Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyresttest speed issue #152

Closed
nitrocode opened this issue Feb 9, 2016 · 16 comments · Fixed by #160
Closed

pyresttest speed issue #152

nitrocode opened this issue Feb 9, 2016 · 16 comments · Fixed by #160
Labels

Comments

@nitrocode
Copy link

We're evaluating using requests instead of pyresttest because we've had issues with speed.

My login.yaml test takes about 24 seconds to run using pyresttest and a more extended test using the requests module, no yaml, and unittest2 takes about 6 seconds. Has another else noticed this too?

I'm speculating but the slow down in pyresttest may be due to the initial processing prior to sending the first curl request. After stepping through the program, with a break point on curl.perform() on line 350, it took about 4 to 5 seconds to before calling the curl statement. Is there a way to speed this up?

@svanoort
Copy link
Owner

svanoort commented Feb 9, 2016

may be due to the initial processing prior to sending the first curl request

Installing libyaml may speed it up, because pyyaml can use that for faster parsing of the YAML (possible bottleneck).

Yes, there is a fixed overhead to parsing and validating the test structure (it shouldn't be this much though), but it's a one-time cost and should not be that long unless your tests are extremely large or complex.

I would need to be able to replicate this and do some experimentation to see where the overheads are.
Can you provide sample YAML and which version of Python, PyRestTest (from source or stable release from PIP), and what OS you're running?

In general, once the initial parsing & configuration is done, the actual requests should be fairly fast, because PyCurl's native code is considerably faster than requests (link is to a benchmark I did). There's some overhead in configuring and analyzing each request/response pair, of course.

The benchmark methods are fairly optimized: there's a test for performance in there, which has shown that overhead is <0.3 ms per request on a fairly weak system. I'll have to see if that has regressed.

The test methods haven't been as optimized, because they do some additional processing and there hasn't been a need yet. The curl handles could be reused (big performance gain), and there's some room for caching of templates (one smaller bottleneck).

@svanoort
Copy link
Owner

svanoort commented Feb 9, 2016

@nitrocode ^

@nitrocode
Copy link
Author

Hi @svanoort, here is my setup

python 2.7.11
pyyaml 3.11
Windows 7 and CentOS 7

sampletest:

---
- config:
    - testset: 'Testing free api'
    - timeout: 60
    - curl_option_SSL_CIPHER_LIST: 'rsa_rc4_128_sha'

- test:
    - group: "Quickstart"
    - name: "Get data"
    - url: '/posts/1'
    - method: 'GET'
    - validators:
         - compare: {jsonpath_mini: 'userId', comparator: 'gt', expected: 0}

command:

python pyresttest/pyresttest/resttest.py http://jsonplaceholder.typicode.com sampletest/jsonplaceholder.yml

This takes about 30 msec

Possible slow down

I haven't looked too deeply into this but recently I've been experimenting with yaml files and pyyaml and noticed that parsing yaml files that use dashes (lists) take a lot longer than without. I also noticed that there seems to be an overhead with iterating over yaml files that have many dashes which is the case of pyrestest. Though, I'm just speculating.

Take for example the examples/minimap-test.yaml file and remove all of the dashes and replace the no-value keys with 'true' called minimap-test-no-lists.yaml

---
- config:
    testset: "Benchmark tests using test app"

- benchmark: # create entity
    name: "Basic get"
    url: "/api/person/"
    warmup_runs: 7
    'benchmark_runs': '101'
    output_file: 'miniapp-benchmark.csv'
    metrics:
        total_time: 'true'
        total_time: mean
        total_time: median
        size_download: 'true'
        speed_download: median

- benchmark: # create entity
    name: "Get single person"
    url: "/api/person/1/"
    metrics: {speed_upload: median, speed_download: median, redirect_time: mean}
    output_format: json
    output_file: 'miniapp-single.json'

Here is the time processed and output of your yaml file

Took 30.9998989105 msec to parse minimap-test.yaml
[{'config': [{'testset': 'Benchmark tests using test app'}]},
 {'benchmark': [{'name': 'Basic get'},
   {'url': '/api/person/'},
   {'warmup_runs': 7},
   {'benchmark_runs': '101'},
   {'output_file': 'miniapp-benchmark.csv'},
   {'metrics': ['total_time',
     {'total_time': 'mean'},
     {'total_time': 'median'},
     'size_download',
     {'speed_download': 'median'}]}]},
 {'benchmark': [{'name': 'Get single person'},
   {'url': '/api/person/1/'},
   {'metrics': {'redirect_time': 'mean',
     'speed_download': 'median',
     'speed_upload': 'median'}},
   {'output_format': 'json'},
   {'output_file': 'miniapp-single.json'}]}]

Here it is for the same file without the dashes (no lists)

Took 20.9999084473 msec to parse minimap-test-no-lists.yml
[{'config': {'testset': 'Benchmark tests using test app'}},
 {'benchmark': {'benchmark_runs': '101',
   'metrics': {'size_download': 'true',
    'speed_download': 'median',
    'total_time': 'median'},
   'name': 'Basic get',
   'output_file': 'miniapp-benchmark.csv',
   'url': '/api/person/',
   'warmup_runs': 7}},
 {'benchmark': {'metrics': {'redirect_time': 'mean',
    'speed_download': 'median',
    'speed_upload': 'median'},
   'name': 'Get single person',
   'output_file': 'miniapp-single.json',
   'output_format': 'json',
   'url': '/api/person/1/'}}]

So there is roughly a 50% decrease in time (10 msec) in processing time just by converting the function and from looking at the json differences, the complexity falls a bit once the list of lists are converted into key-value pairs.

Get benchmark's metric's total_time using minimap-test.yml

data[1]['benchmark'][5]['metrics'][2]

Using minimap-test-no-lists.yml

data[1]['benchmark']['metrics']['total_time']

@nitrocode nitrocode reopened this Feb 17, 2016
@svanoort
Copy link
Owner

@nitrocode 10-20 milliseconds on parsing isn't enough to give a big impact overall. Is performance similar on windows and linux for your call where pyresttest is very slow?

@svanoort
Copy link
Owner

@nitrocode I've been using a test script that does profiling of PyRestTest here:
https://github.com/svanoort/pyresttest/blob/profiling-performance/profile-pyresttest.sh

This should be easy to adapt to see what the cumulative and overall time is:

python -m cProfile -s cumtime pyresttest/resttest.py http://myurl mytest.yaml > mytest-cum.txt
python -m cProfile -s tottime pyresttest/resttest.py http://myurl mytest.yaml > mytest-tot.txt

This will show where the bottlenecks are.

From the json API test script, it looks like the majority of the runtime is simply the HTTP request/response ("curl.perform()"), and the other overhead is only ~100 ms:

�Test Group Quickstart SUCCEEDED: : 1/1 Tests Passed!�
         50354 function calls (48860 primitive calls) in 0.753 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.646    0.646    0.646    0.646 {method 'perform' of 'pycurl.Curl' objects}
        1    0.010    0.010    0.753    0.753 resttest.py:2(<module>)
      232    0.009    0.000    0.023    0.000 sre_compile.py:207(_optimize_charset)
        2    0.009    0.004    0.009    0.004 sre_compile.py:301(_optimize_unicode)
   250/38    0.007    0.000    0.019    0.000 sre_parse.py:379(_parse)
       10    0.005    0.000    0.006    0.001 collections.py:288(namedtuple)
   477/36    0.005    0.000    0.024    0.001 sre_compile.py:32(_compile)
     3034    0.004    0.000    0.005    0.000 sre_parse.py:182(__next)
        1    0.003    0.003    0.005    0.005 tests.py:1(<module>)
       41    0.002    0.000    0.002    0.000 sre_compile.py:258(_mk_bitmap)
      107    0.002    0.000    0.002    0.000 {range}
  656/217    0.002    0.000    0.002    0.000 sre_parse.py:140(getwidth)
        1    0.002    0.002    0.011    0.011 utils.py:5(<module>)
     2047    0.002    0.000    0.002    0.000 sre_parse.py:130(__getitem__)
        1    0.002    0.002    0.002    0.002 socket.py:45(<module>)
     2533    0.001    0.000    0.005    0.000 sre_parse.py:201(get)
10088/9843    0.001    0.000    0.002    0.000 {len}

When I use the content-test.yaml (much more complex use case with generators and templates against the embedded Django test app), this changes a bit and overheads go up, but it's still only a few hundred ms. To be expected since it's a more complex scenario using many of the dynamic features, which come with overheads.

So what I'm wondering is what your test is actually doing in those 4-5s -- which the profiler output above would show, if you could be kind enough to attach it?

@svanoort
Copy link
Owner

@nitrocode Actually... here's a better way, it'll give the full dump and I can analyze that in depth to see what's slow for you (since I can't reproduce this kind of slowness on any of my calls):

python -m cProfile -o my_api_dump.out pyresttest/resttest.py http://mylogin.api.com/api my_login_test.yaml

@svanoort
Copy link
Owner

Rough stats on overheads with content-test.yaml (skewed because there's ZERO network overhead and requests are nearly instant):

v1.6.0 code -
60% curl.perform (200 msec)
6.6% header parsing (22 msec, includes a lot of benchmark runs)
12.8% test file load/parse (43 msec, mostly YAML parsing)
~15% (most of remainder): module loading - call it ~50 msec.

The only part of this that should get longer with more network traffic/requests is the curl.perform bit (all native code, same as curl) or header parsing.

Similar breakdown on python 2.7 in Ubuntu minimal docker container, and with the latest master branch running locally.

Looking at instructions to install pyyaml with libyaml:
http://pyyaml.org/wiki/PyYAMLDocumentation

@svanoort
Copy link
Owner

Personal notes from analyzing Python3 performance:

It appears that python 3 performance is overall similar to python 2 although distributed a bit differently - module loading seems to be a larger share (up to several hundred ms). That might be due to running within a Docker container for that portion, though.

Overall, for Python 2 & 3, it looks like rough expected benchmarks (at least on this system with a fast SSD, so disk I/O is fast) are:

  • ~100-200 ms to load & initialize python libraries - fairly constant with different testset sizes
  • 10-100 ms to load & parse YAML files (including fairly complex test sets here) - increases with testset size and complexity, but still generally fast
  • parsing headers is more expensive than expected, can be 5-10% of overall time for very fast local requests

The YAML parsing can be speeded up quite a bit by enabling access to libyaml bindings.
According to StackOverflow. They that where pure-python YAML parsing takes 405 msec, libyaml version is 59.2 msec.

It looks like that would require some trivial PyRestTest changes (to conditionally import and use the yaml CLoader rather than)... however not worth dealing with unless parsing becomes a bottleneck.

Sample benchmark + analysis command:

python -m cProfile -o all-miniapp-nobench.profile pyresttest/resttest.py http://localhost:8000 examples/miniapp-run-all.yaml
snakeviz all-miniapp-nobench.profile

@nitrocode
Copy link
Author

Hi @svanoort. Thank you for introducing me to profiling as I wasn't aware of this feature.

I used the same rest api in my last comment but this time profiled and it took curl 40 seconds and 99% of the time it took to complete. Unbelievable. I updated pycurl to the latest version and it's still incredibly slow on Windows. I still need to test this on Linux again.

@svanoort
Copy link
Owner

@nitrocode Are you saying the curl.perform call is 99% of the runtime?

That's definitely a Windows issue. Windows performance for python is sometimes kind of terrible, probably other issues here too with a native (curl) library not intended for Windows use originally.

@svanoort
Copy link
Owner

@nitrocode By the way, this post on reddit offers a suggestion that in some cases you may actually see better performance running Python code in a Linux VM than directly in the host OS. Given the sharp performance disparity on windows... I'd suggest giving that a shot here.

@svanoort
Copy link
Owner

@nitrocode I think I've got a solution for you (result of putting together a better answer to a StackOverflow question on PyCurl vs Requests performance). The root cause is an issue with DNS performance in curl interacting with the OS (or at least it is on linux).

The code fix here should greatly increase performance, because it also allows use of pycurl's built-in connection pool for tests (in addition to the DNS cache): #160

Benchmarks here suggest that pyresttest may have almost an order of magnitude improvement in executing tests by pyCurl.

With the patch, the networking portions of the tests may be up to 2x to 4x faster than unittest/requests now.

@svanoort
Copy link
Owner

@nitrocode I've gone ahead and closed this issue, since I've root-caused this and identified the culprit (a bug in how the underlying native library interacts with the OS), along with a solid workaround.

Please let me know if this does not resolve it, and I'll re-open it. Might even be able to test on Windows separately from other OSes now.

@svanoort
Copy link
Owner

(To be clear, this is only an order-of-magnitude improvement in running long series of tests to the same server, probably the improvement will be more modest in other cases.)

@nitrocode
Copy link
Author

Yeah, I dont know what the issue is. I'm still receiving incredibly long curl times. After this experience, I'm no longer a fan of pycurl.

Thank you for all your efforts on this. I wish I could make it work for me. Hopefully I'm just a corner case.

@svanoort
Copy link
Owner

@nitrocode sigh. I've got nothing then; if it misbehaves on linux, there are a couple new avenues to pursue that I discovered (some additional curl settings).

I continue to be fond of pyCurl's options and performance, but at some point in the future am going to have to offer an alternative to fall back to requests for simplicity & reliability (at the cost of features which depend on pycurl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants