New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Greenlet creation on CPython #1120

Merged
merged 4 commits into from Feb 24, 2018

Conversation

Projects
None yet
1 participant
@jamadden
Member

jamadden commented Feb 24, 2018

Two ways: store tuples instead of _frame objects and use direct access
to two of the attributes of the CPython frame objects.

Benchmarks:

CPython 2.7 master vs this change

Benchmark spawn_27_master spawn_27_tuple2
eventlet sleep 9.12 us 8.77 us: 1.04x faster (-4%)
gevent spawn 14.5 us 13.2 us: 1.10x faster (-9%)
gevent sleep 1.63 us 1.86 us: 1.14x slower (+14%)
geventpool spawn 30.4 us 23.6 us: 1.29x faster (-22%)
geventpool sleep 4.30 us 4.55 us: 1.06x slower (+6%)
geventpool join 1.70 us 1.83 us: 1.08x slower (+8%)
gevent spawn kwarg 16.5 us 13.5 us: 1.22x faster (-18%)
geventpool spawn kwarg 30.5 us 23.9 us: 1.27x faster (-22%)

Not significant (7): eventlet spawn; geventraw spawn; geventraw sleep;
none spawn; eventlet spawn kwarg; geventraw spawn kwarg; none spawn
kwarg

CPython 3.6 master vs this change

Benchmark spawn_36_master spawn_36_tuple2
gevent spawn 13.2 us 11.9 us: 1.12x faster (-10%)
gevent sleep 1.71 us 1.90 us: 1.11x slower (+11%)
geventpool spawn 19.9 us 15.9 us: 1.25x faster (-20%)
geventpool sleep 3.54 us 3.75 us: 1.06x slower (+6%)
geventpool spawn kwarg 20.3 us 15.9 us: 1.27x faster (-22%)
geventraw spawn kwarg 5.80 us 6.10 us: 1.05x slower (+5%)

Not significant (9): eventlet spawn; eventlet sleep; geventraw spawn;
geventraw sleep; none spawn; geventpool join; eventlet spawn kwarg;
gevent spawn kwarg; none spawn kwarg

PyPy2 master vs this change

Benchmark spawn_pypy_master spawn_pypy_tuple2
eventlet spawn 30.5 us 28.9 us: 1.05x faster (-5%)
eventlet sleep 3.39 us 3.19 us: 1.06x faster (-6%)
gevent spawn 9.89 us 17.2 us: 1.73x slower (+73%)
gevent sleep 3.14 us 3.99 us: 1.27x slower (+27%)
geventpool spawn 12.3 us 20.1 us: 1.63x slower (+63%)

Not significant (1): geventpool sleep

CPython 3.6: 1.3a1 vs this change

Benchmark spawn_36_13a1 spawn_36_tuple2
eventlet spawn 14.0 us 13.2 us: 1.06x faster (-6%)
gevent spawn 4.25 us 11.9 us: 2.79x slower (+179%)
gevent sleep 2.78 us 1.90 us: 1.46x faster (-32%)
geventpool spawn 10.4 us 15.9 us: 1.52x slower (+52%)
geventpool sleep 5.52 us 3.75 us: 1.47x faster (-32%)
geventraw spawn 2.56 us 5.09 us: 1.99x slower (+99%)
geventraw sleep 738 ns 838 ns: 1.14x slower (+14%)
geventpool join 3.94 us 1.75 us: 2.25x faster (-56%)
gevent spawn kwarg 5.50 us 12.1 us: 2.19x slower (+119%)
geventpool spawn kwarg 11.3 us 15.9 us: 1.41x slower (+41%)
geventraw spawn kwarg 3.90 us 6.10 us: 1.56x slower (+56%)

Not significant (4): eventlet sleep; none spawn; eventlet spawn kwarg; none spawn kwarg

The eventlet, sleep, join and raw tests serve as controls, so we can see that there's up to ~10% variance between most runs anyway.

CPython 3.6 shows the least variance so those 10-20% improvement numbers are probably fairly close.

PyPy sadly gets slower with this change for reasons that are utterly unclear.

Compared to 1.3a1 (last benchmark) we're still up to 2-3x slower.

Creation of a raw greenlet shows 2.66us on CPython 3.6.4 vs the 3.65us reported in #755.

jamadden added some commits Feb 24, 2018

Rework bench_spawn to be perf based for more reliable numbers
I get this on 3.6.4:

.....................
eventlet spawn: Mean +- std dev: 12.2 us +- 1.2 us
.....................
eventlet sleep: Mean +- std dev: 16.3 us +- 0.8 us
.....................
gevent spawn: Mean +- std dev: 13.1 us +- 1.1 us
.....................
gevent sleep: Mean +- std dev: 10.4 us +- 0.8 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (2.73 us) is 17% of the mean (16.1 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool spawn: Mean +- std dev: 16.1 us +- 2.7 us
.....................
geventpool sleep: Mean +- std dev: 11.2 us +- 0.5 us
.....................
geventraw spawn: Mean +- std dev: 4.95 us +- 0.42 us
.....................
geventraw sleep: Mean +- std dev: 7.34 us +- 0.28 us
.....................
none spawn: Mean +- std dev: 1.98 us +- 0.05 us
.....................
geventpool join: Mean +- std dev: 6.59 us +- 0.25 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (1.28 us) is 10% of the mean (12.7 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

eventlet spawn kwarg: Mean +- std dev: 12.7 us +- 1.3 us
.....................
gevent spawn kwarg: Mean +- std dev: 14.6 us +- 1.2 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (2.81 us) is 17% of the mean (17.0 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool spawn kwarg: Mean +- std dev: 17.0 us +- 2.8 us
.....................
geventraw spawn kwarg: Mean +- std dev: 6.11 us +- 0.45 us
.....................
none spawn kwarg: Mean +- std dev: 2.22 us +- 0.07 us

And this on 2.7.14:

.....................
WARNING: the benchmark result may be unstable
* the standard deviation (2.10 us) is 11% of the mean (18.4 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

eventlet spawn: Mean +- std dev: 18.4 us +- 2.1 us
.....................
eventlet sleep: Mean +- std dev: 23.1 us +- 0.8 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (4.39 us) is 25% of the mean (17.3 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

gevent spawn: Mean +- std dev: 17.3 us +- 4.4 us
.....................
gevent sleep: Mean +- std dev: 10.3 us +- 0.5 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (3.92 us) is 16% of the mean (24.7 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool spawn: Mean +- std dev: 24.7 us +- 3.9 us
.....................
geventpool sleep: Mean +- std dev: 13.5 us +- 0.9 us
.....................
geventraw spawn: Mean +- std dev: 6.91 us +- 0.49 us
.....................
geventraw sleep: Mean +- std dev: 8.95 us +- 0.30 us
.....................
none spawn: Mean +- std dev: 2.21 us +- 0.04 us
.....................
geventpool join: Mean +- std dev: 7.93 us +- 0.28 us
.....................
eventlet spawn kwarg: Mean +- std dev: 17.4 us +- 1.3 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (4.11 us) is 27% of the mean (15.1 us)
* the maximum (24.2 us) is 60% greater than the mean (15.1 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

gevent spawn kwarg: Mean +- std dev: 15.1 us +- 4.1 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (4.74 us) is 18% of the mean (26.8 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool spawn kwarg: Mean +- std dev: 26.8 us +- 4.7 us
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (959 ns) is 12% of the mean (8.00 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventraw spawn kwarg: Mean +- std dev: 8.00 us +- 0.96 us
.....................
none spawn kwarg: Mean +- std dev: 2.48 us +- 0.06 us

Partial PyPy results:

.........
WARNING: the benchmark result may be unstable
* the standard deviation (5.77 us) is 18% of the mean (32.5 us)
* the maximum (52.2 us) is 61% greater than the mean (32.5 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

eventlet spawn: Mean +- std dev: 32.5 us +- 5.8 us
.........
eventlet sleep: Mean +- std dev: 39.9 us +- 2.4 us
.........
WARNING: the benchmark result may be unstable
* the standard deviation (8.90 us) is 43% of the mean (20.6 us)
* the minimum (8.50 us) is 59% smaller than the mean (20.6 us)
* the maximum (41.5 us) is 102% greater than the mean (20.6 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

gevent spawn: Mean +- std dev: 20.6 us +- 8.9 us
.........
gevent sleep: Mean +- std dev: 4.20 us +- 0.21 us
.........
WARNING: the benchmark result may be unstable
* the standard deviation (8.52 us) is 50% of the mean (17.2 us)
* the minimum (7.74 us) is 55% smaller than the mean (17.2 us)
* the maximum (58.1 us) is 238% greater than the mean (17.2 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool spawn: Mean +- std dev: 17.2 us +- 8.5 us
.........
WARNING: the benchmark result may be unstable
* the standard deviation (968 ns) is 18% of the mean (5.26 us)
* the maximum (10.5 us) is 99% greater than the mean (5.26 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventpool sleep: Mean +- std dev: 5.26 us +- 0.97 us
..........
WARNING: the benchmark result may be unstable
* the standard deviation (1.45 us) is 52% of the mean (2.80 us)
* the maximum (5.50 us) is 96% greater than the mean (2.80 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

geventraw spawn: Mean +- std dev: 2.80 us +- 1.45 us
.........
geventraw sleep: Mean +- std dev: 4.24 us +- 0.27 us
.........
none spawn: Mean +- std dev: 1.10 us +- 0.06 us
Speed up Greenlet creation on CPython
Two ways: store tuples instead of _frame objects and use direct access
to two of the attributes of the CPython frame objects.

Benchmarks:

+------------------------+-----------------+------------------------------+
| Benchmark              | spawn_27_master | spawn_27_tuple2              |
+========================+=================+==============================+
| eventlet sleep         | 9.12 us         | 8.77 us: 1.04x faster (-4%)  |
+------------------------+-----------------+------------------------------+
| gevent spawn           | 14.5 us         | 13.2 us: 1.10x faster (-9%)  |
+------------------------+-----------------+------------------------------+
| gevent sleep           | 1.63 us         | 1.86 us: 1.14x slower (+14%) |
+------------------------+-----------------+------------------------------+
| geventpool spawn       | 30.4 us         | 23.6 us: 1.29x faster (-22%) |
+------------------------+-----------------+------------------------------+
| geventpool sleep       | 4.30 us         | 4.55 us: 1.06x slower (+6%)  |
+------------------------+-----------------+------------------------------+
| geventpool join        | 1.70 us         | 1.83 us: 1.08x slower (+8%)  |
+------------------------+-----------------+------------------------------+
| gevent spawn kwarg     | 16.5 us         | 13.5 us: 1.22x faster (-18%) |
+------------------------+-----------------+------------------------------+
| geventpool spawn kwarg | 30.5 us         | 23.9 us: 1.27x faster (-22%) |
+------------------------+-----------------+------------------------------+

Not significant (7): eventlet spawn; geventraw spawn; geventraw sleep;
none spawn; eventlet spawn kwarg; geventraw spawn kwarg; none spawn
kwarg

+------------------------+-----------------+------------------------------+
| Benchmark              | spawn_36_master | spawn_36_tuple2              |
+========================+=================+==============================+
| gevent spawn           | 13.2 us         | 11.9 us: 1.12x faster (-10%) |
+------------------------+-----------------+------------------------------+
| gevent sleep           | 1.71 us         | 1.90 us: 1.11x slower (+11%) |
+------------------------+-----------------+------------------------------+
| geventpool spawn       | 19.9 us         | 15.9 us: 1.25x faster (-20%) |
+------------------------+-----------------+------------------------------+
| geventpool sleep       | 3.54 us         | 3.75 us: 1.06x slower (+6%)  |
+------------------------+-----------------+------------------------------+
| geventpool spawn kwarg | 20.3 us         | 15.9 us: 1.27x faster (-22%) |
+------------------------+-----------------+------------------------------+
| geventraw spawn kwarg  | 5.80 us         | 6.10 us: 1.05x slower (+5%)  |
+------------------------+-----------------+------------------------------+

Not significant (9): eventlet spawn; eventlet sleep; geventraw spawn;
geventraw sleep; none spawn; geventpool join; eventlet spawn kwarg;
gevent spawn kwarg; none spawn kwarg

+------------------+-------------------+------------------------------+
| Benchmark        | spawn_pypy_master | spawn_pypy_tuple2            |
+==================+===================+==============================+
| eventlet spawn   | 30.5 us           | 28.9 us: 1.05x faster (-5%)  |
+------------------+-------------------+------------------------------+
| eventlet sleep   | 3.39 us           | 3.19 us: 1.06x faster (-6%)  |
+------------------+-------------------+------------------------------+
| gevent spawn     | 9.89 us           | 17.2 us: 1.73x slower (+73%) |
+------------------+-------------------+------------------------------+
| gevent sleep     | 3.14 us           | 3.99 us: 1.27x slower (+27%) |
+------------------+-------------------+------------------------------+
| geventpool spawn | 12.3 us           | 20.1 us: 1.63x slower (+63%) |
+------------------+-------------------+------------------------------+

Not significant (1): geventpool sleep

+------------------------+---------------+-------------------------------+
| Benchmark              | spawn_36_13a1 | spawn_36_tuple2               |
+========================+===============+===============================+
| eventlet spawn         | 14.0 us       | 13.2 us: 1.06x faster (-6%)   |
+------------------------+---------------+-------------------------------+
| gevent spawn           | 4.25 us       | 11.9 us: 2.79x slower (+179%) |
+------------------------+---------------+-------------------------------+
| gevent sleep           | 2.78 us       | 1.90 us: 1.46x faster (-32%)  |
+------------------------+---------------+-------------------------------+
| geventpool spawn       | 10.4 us       | 15.9 us: 1.52x slower (+52%)  |
+------------------------+---------------+-------------------------------+
| geventpool sleep       | 5.52 us       | 3.75 us: 1.47x faster (-32%)  |
+------------------------+---------------+-------------------------------+
| geventraw spawn        | 2.56 us       | 5.09 us: 1.99x slower (+99%)  |
+------------------------+---------------+-------------------------------+
| geventraw sleep        | 738 ns        | 838 ns: 1.14x slower (+14%)   |
+------------------------+---------------+-------------------------------+
| geventpool join        | 3.94 us       | 1.75 us: 2.25x faster (-56%)  |
+------------------------+---------------+-------------------------------+
| gevent spawn kwarg     | 5.50 us       | 12.1 us: 2.19x slower (+119%) |
+------------------------+---------------+-------------------------------+
| geventpool spawn kwarg | 11.3 us       | 15.9 us: 1.41x slower (+41%)  |
+------------------------+---------------+-------------------------------+
| geventraw spawn kwarg  | 3.90 us       | 6.10 us: 1.56x slower (+56%)  |
+------------------------+---------------+-------------------------------+

Not significant (4): eventlet sleep; none spawn; eventlet spawn kwarg; none spawn kwarg

The eventlet, sleep, join and raw tests serve as controls, so we can see
that there's up to ~10% variance between most runs anyway.

CPython 3.6 shows the least variance so those 10-20% improvement
numbers are probably fairly close.

PyPy sadly gets *slower* with this change for reasons that are utterly
unclear.

Compared to 1.3a1 (last benchmark) we're still up to 2-3x slower.

Creation of a raw greenlet shows 2.66us on CPython 3.6.4 vs the 3.65us
I reported in #755.

@jamadden jamadden merged commit b353730 into master Feb 24, 2018

1 check was pending

continuous-integration/appveyor/branch Waiting for AppVeyor build to complete
Details

@jamadden jamadden deleted the faster-stack branch Feb 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment