Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Pipeline by using a deque #51

Merged
merged 1 commit into from Aug 21, 2014
Merged

Conversation

davidblewett
Copy link
Contributor

This section of code is going to be pretty hot; it's best to optimize it as much as possible.

It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation).

From the docs:

"Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."

This section of code is going to be pretty hot; it's best to optimize it as much as possible.

It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation).

From: https://docs.python.org/2/library/collections.html#collections.deque

"Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."
@jsocol
Copy link
Owner

jsocol commented Aug 18, 2014

Given that this is a pure-performance change, do you have any benchmarks (something simple using timeit would be great) showing that this is actually faster?

@davidblewett
Copy link
Contributor Author

I think a deque is just the correct data structure to use here; it should ease memory fragmentation as well as perform better.

The timeit harness uses a loop similar to what we have in our hottest part of the code; running 16 patterns looking for different things in an input stream and then recording the time it took. I imagine the performance gains will depend on the number of items in the pipeline. The higher that number goes, the better the deque version should perform.

timeit harness: https://gist.github.com/davidblewett/201c0f83de0824d2226b

CPython 2.7:

Mon Aug 18, 15:15 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
56.6121869087
Deque Pipeline (1000000 iterations):
51.9880800247
bin/py src/RaptorFramework/rf/test.py  94.18s user 14.29s system 99% cpu 1:48.71 total

Mon Aug 18, 15:17 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
56.276321888
Deque Pipeline (1000000 iterations):
51.1171529293
bin/py src/RaptorFramework/rf/test.py  93.37s user 13.96s system 99% cpu 1:47.47 total

Mon Aug 18, 15:19 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
54.9464468956
Deque Pipeline (1000000 iterations):
50.5001010895
bin/py src/RaptorFramework/rf/test.py  91.70s user 13.71s system 99% cpu 1:45.52 total

What's interesting is that PyPy 2.3's JIT was able to optimize the current code pretty well:

Mon Aug 18, 15:09 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline:
11.0318801403
Deque Pipeline:
9.75559520721

Mon Aug 18, 15:11 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
10.1341640949
Deque Pipeline (1000000 iterations):
9.59893703461

Mon Aug 18, 15:14 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
9.96120500565
Deque Pipeline (1000000 iterations):
9.59728908539

jsocol added a commit that referenced this pull request Aug 21, 2014
Optimize Pipeline by using a deque
@jsocol jsocol merged commit 5029468 into jsocol:master Aug 21, 2014
@jsocol
Copy link
Owner

jsocol commented Aug 21, 2014

I tweaked it to 100k iterations with 1000 items in the pipe:

$ python test_deque_pipeline.py; python test_deque_pipeline.py; python test_deque_pipeline.py 
Stock Pipeline (100000 iterations):
374.550565958
Deque Pipeline (100000 iterations):
325.278663158
Stock Pipeline (100000 iterations):
361.189954042
Deque Pipeline (100000 iterations):
327.640599966
Stock Pipeline (100000 iterations):
360.275238037
Deque Pipeline (100000 iterations):
326.505084991

I expected bigger, percentage-wise, wins as the size of the pipe scaled up, and just didn't see them. It is an improvement, so I merged it, but I don't think it's a significant win.

A bigger hypothetical win is combining stats, e.g. 1000 incr('foo') calls can be turned into one incr('foo', 1000), and sent in a single packet, instead of looping through the whole thing and sending ~8. (That may interfere with how some people measure, and definitely would interfere with timers, but seems safe-ish for counters without multiple sample rates and gauges.) The only problem is it depends deeply on what people are counting in pipelines. If it's lots of unique stats, then that kind of "optimization" is almost certainly worse.

csestelo pushed a commit to Destygo/pystatsd that referenced this pull request Jul 18, 2022
* Proxy button links

* Add tracked link parameter

* Fix format

* Update diversion URL

* Maxime review

* Vincent Review

* Update diversion URL

* Vincent review 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants