Optimize Pipeline by using a deque #51

davidblewett · 2014-08-12T22:35:48Z

This section of code is going to be pretty hot; it's best to optimize it as much as possible.

It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation).

From the docs:

"Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."

This section of code is going to be pretty hot; it's best to optimize it as much as possible. It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation). From: https://docs.python.org/2/library/collections.html#collections.deque "Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."

jsocol · 2014-08-18T11:57:56Z

Given that this is a pure-performance change, do you have any benchmarks (something simple using timeit would be great) showing that this is actually faster?

davidblewett · 2014-08-18T19:35:19Z

I think a deque is just the correct data structure to use here; it should ease memory fragmentation as well as perform better.

The timeit harness uses a loop similar to what we have in our hottest part of the code; running 16 patterns looking for different things in an input stream and then recording the time it took. I imagine the performance gains will depend on the number of items in the pipeline. The higher that number goes, the better the deque version should perform.

timeit harness: https://gist.github.com/davidblewett/201c0f83de0824d2226b

CPython 2.7:

Mon Aug 18, 15:15 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
56.6121869087
Deque Pipeline (1000000 iterations):
51.9880800247
bin/py src/RaptorFramework/rf/test.py  94.18s user 14.29s system 99% cpu 1:48.71 total

Mon Aug 18, 15:17 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
56.276321888
Deque Pipeline (1000000 iterations):
51.1171529293
bin/py src/RaptorFramework/rf/test.py  93.37s user 13.96s system 99% cpu 1:47.47 total

Mon Aug 18, 15:19 | /Users/dblewett/crowdstrike/csoc-buildout
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
54.9464468956
Deque Pipeline (1000000 iterations):
50.5001010895
bin/py src/RaptorFramework/rf/test.py  91.70s user 13.71s system 99% cpu 1:45.52 total

What's interesting is that PyPy 2.3's JIT was able to optimize the current code pretty well:

Mon Aug 18, 15:09 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline:
11.0318801403
Deque Pipeline:
9.75559520721

Mon Aug 18, 15:11 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
10.1341640949
Deque Pipeline (1000000 iterations):
9.59893703461

Mon Aug 18, 15:14 | /Users/dblewett/crowdstrike/csoc-buildout-pypy
kell% bin/py src/RaptorFramework/rf/test.py
Stock Pipeline (1000000 iterations):
9.96120500565
Deque Pipeline (1000000 iterations):
9.59728908539

Optimize Pipeline by using a deque

jsocol · 2014-08-21T15:17:16Z

I tweaked it to 100k iterations with 1000 items in the pipe:

$ python test_deque_pipeline.py; python test_deque_pipeline.py; python test_deque_pipeline.py 
Stock Pipeline (100000 iterations):
374.550565958
Deque Pipeline (100000 iterations):
325.278663158
Stock Pipeline (100000 iterations):
361.189954042
Deque Pipeline (100000 iterations):
327.640599966
Stock Pipeline (100000 iterations):
360.275238037
Deque Pipeline (100000 iterations):
326.505084991

I expected bigger, percentage-wise, wins as the size of the pipe scaled up, and just didn't see them. It is an improvement, so I merged it, but I don't think it's a significant win.

A bigger hypothetical win is combining stats, e.g. 1000 incr('foo') calls can be turned into one incr('foo', 1000), and sent in a single packet, instead of looping through the whole thing and sending ~8. (That may interfere with how some people measure, and definitely would interfere with timers, but seems safe-ish for counters without multiple sample rates and gauges.) The only problem is it depends deeply on what people are counting in pipelines. If it's lots of unique stats, then that kind of "optimization" is almost certainly worse.

* Proxy button links * Add tracked link parameter * Fix format * Update diversion URL * Maxime review * Vincent Review * Update diversion URL * Vincent review 2

jsocol added a commit that referenced this pull request Aug 21, 2014

Merge pull request #51 from davidblewett/patch-1

5029468

Optimize Pipeline by using a deque

jsocol merged commit 5029468 into jsocol:master Aug 21, 2014

csestelo pushed a commit to Destygo/pystatsd that referenced this pull request Jul 18, 2022

Proxy button links (jsocol#51)

9287075

* Proxy button links * Add tracked link parameter * Fix format * Update diversion URL * Maxime review * Vincent Review * Update diversion URL * Vincent review 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Pipeline by using a deque #51

Optimize Pipeline by using a deque #51

davidblewett commented Aug 12, 2014

jsocol commented Aug 18, 2014

davidblewett commented Aug 18, 2014

jsocol commented Aug 21, 2014

Optimize Pipeline by using a deque #51

Optimize Pipeline by using a deque #51

Conversation

davidblewett commented Aug 12, 2014

jsocol commented Aug 18, 2014

davidblewett commented Aug 18, 2014

jsocol commented Aug 21, 2014