New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Pipeline by using a deque #51
Conversation
This section of code is going to be pretty hot; it's best to optimize it as much as possible. It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation). From: https://docs.python.org/2/library/collections.html#collections.deque "Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."
Given that this is a pure-performance change, do you have any benchmarks (something simple using |
I think a deque is just the correct data structure to use here; it should ease memory fragmentation as well as perform better. The timeit harness uses a loop similar to what we have in our hottest part of the code; running 16 patterns looking for different things in an input stream and then recording the time it took. I imagine the performance gains will depend on the number of items in the pipeline. The higher that number goes, the better the deque version should perform. timeit harness: https://gist.github.com/davidblewett/201c0f83de0824d2226b CPython 2.7:
What's interesting is that PyPy 2.3's JIT was able to optimize the current code pretty well:
|
Optimize Pipeline by using a deque
I tweaked it to 100k iterations with 1000 items in the pipe:
I expected bigger, percentage-wise, wins as the size of the pipe scaled up, and just didn't see them. It is an improvement, so I merged it, but I don't think it's a significant win. A bigger hypothetical win is combining stats, e.g. 1000 |
* Proxy button links * Add tracked link parameter * Fix format * Update diversion URL * Maxime review * Vincent Review * Update diversion URL * Vincent review 2
This section of code is going to be pretty hot; it's best to optimize it as much as possible.
It might be worth it to switch to a list for the data variable as well, and use '\n'.join(data) (vs. string concatenation).
From the docs:
"Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation."