New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyUnicodeWriter: change the overallocation factor for Windows #63780
Comments
PyUnicodeWriter currently overallocates the internal buffer by 25%. On Windows, PyUnicodeWriter is slower than PyAccu API. With an overallocation factor of 50%, PyUnicodeWriter is fastter. See this message for the benchmark: We might also change the factor on all platform, performances are almost the same with a factor of 25% or 50% on Linux. |
Your patch implies that the two only supported OSes are Linux and Windows :-) To be honest I think we should have one single overallocation factor for all OSes. If 50% is ok on Linux too, then let it be 50%. (did you run stringbench to see if it made a difference?) |
Many collections in other programming languages use overallocating rate 100%. Perhaps overallocating rate 12.5% for Python lists is good compromise between speed and memory usage (as far as Python lists have no explicit resize method), but for short-living objects we can use larger overallocating rate. In theory for overallocating rate r long list needs in average O(1/log(1+r)) reallocations and O((r+1)/r) copyings per element. I.e. 50% overallocating rate is 3-3.4 times more efficient than current 12.5% overallocating rate and 100% overallocating rate is 1.5-1.7 times more efficient than 50% overallocating rate (in terms of numbers of reallocations and copyings). I'm interesting what will happened when increase overallocating rate (50% or 100%) for PyAccu. |
I chose 25% on Linux after some micro-benchmarks on str%args and str.format(args). If the buffer is too large, the final resize (because PyUnicodeObject must have the exact size) is slow. I suppose that realloc() can avoid copying data if the new is is very close, but has to allocate a new memory block and copy data if the new size is higher than a threshold. It's how _PyObject_Realloc() for example. |
Did you find a difference for small strings vs. large strings? |
When I replaced PyAccu with PyUnicodeWriter for str%args and str.format(args), I ran a lot of benchmarks with short, medium and large strings. See for example: See issues bpo-14716 and bpo-14744 for old benchmark results. If I remember correctly, this is the script used to run the benchmark: https://bitbucket.org/haypo/misc/src/7c2deb7a37353b41a45564ce6a98e07bbe0c691b/python/bench_str.py The script should be run using: I was concerned by performances on short strings because most calls to str%args are short strings. |
It more means that Windows memory allocator is different to the one used on all other operating systems. Well, if you are not convinced, we can keep the overallocation factor of 25%: performances are not so bad. The difference between 25% and 50% is low. -- Benchmark on patched repr(list) (to use PyUnicodeWriter, see issue bpo-19513) using different overallocation factors on Linux. The best factor is 25% (the current factor). Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow -----------------------------+-------------+----------------+----------------+-------------- |
New changeset 093b9838a41c by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: