-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marshal: performance regression with versions 3 and 4 #64615
Comments
Attached patched disables references for int and float types. |
Use attached bench.py to compare performances. Without the patch: dumps v1: 391.4 ms dumps v2: 166.9 ms dumps v3: 431.2 ms dumps v4: 361.8 ms With the patch: dumps v1: 392.3 ms dumps v2: 167.7 ms dumps v3: 170.3 ms dumps v4: 122.8 ms dumps v3 is 60% faster, loads v3 is also 14% *faster*. dumps v4 is 66% faster, loads v4 is 16% faster. |
Performance of Python 3.3.3+: dumps v1: 374.6 ms dumps v2: 152.9 ms So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller). |
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer. |
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. |
The Derby is suspended until the release of Python 3.4 final. I consider this issue as an important performance regression that should be fixed before Python 3.4 final. |
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. |
Here are new benchmarks on Python 3.4 with:
Integers, without the patch:
Integers, with the patch:
Floats, without the patch:
Floats, with the patch:
The version 3 was added by: This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers. For real data, here are interesting numbers: Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared. |
For the record, format 3 was added through bpo-16475, format 4 was added through bpo-19219. In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact. OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not. As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented. |
As I wrote on python-dev, dumps performance isn't important for the pyc You are also ignoring the *runtime* benefit of sharing objects: smaller |
This is implementation detail. But we can use more efficient way to memoizating small int objects. I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5. |
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less. |
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach. |
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id). Without the patch: data ver. dumps(ms) loads(ms) size(KiB) genData 2 103.9 186.4 4090.7 [1000]*10**6 2 98.6 134.8 4882.8 [1000.0]*10**6 2 173.5 158.4 8789.1 [1000.0j]*10**6 2 288.8 221.4 16601.6 20 pydecimals 2 85.0 114.7 3936.5 With marshal3_numbers.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 108.4 187.5 4090.7 [1000]*10**6 3 104.2 145.8 4882.8 [1000.0]*10**6 3 177.4 154.5 8789.1 [1000.0j]*10**6 3 501.6 61.1 4882.8 20 pydecimals 3 95.2 41.9 3373.4 With marshal_refs_by_value.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 150.4 197.0 4090.7 [1000]*10**6 3 169.1 62.3 4882.8 [1000.0]*10**6 3 313.5 62.2 4882.8 [1000.0j]*10**6 3 410.6 62.5 4882.8 20 pydecimals 3 68.5 41.1 3373.4 The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster. As side effect the patch can turn some simple equal values to identical objects. |
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests). https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py $ find * -name __pycache__ -exec rm -rf '{}' +
$ ./python -m compileall -qq -r 99 Lib
$ find Lib -name '*.pyc' | xargs ./python marshalbench.py Without the patch: ver. dumps loads size With marshal3_numbers.patch: ver. dumps loads size With marshal_refs_by_value.patch: ver. dumps loads size |
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code. |
And here is alternative patch which uses a hashtable. Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation). data ver. dumps(ms) loads(ms) size(KiB) genData 2 99.9 188.9 4090.7 [1000]*10**6 2 97.7 131.6 4882.8 [1000.0]*10**6 2 172.9 153.5 8789.1 [1000.0j]*10**6 2 288.6 228.2 16601.6 20 pydecimals 2 88.0 111.4 3929.6 |
What are your thoughts Victor? |
Can you post the results of your hashtable patch with marshalbench.py? PS: we don't care about versions older than 4. |
Here are new results with corrected marshalbench.py (more precise and with recalculated total size): Without the patch: With marshal3_numbers.patch: With marshal_refs_by_value_3.patch: With marshal_hashtable.patch: |
Update patch addresses Victor's comments. |
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4). |
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.) |
New changeset 012364df2712 by Serhiy Storchaka in branch 'default': |
Thanks for your review Antoine and Victor. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: