-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickle: Faster serialization of Unicode strings #59801
Comments
Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings. Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP-393):
The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize). |
Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our "bigmem" buildbot. ====================================================================== Traceback (most recent call last):
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py", line 1281, in wrapper
return f(self, maxsize)
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py", line 1267, in test_huge_str_32b
pickled = self.dumps(data, protocol=proto)
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py", line 49, in dumps
return pickle.dumps(arg, protocol)
MemoryError |
Looks interesting. Can you post benchmark numbers? |
Here is a benchmark comparing Python 3.3 without and with my patch ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/python Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 ### fastpickle ### The following not significant results are hidden, use -v to show them: |
For your information, results of benchmark comparing Python 3.2 to 3.3: ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 ### fastpickle ### ### pickle_dict ### ### pickle_list ### ### slowpickle ### |
Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well. |
Last one: Python 3.2 vs patched Python 3.3. ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/python Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 ### fastpickle ### ### pickle_dict ### ### pickle_list ### ### slowpickle ### |
serhiy: I'm not really motivated to finish the work on this issue (especially "... it would probably be good idea to benchmarks non-ASCII strings as well."). Would you like to work on this? |
Well, I take care of this. I have the own patch for raw_unicode_escape() optimization, but microbenchmarks don't show any speed up. Maybe your approach will be better. |
Ping? |
Since protocol 0 is essentially dead in Python 3, I would like to propose something simpler and safer: only optimize the binary protocols. If noone beats me to it, I'll adapt Victor's patch for that. |
Here is a new patch. Benchmark: ### fastpickle ### |
New changeset 09a84091ae96 by Antoine Pitrou in branch 'default': |
I've applied the review comments and committed the patch. Thank you! |
Hi Antoine, I prefer your patch. Great job! 2013/4/7 Antoine Pitrou <report@bugs.python.org>:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: