-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Pickle: use NEWOBJ instead of NEWOBJ_EX if possible #66972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently in pickle with protocol 4 the NEWOBJ_EX opcode is used to reconstruct objects with __getnewargs__() or __getnewargs_ex__() even if there no keyword arguments. >>> import pickle, pickletools, collections
>>> P = collections.namedtuple('P', 'x y')
>>> pickletools.dis(pickle.dumps(P(12, 34), 3))
0: \x80 PROTO 3
2: c GLOBAL '__main__ P'
14: q BINPUT 0
16: K BININT1 12
18: K BININT1 34
20: \x86 TUPLE2
21: q BINPUT 1
23: \x81 NEWOBJ
24: q BINPUT 2
26: . STOP
highest protocol among opcodes = 2
>>> pickletools.dis(pickle.dumps(P(12, 34), 4))
0: \x80 PROTO 4
2: \x95 FRAME 28
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE
22: \x8c SHORT_BINUNICODE 'P'
25: \x94 MEMOIZE
26: \x93 STACK_GLOBAL
27: \x94 MEMOIZE
28: K BININT1 12
30: K BININT1 34
32: \x86 TUPLE2
33: \x94 MEMOIZE
34: } EMPTY_DICT
35: \x94 MEMOIZE
36: \x92 NEWOBJ_EX
37: \x94 MEMOIZE
38: . STOP
highest protocol among opcodes = 4 These EMPTY_DICT//MEMOIZE//NEWOBJ_EX can be replaced by single NEWOBJ. Actually there is a regression in protocol 4 against protocol 3, so may be following patch should be applied on 3.4 too. Here is a patch which uses __newobj__ instead of __newobj_ex__ if there are no keywords arguments. Also it reduce the number of lines of C code and adds tests for classes with __getnewargs__() and __getnewargs_ex__(). |
Ping. |
Some examples (with bpo-19858 for protocol 4 optimization). Unpatched: >>> len(pickle.dumps([P(12, 34) for i in range(1000)], 3))
17258
>>> len(pickletools.optimize(pickle.dumps([P(12, 34) for i in range(1000)], 3)))
8018
>>> len(pickle.dumps([P(i, -i) for i in range(1000)], 3))
20999
>>> len(pickletools.optimize(pickle.dumps([P(i, -i) for i in range(1000)], 3)))
11759
>>> len(pickle.dumps([P(12, 34) for i in range(1000)], 4))
12031
>>> len(pickletools.optimize(pickle.dumps([P(12, 34) for i in range(1000)], 4)))
9028
>>> len(pickle.dumps([P(i, -i) for i in range(1000)], 4))
15772
>>> len(pickletools.optimize(pickle.dumps([P(i, -i) for i in range(1000)], 4)))
12769 Patched: >>> len(pickle.dumps([P(12, 34) for i in range(1000)], 4))
10031
>>> len(pickletools.optimize(pickle.dumps([P(12, 34) for i in range(1000)], 4)))
8028
>>> len(pickle.dumps([P(i, -i) for i in range(1000)], 4))
13772
>>> len(pickletools.optimize(pickle.dumps([P(i, -i) for i in range(1000)], 4)))
11769 |
Updated patch addresses Antoine's comments. Also added few comments explaining binary data. |
New changeset 2ffaac4c8e53 by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: