New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-36694: Do not memoize temporary objects in the C implementation of pickle. #13036
base: main
Are you sure you want to change the base?
bpo-36694: Do not memoize temporary objects in the C implementation of pickle. #13036
Conversation
…f pickle. This produces more optimal pickle data and reduces memory consumption on pickling and unpickling.
@@ -1601,15 +1601,15 @@ memo_get(PicklerObject *self, PyObject *key) | |||
/* Store an object in the memo, assign it a new unique ID based on the number | |||
of objects currently stored in the memo and generate a PUT opcode. */ | |||
static int | |||
memo_put(PicklerObject *self, PyObject *obj) | |||
memo_put(PicklerObject *self, PyObject *obj, int opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how you decide whether opt
should be 0 or 1. What is the heuristic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. This is critical behavior for cloudpickle
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting question. The rule is opt=0 for objects which save its content after saving itself. They are non-empty lists, sets, dicts and general objects with non-trivial elements 2-4 of the tuple returned by __reduce__()
. They should be memoized to allow detecting reference loops.
Also @pierreglaser . |
{ | ||
char pdata[30]; | ||
Py_ssize_t len; | ||
Py_ssize_t idx; | ||
|
||
const char memoize_op = MEMOIZE; | ||
|
||
if (self->fast) | ||
if (self->fast || (opt && Py_REFCNT(obj) == 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring Py_REFCNT(obj)
to be 1 pretty strong right? Does this only affect objects created using the C API
, i.e never bounded to python-level variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mostly affects temporary objects. For example, if __reduce__
returns constructor, ((x, y),)
then the tuple (x, y)
will not be memoized. This is a case of namedtuples.
Another example: if you have a list of unique numbers, strings, tuples, etc, then items of the list will not be memoized as the only reference to the item is from the list.
I think this optimization should be restricted to well-known built-in types (tuples, etc.). Omitting arbitrary user objects risks opening regressions. |
For example? |
I don't have any example, but I'm not confident that they don't exist. @pierreglaser mentioned |
|
No, |
I do not see possibility of regressions. |
Well, maybe there are some differences with constructors with side effect. I'll try to write tests for this. |
I believe I'm also hitting this issue, curious about the remaining steps to push this patch forward? Is it just missing tests? |
This produces more optimal pickle data and reduces memory consumption on
pickling and unpickling.
https://bugs.python.org/issue36694