Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Schema.from_dict() to be pickled #1469

Closed

Conversation

smith-kyle
Copy link

Fixes #1468

Right now it's just a failing test. I'm opening this now to have a place to get early feedback

@smith-kyle smith-kyle changed the title Added test to be sure that Schema.from_dict can be pickled Allow Schema.from_dict to be pickled Dec 7, 2019
@smith-kyle smith-kyle changed the title Allow Schema.from_dict to be pickled Allow Schema.from_dict() to be pickled Dec 7, 2019
@sloria
Copy link
Member

sloria commented Dec 7, 2019

Thanks. I'd be happy to merge the fix if it doesn't add too much complexity to the code. Are you working on the fix?

@smith-kyle
Copy link
Author

Great! I'll work on it today

@smith-kyle
Copy link
Author

@sloria Just so I have a better idea of what's going on, what's reason for creating a class with a dynamic name?

https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L444

@smith-kyle
Copy link
Author

smith-kyle commented Dec 9, 2019

Or I guess I can just come out and ask the question rather than asking a leading question: I thought __repr__ was supposed to return a string value that evaluates (if it was interpreted as python) to the instance of the object. If so, it looks like schemas created with from_dict don't create valid repr results because "GeneratedSchema" isn't a class name in this namespace. This is related to the unpickle-able issue because pickle is trying to find GeneratedSchema in the marshmallow.schema module

I don't have a solution if this is a problem, but maybe you do?

@smith-kyle
Copy link
Author

btw thanks for creating your blog! I just found this post via a google search and it helped me at work

https://stevenloria.com/lazy-properties/

@deckar01
Copy link
Member

deckar01 commented Dec 9, 2019

__repr__ ... If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value... If this is not possible, a string of the form <...some useful description...> should be returned.

https://docs.python.org/3/reference/datamodel.html#object.__repr__

AFAIK no classes or class instances produce valid expressions in the default implementation of __repr__. I don't think pickle uses this value.

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.

https://docs.python.org/3/library/pickle.html#comparison-with-marshal

This makes it sound like it is unpicklable since the type is generated. It might be possible to override __reduce__(). Maybe you could add a mixin to the class list when constructing the new class. I'm not really sure how you would reference the fields in the __reduce__() method though.

https://docs.python.org/3/library/pickle.html#object.__reduce__

@smith-kyle
Copy link
Author

smith-kyle commented Dec 11, 2019

Thanks for posting that documentation.

I tried using the __reduce__() function but it appears that since I'm trying to pickle a class object (not an instance of an object) the __reduce__ function is not called

Example

class Foo(object):
     def __reduce__(self):
         print("here")
         return "A"

pickle.dumps(Foo) # pickles the class and outputs: b'\x80\x03c__main__\nFoo\nq\x00.'
pickle.dumps(Foo()) # outputs: here and throws an error about not being able to find `A`

In the case of from_dict, it returns a class which is the thing that I'd like to pickle but when pickling a class the reduce method on that class is not called. That is to say we'd have to add a __reduce__ method to the class that all class objects are instances of in order for this to work 😅

@smith-kyle
Copy link
Author

Here's a more relevant example showing that __reduce__ is not called

class Foo(object):
    def __reduce__(self):
        return 'Foo'

 pickle.dumps(type('A', (Foo,), {}))
Can't pickle <class '__main__.A'>: attribute lookup A on __main__ failed

@hdoupe
Copy link
Contributor

hdoupe commented Dec 11, 2019

FWIW, you can pickle a Schema created with from_dict with cloudpickle:

In [1]: from marshmallow import Schema, fields                                                                                                                                                

In [2]: import pickle                                                                                                                                                                         

In [3]: import cloudpickle                                                                                                                                                                    

In [4]: pickled = cloudpickle.dumps(Schema.from_dict({"foo": fields.Str()}))                                                                                                                  

In [5]: pickle.loads(pickled)().load({"foo": "hello"})                                                                                                                                        
Out[5]: {'foo': 'hello'}

The result from cloudpickle.dumps can be loaded with pickle.

@sloria
Copy link
Member

sloria commented Dec 11, 2019

Hm, that cloudpickle is purpose-built to handle dynamically-created classes suggests that we shouldn't need to handle pickle-ability here. @smith-kyle wdyt? Would using cloudpickle meet your use case?

@smith-kyle
Copy link
Author

Possibly although cloudpickle introduces the constraint that the dumped object needs to be loaded in the exact same python version.

TBH I'd sooner just tweak my object so that when it's pickled it saves the dictionary passed to from_dict rather than the result of from_dict.

Thanks for linking cloudpickle @hdoupe

@smith-kyle
Copy link
Author

Unless a maintainer sees a path forward I'll close this out

@hdoupe
Copy link
Contributor

hdoupe commented Dec 11, 2019

Possibly although cloudpickle introduces the constraint that the dumped object needs to be loaded in the exact same python version.

TBH I'd sooner just tweak my object so that when it's pickled it saves the dictionary passed to from_dict rather than the result of from_dict.

Thanks for linking cloudpickle @hdoupe

No problem! You can set the protocol to pickle.DEFAULT_PROTOCOL so that the result is compatible with other versions of Python (cloudpickle.dumps docstring) if this issue comes back up.

@smith-kyle smith-kyle closed this Dec 12, 2019
@smith-kyle smith-kyle deleted the picklable-schema-from-dict branch January 6, 2020 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema.from_dict cannot be pickled
4 participants