Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue79 pickle 2 #81

Closed

Conversation

mathieulongtin
Copy link

Fix for issue #79. Add __getstate__ and __setstate__ to classes with slots=True, including attr.Attribute. Added test to make sure objects work with pickle.

@codecov-io
Copy link

codecov-io commented Sep 8, 2016

Current coverage is 100% (diff: 100%)

Merging #81 into master will not change coverage

@@           master   #81   diff @@
===================================
  Files           8     8          
  Lines         398   416    +18   
  Methods         0     0          
  Messages        0     0          
  Branches       88    93     +5   
===================================
+ Hits          398   416    +18   
  Misses          0     0          
  Partials        0     0          

Powered by Codecov. Last update dd36808...a68abf0

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

This will bring a slight performance penalty to pickling (details at the end). This is not just theoretical for me, we pickle stuff at the end of every request.

Now obviously slower is better than not working, but I would just suggest adding a special case for Python 3: add the special methods iff slots=True and (Python2 or frozen=True). You can look in attr._compat for a way of detecting Python 2.

Benchmark example:

  1. pip install perf
  2. paste into t.py:
import attr


@attr.s(slots=True)
class A(object):
    a = attr.ib()
    b = attr.ib()
    c = attr.ib()
    d = attr.ib()
    e = attr.ib()
    f = attr.ib()

(pickling doesn't seem to work with just creating the class in the timeit invocation)
3. run pyperf timeit --rigorous -g -s "import pickle; from t import A; a = A(1, 2, 3, 4, 5, 6)" "pickle.dumps(a)"
For me, the difference is 3.83 us +- 0.18 (master) to 4.96 us +- 0.13 (yours).

@mathieulongtin
Copy link
Author

mathieulongtin commented Sep 8, 2016

It gets more complicated: pickle protocol 2 and up support slots natively. Python 3.5 uses protocol 3 by defaults. Python 2 only supports protocol 0, 1 and 2 and defaults to 0.

So do I:

  1. set __getstate__/__setstate__ regardless
  2. set __getstate__ only when needed by the default protocol (py2 and slots; py3 and slots and frozen)
  3. put an option to attrs pickle_protocol and set __getstate__ if needed by the protocol version. It would default to the python version's default protocol.
  4. put a module global attr.pickle_protocol and use that to decide whether to set __getstate__.

1 causes a performance downgrade with Python 3 and probably with Python 2 using proto2.

2 means Python 3 wouldn't work if pickle is used with protocol 0 or 1. It probably slows down Python 2 if proto2 is used.

3 is more flexible, but could be more problematic, as a library could define classes with attrs that are not compatible with your pickle protocol of choice.

4 is probably the best. It would work like option 2 by default, but it would be possible to say, at the beginning of your program, "I use protocol 2" even though I use Python 2, so don't put getstate that would slow me down. But most of the time, you wouldn't need to think about it.

What do you think?

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

Before we consider it, is it possible, and if so, how convenient would it be for you and any other PySpark user to use protocol version 2 on Python 2.7?

If it's no problem, we could just document the requirement.

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

Great piece of research, btw.

@mathieulongtin
Copy link
Author

PySpark actually uses protocol 2. So it only had an issue because it tried to serialize attr.Attribute and other attr internals which are slots and frozen.

@hynek what do you think? I don't want to force a performance penalty on everybody.

A few more options:

  1. only use __getstate__ on frozen classes regardless of Python version, and put a note in the doc that slots=True require pickle protocol>=2.
  2. don't use frozen on internal attr classes and note that frozen classes don't work well with pickle in the doc.

@Tinche
Copy link
Member

Tinche commented Sep 9, 2016

My personal recommendations (identical to @mathieulongtin except for the third bullet):

  • put the methods on attr.Attribute, since pickling it won't work without in any case.
  • document pickling slot classes may require pickle protocol >= 2.
  • put the methods on attrs classes that are both slots and frozen. (Just frozen works without, so no need for the extra penalty.)

Minimal performance impact (only for classes that are both slots and frozen, and these don't work without it anyway), and I think requiring pickle protocol >= 2 is not a great burden.

The fact of the matter is, ordinary (i.e. non-attrs) slot classes require a custom __getstate__/__setstate__ anyway for protocol < 2. People can always write their own getstate if they absolutely need protocol < 2 directly in the class. Apparently protocol 2 was introduced in Python 2.3 and is more efficient anyway.

Of course @hynek's is the last word. :)

@hynek
Copy link
Member

hynek commented Sep 9, 2016

Requiring protocol 2 sounds good to me.

@mathieulongtin
Copy link
Author

I applied the last suggestion by @Tinche. I consider this complete now. It works fine with PySpark.

@hynek hynek closed this in e59c360 Sep 10, 2016
@hynek
Copy link
Member

hynek commented Sep 10, 2016

I’ve made a few minor adjustments and merged it.

Thanks to everyone involved! I don’t care about pickles myself so it’s great to have domain experts. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants