New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue79 pickle 2 #81

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@mathieulongtin

mathieulongtin commented Sep 8, 2016

Fix for issue #79. Add __getstate__ and __setstate__ to classes with slots=True, including attr.Attribute. Added test to make sure objects work with pickle.

@codecov-io

This comment has been minimized.

codecov-io commented Sep 8, 2016

Current coverage is 100% (diff: 100%)

Merging #81 into master will not change coverage

@@           master   #81   diff @@
===================================
  Files           8     8          
  Lines         398   416    +18   
  Methods         0     0          
  Messages        0     0          
  Branches       88    93     +5   
===================================
+ Hits          398   416    +18   
  Misses          0     0          
  Partials        0     0          

Powered by Codecov. Last update dd36808...a68abf0

@Tinche

This comment has been minimized.

Member

Tinche commented Sep 8, 2016

This will bring a slight performance penalty to pickling (details at the end). This is not just theoretical for me, we pickle stuff at the end of every request.

Now obviously slower is better than not working, but I would just suggest adding a special case for Python 3: add the special methods iff slots=True and (Python2 or frozen=True). You can look in attr._compat for a way of detecting Python 2.

Benchmark example:

  1. pip install perf
  2. paste into t.py:
import attr


@attr.s(slots=True)
class A(object):
    a = attr.ib()
    b = attr.ib()
    c = attr.ib()
    d = attr.ib()
    e = attr.ib()
    f = attr.ib()

(pickling doesn't seem to work with just creating the class in the timeit invocation)
3. run pyperf timeit --rigorous -g -s "import pickle; from t import A; a = A(1, 2, 3, 4, 5, 6)" "pickle.dumps(a)"
For me, the difference is 3.83 us +- 0.18 (master) to 4.96 us +- 0.13 (yours).

@mathieulongtin

This comment has been minimized.

mathieulongtin commented Sep 8, 2016

It gets more complicated: pickle protocol 2 and up support slots natively. Python 3.5 uses protocol 3 by defaults. Python 2 only supports protocol 0, 1 and 2 and defaults to 0.

So do I:

  1. set __getstate__/__setstate__ regardless
  2. set __getstate__ only when needed by the default protocol (py2 and slots; py3 and slots and frozen)
  3. put an option to attrs pickle_protocol and set __getstate__ if needed by the protocol version. It would default to the python version's default protocol.
  4. put a module global attr.pickle_protocol and use that to decide whether to set __getstate__.

1 causes a performance downgrade with Python 3 and probably with Python 2 using proto2.

2 means Python 3 wouldn't work if pickle is used with protocol 0 or 1. It probably slows down Python 2 if proto2 is used.

3 is more flexible, but could be more problematic, as a library could define classes with attrs that are not compatible with your pickle protocol of choice.

4 is probably the best. It would work like option 2 by default, but it would be possible to say, at the beginning of your program, "I use protocol 2" even though I use Python 2, so don't put getstate that would slow me down. But most of the time, you wouldn't need to think about it.

What do you think?

@Tinche

This comment has been minimized.

Member

Tinche commented Sep 8, 2016

Before we consider it, is it possible, and if so, how convenient would it be for you and any other PySpark user to use protocol version 2 on Python 2.7?

If it's no problem, we could just document the requirement.

@Tinche

This comment has been minimized.

Member

Tinche commented Sep 8, 2016

Great piece of research, btw.

@mathieulongtin

This comment has been minimized.

mathieulongtin commented Sep 9, 2016

PySpark actually uses protocol 2. So it only had an issue because it tried to serialize attr.Attribute and other attr internals which are slots and frozen.

@hynek what do you think? I don't want to force a performance penalty on everybody.

A few more options:

  1. only use __getstate__ on frozen classes regardless of Python version, and put a note in the doc that slots=True require pickle protocol>=2.
  2. don't use frozen on internal attr classes and note that frozen classes don't work well with pickle in the doc.
@Tinche

This comment has been minimized.

Member

Tinche commented Sep 9, 2016

My personal recommendations (identical to @mathieulongtin except for the third bullet):

  • put the methods on attr.Attribute, since pickling it won't work without in any case.
  • document pickling slot classes may require pickle protocol >= 2.
  • put the methods on attrs classes that are both slots and frozen. (Just frozen works without, so no need for the extra penalty.)

Minimal performance impact (only for classes that are both slots and frozen, and these don't work without it anyway), and I think requiring pickle protocol >= 2 is not a great burden.

The fact of the matter is, ordinary (i.e. non-attrs) slot classes require a custom __getstate__/__setstate__ anyway for protocol < 2. People can always write their own getstate if they absolutely need protocol < 2 directly in the class. Apparently protocol 2 was introduced in Python 2.3 and is more efficient anyway.

Of course @hynek's is the last word. :)

@hynek

This comment has been minimized.

Member

hynek commented Sep 9, 2016

Requiring protocol 2 sounds good to me.

mathieu longtin added some commits Sep 9, 2016

mathieu longtin
@mathieulongtin

This comment has been minimized.

mathieulongtin commented Sep 9, 2016

I applied the last suggestion by @Tinche. I consider this complete now. It works fine with PySpark.

@Tinche Tinche referenced this pull request Sep 9, 2016

Closed

Improve tests, bandage assoc. #84

@hynek hynek closed this in e59c360 Sep 10, 2016

@hynek

This comment has been minimized.

Member

hynek commented Sep 10, 2016

I’ve made a few minor adjustments and merged it.

Thanks to everyone involved! I don’t care about pickles myself so it’s great to have domain experts. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment