Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FrozenInstanceError with pickle in Spark #79

Closed
mathieulongtin opened this issue Sep 7, 2016 · 13 comments
Closed

FrozenInstanceError with pickle in Spark #79

mathieulongtin opened this issue Sep 7, 2016 · 13 comments

Comments

@mathieulongtin
Copy link

Using Apache Spark, with attr 16.1.0, I get this error.

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File ".../spark/python/pyspark/worker.py", line 98, in main
    command = pickleSer._read_with_length(infile)
  File ".../spark/python/pyspark/serializers.py", line 164, in _read_with_length
    return self.loads(obj)
  File ".../spark/python/pyspark/serializers.py", line 422, in loads
    return pickle.loads(obj)
  File ".../lib/python2.7/site-packages/attr/_make.py", line 617, in __setattr__
    raise FrozenInstanceError()
FrozenInstanceError

Unfortunately, I can't reproduce it with simple code, for some reason. Using IPython, I managed to track it down to rebuilding Attributes class:

  /usr/lib64/python2.7/pickle.py(1247)load_build()
   1245         if slotstate:
   1246             for k, v in slotstate.items():
-> 1247                 setattr(inst, k, v)

Spark uses a custom pickler in order to support local function and classes, but still uses the standard unpickler.

This was not an issue with 16.0.0, so sticking to that for now.

@hynek
Copy link
Member

hynek commented Sep 7, 2016

Well because 16.0 didn't have frozen classes? :) Pickle seems to use setattr therefore you can't use it with frozen classes.

@mathieulongtin
Copy link
Author

Actually, my code doesn't use frozen. attr.Attribute uses frozen, and that's what breaks. This exact code is working fine with 16.0.0.

mathieulongtin pushed a commit to mathieulongtin/attrs that referenced this issue Sep 7, 2016
@hynek
Copy link
Member

hynek commented Sep 7, 2016

Oh drat!

@mathieulongtin
Copy link
Author

So it turns out Spark's pickle does fancy things to pickle things pickle can't pickle. (say that fast).

Basic pickle couldn't serialize an attr.Attribute at all and mumbled something about getstate. Sure enough, you can fix the whole thing by adding these methods to Attribute:

      def __getstate__(self):
          """Play nice with pickle"""
          return tuple(getattr(self, a) for a in self.__slots__)

      def __setstate__(self, state):
          """Play nice with pickle"""
          __bound_setattr = _obj_setattr.__get__(self, Attribute)
          for name, value in zip(self.__slots__, state):
              __bound_setattr(name, value)

And it works with Spark.

Basic test:

x = attr.Attribute.from_counting_attr('x', attr.ib())
assert x == pickle.loads(pickle.dumps(x))

This fails with __getstate__ and succeeds with.
I'll send a pull request once I understand your test harness better. I'm guessing this goes in the dark magic section.

It would be a good idea to add this to frozen classes so they work with Pickle.

@Tinche
Copy link
Member

Tinche commented Sep 7, 2016

(Commenting on the commit)

Ha, that's very clever. It cuts down attr.Attribute instantiation time by ~45%. However, it slows down access on the Attribute fields, so for example asdict is slower by ~12%. It's an interesting tradeoff, but I'd always trade class definition time (~once per process) for runtime operations like asdict that get called n times.

I wouldn't know about picking attr.Attribute, but pickling instances of non-frozen attr classes in general seems to work fine. (Frozen, not so much, yeah.) Does PySpark require you to pickle classes, as well as instances?

@mathieulongtin
Copy link
Author

PySpark is more aggressive about what it pickles so it can support lambda and inline classes.

mathieulongtin pushed a commit to mathieulongtin/attrs that referenced this issue Sep 8, 2016
@hynek
Copy link
Member

hynek commented Sep 8, 2016

FWIW, I could live with adding PySpark to the test suite to prevent future regressions.

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

Just to clarify, my initial comment was to do with changing the Attribute class to have the _values tuple and properties. Adding __getstate__ and __setstate__ to frozen classes basically has no impact. From playing around in the REPL, it even works with nested attrs classes.

What are the exact combinations of parameters that trigger this? slots=True and frozen=True?

@mathieulongtin
Copy link
Author

PR done. I added a test to pickle all the classes defined in the dark magic test. It seems to work fine with frozen classes without __getstate__. The problem was only slots classes.

It works fine with Spark, or rather, my previously working Spark code works with this patch.

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

Hm, really? On Python 3.5, only both slots=True and frozen=True fails. https://gist.github.com/Tinche/33b702c6a1c3e4093be95953720cd80c

@mathieulongtin
Copy link
Author

I test with 2.7. Any slots=True failed. frozen=True, slots=False works fine.

@Tinche
Copy link
Member

Tinche commented Sep 8, 2016

Hm, can confirm, on 2.7 cases B and C fail.

On 3.5 only C fails.

@mathieulongtin
Copy link
Author

Slots seems to be supported by Pickle in Python 3.5 without __getstate__. Anyway, my PR seems to cure all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants