-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Make StructValue picklable #2577
Conversation
ibis/expr/api.py
Outdated
| def _destructure(expr: StructColumn) -> DestructColumn: | ||
| """ Destructure a ``Struct`` to create a destruct column. | ||
| def _struct_get_attr(expr: StructValue, attr_name: str) -> ValueExpr: | ||
| """Get field named `attr_name` from the ``StructValue`` expression `expr`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are now just defining __getstate___ and __setstate___ on a StructValueExpr itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is extremely convoluted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are now just defining
__getstate___and__setstate___on a StructValueExpr itself?
Do you mean defining __getstate__ and __setstate__ but mimic the default implementation (i.e. their behavior if they weren't defined), so that __getattr__ is not triggered?
I think that would work too, but I'm not sure how much better it would be:
- It feels odd to be explicitly defining
__getstate__and__setstate__with an implementation that is just their default implementation - On the other hand, my current implementation is essentially defining
__getattr__with an exclusion (that says that it should not be used for__getstate__and__setstate__), which seems a little easier to follow to me
I would agree that both solutions feel like hacks though
ibis/expr/api.py
Outdated
| def _struct_get_attr(expr: StructValue, attr_name: str) -> ValueExpr: | ||
| """Get field named `attr_name` from the ``StructValue`` expression `expr`. | ||
|
|
||
| Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also why are you defining __dir__ on StructValueExpr in the first place? this is the cause of the trouble. it would be much better to provide an attribute which holds the data itself, e.g. .attr or something, then you don't have these namespace collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean why we define StructValue.__getattr__? (and that we should remove StructValue.__getattr__, and only allow access to fields of the struct through StructValue.__getitem__?)
The underlying struct is already stored in the way you describe if I'm understanding correctly. Information about the underlying struct is stored in an attribute called StructValue._dtype, and then doing my_struct_value.foo or my_struct_value['foo'] constructs a operation+expression (using information from StructValue._dtype) that would pull the field's value out when executed.
Getting rid of StructValue.__getattr__ sounds like a good solution to me. It would simplify all of this (no need for any pickling-specific fix). I would have to do a little digging to see whether it's used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no the current implementation is a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of StructValue.getattr sounds like a good solution to me. It would simplify all of this (no need for any pickling-specific fix). I would have to do a little digging to see whether it's used anywhere.
yes do this, its way better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. can you add some documentation / examples (not sure exactly where) about getting these sub-fields
|
I've made the changes—just to confirm we're OK with the fact that this changes the Side note: I noticed |
|
Oops messages crossed each other, but yes sure I'll look into adding documentation. Looks like a few tests will have to be fixed as well |
|
can you rebase once more |
StructValueis an expr class (parent class ofStructScalarandStructColumn)Problem (Why is
StructValueunpicklable?)First,
StructValuedefines a__getattr__so that users may access fields in the struct through attrs. For example:my_struct_scalar.footo access the value of the field named'foo'.Some background on pickle in case not already familiar: During the pickling process, pickle will check if
StructValuehas a__getstate__method (generally speaking, this method can be defined for a class if custom pickling behavior is desired). Pickle will then call that method if found.Now since
StructValuehas a custom__getattr__, and pickle is looking for an attribute named'__getstate__', it will trigger the logic that tries to fetch a field named'__getstate__'from the struct, which will fail.(There's also a
__setstate__method that pickle attempts to call during the unpickling process. The same logic above applies.)Solution
In
StructValue's__getattr__method, raise anAttributeErrorif the attribute being asked for is either'__getstate__'or'__setstate__'. This tells pickle that we aren't trying to define an implementation for either method.Why always raise an error in those cases?
In other words, why not confirm the struct actually does not have any field named
'__getstate__'or'__setstate__'in the struct before raising an error? Doing a check like that would cover the edge case that the user has purposely defined a struct with a field named'__getstate__'or'__setstate__'.This won't work during the unpickling process. Here's a walkthrough:
__setstate__StructValue's__getattr__methodStructValuepopulated. This means that we don't have access to the underlying struct, and can't check if a field named'__setstate__'exists in the struct.