-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add __getstate__ and __setstate__ for Nodes #2938
Conversation
|
@patcao can you link an issue number if you're working on one. Or provide a description to know why you are proposing this change please. Thank you |
ibis/expr/operations.py
Outdated
| @@ -59,6 +59,18 @@ def _pp(x): | |||
|
|
|||
| return '{}({})'.format(opname, ', '.join(pprint_args)) | |||
|
|
|||
| def __getstate__(self): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add docstring for these two method describing what you are doing and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type these things as well
| @@ -59,6 +59,18 @@ def _pp(x): | |||
|
|
|||
| return '{}({})'.format(opname, ', '.join(pprint_args)) | |||
|
|
|||
| def __getstate__(self): | |||
| excluded_slots = {'_expr_cached', '_hash'} | |||
| return { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain how it works when slot is another ibis TableExpr or ColumnExpr, can you explain it works?
|
@datapythonista Chime in a little bit here about the rationale here. We are trying to build "consistent hash" for ibis Expression (hash that doesn't change cross Python interpreter restart for ibis Expression) and the current implementation depends on first pickling the ibis Expression then use sth like hashlib.sha256() to compute the hash from the pickled bytes. During this, we found that because ibis Node class that "_cached_expr" and "_hash" attributes, the pickled bytes can be different for the same table depending on whether those two fields are created, which is not a very desirable behavior. This PR aims to fix the ser/de method so that they produce consistent results for the equivalent Node object. |
ibis/expr/operations.py
Outdated
| @@ -59,6 +59,18 @@ def _pp(x): | |||
|
|
|||
| return '{}({})'.format(opname, ', '.join(pprint_args)) | |||
|
|
|||
| def __getstate__(self): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type these things as well
ibis/tests/expr/test_table.py
Outdated
| @@ -1298,6 +1298,12 @@ def test_pickle_table_expr(): | |||
| assert t1.equals(t0) | |||
|
|
|||
|
|
|||
| def test_pickle_table_node(table): | |||
| n0 = table.op() | |||
| n1 = pickle.loads(pickle.dumps(n0)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a testing function that does this e.g. lines 1303 & 1304 and in ibis/tests/util.py
call it : assert_pickle_roundtrip(expr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls add a release note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. @icexelloss has a couple of comments. pls add a release note as well.
ibis/expr/operations.py
Outdated
| if slot not in excluded_slots | ||
| } | ||
|
|
||
| def __setstate__(self, state: Dict[str, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return -> None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@patcao can you add a release note. ping on green. |
docs/source/release/index.rst
Outdated
| @@ -12,6 +12,7 @@ Release Notes | |||
| These release notes are for versions of ibis **1.0 and later**. Release | |||
| notes for pre-1.0 versions of ibis can be found at :doc:`release-pre-1.0` | |||
|
|
|||
| * :feature:`2938` Add `__getstate__` and `__setstate__` methods to Node | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just say, that serialization-deserialization via pickle is now byte compatible between different processes.
|
thanks @patcao |
From icexelloss:
We are trying to build "consistent hash" for ibis Expression (hash that doesn't change cross Python interpreter restart for ibis Expression) and the current implementation depends on first pickling the ibis Expression then use something like hashlib.sha256() to compute the hash from the pickled bytes.
During this, we found that because ibis Node class that "_cached_expr" and "_hash" attributes, the pickled bytes can be different for the same table depending on whether those two fields are created, which is not a very desirable behavior. This PR aims to fix the serialization/deserialization method so that they produce consistent results for the equivalent Node object.