-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#7072: Replace MaterializationHook with the materialized object on serialization. #7075
Conversation
…ized object on serialization.
5859622
to
6ba08c5
Compare
------- | ||
tuple | ||
""" | ||
return _hook_deserializer, (RayWrapper.materialize(self),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
return _hook_deserializer, (RayWrapper.materialize(self),) | ||
|
||
|
||
def _hook_deserializer(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this function a staticmethod or a classmethod. This way we can move it closer to the place of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When moved inside the class, the serialized data requires 20 bytes more, because the class name is also serialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, renamed to _get() and added check - if the object is int, this type is used instead because it has a 3x smaller serialized form.
@@ -307,6 +307,31 @@ def post_materialize(self, materialized): | |||
""" | |||
raise NotImplementedError() | |||
|
|||
def __reduce__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class intended to work in remote kernels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's used for lazy materialization on the host process only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this class is used in remote functions because we ues it in MetaList, which we use in remote functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which context do we use MetaList in a remote function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems some functions pass the non-materialized partition lengths to remote functions.
@AndreyPavlenko, did you verify that hm and plasticc passed? |
I've verified hm. |
I checked plasticc, it also works. @anmyachev , @dchigarev, any comments? |
tuple | ||
""" | ||
data = RayWrapper.materialize(self) | ||
return int if isinstance(data, int) else self._get, (data,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of int
, we are losing data. A bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? In case of int, tuple(int, tuple(data)) is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood the comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment why you made a separate branch for int.
------- | ||
object | ||
""" | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, all the implementations (MetaListHook and SlicerHook) return int, thus, this method is never used.
Probably, it makes sense to remove this method at all and add assert isinstance(data, int)
or raise NotImplementedError()
to __reduce__()
. If a new implementation, that returns non-int, appears, the assertion will fail. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for me
tuple | ||
""" | ||
data = RayWrapper.materialize(self) | ||
return int if isinstance(data, int) else self._get, (data,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also confused by this line, let's make it more obvious
return int if isinstance(data, int) else self._get, (data,) | |
reconstructor = int if isinstance(data, int) else self._get | |
return reconstructor, (data,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can it simply be?
def __reduce__(self):
return (lambda x: x), (data,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use len(pickle.dumps(hook))
to calculate the length of the serialized data. In case of int
this is 37, self._get
- 96, lambda x: x
- Can't pickle local object
. Anyway, creating and serializing a new object each time when we need to serialize a single int - this is a redundant overhead. I'm not familiar with the ray serializer implementation, but, if it's smart enough, it should not serialize the same object multiple times. For example, when serializing 100 hook objects, in case of a static function, this function + 100 tuples will be serialized, but not 100 lambdas + 100 tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this method. Only ints are currently supported.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date