New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor code about ray.ObjectID. #3674
Conversation
Test PASSed. |
The changes regarding NIL ids and the error class renaming look good to me! For the random object id generation, we should make sure that from_random is fork save if we expose it to the python side, see the discussion in apache/arrow#2400 and import ray
import multiprocessing as mp
def child(): print(ray.ObjectID.from_random())
for i in range(4): mp.Process(target=child).start()
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1) Pickling object_ids is a double edged sword. It can be very convenient for users, but can also be over-used and make fault-tolerance harder. I'd say we shouldn't do it for now and let users explicitly call .id() if they need to, to make sure they understand something potentially dangerous is going on. |
{"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_NOARGS, | ||
"Say how to pickle this ObjectID. This raises an exception to prevent" | ||
"object IDs from being serialized."}, | ||
{"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_VARARGS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think there is actually a good reason to not allow object IDs to be pickled, but I'm not exactly sure what. @robertnishihara?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my concern was that people would define remote functions that captured object IDs and that most of the time this happened it would be an accident.
I'm not really sure how much this kind of error would occur, since I haven't seen too many people complaining about object IDs not being pickleable on GitHub.
It does force us to do some ugly stuff to make actor handles pickleable (since actor handles include a bunch of object IDs).
I could go either way on this one. @guoyuhong what were your reasons for making them pickleable? Is it to simplify the actor handle code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is #1317.
Thanks @guoyuhong! Similar to @raulchen's comment in #3564 (comment), I think I prefer Also, in the future, instead of having a using just |
@pcmoritz @robertnishihara Do you remember what particular issue(s) |
@@ -487,7 +487,7 @@ MOD_INIT(libraylet_library_python) { | |||
char common_error[] = "common.error"; | |||
CommonError = PyErr_NewException(common_error, NULL, NULL); | |||
Py_INCREF(CommonError); | |||
PyModule_AddObject(m, "common_error", CommonError); | |||
PyModule_AddObject(m, "CommonError", CommonError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't need to be addressed in this PR.
We can define this CommonError
in Python code and import it to the C extension. That would simplify the code. Also, the name CommonError
sounds ambiguous to me. We should use more specific exception types depending on the concrete cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will temporarily change the name to RayCommonError. I think after @suquark 's cython change. This part will easier.
@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions. I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more. |
Thanks. If we don't see any potential issues by allowing ObjectID to be picklable, I prefer to give it a try. |
Any progress in this PR? I am considering closing it because |
@suquark Thanks for the reminding. I will finish this PR. The python part can be will not conflict with your PR and |
d096764
to
006b43e
Compare
Test FAILed. |
Test FAILed. |
I have updated the PR.
|
Test FAILed. |
88feeb1
to
f0e7761
Compare
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a minor comment and will approve once that and the question about pickling ObjectID
s is addressed.
python/ray/utils.py
Outdated
@@ -70,7 +70,7 @@ def push_error_to_driver(worker, | |||
will be serialized with json and stored in Redis. | |||
""" | |||
if driver_id is None: | |||
driver_id = ray_constants.NIL_JOB_ID.id() | |||
driver_id = ray.ObjectID.nil_id().id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can change driver_id
to be a ray.ObjectID
instead of handling raw bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephanie-wang I have changed driver_id
to ray.ObjectID
. For the question about pickling ObjectID
, @raulchen had discussed with @robertnishihara , we decided to give it a try. From current Jenkins and Travis test, it works fine. We need to monitor it continuously to see whether users will have problems or could there be difficult bugs when it's pickleable.
096e342
to
83dbc99
Compare
Test PASSed. |
Test PASSed. |
83dbc99
to
b172316
Compare
Test FAILed. |
@AmplabJenkins retest this, please. |
test/runtest.py
Outdated
@@ -2310,12 +2310,14 @@ def test_global_state_api(shutdown_only): | |||
assert len(task_table) == 1 | |||
assert driver_task_id == list(task_table.keys())[0] | |||
task_spec = task_table[driver_task_id]["TaskSpec"] | |||
nil_id_hex = ray.experimental.state.binary_to_hex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can remove this and just use "ray.ObjectID.nil_id().hex()" below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, rest LGTM
Test PASSed. |
Test PASSed. |
Test PASSed. |
What do these changes do?
The instances of ids in python are not constant. Sometimes, ’id‘ means
bytes
and sometimes ’id‘ meansray.ObjectID
. In this PR, I will do my best to make the meaning of ’id‘ consistent to representray.ObjectID
. Only when the id is used as a hash key, the id is needed to transformed to bytes usingid()
.This PR included following changes:
ray.ObjectID
to be pickled / unpickled.ray.ObjectID()
to generate a NIL ID which is the same as the backend does. Convert UniqueID::nil() to a constructor #3564ray.ObjectID.from_random()
to generate a random Object Id.NIL_ID
toray.ObjectID
'sis_nil()
.common_error
toCommonError
which is the wayObjectID
,RayletClient
,Task
, etc. use.Related issue number
N/A