Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code about ray.ObjectID. #3674

Merged
merged 7 commits into from Jan 13, 2019
Merged

Conversation

guoyuhong
Copy link
Contributor

What do these changes do?

The instances of ids in python are not constant. Sometimes, ’id‘ means bytes and sometimes ’id‘ means ray.ObjectID. In this PR, I will do my best to make the meaning of ’id‘ consistent to represent ray.ObjectID. Only when the id is used as a hash key, the id is needed to transformed to bytes using id().

This PR included following changes:

  1. Enable ray.ObjectID to be pickled / unpickled.
  2. Add default constructor of ray.ObjectID() to generate a NIL ID which is the same as the backend does. Convert UniqueID::nil() to a constructor #3564
  3. Add a function of ray.ObjectID.from_random() to generate a random Object Id.
  4. Replace the comparison of id bytes to NIL_ID to ray.ObjectID's is_nil().
  5. Change the name of common_error to CommonError which is the way ObjectID, RayletClient, Task, etc. use.

Related issue number

N/A

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10541/
Test PASSed.

@pcmoritz
Copy link
Contributor

pcmoritz commented Jan 2, 2019

The changes regarding NIL ids and the error class renaming look good to me!

For the random object id generation, we should make sure that from_random is fork save if we expose it to the python side, see the discussion in apache/arrow#2400 and

import ray
import multiprocessing as mp
def child(): print(ray.ObjectID.from_random())
for i in range(4): mp.Process(target=child).start()

ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)

Pickling object_ids is a double edged sword. It can be very convenient for users, but can also be over-used and make fault-tolerance harder. I'd say we shouldn't do it for now and let users explicitly call .id() if they need to, to make sure they understand something potentially dangerous is going on.

python/ray/actor.py Outdated Show resolved Hide resolved
python/ray/ray_constants.py Outdated Show resolved Hide resolved
src/ray/raylet/lib/python/common_extension.cc Outdated Show resolved Hide resolved
{"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_NOARGS,
"Say how to pickle this ObjectID. This raises an exception to prevent"
"object IDs from being serialized."},
{"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_VARARGS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think there is actually a good reason to not allow object IDs to be pickled, but I'm not exactly sure what. @robertnishihara?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my concern was that people would define remote functions that captured object IDs and that most of the time this happened it would be an accident.

I'm not really sure how much this kind of error would occur, since I haven't seen too many people complaining about object IDs not being pickleable on GitHub.

It does force us to do some ugly stuff to make actor handles pickleable (since actor handles include a bunch of object IDs).

I could go either way on this one. @guoyuhong what were your reasons for making them pickleable? Is it to simplify the actor handle code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is #1317.

@robertnishihara
Copy link
Collaborator

robertnishihara commented Jan 3, 2019

Thanks @guoyuhong!

Similar to @raulchen's comment in #3564 (comment), I think I prefer ray.ObjectID.nil() to ray.ObjectID() because it is explicit and it's clear how it relates to object_id.is_nil() (it's also more parallel to ray.ObjectID.from_random()). What do you think about that?

Also, in the future, instead of having a using just ObjectID, we should probably have equivalent classes ActorID and ActorHandleID, and so on instead of just using ObjectID everywhere. Similarly in in C++, we should make these actually different classes instead of typedefs. This comment is not relevant for this PR though.

@raulchen
Copy link
Contributor

raulchen commented Jan 4, 2019

@pcmoritz @robertnishihara Do you remember what particular issue(s) make fault-tolerance harder?
Some of my colleagues complained about ObjectID not being pickle-able. I'd like to understand the problems and see if they can be resolved. Thanks.

python/ray/worker.py Outdated Show resolved Hide resolved
src/ray/raylet/lib/python/common_extension.cc Outdated Show resolved Hide resolved
src/ray/raylet/lib/python/common_extension.cc Outdated Show resolved Hide resolved
@@ -487,7 +487,7 @@ MOD_INIT(libraylet_library_python) {
char common_error[] = "common.error";
CommonError = PyErr_NewException(common_error, NULL, NULL);
Py_INCREF(CommonError);
PyModule_AddObject(m, "common_error", CommonError);
PyModule_AddObject(m, "CommonError", CommonError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't need to be addressed in this PR.
We can define this CommonError in Python code and import it to the C extension. That would simplify the code. Also, the name CommonError sounds ambiguous to me. We should use more specific exception types depending on the concrete cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will temporarily change the name to RayCommonError. I think after @suquark 's cython change. This part will easier.

@pcmoritz
Copy link
Contributor

pcmoritz commented Jan 4, 2019

@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions.

I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more.

@raulchen
Copy link
Contributor

raulchen commented Jan 4, 2019

@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions.

I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more.

Thanks. If we don't see any potential issues by allowing ObjectID to be picklable, I prefer to give it a try.

@robertnishihara
Copy link
Collaborator

@raulchen @pcmoritz that sounds fine, let's give it a try.

@suquark suquark added this to In progress in Code quality & design Jan 8, 2019
@suquark suquark moved this from In progress to Needs review in Code quality & design Jan 8, 2019
@suquark
Copy link
Member

suquark commented Jan 11, 2019

Any progress in this PR? I am considering closing it because ray.ObjectID has been implemented in #3541.

@guoyuhong
Copy link
Contributor Author

@suquark Thanks for the reminding. I will finish this PR. The python part can be will not conflict with your PR and test_object_id_properties can also test the cython code for ObjectID.

@guoyuhong guoyuhong force-pushed the refineObjectID branch 2 times, most recently from d096764 to 006b43e Compare January 11, 2019 06:17
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10761/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10762/
Test FAILed.

@guoyuhong
Copy link
Contributor Author

I have updated the PR.

  1. Change CommonError to RayCommonError.
  2. Remove from_random and use the old way since it may have some concerns.
  3. use from ray import ObjectID in worker.py and actor.py which use ObjectID frequently.
  4. Use ray.ObjectID.nil_id() to replace ray.ObjectID() to generate nil ID.
  5. I also remove .id() in all hash keys.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10767/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10769/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10772/
Test PASSed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a minor comment and will approve once that and the question about pickling ObjectIDs is addressed.

@@ -70,7 +70,7 @@ def push_error_to_driver(worker,
will be serialized with json and stored in Redis.
"""
if driver_id is None:
driver_id = ray_constants.NIL_JOB_ID.id()
driver_id = ray.ObjectID.nil_id().id()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can change driver_id to be a ray.ObjectID instead of handling raw bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanie-wang I have changed driver_id to ray.ObjectID. For the question about pickling ObjectID, @raulchen had discussed with @robertnishihara , we decided to give it a try. From current Jenkins and Travis test, it works fine. We need to monitor it continuously to see whether users will have problems or could there be difficult bugs when it's pickleable.

@guoyuhong guoyuhong force-pushed the refineObjectID branch 2 times, most recently from 096e342 to 83dbc99 Compare January 12, 2019 12:12
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10790/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10791/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10794/
Test FAILed.

@guoyuhong
Copy link
Contributor Author

@AmplabJenkins retest this, please.

test/runtest.py Outdated
@@ -2310,12 +2310,14 @@ def test_global_state_api(shutdown_only):
assert len(task_table) == 1
assert driver_task_id == list(task_table.keys())[0]
task_spec = task_table[driver_task_id]["TaskSpec"]
nil_id_hex = ray.experimental.state.binary_to_hex(
Copy link
Contributor

@pcmoritz pcmoritz Jan 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can remove this and just use "ray.ObjectID.nil_id().hex()" below

Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, rest LGTM

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10816/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10817/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10820/
Test PASSed.

@pcmoritz pcmoritz merged commit d2cf856 into ray-project:master Jan 13, 2019
Code quality & design automation moved this from Needs review to Done Jan 13, 2019
@guoyuhong guoyuhong deleted the refineObjectID branch January 25, 2019 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants