Refactor code about ray.ObjectID. #3674

guoyuhong · 2019-01-02T07:38:17Z

What do these changes do?

The instances of ids in python are not constant. Sometimes, ’id‘ means bytes and sometimes ’id‘ means ray.ObjectID. In this PR, I will do my best to make the meaning of ’id‘ consistent to represent ray.ObjectID. Only when the id is used as a hash key, the id is needed to transformed to bytes using id().

This PR included following changes:

Enable ray.ObjectID to be pickled / unpickled.
Add default constructor of ray.ObjectID() to generate a NIL ID which is the same as the backend does. Convert UniqueID::nil() to a constructor #3564
Add a function of ray.ObjectID.from_random() to generate a random Object Id.
Replace the comparison of id bytes to NIL_ID to ray.ObjectID's is_nil().
Change the name of common_error to CommonError which is the way ObjectID, RayletClient, Task, etc. use.

Related issue number

N/A

AmplabJenkins · 2019-01-02T09:39:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10541/
Test PASSed.

pcmoritz · 2019-01-02T21:24:54Z

The changes regarding NIL ids and the error class renaming look good to me!

For the random object id generation, we should make sure that from_random is fork save if we expose it to the python side, see the discussion in apache/arrow#2400 and

import ray
import multiprocessing as mp
def child(): print(ray.ObjectID.from_random())
for i in range(4): mp.Process(target=child).start()

ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)

Pickling object_ids is a double edged sword. It can be very convenient for users, but can also be over-used and make fault-tolerance harder. I'd say we shouldn't do it for now and let users explicitly call .id() if they need to, to make sure they understand something potentially dangerous is going on.

python/ray/actor.py

python/ray/function_manager.py

python/ray/ray_constants.py

src/ray/raylet/lib/python/common_extension.cc

stephanie-wang · 2019-01-03T00:28:10Z

src/ray/raylet/lib/python/common_extension.cc

-    {"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_NOARGS,
-     "Say how to pickle this ObjectID. This raises an exception to prevent"
-     "object IDs from being serialized."},
+    {"__reduce__", (PyCFunction)PyObjectID___reduce__, METH_VARARGS,


Hmm, I think there is actually a good reason to not allow object IDs to be pickled, but I'm not exactly sure what. @robertnishihara?

So, my concern was that people would define remote functions that captured object IDs and that most of the time this happened it would be an accident.

I'm not really sure how much this kind of error would occur, since I haven't seen too many people complaining about object IDs not being pickleable on GitHub.

It does force us to do some ugly stuff to make actor handles pickleable (since actor handles include a bunch of object IDs).

I could go either way on this one. @guoyuhong what were your reasons for making them pickleable? Is it to simplify the actor handle code?

I see there is #1317.

robertnishihara · 2019-01-03T06:09:51Z

Thanks @guoyuhong!

Similar to @raulchen's comment in #3564 (comment), I think I prefer ray.ObjectID.nil() to ray.ObjectID() because it is explicit and it's clear how it relates to object_id.is_nil() (it's also more parallel to ray.ObjectID.from_random()). What do you think about that?

Also, in the future, instead of having a using just ObjectID, we should probably have equivalent classes ActorID and ActorHandleID, and so on instead of just using ObjectID everywhere. Similarly in in C++, we should make these actually different classes instead of typedefs. This comment is not relevant for this PR though.

raulchen · 2019-01-04T03:18:22Z

@pcmoritz @robertnishihara Do you remember what particular issue(s) make fault-tolerance harder?
Some of my colleagues complained about ObjectID not being pickle-able. I'd like to understand the problems and see if they can be resolved. Thanks.

python/ray/worker.py

src/ray/raylet/lib/python/common_extension.cc

raulchen · 2019-01-04T03:47:15Z

src/ray/raylet/lib/python/raylet_extension.cc

@@ -487,7 +487,7 @@ MOD_INIT(libraylet_library_python) {
  char common_error[] = "common.error";
  CommonError = PyErr_NewException(common_error, NULL, NULL);
  Py_INCREF(CommonError);
-  PyModule_AddObject(m, "common_error", CommonError);
+  PyModule_AddObject(m, "CommonError", CommonError);


This comment doesn't need to be addressed in this PR.
We can define this CommonError in Python code and import it to the C extension. That would simplify the code. Also, the name CommonError sounds ambiguous to me. We should use more specific exception types depending on the concrete cases.

I will temporarily change the name to RayCommonError. I think after @suquark 's cython change. This part will easier.

pcmoritz · 2019-01-04T05:00:22Z

@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions.

I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more.

raulchen · 2019-01-04T09:00:39Z

@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions.

I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more.

Thanks. If we don't see any potential issues by allowing ObjectID to be picklable, I prefer to give it a try.

robertnishihara · 2019-01-07T22:23:52Z

@raulchen @pcmoritz that sounds fine, let's give it a try.

suquark · 2019-01-11T04:08:21Z

Any progress in this PR? I am considering closing it because ray.ObjectID has been implemented in #3541.

guoyuhong · 2019-01-11T05:44:33Z

@suquark Thanks for the reminding. I will finish this PR. The python part can be will not conflict with your PR and test_object_id_properties can also test the cython code for ObjectID.

AmplabJenkins · 2019-01-11T06:26:39Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10761/
Test FAILed.

AmplabJenkins · 2019-01-11T06:39:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10762/
Test FAILed.

guoyuhong · 2019-01-11T07:43:09Z

I have updated the PR.

Change CommonError to RayCommonError.
Remove from_random and use the old way since it may have some concerns.
use from ray import ObjectID in worker.py and actor.py which use ObjectID frequently.
Use ray.ObjectID.nil_id() to replace ray.ObjectID() to generate nil ID.
I also remove .id() in all hash keys.

AmplabJenkins · 2019-01-11T08:04:29Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10767/
Test FAILed.

AmplabJenkins · 2019-01-11T11:59:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10769/
Test PASSed.

AmplabJenkins · 2019-01-11T17:15:06Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10772/
Test PASSed.

stephanie-wang

Looks good! I left a minor comment and will approve once that and the question about pickling ObjectIDs is addressed.

stephanie-wang · 2019-01-11T19:08:43Z

python/ray/utils.py

@@ -70,7 +70,7 @@ def push_error_to_driver(worker,
            will be serialized with json and stored in Redis.
    """
    if driver_id is None:
-        driver_id = ray_constants.NIL_JOB_ID.id()
+        driver_id = ray.ObjectID.nil_id().id()


It looks like we can change driver_id to be a ray.ObjectID instead of handling raw bytes.

@stephanie-wang I have changed driver_id to ray.ObjectID. For the question about pickling ObjectID, @raulchen had discussed with @robertnishihara , we decided to give it a try. From current Jenkins and Travis test, it works fine. We need to monitor it continuously to see whether users will have problems or could there be difficult bugs when it's pickleable.

AmplabJenkins · 2019-01-12T13:17:13Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10790/
Test PASSed.

AmplabJenkins · 2019-01-12T14:19:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10791/
Test PASSed.

AmplabJenkins · 2019-01-12T15:43:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10794/
Test FAILed.

guoyuhong · 2019-01-13T01:28:49Z

@AmplabJenkins retest this, please.

pcmoritz · 2019-01-13T01:43:31Z

test/runtest.py

@@ -2310,12 +2310,14 @@ def test_global_state_api(shutdown_only):
    assert len(task_table) == 1
    assert driver_task_id == list(task_table.keys())[0]
    task_spec = task_table[driver_task_id]["TaskSpec"]
+    nil_id_hex = ray.experimental.state.binary_to_hex(


nit: you can remove this and just use "ray.ObjectID.nil_id().hex()" below

pcmoritz

Small nit, rest LGTM

AmplabJenkins · 2019-01-13T04:10:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10816/
Test PASSed.

AmplabJenkins · 2019-01-13T05:01:46Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10817/
Test PASSed.

AmplabJenkins · 2019-01-13T09:31:25Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10820/
Test PASSed.

guoyuhong requested review from stephanie-wang, robertnishihara, raulchen and jovany-wang and removed request for stephanie-wang January 2, 2019 07:38

This was referenced Jan 2, 2019

Fix multi-thread problem of function manager and Jenkins test #3648

Merged

Segfault when putting object whose class closes over an ObjectID. #1317

Closed

guoyuhong requested review from ericl and richardliaw January 2, 2019 07:59

pcmoritz reviewed Jan 2, 2019

View reviewed changes

python/ray/actor.py Outdated Show resolved Hide resolved

pcmoritz reviewed Jan 2, 2019

View reviewed changes

python/ray/function_manager.py Outdated Show resolved Hide resolved

stephanie-wang reviewed Jan 3, 2019

View reviewed changes

raulchen reviewed Jan 4, 2019

View reviewed changes

robertnishihara mentioned this pull request Jan 7, 2019

Raylet events subscription support #3557

Closed

suquark added this to In progress in Code quality & design Jan 8, 2019

suquark moved this from In progress to Needs review in Code quality & design Jan 8, 2019

suquark assigned guoyuhong Jan 9, 2019

guoyuhong force-pushed the refineObjectID branch 2 times, most recently from d096764 to 006b43e Compare January 11, 2019 06:17

guoyuhong force-pushed the refineObjectID branch from 88feeb1 to f0e7761 Compare January 11, 2019 09:28

stephanie-wang reviewed Jan 11, 2019

View reviewed changes

guoyuhong added 4 commits January 12, 2019 19:02

Refactor code about ray.ObjectID.

fc8aa44

remove from_random and use nil_id instead of constructor

cde751d

remove id() in hash

4e30466

Lint and fix

fa3321c

guoyuhong force-pushed the refineObjectID branch 2 times, most recently from 096e342 to 83dbc99 Compare January 12, 2019 12:12

Change driver id to ObjectID

b172316

guoyuhong force-pushed the refineObjectID branch from 83dbc99 to b172316 Compare January 12, 2019 14:29

pcmoritz reviewed Jan 13, 2019

View reviewed changes

pcmoritz approved these changes Jan 13, 2019

View reviewed changes

Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()

3df7c59

Merge branch 'master' into refineObjectID

828dd45

pcmoritz merged commit d2cf856 into ray-project:master Jan 13, 2019

Code quality & design automation moved this from Needs review to Done Jan 13, 2019

guoyuhong deleted the refineObjectID branch January 25, 2019 10:08

guoyuhong mentioned this pull request May 17, 2019

Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID #4776

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code about ray.ObjectID. #3674

Refactor code about ray.ObjectID. #3674

guoyuhong commented Jan 2, 2019

AmplabJenkins commented Jan 2, 2019

pcmoritz commented Jan 2, 2019 •

edited

stephanie-wang Jan 3, 2019

robertnishihara Jan 3, 2019

robertnishihara Jan 3, 2019

robertnishihara commented Jan 3, 2019 •

edited

raulchen commented Jan 4, 2019

raulchen Jan 4, 2019

guoyuhong Jan 11, 2019

pcmoritz commented Jan 4, 2019 •

edited

raulchen commented Jan 4, 2019

robertnishihara commented Jan 7, 2019

suquark commented Jan 11, 2019

guoyuhong commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

guoyuhong commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

stephanie-wang left a comment

stephanie-wang Jan 11, 2019

guoyuhong Jan 12, 2019

AmplabJenkins commented Jan 12, 2019

AmplabJenkins commented Jan 12, 2019

AmplabJenkins commented Jan 12, 2019

guoyuhong commented Jan 13, 2019

pcmoritz Jan 13, 2019 •

edited

pcmoritz left a comment

AmplabJenkins commented Jan 13, 2019

AmplabJenkins commented Jan 13, 2019

AmplabJenkins commented Jan 13, 2019

Refactor code about ray.ObjectID. #3674

Refactor code about ray.ObjectID. #3674

Conversation

guoyuhong commented Jan 2, 2019

What do these changes do?

Related issue number

AmplabJenkins commented Jan 2, 2019

pcmoritz commented Jan 2, 2019 • edited

stephanie-wang Jan 3, 2019

Choose a reason for hiding this comment

robertnishihara Jan 3, 2019

Choose a reason for hiding this comment

robertnishihara Jan 3, 2019

Choose a reason for hiding this comment

robertnishihara commented Jan 3, 2019 • edited

raulchen commented Jan 4, 2019

raulchen Jan 4, 2019

Choose a reason for hiding this comment

guoyuhong Jan 11, 2019

Choose a reason for hiding this comment

pcmoritz commented Jan 4, 2019 • edited

raulchen commented Jan 4, 2019

robertnishihara commented Jan 7, 2019

suquark commented Jan 11, 2019

guoyuhong commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

guoyuhong commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

AmplabJenkins commented Jan 11, 2019

stephanie-wang left a comment

Choose a reason for hiding this comment

stephanie-wang Jan 11, 2019

Choose a reason for hiding this comment

guoyuhong Jan 12, 2019

Choose a reason for hiding this comment

AmplabJenkins commented Jan 12, 2019

AmplabJenkins commented Jan 12, 2019

AmplabJenkins commented Jan 12, 2019

guoyuhong commented Jan 13, 2019

pcmoritz Jan 13, 2019 • edited

Choose a reason for hiding this comment

pcmoritz left a comment

Choose a reason for hiding this comment

AmplabJenkins commented Jan 13, 2019

AmplabJenkins commented Jan 13, 2019

AmplabJenkins commented Jan 13, 2019

pcmoritz commented Jan 2, 2019 •

edited

robertnishihara commented Jan 3, 2019 •

edited

pcmoritz commented Jan 4, 2019 •

edited

pcmoritz Jan 13, 2019 •

edited