[Data] Propagate driver `DataContext` to `RayTrainWorkers` #40116

scottjlee · 2023-10-04T19:29:26Z

Why are these changes needed?

Second attempt on #39698, which was found to be incompatible with RLLib Learner classes. In this PR, we instead move the logic of passing the driver's DataContext into the BackendExecutor, instead of the RayTrainWorker as previously.

Related issue number

Closes #39237
Previous PR: #39698

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee · 2023-10-06T21:59:36Z

CI run with ML / RL tests passing: https://buildkite.com/ray-project/oss-ci-build-pr/builds/37993

Going to now revert manual enabling the RL tests trigger.

Signed-off-by: Scott Lee <sjl@anyscale.com>

woshiyyya

Thanks @scottjlee, this solution is cleaner than the previous one.

Also, can you elaborate more on the RLLib Learner issue?

scottjlee · 2023-10-06T22:50:10Z

Thanks @scottjlee, this solution is cleaner than the previous one.

Also, can you elaborate more on the RLLib Learner issue?

Yeah, the previous implementation, which added a new parameter into RayTrainWorker.__init__(), was incompatible with Learner which doesn't have this extra parameter (and we do not want to change this RLLib API).

matthewdeng · 2023-10-11T17:16:43Z

python/ray/train/tests/test_backend.py

+# TODO(@justinvyu: fix test and/or deprecate relevant code path)
+@pytest.mark.skip("Mocked execute_async doesn't work as intended")


Is this intentional as part of this PR?

yeah, paired with @justinvyu on this for some time, and we came to the conclusion that the mocking inside the test may need to be updated to be compatible with the fix in this PR, but we couldn't figure it out. I think @justinvyu said he can come back in the future to fix or remove the test, will also let him elaborate

scottjlee added 9 commits October 4, 2023 12:27

move logic to BackendExecutor

e1c4da9

Signed-off-by: Scott Lee <sjl@anyscale.com>

null param

7f9a24f

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into 1004-propagate-context

adfb73b

Signed-off-by: Scott Lee <sjl@anyscale.com>

update kwarg usage

6a5bfd7

Signed-off-by: Scott Lee <sjl@anyscale.com>

only include when class is provided

9df9974

Signed-off-by: Scott Lee <sjl@anyscale.com>

move logic to worker_group.execute()

3e7f100

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into 1004-propagate-context

e78c71e

Signed-off-by: Scott Lee <sjl@anyscale.com>

remove from trainworker

f33b523

Signed-off-by: Scott Lee <sjl@anyscale.com>

tests

50606f7

Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee added 2 commits October 6, 2023 15:07

Merge branch 'master' into 1004-propagate-context

4aa87c1

Signed-off-by: Scott Lee <sjl@anyscale.com>

disable manual rllib test trigger

e73a8d1

Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee marked this pull request as ready for review October 6, 2023 22:09

scottjlee assigned justinvyu and woshiyyya Oct 6, 2023

woshiyyya approved these changes Oct 6, 2023

View reviewed changes

matthewdeng approved these changes Oct 11, 2023

View reviewed changes

matthewdeng merged commit 9d35273 into ray-project:master Oct 11, 2023
38 of 41 checks passed

matthewdeng mentioned this pull request Nov 29, 2023

[data] Fix the issue that DataContext is not propagated when using streaming_split #41473

Merged

8 tasks

raulchen mentioned this pull request Dec 6, 2023

[data][bug] Dataset.context not being sealed after creation #41573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Propagate driver `DataContext` to `RayTrainWorkers` #40116

[Data] Propagate driver `DataContext` to `RayTrainWorkers` #40116

scottjlee commented Oct 4, 2023 •

edited

Loading

scottjlee commented Oct 6, 2023 •

edited

Loading

woshiyyya left a comment

scottjlee commented Oct 6, 2023

matthewdeng Oct 11, 2023

scottjlee Oct 11, 2023

		# TODO(@justinvyu: fix test and/or deprecate relevant code path)
		@pytest.mark.skip("Mocked execute_async doesn't work as intended")

[Data] Propagate driver DataContext to RayTrainWorkers #40116

[Data] Propagate driver DataContext to RayTrainWorkers #40116

Conversation

scottjlee commented Oct 4, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

scottjlee commented Oct 6, 2023 • edited Loading

woshiyyya left a comment

Choose a reason for hiding this comment

scottjlee commented Oct 6, 2023

matthewdeng Oct 11, 2023

Choose a reason for hiding this comment

scottjlee Oct 11, 2023

Choose a reason for hiding this comment

[Data] Propagate driver `DataContext` to `RayTrainWorkers` #40116

[Data] Propagate driver `DataContext` to `RayTrainWorkers` #40116

scottjlee commented Oct 4, 2023 •

edited

Loading

scottjlee commented Oct 6, 2023 •

edited

Loading