Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Propagate driver DataContext to RayTrainWorkers #40116

Merged
merged 11 commits into from
Oct 11, 2023

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Oct 4, 2023

Why are these changes needed?

Second attempt on #39698, which was found to be incompatible with RLLib Learner classes. In this PR, we instead move the logic of passing the driver's DataContext into the BackendExecutor, instead of the RayTrainWorker as previously.

Related issue number

Closes #39237
Previous PR: #39698

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee
Copy link
Contributor Author

scottjlee commented Oct 6, 2023

CI run with ML / RL tests passing: https://buildkite.com/ray-project/oss-ci-build-pr/builds/37993

Going to now revert manual enabling the RL tests trigger.

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee scottjlee marked this pull request as ready for review October 6, 2023 22:09
Copy link
Member

@woshiyyya woshiyyya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottjlee, this solution is cleaner than the previous one.

Also, can you elaborate more on the RLLib Learner issue?

@scottjlee
Copy link
Contributor Author

Thanks @scottjlee, this solution is cleaner than the previous one.

Also, can you elaborate more on the RLLib Learner issue?

Yeah, the previous implementation, which added a new parameter into RayTrainWorker.__init__(), was incompatible with Learner which doesn't have this extra parameter (and we do not want to change this RLLib API).

Comment on lines +239 to +240
# TODO(@justinvyu: fix test and/or deprecate relevant code path)
@pytest.mark.skip("Mocked execute_async doesn't work as intended")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, paired with @justinvyu on this for some time, and we came to the conclusion that the mocking inside the test may need to be updated to be compatible with the fix in this PR, but we couldn't figure it out. I think @justinvyu said he can come back in the future to fix or remove the test, will also let him elaborate

@matthewdeng matthewdeng merged commit 9d35273 into ray-project:master Oct 11, 2023
38 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[data] datacontext setting does not work for preprocessor
4 participants