-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Train] Colocate Trainer and rank 0 worker (#43115)
This PR automatically merge the trainer bundle with the rank 0 worker bundle, so that the trainer and rank 0 worker can always colocate on the same node. ### Benefits: - Enables users to specify additional resources for rank 0 worker. - Always colocate trainers and rank 0 workers together to make the scheduling behavior deterministic. ### Major changes: #### 1. Merge trainer bundle and the first worker bundle. Specifically, we build a placement groups with bundles `[{}, {trainer+worker}, {worker}, ..., {worker}]`, and schedule the `TrainTrainable` with the first non-empty bundle. When assigning worker ranks, we designate the worker with the smallest GPU ID on the same node as the trainer to be rank 0. #### 2. Set `num_workers=1` by default in `ScalingConfig`. Previously, setting `num_workers` to `None` resulted launching a single `TrainTrainable` with zero workers. It no longer applies to the current Ray Train, as all Trainers now require at least one worker to execute the `train_func`. Additionally, this approach led to undefined behaviors during the merging and separation of the first bundle. To ensure the consistent behavior, we have now set the default value of `num_workers` to 1. #### 3. Forbid using `ScalingConfig` with `tune.with_resources`. `ScalingConfig` should be a Ray Train only utility and it's should not be used for Tune Trainables. For example, it doesn't make sense to provide ScalingConfig for a function trainable, since there's no trainer and worker concepts for it. Passed Release Test:https://buildkite.com/ray-project/release/builds/9650#018dee6e-e3ce-4376-9f3d-5ad7e250e513 ## Related PRs: The below two PRs enabled that the actors with empty resources can be launched on the node of a specific bundle in placement group. - #43269 - #43448 Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
- Loading branch information
Showing
13 changed files
with
186 additions
and
138 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.