-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Support spill objects to node_id sub-directory #44487
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
jjyao
approved these changes
Apr 5, 2024
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
8 tasks
jjyao
pushed a commit
that referenced
this pull request
Apr 12, 2024
In #44487 , node_id is obtained from an RPC call, which is extra overhead. In this change, we pass down node_id from raylet when default workers are created to avoid such overhead. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
harborn
pushed a commit
to harborn/ray
that referenced
this pull request
Apr 18, 2024
In ray-project#44487 , node_id is obtained from an RPC call, which is extra overhead. In this change, we pass down node_id from raylet when default workers are created to avoid such overhead. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Jun 7, 2024
In ray-project#44487 , node_id is obtained from an RPC call, which is extra overhead. In this change, we pass down node_id from raylet when default workers are created to avoid such overhead. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
When system-config was used to specify the spilling location, only one direction can be used which is the same for all the workers. When the dir is a shared NFS, conflict might happen.
https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html#cluster-mode
This PR adds support for using <NODE_ID> as part of the spilling directory name to avoid the conflicts. Specifically,
the spilling directory would be set to /tmp/spill/ray_spilled_objects_<NODE_ID>. Note that this change is not backwards compatible since it changes the spill directory name. We preferred this as opposed to introducing an additional config argument (API change) that controls whether to use <NODE_ID> as part of the directory name.
Note that after this change Ray doesn't clean up spill directory at start-up (as it has no information of the old node id), and leaves it to external system such as VM clean-up. As future work, functionality should be added to clean up spill directory at node shutdown time.
Related issue number
Closes #44206
Fixes postmerge errors in #44341
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.