[core] Lazily subscribe to node changes from workers#51718
[core] Lazily subscribe to node changes from workers#51718dayshah wants to merge 9 commits intoray-project:masterfrom
Conversation
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Signed-off-by: dayshah <dhyey2019@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: dayshah <dhyey2019@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Why are these changes needed?
This should result in significantly increased scalability in terms of # of nodes / workers that can be supported by the GCS. The main gcs thread spends a lot of time on GetAllNodeInfo requests and we make one of these requests every single time a worker subscribes to node changes to get the current state of all nodes. Every worker doesn't actually need to be aware of the state of all other nodes, only workers that are the "owners" of objects and task submitters need to be aware of this. Very few workers are either of these things. In most cases it's just the driver (which is why we always subscribe if the worker is the driver here).
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.