-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Kuberay Ray Autoscaler integration #100
Conversation
@@ -13,13 +13,14 @@ RUN go mod download | |||
COPY main.go main.go | |||
COPY api/ api/ | |||
COPY controllers/ controllers/ | |||
COPY rpc/ rpc/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may change how to organize the codes later. Let's concentrate on the functionality and let's adapt to the new changes later.
@@ -19,7 +19,8 @@ type RayClusterSpec struct { | |||
// RayVersion is the version of ray being used. this affects the command used to start ray | |||
RayVersion string `json:"rayVersion,omitempty"` | |||
// EnableInTreeAutoscaling indicates whether operator should create in tree autoscaling configs | |||
EnableInTreeAutoscaling *bool `json:"enableInTreeAutoscaling,omitempty"` | |||
EnableInTreeAutoscaling *bool `json:"enableInTreeAutoscaling,omitempty"` | |||
DesiredWorkers []string `json:"desiredWorkers,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a full snapshot of the cluster workers? I remember community has some discussions in the past whether using full/delta earlier. Some feedbacks are if there's a large ray cluster, this list would be long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScaleStrategy ScaleStrategy `json:"scaleStrategy,omitempty"` |
We defined ScaleStrategy
for this purpose in the past. Can that be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @akanso
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current iteration, I'm now just launching worker pods directly in the gRPC server from the autoscaler. That's simple and effective and doesn't need schema changes.
path: code.py | ||
- key: example.py | ||
path: example.py | ||
- name: autoscaler-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are playing with configuration here to save some debugging efforts on e2e work?
Just like to confirm if we are on the same page on the flow.
if EnableInTreeAutoscaling
is enabled,
- Controller converts CR to autoscaling config.
- Controller create configmap and mount to head automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will be integrated into the controller, it is just for testing at the moment.
@@ -252,7 +252,9 @@ func setMissingRayStartParams(rayStartParams map[string]string, nodeType rayiov1 | |||
func concatenateContainerCommand(nodeType rayiov1alpha1.RayNodeType, rayStartParams map[string]string) (fullCmd string) { | |||
switch nodeType { | |||
case rayiov1alpha1.HeadNode: | |||
return fmt.Sprintf("ulimit -n 65536; ray start --head %s", convertParamMap(rayStartParams)) | |||
// ray start --head --no-monitor --port=6379 --redis-password=5241590000000000 --dashboard-host=0.0.0.0 --node-ip-address=$MY_POD_IP --node-manager-port=12346 --num-cpus=1 --object-manager-port=12345 --object-store-memory=100000000 | |||
return fmt.Sprintf("ulimit -n 65536; ray start --head --no-monitor --block %s", convertParamMap(rayStartParams)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--block
is tracked in #77
We are now doing this differently, see ray-project/ray#21086 |
This PR is not functional yet, it is used to track the Kuberay <-> Ray Autoscaler integration prototype
Why are these changes needed?
Related issue number
Checks