[WIP] Kuberay Ray Autoscaler integration #100

pcmoritz · 2021-11-17T01:16:47Z

This PR is not functional yet, it is used to track the Kuberay <-> Ray Autoscaler integration prototype

Why are these changes needed?

Related issue number

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Jeffwan · 2021-11-26T16:20:18Z

ray-operator/Dockerfile

@@ -13,13 +13,14 @@ RUN go mod download
 COPY main.go main.go
 COPY api/ api/
 COPY controllers/ controllers/
+COPY rpc/ rpc/


We may change how to organize the codes later. Let's concentrate on the functionality and let's adapt to the new changes later.

Jeffwan · 2021-11-26T16:22:36Z

ray-operator/api/raycluster/v1alpha1/raycluster_types.go

@@ -19,7 +19,8 @@ type RayClusterSpec struct {
 	// RayVersion is the version of ray being used. this affects the command used to start ray
 	RayVersion string `json:"rayVersion,omitempty"`
 	// EnableInTreeAutoscaling indicates whether operator should create in tree autoscaling configs
-	EnableInTreeAutoscaling *bool `json:"enableInTreeAutoscaling,omitempty"`
+	EnableInTreeAutoscaling *bool    `json:"enableInTreeAutoscaling,omitempty"`
+	DesiredWorkers          []string `json:"desiredWorkers,omitempty"`


this would be a full snapshot of the cluster workers? I remember community has some discussions in the past whether using full/delta earlier. Some feedbacks are if there's a large ray cluster, this list would be long.

kuberay/ray-operator/api/raycluster/v1alpha1/raycluster_types.go

Line 57 in 6593822

ScaleStrategy ScaleStrategy `json:"scaleStrategy,omitempty"`

We defined ScaleStrategy for this purpose in the past. Can that be reused?

/cc @akanso

In the current iteration, I'm now just launching worker pods directly in the gRPC server from the autoscaler. That's simple and effective and doesn't need schema changes.

Jeffwan · 2021-11-26T16:26:47Z

ray-operator/config/samples/ray-cluster.complete.yaml

-              path: code.py
+            - key: example.py
+              path: example.py
+        - name: autoscaler-config


I think you are playing with configuration here to save some debugging efforts on e2e work?
Just like to confirm if we are on the same page on the flow.

if EnableInTreeAutoscaling is enabled,

Controller converts CR to autoscaling config.

Controller create configmap and mount to head automatically

Yes, this will be integrated into the controller, it is just for testing at the moment.

Jeffwan · 2021-11-26T16:28:04Z

ray-operator/controllers/common/pod.go

@@ -252,7 +252,9 @@ func setMissingRayStartParams(rayStartParams map[string]string, nodeType rayiov1
 func concatenateContainerCommand(nodeType rayiov1alpha1.RayNodeType, rayStartParams map[string]string) (fullCmd string) {
 	switch nodeType {
 	case rayiov1alpha1.HeadNode:
-		return fmt.Sprintf("ulimit -n 65536; ray start --head %s", convertParamMap(rayStartParams))
+		// ray start --head --no-monitor  --port=6379  --redis-password=5241590000000000  --dashboard-host=0.0.0.0  --node-ip-address=$MY_POD_IP  --node-manager-port=12346  --num-cpus=1  --object-manager-port=12345  --object-store-memory=100000000
+		return fmt.Sprintf("ulimit -n 65536; ray start --head --no-monitor --block %s", convertParamMap(rayStartParams))


--block is tracked in #77

pcmoritz · 2022-02-19T01:09:46Z

We are now doing this differently, see ray-project/ray#21086

WIP Kuberay Ray Autoscaler integration

4b569b4

pcmoritz mentioned this pull request Nov 17, 2021

[WIP] Ray Autoscaler integration with Kuberay (Prototype) ray-project/ray#20457

Closed

6 tasks

pcmoritz added 3 commits November 17, 2021 23:14

save work

686e7ab

add debugging

03fde28

save work

6593822

Jeffwan reviewed Nov 26, 2021

View reviewed changes

pcmoritz added 4 commits November 29, 2021 16:47

implement more autoscaler RPCs (work in progress)

95b5351

update

d23cfc2

cleanup

7b68f33

update

a6580d1

pcmoritz closed this Feb 19, 2022

Jeffwan deleted the kuberay-autoscaler branch March 14, 2022 03:40

Jeffwan restored the kuberay-autoscaler branch March 14, 2022 03:41

Jeffwan deleted the kuberay-autoscaler branch August 18, 2022 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Kuberay Ray Autoscaler integration #100

[WIP] Kuberay Ray Autoscaler integration #100

pcmoritz commented Nov 17, 2021

Jeffwan Nov 26, 2021

Jeffwan Nov 26, 2021

Jeffwan Nov 26, 2021

Jeffwan Nov 26, 2021

pcmoritz Nov 30, 2021

Jeffwan Nov 26, 2021

pcmoritz Nov 30, 2021

Jeffwan Nov 26, 2021

pcmoritz commented Feb 19, 2022

[WIP] Kuberay Ray Autoscaler integration #100

[WIP] Kuberay Ray Autoscaler integration #100

Conversation

pcmoritz commented Nov 17, 2021

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz commented Feb 19, 2022