[Bug] Allow zero replica for workers for Helm #968

ducviet00 · 2023-03-16T04:37:56Z

Why are these changes needed?

We are currently using Ray for computing heavily tasks on GKE. When initializing, it spawns a worker each worker group. Then, it triggers GKE scale up node. It's money cost.

This happens because ternary function in template file. {{ 0 | 1 }} = 1

kuberay/helm-chart/ray-cluster/templates/raycluster-cluster.yaml

Line 91 in 87dde22

    
           minReplicas: {{ $values.minReplicas | default (default 1 $values.miniReplicas) }}

kuberay/helm-chart/ray-cluster/templates/raycluster-cluster.yaml

Line 157 in 87dde22

    
           minReplicas: {{ .Values.worker.minReplicas | default (default 1 .Values.worker.miniReplicas) }}

workaround by setting default replica to zero.

Related issue number

Open #965

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

kevin85421

Thank you for the contribution! I am wondering is there any difference for your use case between disabled: true and replicas: 0?

ducviet00 · 2023-03-17T02:18:24Z

Thank you for the contribution! I am wondering is there any difference for your use case between disabled: true and replicas: 0?

as I understand, minReplicas: 0 allow scaling down pod to zero and disabled: true doesn't allow scale up pods.

ducviet00 · 2023-03-17T02:33:15Z

I think setting replica: 0 as the default is better than setting replica: 1 because we shouldn't create a replica initially. A worker requests much memory, so it's a waste of resources. The autoscaler will handle it based on the job's resource demand. Setting minReplicas: 0 to allows it and allows scale down to zero when no job running.

ducviet00 · 2023-05-30T19:06:26Z

@kevin85421 Could you make a review?

kevin85421

Test this PR manually using this gist.

# Step 0: Replace values.yaml with the gist
# (path: helm-chart/ray-cluster)
helm install ray-cluster .

# Step 1: Try to scale up the cluster
export HEAD_POD=$(kubectl get pods -o custom-columns=POD:metadata.name | grep raycluster-autoscaler-head)
kubectl exec $HEAD_POD -it -c ray-head -- python -c "import ray;ray.init();ray.autoscaler.sdk.request_resources(num_cpus=4)"

# Step 2: The RayCluster will scale from 0 worker to 3 workers.

yc2984 · 2023-07-21T13:50:11Z

@kevin85421 is this available on 0.5.2?

yc2984 · 2023-07-21T13:57:04Z

@kevin85421 is this available on 0.5.2?

I see it's only on 0.6.0. Is it stable or still WIP?

Allow zero replica for workers for Helm

Hoàng Đức Việt added 2 commits March 16, 2023 10:16

feat: allow zero replica of workers

d8e2178

feat: allow zero replica of workers

62f79f5

kevin85421 reviewed Mar 16, 2023

View reviewed changes

kevin85421 self-requested a review May 30, 2023 20:19

kevin85421 self-assigned this May 30, 2023

kevin85421 approved these changes Jun 12, 2023

View reviewed changes

kevin85421 merged commit 7ad3acf into ray-project:master Jun 12, 2023

kevin85421 mentioned this pull request Jun 22, 2023

[Bug] Allow zero replica for Helm #965

Closed

2 tasks

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023

[Bug] Allow zero replica for workers for Helm (ray-project#968)

92e150d

Allow zero replica for workers for Helm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Allow zero replica for workers for Helm #968

[Bug] Allow zero replica for workers for Helm #968

ducviet00 commented Mar 16, 2023 •

edited

Loading

kevin85421 left a comment

ducviet00 commented Mar 17, 2023 •

edited

Loading

ducviet00 commented Mar 17, 2023 •

edited

Loading

ducviet00 commented May 30, 2023

kevin85421 left a comment

yc2984 commented Jul 21, 2023

yc2984 commented Jul 21, 2023

[Bug] Allow zero replica for workers for Helm #968

[Bug] Allow zero replica for workers for Helm #968

Conversation

ducviet00 commented Mar 16, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

kevin85421 left a comment

Choose a reason for hiding this comment

ducviet00 commented Mar 17, 2023 • edited Loading

ducviet00 commented Mar 17, 2023 • edited Loading

ducviet00 commented May 30, 2023

kevin85421 left a comment

Choose a reason for hiding this comment

yc2984 commented Jul 21, 2023

yc2984 commented Jul 21, 2023

ducviet00 commented Mar 16, 2023 •

edited

Loading

ducviet00 commented Mar 17, 2023 •

edited

Loading

ducviet00 commented Mar 17, 2023 •

edited

Loading