Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler][kubernetes] Helm chart #15614

Merged
merged 41 commits into from
May 17, 2021
Merged

Conversation

DmitriGekhtman
Copy link
Contributor

@DmitriGekhtman DmitriGekhtman commented May 3, 2021

Why are these changes needed?

This PR adds a Helm chart to simplify deployment of the Ray operator.
For now, only basic configuration options are exposed -- we could later expose more configuration on user feedback.

UX:
Launching operator and cluster with default settings.
helm -n <namespace> install <release name> ./ray
For custom configuration values one, uses -f <custom-values.yaml> or --set <option>=value.
Can launch just operator, without Ray cluster using --set operatorOnly=true.
Can create a RayCluster resource without the operator using --set clusterOnly=true:
if the operator is already running, this is the way to create multiple ray clusters.

This PR also pins Ray images used in the Ray clusters to 1.3.0
The operator is still set to nightly for now, for better stability. There's comment in the helm chart noting this and the alternatives of using 1.3.0 or a specific Ray master commit.

Related issue number

Closes #12654
Mostly addresses #15222 (still using nightly operator image, for now)
Closes #15652

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@DmitriGekhtman
Copy link
Contributor Author

DmitriGekhtman commented May 3, 2021

Before adding documentation, I could use some feedback on whether the configuration options exposed in values.yaml look sensible.
cc @tgaddair @ijrsvt

Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriGekhtman Is this file auto-generated or did you write this by hand:
python/ray/autoscaler/kubernetes/operator_configs/helm/ray/crds/cluster_crd.yaml

@yiranwang52
Copy link
Contributor

nice!

@edoakes
Copy link
Contributor

edoakes commented May 3, 2021

@DmitriGekhtman the config options look very reasonable to me as a first cut!

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a couple suggestions.

@DmitriGekhtman
Copy link
Contributor Author

@DmitriGekhtman Is this file auto-generated or did you write this by hand:
python/ray/autoscaler/kubernetes/operator_configs/helm/ray/crds/cluster_crd.yaml

This was copied from kubernetes/operator_configs/cluster_crd.yaml.
That file has the schema for a Kubernetes podSpec pasted in which explains the length.
If you fold podConfig.spec the file looks much smaller.
It's an open item to do podSpec validation in a saner way:
#13674

@DmitriGekhtman
Copy link
Contributor Author

I've added the nodeSelector field.
I've also exposed the rayResources field, so that users can signal the presence of custom resources (e.g. gpu type).

Now will add some tests and documentation.

@DmitriGekhtman
Copy link
Contributor Author

Not sure about the ideal protocol for versioning and distributing this chart.
Will solve that later -- for now it will be available by cloning Ray master.

@DmitriGekhtman
Copy link
Contributor Author

docs written -- this should be good for a final review.
@richardliaw @javi-redondo could you guys glance at the docs?

@DmitriGekhtman
Copy link
Contributor Author

Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

@DmitriGekhtman
Copy link
Contributor Author

@ijrsvt It's good to merge
-- main thing was to get +1s onvalues.yaml

@ijrsvt ijrsvt merged commit 95c3d88 into ray-project:master May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants