[kuberay][autoscaler] Update KubeRay version to v1.0.0 #40918

kevin85421 · 2023-11-02T22:19:22Z

Why are these changes needed?

Create a YAML file ray-cluster.autoscaler-template.yaml for testing instead of using the YAML file ray-cluster.autoscaler.yaml in the KubeRay repository.
- Note that the tests assume that both head and worker Pods have exactly 1 CPU. Hence, if we set num-cpus: "0" in the head's rayStartParams, the current test logic would not work.
Why do I remove the test for "Confirming that the operator and autoscaler ignore pods marked for termination"?
- KubeRay tries to ensure the number of runningPods, but the definition of runningPods is a bit different from different KubeRay versions.
  - Definition 1: For KubeRay v0.6.0 and older, the definition of runningPods is the Pods that are running or pending and not terminating.
  - Definition 2: For KubeRay v1.0.0, the definition of runningPods becomes that Pods that their Ray containers are not actually terminated. See [GCS FT] Consider the case of sidecar containers kuberay#1386 for more details.
  - That is, in definition 1, KubeRay may create new Pods if some Pods are in the terminating process. Hence, it is possible to have more than maxReplicas Pods & Ray nodes from both Kubernetes and Ray perspectives. In definition 2, KubeRay only creates new Pods when the Ray nodes are actually terminated.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

CI tests pass
- https://buildkite.com/ray-project/oss-ci-build-pr/builds/40875#018ba315-b595-468e-9161-2e0455072833
- https://buildkite.com/ray-project/oss-ci-build-pr/builds/40875#018ba315-a4c6-4a3c-a86f-79afe2fb2731

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2023-11-03T20:45:04Z

python/ray/tests/kuberay/test_autoscaling_e2e.py

@@ -79,7 +79,12 @@ def _get_ray_cr_config(
        """
        with open(EXAMPLE_CLUSTER_PATH) as ray_cr_config_file:
            ray_cr_config_str = ray_cr_config_file.read()
-        config = yaml.safe_load(ray_cr_config_str)
+
+        kuberay_crd_sets = set(["RayCluster", "RayJob", "RayService"])


A YAML file may have multiple K8s objects.

Signed-off-by: kaihsun <kaihsun@anyscale.com>

architkulkarni · 2023-11-06T15:47:38Z

test_memory_pressure unrelated
Windows serve tests unrelated
Windows wheels failure unrelated: "RuntimeError: Detected Python version 3.7, which is not supported. Only Python 3.8, 3.9, 3.10, 3.11 are supported."

…0918) Create a YAML file ray-cluster.autoscaler-template.yaml for testing instead of using the YAML file ray-cluster.autoscaler.yaml in the KubeRay repository. Note that the tests assume that both head and worker Pods have exactly 1 CPU. Hence, if we set num-cpus: "0" in the head's rayStartParams, the current test logic would not work. Why do I remove the test for "Confirming that the operator and autoscaler ignore pods marked for termination"? KubeRay tries to ensure the number of runningPods, but the definition of runningPods is a bit different from different KubeRay versions. Definition 1: For KubeRay v0.6.0 and older, the definition of runningPods is the Pods that are running or pending and not terminating. Definition 2: For KubeRay v1.0.0, the definition of runningPods becomes that Pods that their Ray containers are not actually terminated. See [GCS FT] Consider the case of sidecar containers kuberay#1386 for more details. That is, in definition 1, KubeRay may create new Pods if some Pods are in the terminating process. Hence, it is possible to have more than maxReplicas Pods & Ray nodes from both Kubernetes and Ray perspectives. In definition 2, KubeRay only creates new Pods when the Ray nodes are actually terminated. --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 added 3 commits November 2, 2023 22:18

update

7c9141a

Signed-off-by: kaihsun <kaihsun@anyscale.com>

update

c2293a3

Signed-off-by: kaihsun <kaihsun@anyscale.com>

update

c13b817

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 commented Nov 3, 2023

View reviewed changes

kevin85421 added 2 commits November 4, 2023 02:53

update

b6332bd

Signed-off-by: kaihsun <kaihsun@anyscale.com>

update

e57ab0e

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 changed the title ~~[kuberay][autoscaler] Update KubeRay version to v1.0.0-rc.2~~ [kuberay][autoscaler] Update KubeRay version to v1.0.0 Nov 6, 2023

kevin85421 force-pushed the update-kuberay-1.0.0-rc2 branch from 3889a09 to bec44b2 Compare November 6, 2023 05:06

update

c8fbf90

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 force-pushed the update-kuberay-1.0.0-rc2 branch from bec44b2 to c8fbf90 Compare November 6, 2023 05:08

kevin85421 marked this pull request as ready for review November 6, 2023 14:47

kevin85421 requested review from ericl, architkulkarni and a team as code owners November 6, 2023 14:47

kevin85421 assigned architkulkarni Nov 6, 2023

architkulkarni approved these changes Nov 6, 2023

View reviewed changes

architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 6, 2023

architkulkarni merged commit 390738a into ray-project:master Nov 6, 2023
29 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kuberay][autoscaler] Update KubeRay version to v1.0.0 #40918

[kuberay][autoscaler] Update KubeRay version to v1.0.0 #40918

kevin85421 commented Nov 2, 2023 •

edited

kevin85421 Nov 3, 2023

architkulkarni commented Nov 6, 2023

[kuberay][autoscaler] Update KubeRay version to v1.0.0 #40918

[kuberay][autoscaler] Update KubeRay version to v1.0.0 #40918

Conversation

kevin85421 commented Nov 2, 2023 • edited

Why are these changes needed?

Related issue number

Checks

kevin85421 Nov 3, 2023

Choose a reason for hiding this comment

architkulkarni commented Nov 6, 2023

kevin85421 commented Nov 2, 2023 •

edited