Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add a flag to make zero downtime upgrades optional #1564

Merged
merged 4 commits into from
Oct 25, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Oct 24, 2023

Why are these changes needed?

For LLM serving, some users might not have sufficient GPU resources to run two RayClusters simultaneously.
Therefore, KubeRay offers ENABLE_ZERO_DOWNTIME as a feature flag for zero-downtime upgrades.

Related issue number

Closes #1476

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
# Modify `values.yaml` to disable zero-downtime upgrade
# - name: ENABLE_ZERO_DOWNTIME
#   value: "false"

# (Path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest

# Install a RayService.
kubectl apply -f ray_v1alpha1_rayservice.yaml

# Forward the port for serving in a new terminal
kubectl port-forward svc/rayservice-sample-serve-svc 8000

# Send a request to the serve port.
curl -X POST -H 'Content-Type: application/json' localhost:8000/fruit/ -d '["PEAR", 12]'
# [Expected output]: 12

# Test in-place update
# Update the price of `PearStand` from 1 to 2.
#  - name: PearStand
#     num_replicas: 1
#     user_config:
#         price: 2
kubectl apply -f ray_v1alpha1_rayservice.yaml

# Send a request to the serve port again.
curl -X POST -H 'Content-Type: application/json' localhost:8000/fruit/ -d '["PEAR", 12]'
# [Expected output]: 24

# Test zero-downtime upgrade.
# Set `rayVersion: '2.100.0'` in the RayService YAML to trigger the Zero-downtime upgrade.
kubectl apply -f ray_v1alpha1_rayservice.yaml

# [Expected result]: No new RayCluster will be created.

# Check logs
kubectl logs kuberay-operator-8fd6754b4-zvlxb | grep "Zero-downtime upgrade is disabled"
# You should see something like:
# 2023-10-25T18:08:47.993Z        INFO    controllers.RayService  Zero-downtime upgrade is disabled (ENABLE_ZERO_DOWNTIME: false). Skip preparing a new RayCluster.

@kevin85421 kevin85421 marked this pull request as ready for review October 24, 2023 19:54
@kevin85421
Copy link
Member Author

@architkulkarni I have already added the details about manual tests.

@kevin85421
Copy link
Member Author

cc @YQ-Wang

@kevin85421 kevin85421 merged commit 99abccf into ray-project:master Oct 25, 2023
23 checks passed
kevin85421 added a commit to kevin85421/kuberay that referenced this pull request Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add a flag to make zero downtime upgrades optional
2 participants