Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix in HeadPod Service Generation logic which was causing frequent reconciliation #1056

Merged

Conversation

msumitjain
Copy link
Contributor

@msumitjain msumitjain commented Apr 27, 2023

Why are these changes needed?

BuildServiceForHeadPod method creates the corev1.Service object without ensuring the order of ports. When the rayservice_controller reconciliation loop updates the Service Spec, K8s detects a change in the and applies it. This will lead to a performance degradation.

Related issue number

Closes 1044

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@msumitjain msumitjain changed the title [WIP] Fix in HeadPod Service Generation logic which was causing frequent reconciliation Fix in HeadPod Service Generation logic which was causing frequent reconciliation Apr 27, 2023
@kevin85421 kevin85421 self-requested a review May 1, 2023 16:37
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the contribution.

I cloned the fork and tested it by the following instructions:

Reproduce

helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0
# path: ray-operator/config/samples
kubectl apply -f ray_v1alpha1_rayservice.yaml

# Wait for 5 mins
# (a) Check the order of ports
kubectl describe svc rayservice-sample-head-svc

# (b) Check resourceVersion
kubectl get svc rayservice-sample-head-svc -o=jsonpath='{.metadata.resourceVersion}'

# Wait for another 5 mins
# (c) Check the order of ports
kubectl describe svc rayservice-sample-head-svc

# (d) Check resourceVersion
kubectl get svc rayservice-sample-head-svc -o=jsonpath='{.metadata.resourceVersion}'

# (c) is possible to be not equal to (a), and (d) must be different from (b).

Test this PR

# Build Docker image (path: ray-operator)
make docker-image
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 --set image.repository=controller,image.tag=latest
# path: ray-operator/config/samples
kubectl apply -f ray_v1alpha1_rayservice.yaml

# Wait for 5 mins
# (a) Check the order of ports
kubectl describe svc rayservice-sample-head-svc

# (b) Check resourceVersion
kubectl get svc rayservice-sample-head-svc -o=jsonpath='{.metadata.resourceVersion}'

# Wait for another 5 mins
# (c) Check the order of ports
kubectl describe svc rayservice-sample-head-svc

# (d) Check resourceVersion
kubectl get svc rayservice-sample-head-svc -o=jsonpath='{.metadata.resourceVersion}'

# (c) should be equal to (a), and (d) should be equal to (b).

This PR prevents the RayService's head service from being updated by Kubernetes APIServer. However, we still need to make sure we only call the Update function if the object has actually been modified to avoid unnecessary API calls and potential race conditions (i.e. 409 conflict) (At this moment, we call Update() in every reconciliation (link)). This follow-up will be tracked by #983.

@kevin85421
Copy link
Member

Hi @msumitjain, would you mind rebasing with the master branch? Some CI-related PRs are merged recently. Thanks!

@kevin85421 kevin85421 merged commit 3571d52 into ray-project:master May 2, 2023
19 checks passed
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…conciliation (ray-project#1056)

Fix in HeadPod Service Generation logic which was causing frequent reconciliation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RayController updating RayService resource every second.
2 participants