Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Modification of nameOverride will cause label selector mismatch for head node #572

Merged
merged 6 commits into from
Sep 16, 2022

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Sep 16, 2022

Why are these changes needed?

As mentioned in #495 and #530, the value of nameOverride needs to be hardcoded with the value "kuberay" to avoid the mismatch between head pod and head service.

The value of app.kubernetes.io/name in the head pod's labels is same as app.kubernetes.io/name specified in RayCluster.Spec.HeadGroupSpec.Template.ObjectMeta.Labels of RayCluster custom resource YAML file. See the function labelPod in pod.go for more details.

RayCluster.Spec.HeadGroupSpec.Template.ObjectMeta.Labels["app.kubernetes.io/name"] is controlled by the value of nameOverride in helm chart.

However, the value of app.kubernetes.io/name in the head pod service's selector is hardcoded with the constant variable ApplicationName (= "kuberay")

Related issue number

closes #571
#495
#530

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
# Step1: Clone my repo and checkout to `nameOverride` branch
# Step2: Build Docker image
# Step3: Update image of kuberay-operator chart
# Step4: Update `nameOverride` in ray-cluster chart from "kuberay" to "aaaaaaaaaa"
# Step5: Install kuberay-operator
cd helm-chart/kuberay-operator
helm install kuberay-operator .

# Step6: Install raycluster custom resource
cd helm-chart/ray-cluster
helm install ray-cluster .

# Step7: Set up port-forwarding
kubectl port-forward svc/ray-cluster-aaaaaaaaaa-head-svc 8265:8265

# Step8: Submit a simple job
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"

Check head node service

  • kubectl describe svc/ray-cluster-aaaaaaaaaa-head-svc

  • Without this PR:

Selector: app.kubernetes.io/created-by=kuberay-operator,app.kubernetes.io/name=kuberay,ray.io/cluster=ray-cluster-aaaaaaaaaa,ray.io/identifier=ray-cluster-aaaaaaaaaa-head,ray.io/node-type=head
  • With this PR:
Selector: app.kubernetes.io/created-by=kuberay-operator,app.kubernetes.io/name=aaaaaaaaaa,ray.io/cluster=ray-cluster-aaaaaaaaaa,ray.io/identifier=ray-cluster-aaaaaaaaaa-head,ray.io/node-type=head

Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- I left some minor nits about test documentation.

If you haven't yet, could you test this out manually? You could do that by building an operator image with the changes, deploying with a non-default label, and making sure the worker pod connects to the cluster successfully.

ray-operator/controllers/ray/common/service_test.go Outdated Show resolved Hide resolved
@DmitriGekhtman
Copy link
Collaborator

This makes sense to me -- I left some minor nits about test documentation.

If you haven't yet, could you test this out manually? You could do that by building an operator image with the changes, deploying with a non-default label, and making sure the worker pod connects to the cluster successfully.

Sorry, I see you've already done that.

@DmitriGekhtman DmitriGekhtman merged commit d46b431 into ray-project:master Sep 16, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
… for head node (ray-project#572)

As mentioned in ray-project#495 and ray-project#530, the value of nameOverride needs to be hardcoded with the value "kuberay" to avoid the mismatch between head pod and head service.

The value of app.kubernetes.io/name in the head pod's labels is same as app.kubernetes.io/name specified in RayCluster.Spec.HeadGroupSpec.Template.ObjectMeta.Labels of RayCluster custom resource YAML file. See the function labelPod in pod.go for more details.

RayCluster.Spec.HeadGroupSpec.Template.ObjectMeta.Labels["app.kubernetes.io/name"] is controlled by the value of nameOverride in helm chart.

However, the value of app.kubernetes.io/name in the head pod service's selector is hardcoded with the constant variable ApplicationName (= "kuberay")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Modification of nameOverride will cause label selector mismatch for head node
2 participants