Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Split ray.io/originated-from into ray.io/originated-from-cr-name and ray.io/originated-from-crd #1864

Merged
merged 5 commits into from
Jan 24, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Jan 23, 2024

Why are these changes needed?

Fix several bugs, add tests, and refactor. I have added comments to all important parts.

Related issue number

This is the follow up for #1830

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@@ -450,72 +450,6 @@ def clean_up(self):
show_cluster_info(self.namespace)
raise Exception("RayClusterAddCREvent clean_up() timeout")

class RayServiceFullCREvent(CREvent):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused.

@@ -62,7 +62,14 @@ def wait(self):
namespace = self.namespace, label_selector='ray.io/node-type=worker')
serve_services = k8s_v1_api.list_namespaced_service(
namespace = self.namespace, label_selector =
f"ray.io/originated-from=RayService_{self.custom_resource_object['metadata']['name']}-serve")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug that causes #1830 to fail on the master branch. See this build for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mistakenly treated the ray.io/serve as the ray.io/service.

f"ray.io/originated-from=RayService_{self.custom_resource_object['metadata']['name']}-serve")
f"ray.io/originated-from-cr-name={self.custom_resource_object['metadata']['name']},"
f"ray.io/originated-from-crd=RayService,"
f"ray.io/serve={self.custom_resource_object['metadata']['name']}-serve")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: The definition of ray.io/serve is so messy. It is possible to be True/False and $SVC_NAME-serve.

@@ -1064,7 +1064,7 @@ func getCreator(instance rayv1.RayCluster) string {
if instance.Labels == nil {
return ""
}
creatorName, exist := instance.Labels[utils.KubernetesCreatedByLabelKey]
creatorName, exist := instance.Labels[utils.RayOriginatedFromCRDLabelKey]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is used to determine whether to inject certain environment variables for Ray Serve (code). However, before #1830, the definition of KubernetesCreatedByLabelKey is pretty messy. In most cases, the value of the label is kuberay-operator, but RayService controller sets the label to rayservice until #1830 deletes the line.

After #1830 deletes the line, the env variables for Ray Serve will no longer be injected. This PR fixes the issue.

@@ -257,6 +257,7 @@ var _ = Context("Inside the default namespace", func() {
for _, pod := range workerPods.Items {
// Worker Pod should have only one container.
Expect(len(pod.Spec.Containers)).Should(Equal(1))
Expect(utils.EnvVarExists(utils.RAY_SERVE_KV_TIMEOUT_S, pod.Spec.Containers[utils.RayContainerIndex].Env)).Should(BeTrue())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whether the environment variables for Ray Serve have been injected. If we don't have the check, this issue may happen again.

serveKvTimeoutEnv := corev1.EnvVar{Name: utils.RAY_SERVE_KV_TIMEOUT_S, Value: "5"}
container.Env = append(container.Env, serveKvTimeoutEnv)
}
if !envVarExists(utils.SERVE_CONTROLLER_PIN_ON_NODE, container.Env) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SERVE_CONTROLLER_PIN_ON_NODE: This env var was removed in 2.6.0.

@kevin85421 kevin85421 marked this pull request as ready for review January 24, 2024 08:35
@kevin85421
Copy link
Member Author

cc @rueian

@rueian
Copy link
Contributor

rueian commented Jan 24, 2024

Hi @kevin85421, thanks! And why do we split the label again?

@kevin85421
Copy link
Member Author

Hi @kevin85421, thanks! And why do we split the label again?

I discovered that retrieving CRDType from ray.io/originated-from is quite common. Moreover, dividing the label into two parts simplifies the validation process. For example, we can easily check whether the label belongs to CRDType or not.

Additionally, I've observed that some labels in KubeRay have multiple definitions, complicating maintenance (see example1 and example2). If we opt to use one label for both CRDType and CR name, retrieving the CRDType from the label will involve string manipulation, which may introduce some issues due to the complexity.

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but what's the reason for splitting the labels? Maybe you could summarize it in the PR description, I coudn't find the relevant comments in the previous PR

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, just saw the most recent comments on this PR.

@kevin85421 kevin85421 merged commit 6691b70 into ray-project:master Jan 24, 2024
23 checks passed
ryanaoleary pushed a commit to ryanaoleary/kuberay that referenced this pull request Feb 13, 2024
@kevin85421 kevin85421 mentioned this pull request Feb 16, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants