-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Split ray.io/originated-from
into ray.io/originated-from-cr-name
and ray.io/originated-from-crd
#1864
Conversation
@@ -450,72 +450,6 @@ def clean_up(self): | |||
show_cluster_info(self.namespace) | |||
raise Exception("RayClusterAddCREvent clean_up() timeout") | |||
|
|||
class RayServiceFullCREvent(CREvent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unused.
@@ -62,7 +62,14 @@ def wait(self): | |||
namespace = self.namespace, label_selector='ray.io/node-type=worker') | |||
serve_services = k8s_v1_api.list_namespaced_service( | |||
namespace = self.namespace, label_selector = | |||
f"ray.io/originated-from=RayService_{self.custom_resource_object['metadata']['name']}-serve") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug that causes #1830 to fail on the master branch. See this build for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I mistakenly treated the ray.io/serve
as the ray.io/service
.
f"ray.io/originated-from=RayService_{self.custom_resource_object['metadata']['name']}-serve") | ||
f"ray.io/originated-from-cr-name={self.custom_resource_object['metadata']['name']}," | ||
f"ray.io/originated-from-crd=RayService," | ||
f"ray.io/serve={self.custom_resource_object['metadata']['name']}-serve") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: The definition of ray.io/serve
is so messy. It is possible to be True/False
and $SVC_NAME-serve
.
@@ -1064,7 +1064,7 @@ func getCreator(instance rayv1.RayCluster) string { | |||
if instance.Labels == nil { | |||
return "" | |||
} | |||
creatorName, exist := instance.Labels[utils.KubernetesCreatedByLabelKey] | |||
creatorName, exist := instance.Labels[utils.RayOriginatedFromCRDLabelKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is used to determine whether to inject certain environment variables for Ray Serve (code). However, before #1830, the definition of KubernetesCreatedByLabelKey
is pretty messy. In most cases, the value of the label is kuberay-operator
, but RayService controller sets the label to rayservice
until #1830 deletes the line.
After #1830 deletes the line, the env variables for Ray Serve will no longer be injected. This PR fixes the issue.
@@ -257,6 +257,7 @@ var _ = Context("Inside the default namespace", func() { | |||
for _, pod := range workerPods.Items { | |||
// Worker Pod should have only one container. | |||
Expect(len(pod.Spec.Containers)).Should(Equal(1)) | |||
Expect(utils.EnvVarExists(utils.RAY_SERVE_KV_TIMEOUT_S, pod.Spec.Containers[utils.RayContainerIndex].Env)).Should(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check whether the environment variables for Ray Serve have been injected. If we don't have the check, this issue may happen again.
serveKvTimeoutEnv := corev1.EnvVar{Name: utils.RAY_SERVE_KV_TIMEOUT_S, Value: "5"} | ||
container.Env = append(container.Env, serveKvTimeoutEnv) | ||
} | ||
if !envVarExists(utils.SERVE_CONTROLLER_PIN_ON_NODE, container.Env) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SERVE_CONTROLLER_PIN_ON_NODE
: This env var was removed in 2.6.0.
cc @rueian |
Hi @kevin85421, thanks! And why do we split the label again? |
I discovered that retrieving CRDType from Additionally, I've observed that some labels in KubeRay have multiple definitions, complicating maintenance (see example1 and example2). If we opt to use one label for both CRDType and CR name, retrieving the CRDType from the label will involve string manipulation, which may introduce some issues due to the complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, but what's the reason for splitting the labels? Maybe you could summarize it in the PR description, I coudn't find the relevant comments in the previous PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, just saw the most recent comments on this PR.
…-cr-name` and `ray.io/originated-from-crd` (ray-project#1864)
Why are these changes needed?
Fix several bugs, add tests, and refactor. I have added comments to all important parts.
Related issue number
This is the follow up for #1830
Checks