Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to Ensure Consistent Use of CRDType #1892

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

Yicheng-Lu-llll
Copy link
Contributor

@Yicheng-Lu-llll Yicheng-Lu-llll commented Jan 30, 2024

Why are these changes needed?

This PR ensures consistent use of CRDType instead of string types to avoid potential bugs, addressing the issue described here:

// TODO (kevin85421): It is better to use `CRDType` as the return type.

Related issue number

Checks

As shown below, we can successfully detect that the RayCluster is created by RayService:

  1. The health check is injected into the readiness probe.
  2. The RAY_timeout_ms_task_wait_for_death_info environment variable is set.
# Run a Rayservice sample in under https://github.com/ray-project/kuberay/tree/master/ray-operator/config/samples
kubectl apply -f /home/ubuntu/kuberay/ray-operator/config/samples/ray-service.sample.yaml
kubectl get pod
# NAME                                                      READY   STATUS    RESTARTS   AGE
# ervice-sample-raycluster-sdntw-worker-small-group-j7crr   1/1     Running   0          88s
# kuberay-operator-5987588ffc-2cgbf                         1/1     Running   0          2m41s
# rayservice-sample-raycluster-sdntw-head-mp2qb             1/1     Running   0          88s
kubectl describe $(kubectl get pods -o=name | grep worker) | grep  "Readiness:\|Liveness:"
# Liveness:   exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success] delay=30s timeout=1s period=5s #success=1 #failure=120
# Readiness:  exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success && wget -T 2 -q -O- http://localhost:8000/-/healthz | grep success] delay=10s timeout=1s period=5s #success=1 #failure=1
kubectl describe $(kubectl get pods -o=name | grep worker) | grep "RAY_timeout_ms_task_wait_for_death_info"
# RAY_timeout_ms_task_wait_for_death_info:  0
  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>

return creatorName
func getCreatorCRDType(instance rayv1.RayCluster) utils.CRDType {
return utils.GetCRDType(instance.Labels[utils.RayOriginatedFromCRDLabelKey])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Labels is nil or does not contain the key, instance.Labels[utils.RayOriginatedFromCRDLabelKey] will return an empty string. See this simple example.

@kevin85421 kevin85421 self-requested a review February 1, 2024 06:30
@kevin85421 kevin85421 self-assigned this Feb 1, 2024
@@ -292,14 +292,14 @@ func initLivenessAndReadinessProbe(rayContainer *corev1.Container, rayNodeType r
}

// BuildPod a pod config
func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNodeType rayv1.RayNodeType, rayStartParams map[string]string, headPort string, enableRayAutoscaler *bool, creator string, fqdnRayIP string) (aPod corev1.Pod) {
func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNodeType rayv1.RayNodeType, rayStartParams map[string]string, headPort string, enableRayAutoscaler *bool, creatorCRDType utils.CRDType, fqdnRayIP string) (aPod corev1.Pod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is (aPod corev1.Pod) necessary?

We don't need to update it in this PR. We can revisit all named return values in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree aPod is unnecessary.

@kevin85421 kevin85421 merged commit 160ab10 into ray-project:master Feb 2, 2024
23 checks passed
ryanaoleary pushed a commit to ryanaoleary/kuberay that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants