Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ray address env. #388

Merged

Conversation

DmitriGekhtman
Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman commented Jul 18, 2022

Why are these changes needed?

  • Adds RAY_ADDRESS env variable for Ray containers.
    Allows using the natural-looking ray.init() to connect to Ray from within the cluster, rather than the mystical ray.init("auto")

  • Fixes a bug: currently workers look to their own RayStartParams for the Ray GCS server port. They should look at the head's RayStartParams instead. This bug fix required modifying function signatures a bit.

Related issue number

Closes #373

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
      I tested connecting to Ray with ray.init() on the head node.
    • This PR is not tested :(

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
// if worker, use the service name of the head
ip.Value = svcName
}
ip := v1.EnvVar{Name: RAY_IP, Value: rayIP}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what RAY_IP or RAY_PORT envs are for.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
@@ -605,8 +607,10 @@ func (r *RayClusterReconciler) buildWorkerPod(instance rayiov1alpha1.RayCluster,
podName := strings.ToLower(instance.Name + common.DashSymbol + string(rayiov1alpha1.WorkerNode) + common.DashSymbol + worker.GroupName + common.DashSymbol)
podName = utils.CheckName(podName) // making sure the name is valid
svcName := utils.GenerateServiceName(instance.Name)
podTemplateSpec := common.DefaultWorkerPodTemplate(instance, worker, podName, svcName)
pod := common.BuildPod(podTemplateSpec, rayiov1alpha1.WorkerNode, worker.RayStartParams, svcName, instance.Spec.EnableInTreeAutoscaling)
headPort := common.GetHeadPort(instance.Spec.HeadGroupSpec.RayStartParams)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to refactor to deduplicate the buildWorkerPod and and buildHeadPod logic.

The fact that HeadGroupSpec and WorkerGroupSpec are different types makes that a bit awkward, though.

Copy link
Collaborator

@pcmoritz pcmoritz Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, once we switch to Go 1.18 we could possibly use generics for this if they support this use case :)

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 19, 2022

Kind of overwhelmed today. I will help review the change tomorrow morning.

Copy link
Collaborator

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot for doing this! I left some small suggestions :)

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
@DmitriGekhtman
Copy link
Collaborator Author

Will let GCS HA #294 get merged first to avoid blocking it with a merge conflict.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 20, 2022

@DmitriGekhtman HA PR has been merged. This one is unblocked now. Seem it does have few conflicts.

@DmitriGekhtman
Copy link
Collaborator Author

The nightly compatibility test has been unhappy lately.
It might make sense to view that one as "non-blocking" given that it is sensitive to changes in another repo.

@DmitriGekhtman DmitriGekhtman merged commit a44f9a0 into ray-project:master Jul 21, 2022
@DmitriGekhtman DmitriGekhtman deleted the dmitri/envs-and-other-stuff branch July 21, 2022 01:14
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Adds RAY_ADDRESS env variable for Ray containers.
Allows using the natural-looking ray.init() to connect to Ray from within the cluster, rather than the mystical ray.init("auto")

Fixes a bug: currently workers look to their own RayStartParams for the Ray GCS server port. They should look at the head's RayStartParams instead. This bug fix required modifying function signatures a bit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Set RAY_ADDRESS env for the head pod
4 participants