Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates #1852

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

andrewsykim
Copy link
Contributor

@andrewsykim andrewsykim commented Jan 19, 2024

Why are these changes needed?

See #1850 for details.

Related issue number

Closes #1850

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind also updating the following code snippet to use FQ_RAY_IP for submitting a Ray job? Thank you!

# SubmitterPodTemplate is the template for the pod that will run the `ray job submit` command against the RayCluster.
# If SubmitterPodTemplate is specified, the first container is assumed to be the submitter container.
# submitterPodTemplate:
# spec:
# restartPolicy: Never
# containers:
# - name: my-custom-rayjob-submitter-pod
# image: rayproject/ray:2.9.0
# # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field.
# # Specifying Command is not recommended.
# # command: ["ray job submit --address=http://rayjob-sample-raycluster-v6qcq-head-svc.default.svc.cluster.local:8265 -- echo hello world"]

@@ -327,6 +327,18 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1.RayJob, ra
r.Log.Info("default submitter template is used")
} else {
submitterTemplate = *rayJobInstance.Spec.SubmitterPodTemplate.DeepCopy()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How about moving the logic to the same location as the PYTHONUNBUFFERED environment variable (code)?

  2. We can always inject the env variable no matter whether users set SubmitterPodTemplate by themselves or not.

  3. We can only inject FQ_RAY_IP into the Ray container, which is always the first container (i.e., submitterTemplate.Spec.Containers[utils.RayContainerIndex]) in a Pod in KubeRay. We can inject it into other sidecar containers later upon user request.

  4. Do we need to check len(container.Env) == 0? Can we directly append to the slice instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated to always inject the environment variable. I wasn't sure if it made sense to inject the env var in the default submitter template.

@kevin85421 kevin85421 self-assigned this Jan 19, 2024
@kevin85421
Copy link
Member

Btw, you can add "Closes #1850" in the PR description so that the issue will be closed when the PR is merged.

@andrewsykim
Copy link
Contributor Author

Updated sample manifest to use FQ_RAY_IP

@@ -102,7 +102,7 @@ spec:
# image: rayproject/ray:2.9.0
# # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field.
# # Specifying Command is not recommended.
# # command: ["ray job submit --address=http://rayjob-sample-raycluster-v6qcq-head-svc.default.svc.cluster.local:8265 -- echo hello world"]
# # command: ["ray job submit --address=http://$FQ_RAY_IP:8265 -- echo hello world"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind trying whether it actually works or not? I forgot why I divided the command into a slice of multiple strings instead of a slice with just one string in rayjob_test.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env injection works, but the original entrypoint was invalid. Updated to this:

command: ["sh", "-c", "ray job submit --address=http://$FQ_RAY_IP -- echo hello world"]

submitterTemplate, err = r.getSubmitterTemplate(rayJobInstanceWithTemplate, nil)
assert.NoError(t, err)
found = false
for _, envVar := range submitterTemplate.Spec.Containers[utils.RayContainerIndex].Env {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make the function envVarExists public and reuse it? In the future, we can consider moving the function to ray-operator/utils instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I make this change in a follow-up PR? There are other tests and assertions that I would have to refactor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Would you mind opening an issue to track the progress? Thanks!

@kevin85421
Copy link
Member

Would you mind rebasing with the master branch? Thanks!

@andrewsykim
Copy link
Contributor Author

Btw, FQ_RAY_IP is a bit misleading because Dashboard URL in the status also contains the port. Something like FQ_RAY_HOST might be better but I won't try to fix that in this PR

@kevin85421
Copy link
Member

Btw, FQ_RAY_IP is a bit misleading because Dashboard URL in the status also contains the port. Something like FQ_RAY_HOST might be better but I won't try to fix that in this PR

Is there any reason not to address the issue in this PR? If it doesn't involve changes unrelated to this PR, I would prefer to resolve it here.

@andrewsykim
Copy link
Contributor Author

Is there any reason not to address the issue in this PR? If it doesn't involve changes unrelated to this PR, I would prefer to resolve it here.

I didn't want to change other references to FQ_RAY_IP. But thinking about it more, can we just use RAY_ADDRESS instead which usually contains the port?

@andrewsykim
Copy link
Contributor Author

@kevin85421 I updated the implementation to use RAY_ADDRESS, which means the submitter entrypoint doesn't even need to specify --address anymore

@kevin85421
Copy link
Member

can we just use RAY_ADDRESS instead which usually contains the port?

RAY_ADDRESS is set to $FQ_RAY_IP:$GCS_PORT, but DashboardURL is set to $FQ_RAY_IP:$DASHBOARD_PORT. Maybe we should use a different name?

@andrewsykim
Copy link
Contributor Author

How about RAY_DASHBOARD_ADDRESS?

@@ -348,6 +348,12 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1.RayJob, ra
Value: "1",
})

// Set RAY_ADDRESS so that the Ray address can be discovered at runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -170,6 +170,19 @@ func TestGetSubmitterTemplate(t *testing.T) {
}
}
assert.True(t, found)

// Test 5: Check default FQ_RAY_IP env var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@andrewsykim andrewsykim changed the title RayJob: inject FQ_RAY_IP envariable variable for user provided submiter templates RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates Jan 23, 2024
@kevin85421
Copy link
Member

There is a conflict. Would you mind rebasing with the master branch? Thanks!

…ter templates

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
ryanaoleary pushed a commit to ryanaoleary/kuberay that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][RayJob] Set the environment variable FQ_RAY_IP for the submitter Kubernetes Job
2 participants