-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates #1852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind also updating the following code snippet to use FQ_RAY_IP
for submitting a Ray job? Thank you!
kuberay/ray-operator/config/samples/ray_v1alpha1_rayjob.yaml
Lines 95 to 105 in 33f29f6
# SubmitterPodTemplate is the template for the pod that will run the `ray job submit` command against the RayCluster. | |
# If SubmitterPodTemplate is specified, the first container is assumed to be the submitter container. | |
# submitterPodTemplate: | |
# spec: | |
# restartPolicy: Never | |
# containers: | |
# - name: my-custom-rayjob-submitter-pod | |
# image: rayproject/ray:2.9.0 | |
# # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field. | |
# # Specifying Command is not recommended. | |
# # command: ["ray job submit --address=http://rayjob-sample-raycluster-v6qcq-head-svc.default.svc.cluster.local:8265 -- echo hello world"] |
@@ -327,6 +327,18 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1.RayJob, ra | |||
r.Log.Info("default submitter template is used") | |||
} else { | |||
submitterTemplate = *rayJobInstance.Spec.SubmitterPodTemplate.DeepCopy() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
How about moving the logic to the same location as the
PYTHONUNBUFFERED
environment variable (code)? -
We can always inject the env variable no matter whether users set
SubmitterPodTemplate
by themselves or not. -
We can only inject
FQ_RAY_IP
into the Ray container, which is always the first container (i.e.,submitterTemplate.Spec.Containers[utils.RayContainerIndex]
) in a Pod in KubeRay. We can inject it into other sidecar containers later upon user request. -
Do we need to check
len(container.Env) == 0
? Can we directly append to the slice instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I updated to always inject the environment variable. I wasn't sure if it made sense to inject the env var in the default submitter template.
Btw, you can add "Closes #1850" in the PR description so that the issue will be closed when the PR is merged. |
9bb7a79
to
032a052
Compare
Updated sample manifest to use FQ_RAY_IP |
@@ -102,7 +102,7 @@ spec: | |||
# image: rayproject/ray:2.9.0 | |||
# # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field. | |||
# # Specifying Command is not recommended. | |||
# # command: ["ray job submit --address=http://rayjob-sample-raycluster-v6qcq-head-svc.default.svc.cluster.local:8265 -- echo hello world"] | |||
# # command: ["ray job submit --address=http://$FQ_RAY_IP:8265 -- echo hello world"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind trying whether it actually works or not? I forgot why I divided the command into a slice of multiple strings instead of a slice with just one string in rayjob_test.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env injection works, but the original entrypoint was invalid. Updated to this:
command: ["sh", "-c", "ray job submit --address=http://$FQ_RAY_IP -- echo hello world"]
submitterTemplate, err = r.getSubmitterTemplate(rayJobInstanceWithTemplate, nil) | ||
assert.NoError(t, err) | ||
found = false | ||
for _, envVar := range submitterTemplate.Spec.Containers[utils.RayContainerIndex].Env { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make the function envVarExists public and reuse it? In the future, we can consider moving the function to ray-operator/utils
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I make this change in a follow-up PR? There are other tests and assertions that I would have to refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Would you mind opening an issue to track the progress? Thanks!
Would you mind rebasing with the master branch? Thanks! |
Btw, |
Is there any reason not to address the issue in this PR? If it doesn't involve changes unrelated to this PR, I would prefer to resolve it here. |
I didn't want to change other references to FQ_RAY_IP. But thinking about it more, can we just use RAY_ADDRESS instead which usually contains the port? |
@kevin85421 I updated the implementation to use RAY_ADDRESS, which means the submitter entrypoint doesn't even need to specify --address anymore |
|
How about RAY_DASHBOARD_ADDRESS? |
@@ -348,6 +348,12 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1.RayJob, ra | |||
Value: "1", | |||
}) | |||
|
|||
// Set RAY_ADDRESS so that the Ray address can be discovered at runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -170,6 +170,19 @@ func TestGetSubmitterTemplate(t *testing.T) { | |||
} | |||
} | |||
assert.True(t, found) | |||
|
|||
// Test 5: Check default FQ_RAY_IP env var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
156433c
to
635ed70
Compare
There is a conflict. Would you mind rebasing with the master branch? Thanks! |
…ter templates Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…vided submiter templates (ray-project#1852)
Why are these changes needed?
See #1850 for details.
Related issue number
Closes #1850
Checks