Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayJob] Enable job log streaming by setting PYTHONUNBUFFERED in job container #1375

Merged
merged 13 commits into from
Sep 6, 2023

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Aug 29, 2023

Why are these changes needed?

Previously, kubectl get logs job/my-rayjob and kubectl logs my-rayjob-xxxxx (getting the logs from the pod directly) would be empty until the job finished.

This PR sets PYTHONUNBUFFERED in the job container so that the logs can be viewed while the job is still running.

This improves the log viewing experience to match that of ordinary Kubernetes Jobs.

This PR adds unit tests, but adding an end-to-end test (e.g. submit a job that prints a log and then sleeps forever, and check that we can read the log) will require some followup work. The main issue is the following:

  • The RayJob sample YAML test framework waits for a job to "converge" (actually, to be SUCCEEDED) before running any Rules. But for this test, we just want the job to start before running the Rule.

Related issue number

Closes #1374

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
This reverts commit 89f6d62.
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
This reverts commit 4ebe88e.
@@ -375,6 +376,26 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1alpha1.RayJ
r.Log.Info("User-provided command is used", "command", submitterTemplate.Spec.Containers[0].Command)
}

// Set PYTHONUNBUFFERED=1 for real-time logging, unless user already set it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse envVarExists.

func envVarExists(envName string, envVars []v1.EnvVar) bool {

assert.NoError(t, err)
pythonUnbufferedSet := false
for _, envVar := range submitterTemplate.Spec.Containers[0].Env {
if envVar.Name == "PYTHONUNBUFFERED" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PythonUnbufferedEnvVarName instead?

userValue := "0"
rayJobInstanceWithTemplate.Spec.SubmitterPodTemplate.Spec.Containers[0].Env = []corev1.EnvVar{
{
Name: "PYTHONUNBUFFERED",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PythonUnbufferedEnvVarName instead?

submitterTemplate, err = r.getSubmitterTemplate(rayJobInstanceWithTemplate)
assert.NoError(t, err)
for _, envVar := range submitterTemplate.Spec.Containers[0].Env {
if envVar.Name == "PYTHONUNBUFFERED" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PythonUnbufferedEnvVarName instead?

submitterTemplate, err = r.getSubmitterTemplate(rayJobInstanceWithTemplate)
assert.NoError(t, err)
for _, envVar := range submitterTemplate.Spec.Containers[0].Env {
if envVar.Name == "PYTHONUNBUFFERED" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for the test to pass without entering this if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
@architkulkarni
Copy link
Contributor Author

@kevin85421 Comments have been addressed, please take another look!

// Set PYTHONUNBUFFERED=1 for real-time logging, unless user already set it.
userProvidedValue := ""
if common.EnvVarExists(PythonUnbufferedEnvVarName, submitterTemplate.Spec.Containers[0].Env) {
for _, envVar := range submitterTemplate.Spec.Containers[0].Env {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit weird to iterate through the loop again after calling EnvVarExists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may consider returning bool, EnvVar.

if common.EnvVarExists(PythonUnbufferedEnvVarName, submitterTemplate.Spec.Containers[0].Env) {
for _, envVar := range submitterTemplate.Spec.Containers[0].Env {
if envVar.Name == PythonUnbufferedEnvVarName {
userProvidedValue = envVar.Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic doesn't seem entirely correct to me, especially in cases where users can use ValueFrom instead of Value.

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
@architkulkarni
Copy link
Contributor Author

@kevin85421 updated the PR to not check if the env var was already set, as discussed offline, because we don't anticipate any use case for PYTHONUNBUFFERED=0.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use submitterTemplate.Spec.Containers[common.RayContainerIndex] to replace submitterTemplate.Spec.Containers[0]. You can either update it in the PR or open an issue as a good first issue.

@@ -375,6 +376,12 @@ func (r *RayJobReconciler) getSubmitterTemplate(rayJobInstance *rayv1alpha1.RayJ
r.Log.Info("User-provided command is used", "command", submitterTemplate.Spec.Containers[0].Command)
}

// Set PYTHONUNBUFFERED=1 for real-time logging
submitterTemplate.Spec.Containers[0].Env = append(submitterTemplate.Spec.Containers[0].Env, v1.EnvVar{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What will happen if users specify PYTHONUNBUFFERED on their own? I am not sure whether defining the same env variable twice in a container will throw an error message or override the first value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the second definition will override the first one: https://www.baeldung.com/linux/kubernetes-pod-environment-variables

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
@kevin85421
Copy link
Member

Maybe we can use submitterTemplate.Spec.Containers[common.RayContainerIndex] to replace submitterTemplate.Spec.Containers[0]. You can either update it in the PR or open an issue as a good first issue.

Open a good first issue #1397.

@kevin85421 kevin85421 merged commit 9382c1f into ray-project:master Sep 6, 2023
20 of 21 checks passed
z103cb pushed a commit to z103cb/kuberay that referenced this pull request Sep 11, 2023
…b container (ray-project#1375)

Enable job log streaming by setting `PYTHONUNBUFFERED` in job container
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…b container (ray-project#1375)

Enable job log streaming by setting `PYTHONUNBUFFERED` in job container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] RayJob does not stream logs in real time
2 participants