Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Merge Driver/Job's runtime environment when it conflicts #39208

Merged

Conversation

rkooo567
Copy link
Contributor

@rkooo567 rkooo567 commented Sep 1, 2023

Why are these changes needed?

When the job and driver both specifies the runtime env, driver's runtime env is ignored. This is a confusing behavior and we'd like to merge instead.

This PR merges the runtime envs, and if there's a conflict raises an exception. User can override the job's runtime env by driver's runtime env by specifying the env var.

Related issue number

Closes #37930

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
# Driver's specified `runtime_env`
{"env_vars": {"C": "c"}}

# Driver's actual `runtime_env` (merged with Job's)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume by Driver's runtime env, you mean runtime_env specified in ray.init(), then the driver's actual runtime env can only be job's runtime env since ray.init runtime env only have effects on tasks and actors.

I'd avoid Driver's runtime_env, by saying runtime_env specified via ray.init()

Also there is another confusing thing: we normally say ray.init() runtime_env as job level runtime env but now there is another job level runtime env specified via job submission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I will call it job-level runtime env specified ray job submit vs ray.init then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this part. Can you check?

Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits for style/consistency and clarification.

doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
python/ray/runtime_env/runtime_env.py Outdated Show resolved Hide resolved
python/ray/runtime_env/runtime_env.py Outdated Show resolved Hide resolved
python/ray/runtime_env/runtime_env.py Outdated Show resolved Hide resolved
python/ray/runtime_env/runtime_env.py Outdated Show resolved Hide resolved
python/ray/runtime_env/runtime_env.py Outdated Show resolved Hide resolved
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor concerns:

  • It looks like we've duplicated the merging code for job->driver and the existing merging code for driver->actor/task. Ideally both of these cases would just call into a single helper function (maybe with an arg raise_exception: bool)
  • Similarly the merge behavior is described in the docs in two different places, ideally there would be a single source of truth that we just link to from both places.

Runtime Environment Specified by Both Job and Driver
""""""""""""""""""""""""""""""""""""""""""""""""""""

You can specify Ray driver's runtime environment via `ray.init(runtime_env=...)` or `ray job submit --runtime-env`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray driver's runtime env can only be specified via job submission. ray.init runtime env has no effect on the driver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great catch. ray.init() runtime env only sets the default for any subsequent task or actor's runtime env

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so if we do job submission, the driver script will already have a runtime env created? How does it work now? Is the runtime env created by supervisor actor first and the driver is started with the same config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is already written in a pretty confusing way; https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#specifying-a-runtime-environment-per-job (it considers as job submit & ray.init as "runtime env per job"). I think we should technically distinguish them in this case. Let me try rewriting this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjyao @architkulkarni I updated the doc based on the feedback again; 401bf31. Can you guys check again? if this doesn't satisfy the merge bar, I will create a meeting to discuss this.

Also @architkulkarni

It looks like we've duplicated the merging code for job->driver and the existing merging code for driver->actor/task. Ideally both of these cases would just call into a single helper function (maybe with an arg raise_exception: bool)

I think we shouldn't use the Python helper for everything because we'd like to keep the inheritance logic in cpp if possible (so that it will be language independent). That said, we ideally should merge the runtime env in cpp. It seems like the job submit runtime env inheritance is not even supported from other languages, so I think we can do it when we actually support them in multi languages.

Similarly the merge behavior is described in the docs in two different places, ideally there would be a single source of truth that we just link to from both places.

Addressed in 401bf31

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the runtime env created by supervisor actor first and the driver is started with the same config?

That's right, the driver is started by the actor as a subprocess so it inherits the cwd and env vars automatically without Ray having to do anything.

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update looks good to me, just some nits about the text.

doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved

This ensures the runtime environment is installed on the cluster before the entrypoint script is run.
If ``runtime_env`` is specified from ``ray.init(runtime_env=...)``, the runtime env is only applied to all children Tasks and Actors, not the entrypoint script (Driver) itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen, tasks, actors and drivers are not capitalized elsewhere in the doc. It would be great if you could double check and make it consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was recommended from @angelinalg. I will just stick to this it for now.

doc/source/ray-core/handling-dependencies.rst Outdated Show resolved Hide resolved
@rkooo567 rkooo567 merged commit b1356d7 into ray-project:master Sep 7, 2023
81 of 88 checks passed
rkooo567 added a commit to rkooo567/ray that referenced this pull request Sep 7, 2023
…project#39208)

When the job and driver both specifies the runtime env, driver's runtime env is ignored. This is a confusing behavior and we'd like to merge instead.

This PR merges the runtime envs, and if there's a conflict raises an exception. User can override the job's runtime env by driver's runtime env by specifying the env var.
harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023
…project#39208)

When the job and driver both specifies the runtime env, driver's runtime env is ignored. This is a confusing behavior and we'd like to merge instead.

This PR merges the runtime envs, and if there's a conflict raises an exception. User can override the job's runtime env by driver's runtime env by specifying the env var.
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
…project#39208)

When the job and driver both specifies the runtime env, driver's runtime env is ignored. This is a confusing behavior and we'd like to merge instead.

This PR merges the runtime envs, and if there's a conflict raises an exception. User can override the job's runtime env by driver's runtime env by specifying the env var.

Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…project#39208)

When the job and driver both specifies the runtime env, driver's runtime env is ignored. This is a confusing behavior and we'd like to merge instead.

This PR merges the runtime envs, and if there's a conflict raises an exception. User can override the job's runtime env by driver's runtime env by specifying the env var.

Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants