-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jobs] Add job manager class for simple jobs python APIs #19567
Conversation
Runs a command as a child process, streaming stderr & stdout to given | ||
log file. | ||
""" | ||
with open(log_file, "a+") as fin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to gc the child if the JobSupervisor dies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep fate sharing and proper gc is one of the next steps, just putting up a working one first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, please re-request when you address the outstanding issues
Created for each submitted job from JobManager, runs on head node in same | ||
process as ray dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't run in the same process as the ray dashboard ? Also, why are you mentioning the ray dashboard in this file? That's a leaky abstraction. This file should be a standalone module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not mentioning dashboard here is good idea. Afaik dashboard runs on separate server process, but all job supervisor actors live in same ray process as where ray.init() was called on dashboard module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ray actors live in separate processes from the driver they're created in
except ValueError: # Ray returns ValueError for nonexistent actor. | ||
return None | ||
|
||
def submit_job(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to accept arbitrary key, value metadata that should be attached to the inner job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep we agreed on this since beginning. i put it in a separate PR
… subprocess exception test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as a first cut, ping me to merge when tests are passing
Changes
The Job Manager class lives in ray private subfolder, intends to empower HTTP, cli and python APIs.
Test plan
`
❯ pytest test_job_manager.py -sv
Test session starts (platform: darwin, Python 3.8.10, pytest 5.4.3, pytest-sugar 0.9.4)
cachedir: .pytest_cache
rootdir: /Users/jiaodong/Workspace/ray/python
plugins: sugar-0.9.4, anyio-3.3.1, asyncio-0.15.1, timeout-1.4.2, lazy-fixture-0.6.3, rerunfailures-10.0
collecting ... 2021-10-21 17:38:21,719 INFO services.py:1331 -- View the Ray dashboard at http://127.0.0.1:8265
ray/_private/job_manager/tests/test_job_manager.py::test_submit_basic_echo ✓ 14% █▌
ray/_private/job_manager/tests/test_job_manager.py::test_submit_stderr ✓ 29% ██▉
ray/_private/job_manager/tests/test_job_manager.py::test_submit_ls_grep ✓ 43% ████▍
ray/_private/job_manager/tests/test_job_manager.py::test_subprocess_exception ✓ 57% █████▊
ray/_private/job_manager/tests/test_job_manager.py::test_submit_with_s3_runtime_env ✓ 71% ███████▎
ray/_private/job_manager/tests/test_job_manager.py::TestRuntimeEnv.test_inheritance ✓ 86% ████████▋
ray/_private/job_manager/tests/test_job_manager.py::TestRuntimeEnv.test_multiple_runtime_envs ✓ 100% ██████████
Results (5.94s):
7 passed
`
Next Steps
Related issue number
Closes #19414
Checks
scripts/format.sh
to lint the changes in this PR.