Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready for review] Initial http jobs server on head node #19657

Merged
merged 4 commits into from
Oct 23, 2021

Conversation

jiaodong
Copy link
Member

@jiaodong jiaodong commented Oct 22, 2021

After syncing with Ant this week, they confirmed this folder is not used anymore as they use internal implementation only. So in this PR we can just strip it to the minimum and redo a simple http server.

Changes

  • Removed existing job server basically completely .. with only class JobHead(dashboard_utils.DashboardHeadModule): left
  • Added pydantic types for each API's request / request as well as job spec
  • Extended dashboard_utils.rest_response() to optionally not google style keys (job_id -> jobId)
  • Added and verified local + s3 working_dir tests passed with http api calls.
  • Changed python job_manager not to call supervisor actor if None is returned (actor exited)

Test plan

Closes #19413

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Do you plan to use pydantic types for validating inputs?

Comment on lines 37 to 38
if not ray.is_initialized():
ray.init(address="auto")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we centralize this ray.init logic to one place? Maybe make it a decorator?

Also, we should use some kind of internal namespace ("_ray_internal_jobs"?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decorator sounds great, looks like ray.init() has to happen at api calls (run() didn't work either) but at least only called once.

internal namespace sounds great too !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will refuse to ship job server without pydantic type checks so this has to be included

@jiaodong jiaodong changed the title [WIP] Initial http jobs server on head node [Ready for review] Initial http jobs server on head node Oct 23, 2021
@jiaodong jiaodong marked this pull request as ready for review October 23, 2021 02:17
@edoakes
Copy link
Contributor

edoakes commented Oct 23, 2021

Windows failures look unrelated and same tests also have failures on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job API server for job submission
4 participants