Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] run setup commands with restart_only=True #13836

Merged

Conversation

ijrsvt
Copy link
Contributor

@ijrsvt ijrsvt commented Feb 1, 2021

Why are these changes needed?

The Docker Command Runner may restart containers, requiring setup_commands to be re-run.

NOTE: This does not apply to workers at the moment, as per #13861

Related issue number

Closes #13587

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor Author

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do this for init_commands in autoscaler.py (WORKERS)

Copy link
Contributor Author

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(ilr): Make sure to validate with workers!

@ijrsvt
Copy link
Contributor Author

ijrsvt commented Feb 2, 2021

@AmeerHajAli I'm not sure what I should do with respect to worker nodes, I just realized that we don't have a field in ray-schema.json for "restart_only". We try to get that value here in autoscaler.py, but we never set it in commands.py. Should I:

  1. Leave this PR as only for head node and open an issue for adding it to worker nodes
  2. Open a separate PR & fix it for worker nodes (then merge this PR on top of that)

@AmeerHajAli
Copy link
Contributor

@AmeerHajAli I'm not sure what I should do with respect to worker nodes, I just realized that we don't have a field in ray-schema.json for "restart_only". We try to get that value here in autoscaler.py, but we never set it in commands.py. Should I:

  1. Leave this PR as only for head node and open an issue for adding it to worker nodes
  2. Open a separate PR & fix it for worker nodes (then merge this PR on top of that)

This is a good catch @ijrsvt. Lets go with option 1 for now.
Thanks!

@ijrsvt ijrsvt force-pushed the setup-commands-docker-restart-only branch from 287a6b7 to c0e193f Compare February 9, 2021 16:30
Copy link
Contributor

@AmeerHajAli AmeerHajAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@richardliaw
Copy link
Contributor

richardliaw commented Feb 10, 2021

haven't tried this out but looks fine - thanks @ijrsvt !

@richardliaw richardliaw changed the title run setup commands with restart_only=True [autoscaler] run setup commands with restart_only=True Feb 10, 2021
@ijrsvt ijrsvt added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 11, 2021
@AmeerHajAli
Copy link
Contributor

@richardliaw , can you please merge?

@richardliaw richardliaw merged commit f6cfc44 into ray-project:master Feb 11, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[docker] Restarting container needs to re-run setup commands
3 participants