Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Rolling updates for user_config changes #15909

Merged
merged 19 commits into from
Jul 1, 2021

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented May 19, 2021

Why are these changes needed?

Config updates may be slow and cause replicas to be unavailable while updating, so we should do rolling updates for these as well as code changes.

This PR uses the existing codepath for rolling updates but adds the hash of the user_config to the version used to check for required updates (this requires the user_config to be hashable, see BackendVersion implementation for details.

When the user-provided version is mismatched, we perform the existing rolling update. When only the user_config is mismatched, we all the reconfigure method and wait for it to return. This transitions the replica back to the STARTING_OR_UPDATING state.

Note that to maintain backwards compatibility this leaves the long poll client for backend configs in place, but once we remove the legacy update_backend_config codepath we can completely remove this from the backend_worker.

Note that #15820 is dependent on this change so that we can properly wait for config changes to go through (i.e., endpoints to be available).

Related issue number

Closes #15136

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@simon-mo
Copy link
Contributor

cc @caitengwei @liuyang-my This PR should help with common reloading models into running backend use case.

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just few small comments.

python/ray/serve/api.py Outdated Show resolved Hide resolved
python/ray/serve/backend_state.py Show resolved Hide resolved
python/ray/serve/backend_state.py Show resolved Hide resolved
python/ray/serve/backend_state.py Show resolved Hide resolved
python/ray/serve/backend_state.py Outdated Show resolved Hide resolved
python/ray/serve/backend_state.py Show resolved Hide resolved
python/ray/serve/common.py Show resolved Hide resolved
@jiaodong jiaodong added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 16, 2021
@edoakes edoakes merged commit a6051ea into ray-project:master Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[serve] Support rolling updates for config changes
3 participants