Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Support rolling updates for config changes #15136

Closed
edoakes opened this issue Apr 6, 2021 · 1 comment · Fixed by #15909
Closed

[serve] Support rolling updates for config changes #15136

edoakes opened this issue Apr 6, 2021 · 1 comment · Fixed by #15909
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical serve Ray Serve Related Issue

Comments

@edoakes
Copy link
Contributor

edoakes commented Apr 6, 2021

Now that we do rolling updates for code changes, we should also consider doing the same for config changes. These may be very slow (e.g., if it loads a new heavyweight model) which could cause downtime if all replicas do it simultaneously.

This should be straightforward to implement: just add another state and relevant transitions for ReplicaState.UPDATING_CONFIG.

@edoakes edoakes added enhancement Request for new feature and/or capability P2 Important issue, but not time-critical serve Ray Serve Related Issue labels Apr 6, 2021
@edoakes edoakes added this to the [serve] v2 API milestone Apr 6, 2021
@edoakes
Copy link
Contributor Author

edoakes commented Apr 20, 2021

import os
import time

import requests

import ray
from ray import serve

serve.start(detached=True)

@serve.deployment(version="1", num_replicas=2, user_config="test")
class MyDeployment:
    def __init__(self):
        print(f"Replica {serve.get_replica_context().replica_tag} starting up.")

    def reconfigure(self, config):
        print(f"Replica {serve.get_replica_context().replica_tag} reconfiguring.")
        time.sleep(1)

    def __call__(self):
        return os.getpid()

# Initial deployment, replicas come up as they're ready.
MyDeployment.deploy()

# Redeploy new version. Rolling update to tear one replica down at a time.
MyDeployment.options(version="2").deploy()

# Config change. Replicas are long-polling for config changes so they may all
# call reconfigure() with the new user_config at once, which could mean no
# requests are able to be handled if reconfigure is slow.
MyDeployment.options(version="2", user_config="updated").deploy()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant