New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Change stopping behavior #43187
Conversation
776430d
to
d9f7ca8
Compare
b879e2c
to
9610bdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@@ -206,6 +206,7 @@ def __call__(self): | |||
@pytest.mark.skipif(sys.platform == "win32", reason="Failing on Windows.") | |||
@pytest.mark.parametrize("smoothing_factor", [1, 0.2]) | |||
@pytest.mark.parametrize("use_upscale_downscale_config", [True, False]) | |||
@mock.patch("ray.serve._private.router.HANDLE_METRIC_PUSH_INTERVAL_S", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why's this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test (for RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0
) when the deployment scales back down to 0, the last metric report pushed from the handle is often a non-zero number (because the push interval is set to 10 seconds). So for the next 10 seconds the # replicas keeps oscillating between 0 and 1 because of the outdated metric from the handle. I'm not sure why we never ran into this before, but this seems like a totally reasonable scenario so I set the handle push interval lower to avoid making the test wait longer.
# We wait for this to be satisfied at the end because there may be | ||
# more than 3 worker nodes after the deployment finishes deploying, | ||
# since replicas are being started and stopped at the same time, and | ||
# there is a strict max replicas per node requirement. However nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a little bit of a nasty interaction. This may cause excessive resource fragmentation in some cases since we'll start a new node for the replacement replica, then the old replica will get removed.
I suppose the defragmentation work will address it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is definitely a case where this behavior change will have negative side effects... and agreed, we should aim to address this with the defragmentation work.
5ae7a4b
to
8aba50a
Compare
Change from stop-fully-then-start to stop-then-immediately-start. More concretely, previously we would make sure that the total number of (running or moving towards running) replicas **plus** the number of stopping replicas never exceeds the target. This PR changes it so that the number of stopping replicas is not considered in this calculation. The reasoning for this is that replicas that take a long time to gracefully shut down should not prevent new replicas from starting, as that will intentionally introduce lower availability. Rollout plan: * New behavior is enabled by default. * Old behavior can be switched back on through the feature flag RAY_SERVE_STOP_FULLY_THEN_START_REPLICAS. * Old behavior will be removed in the next release. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@edoakes Tests are passing! |
[serve] Change stopping behavior
Change from stop-fully-then-start to stop-then-immediately-start.
More concretely, previously we would make sure that the total number of (running or moving towards running) replicas plus the number of stopping replicas never exceeds the target. This PR changes it so that the number of stopping replicas is not considered in this calculation. The reasoning for this is that replicas that take a long time to gracefully shut down should not prevent new replicas from starting, as that will intentionally introduce lower availability.
Rollout plan:
Signed-off-by: Cindy Zhang cindyzyx9@gmail.com
Stack created with Sapling. Best reviewed with ReviewStack.