[serve] Change stopping behavior #43187

zcin · 2024-02-15T02:31:19Z

[serve] Change stopping behavior

Change from stop-fully-then-start to stop-then-immediately-start.
More concretely, previously we would make sure that the total number of (running or moving towards running) replicas plus the number of stopping replicas never exceeds the target. This PR changes it so that the number of stopping replicas is not considered in this calculation. The reasoning for this is that replicas that take a long time to gracefully shut down should not prevent new replicas from starting, as that will intentionally introduce lower availability.

Rollout plan:

New behavior is enabled by default.
Old behavior can be switched back on through the feature flag RAY_SERVE_STOP_FULLY_THEN_START_REPLICAS.
Old behavior will be removed in the next release.

Signed-off-by: Cindy Zhang cindyzyx9@gmail.com

Stack created with Sapling. Best reviewed with ReviewStack.

edoakes

Looks good

python/ray/serve/_private/constants.py

edoakes · 2024-02-20T21:20:43Z

python/ray/serve/tests/test_autoscaling_policy.py

@@ -206,6 +206,7 @@ def __call__(self):
 @pytest.mark.skipif(sys.platform == "win32", reason="Failing on Windows.")
 @pytest.mark.parametrize("smoothing_factor", [1, 0.2])
 @pytest.mark.parametrize("use_upscale_downscale_config", [True, False])
+@mock.patch("ray.serve._private.router.HANDLE_METRIC_PUSH_INTERVAL_S", 1)


why's this needed?

In this test (for RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0) when the deployment scales back down to 0, the last metric report pushed from the handle is often a non-zero number (because the push interval is set to 10 seconds). So for the next 10 seconds the # replicas keeps oscillating between 0 and 1 because of the outdated metric from the handle. I'm not sure why we never ran into this before, but this seems like a totally reasonable scenario so I set the handle push interval lower to avoid making the test wait longer.

edoakes · 2024-02-20T21:22:30Z

python/ray/serve/tests/test_max_replicas_per_node.py

+        # We wait for this to be satisfied at the end because there may be
+        # more than 3 worker nodes after the deployment finishes deploying,
+        # since replicas are being started and stopped at the same time, and
+        # there is a strict max replicas per node requirement. However nodes


This seems like a little bit of a nasty interaction. This may cause excessive resource fragmentation in some cases since we'll start a new node for the replacement replica, then the old replica will get removed.

I suppose the defragmentation work will address it...

Yup, this is definitely a case where this behavior change will have negative side effects... and agreed, we should aim to address this with the defragmentation work.

Change from stop-fully-then-start to stop-then-immediately-start. More concretely, previously we would make sure that the total number of (running or moving towards running) replicas **plus** the number of stopping replicas never exceeds the target. This PR changes it so that the number of stopping replicas is not considered in this calculation. The reasoning for this is that replicas that take a long time to gracefully shut down should not prevent new replicas from starting, as that will intentionally introduce lower availability. Rollout plan: * New behavior is enabled by default. * Old behavior can be switched back on through the feature flag RAY_SERVE_STOP_FULLY_THEN_START_REPLICAS. * Old behavior will be removed in the next release. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

zcin · 2024-02-21T01:07:50Z

@edoakes Tests are passing!

zcin mentioned this pull request Feb 15, 2024

[serve] test deployment state manager refactor #43186

Merged

zcin force-pushed the pr43187 branch 7 times, most recently from 776430d to d9f7ca8 Compare February 16, 2024 06:23

zcin mentioned this pull request Feb 16, 2024

[serve] safe draining #43228

Merged

zcin force-pushed the pr43187 branch 12 times, most recently from b879e2c to 9610bdd Compare February 18, 2024 03:12

zcin self-assigned this Feb 20, 2024

zcin force-pushed the pr43187 branch from 9610bdd to 5fb066b Compare February 20, 2024 18:14

zcin marked this pull request as ready for review February 20, 2024 18:57

zcin requested a review from edoakes February 20, 2024 18:57

zcin force-pushed the pr43187 branch from 5fb066b to 212c751 Compare February 20, 2024 19:00

edoakes reviewed Feb 20, 2024

View reviewed changes

zcin force-pushed the pr43187 branch 2 times, most recently from 5ae7a4b to 8aba50a Compare February 20, 2024 22:10

zcin force-pushed the pr43187 branch from 8aba50a to 04a9f15 Compare February 20, 2024 22:12

edoakes approved these changes Feb 20, 2024

View reviewed changes

edoakes merged commit e2d9f42 into ray-project:master Feb 21, 2024
10 checks passed

zcin mentioned this pull request Feb 28, 2024

[serve] replicas that are being gracefully stopped block new replicas from starting #43030

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Change stopping behavior #43187

[serve] Change stopping behavior #43187

zcin commented Feb 15, 2024 •

edited

edoakes left a comment

edoakes Feb 20, 2024

zcin Feb 20, 2024 •

edited

edoakes Feb 20, 2024

zcin Feb 20, 2024

zcin commented Feb 21, 2024

[serve] Change stopping behavior #43187

[serve] Change stopping behavior #43187

Conversation

zcin commented Feb 15, 2024 • edited

edoakes left a comment

Choose a reason for hiding this comment

edoakes Feb 20, 2024

Choose a reason for hiding this comment

zcin Feb 20, 2024 • edited

Choose a reason for hiding this comment

edoakes Feb 20, 2024

Choose a reason for hiding this comment

zcin Feb 20, 2024

Choose a reason for hiding this comment

zcin commented Feb 21, 2024

zcin commented Feb 15, 2024 •

edited

zcin Feb 20, 2024 •

edited