Skip to content

[serve] Clip autoscaling aggregation window to shed ramp overshoot#64509

Open
johntaylor-cell wants to merge 4 commits into
ray-project:masterfrom
johntaylor-cell:serve-autoscale-clip
Open

[serve] Clip autoscaling aggregation window to shed ramp overshoot#64509
johntaylor-cell wants to merge 4 commits into
ray-project:masterfrom
johntaylor-cell:serve-autoscale-clip

Conversation

@johntaylor-cell

Copy link
Copy Markdown
Contributor

Under free autoscaling with target_ongoing_requests=1, the desired replica count tracks get_total_num_requests -- a ~30s time-weighted mean. During a ramp the queued high-water mark is averaged into that mean for ~30s after the ramp settles, so the total reads above the true steady-state count, the deployment never leaves "_in_transition", and (with downscale_delay >> the marination window) the overshoot is not shed.

Add RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S (default 0 = off): when > 0, clip the aggregation window_start to the most recent N seconds so the stale ramp transient is not averaged in. Implemented as a pure module-level helper _clip_window_start and applied in _merge_and_aggregate_timeseries right after the partial-period exclusion.

Behavior is unchanged with the flag off (default). The helper mirrors the clip used by the columnar autoscaling-ingest merge path; when that lands the two share one helper so the timeseries and columnar clips stay in lockstep.

@johntaylor-cell johntaylor-cell requested a review from a team as a code owner July 2, 2026 15:57

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to clip the autoscaling aggregation window to the most recent N seconds using a new environment variable RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S. This prevents stale ramp transients from skewing the aggregation mean. Feedback suggests preserving the None state in _clip_window_start when no clipping is needed, updating the unit tests accordingly, and using the existing get_env_float_non_negative helper to parse the environment variable consistently.

Comment thread python/ray/serve/_private/autoscaling_state.py Outdated
Comment thread python/ray/serve/tests/unit/test_autoscale_clip_window.py Outdated
Comment thread python/ray/serve/_private/constants.py Outdated
@johntaylor-cell johntaylor-cell self-assigned this Jul 2, 2026
Under free autoscaling with target_ongoing_requests=1, the desired replica count
tracks get_total_num_requests -- a ~30s time-weighted mean. During a ramp the
queued high-water mark is averaged into that mean for ~30s after the ramp settles,
so the total reads above the true steady-state count, the deployment never leaves
"_in_transition", and (with downscale_delay >> the marination window) the overshoot
is not shed.

Add RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S (default 0 = off): when > 0, clip the
aggregation window_start to the most recent N seconds so the stale ramp transient
is not averaged in. Implemented as a pure module-level helper _clip_window_start
and applied in _merge_and_aggregate_timeseries right after the partial-period
exclusion.

Behavior is unchanged with the flag off (default). The helper mirrors the clip used
by the columnar autoscaling-ingest merge path; when that lands the two share one
helper so the timeseries and columnar clips stay in lockstep.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: john.taylor <john.taylor@anyscale.com>
@johntaylor-cell johntaylor-cell added the go add ONLY when ready to merge, run all tests label Jul 2, 2026
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Jul 2, 2026
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: john.taylor <john.taylor@anyscale.com>
johntaylor-cell and others added 2 commits July 3, 2026 11:38
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: john.taylor <john.taylor@anyscale.com>
…ardinality_metric_tags timeout; unrelated)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: john.taylor <john.taylor@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests performance serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant