[serve] Clip autoscaling aggregation window to shed ramp overshoot#64509
Open
johntaylor-cell wants to merge 4 commits into
Open
[serve] Clip autoscaling aggregation window to shed ramp overshoot#64509johntaylor-cell wants to merge 4 commits into
johntaylor-cell wants to merge 4 commits into
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to clip the autoscaling aggregation window to the most recent N seconds using a new environment variable RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S. This prevents stale ramp transients from skewing the aggregation mean. Feedback suggests preserving the None state in _clip_window_start when no clipping is needed, updating the unit tests accordingly, and using the existing get_env_float_non_negative helper to parse the environment variable consistently.
9b127a6 to
0a4a505
Compare
0a4a505 to
b4dc994
Compare
Under free autoscaling with target_ongoing_requests=1, the desired replica count tracks get_total_num_requests -- a ~30s time-weighted mean. During a ramp the queued high-water mark is averaged into that mean for ~30s after the ramp settles, so the total reads above the true steady-state count, the deployment never leaves "_in_transition", and (with downscale_delay >> the marination window) the overshoot is not shed. Add RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S (default 0 = off): when > 0, clip the aggregation window_start to the most recent N seconds so the stale ramp transient is not averaged in. Implemented as a pure module-level helper _clip_window_start and applied in _merge_and_aggregate_timeseries right after the partial-period exclusion. Behavior is unchanged with the flag off (default). The helper mirrors the clip used by the columnar autoscaling-ingest merge path; when that lands the two share one helper so the timeseries and columnar clips stay in lockstep. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: john.taylor <john.taylor@anyscale.com>
b4dc994 to
280409c
Compare
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: john.taylor <john.taylor@anyscale.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: john.taylor <john.taylor@anyscale.com>
…ardinality_metric_tags timeout; unrelated) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: john.taylor <john.taylor@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Under free autoscaling with target_ongoing_requests=1, the desired replica count tracks get_total_num_requests -- a ~30s time-weighted mean. During a ramp the queued high-water mark is averaged into that mean for ~30s after the ramp settles, so the total reads above the true steady-state count, the deployment never leaves "_in_transition", and (with downscale_delay >> the marination window) the overshoot is not shed.
Add RAY_SERVE_AUTOSCALE_CLIP_WINDOW_S (default 0 = off): when > 0, clip the aggregation window_start to the most recent N seconds so the stale ramp transient is not averaged in. Implemented as a pure module-level helper _clip_window_start and applied in _merge_and_aggregate_timeseries right after the partial-period exclusion.
Behavior is unchanged with the flag off (default). The helper mirrors the clip used by the columnar autoscaling-ingest merge path; when that lands the two share one helper so the timeseries and columnar clips stay in lockstep.