-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] fix lightweight update max ongoing requests #45006
Conversation
ca48357
to
fe48404
Compare
When a lightweight update occurs for a deployment and `max_ongoing_requests` is updated, two components need to be notified: 1. Deployment handles, to know not to send more requests to a replica when it's reached its maximum 2. Replicas, to know to reject requests when it's reached its maximum Right now we handle (1), but we don't handle (2), i.e. replicas aren't notified of the updated `max_ongoing_requests` for lightweight updates. The problem is that (1) is not strict enforcement of `max_ongoing_requests` since it relies on a cache that can be stale, so the current bug is that replicas aren't updated -> updated max is not fully enforced. This PR fixes that, and updates a test to fully test this behavior. Fixes ray-project#44975. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
@@ -117,7 +117,7 @@ class DeploymentConfig(BaseModel): | |||
) | |||
max_ongoing_requests: PositiveInt = Field( | |||
default=DEFAULT_MAX_ONGOING_REQUESTS, | |||
update_type=DeploymentOptionUpdateType.NeedsReconfigure, | |||
update_type=DeploymentOptionUpdateType.NeedsActorReconfigure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior if the number of queued requests exceeds the new max? Does the replica simply drain the queue until it reaches the new max before accepting new requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
[serve] fix lightweight update max ongoing requests
When a lightweight update occurs for a deployment and
max_ongoing_requests
is updated, two components need to be notified:Right now we handle (1), but we don't handle (2), i.e. replicas aren't notified of the updated
max_ongoing_requests
for lightweight updates. The problem is that (1) is not strict enforcement ofmax_ongoing_requests
since it relies on a cache that can be stale, so the current bug is that replicas aren't updated -> updated max is not fully enforced.This PR fixes that, and updates a test to fully test this behavior.
Fixes #44975.
Signed-off-by: Cindy Zhang cindyzyx9@gmail.com