Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] fix lightweight update max ongoing requests #45006

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

zcin
Copy link
Contributor

@zcin zcin commented Apr 27, 2024

[serve] fix lightweight update max ongoing requests

When a lightweight update occurs for a deployment and max_ongoing_requests is updated, two components need to be notified:

  1. Deployment handles, to know not to send more requests to a replica when it's reached its maximum
  2. Replicas, to know to reject requests when it's reached its maximum

Right now we handle (1), but we don't handle (2), i.e. replicas aren't notified of the updated max_ongoing_requests for lightweight updates. The problem is that (1) is not strict enforcement of max_ongoing_requests since it relies on a cache that can be stale, so the current bug is that replicas aren't updated -> updated max is not fully enforced.

This PR fixes that, and updates a test to fully test this behavior.

Fixes #44975.

Signed-off-by: Cindy Zhang cindyzyx9@gmail.com

@zcin zcin changed the title [serve] fix bug [serve] fix lightweight update max ongoing requests Apr 27, 2024
@zcin zcin force-pushed the pr45006 branch 2 times, most recently from ca48357 to fe48404 Compare April 27, 2024 05:31
When a lightweight update occurs for a deployment and `max_ongoing_requests` is updated, two components need to be notified:
1. Deployment handles, to know not to send more requests to a replica when it's reached its maximum
2. Replicas, to know to reject requests when it's reached its maximum

Right now we handle (1), but we don't handle (2), i.e. replicas aren't notified of the updated `max_ongoing_requests` for lightweight updates. The problem is that (1) is not strict enforcement of `max_ongoing_requests` since it relies on a cache that can be stale, so the current bug is that replicas aren't updated -> updated max is not fully enforced.

This PR fixes that, and updates a test to fully test this behavior.

Fixes ray-project#44975.


Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin marked this pull request as ready for review April 29, 2024 16:26
@zcin zcin requested a review from a team April 29, 2024 16:26
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@@ -117,7 +117,7 @@ class DeploymentConfig(BaseModel):
)
max_ongoing_requests: PositiveInt = Field(
default=DEFAULT_MAX_ONGOING_REQUESTS,
update_type=DeploymentOptionUpdateType.NeedsReconfigure,
update_type=DeploymentOptionUpdateType.NeedsActorReconfigure,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior if the number of queued requests exceeds the new max? Does the replica simply drain the queue until it reaches the new max before accepting new requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@edoakes edoakes merged commit f59c553 into ray-project:master Apr 29, 2024
6 checks passed
@zcin zcin self-assigned this Apr 30, 2024
harborn pushed a commit to harborn/ray that referenced this pull request May 8, 2024
[serve] fix lightweight update max ongoing requests

When a lightweight update occurs for a deployment and
`max_ongoing_requests` is updated, two components need to be notified:
1. Deployment handles, to know not to send more requests to a replica
when it's reached its maximum
2. Replicas, to know to reject requests when it's reached its maximum

Right now we handle (1), but we don't handle (2), i.e. replicas aren't
notified of the updated `max_ongoing_requests` for lightweight updates.
The problem is that (1) is not strict enforcement of
`max_ongoing_requests` since it relies on a cache that can be stale, so
the current bug is that replicas aren't updated -> updated max is not
fully enforced.

This PR fixes that, and updates a test to fully test this behavior.

Fixes ray-project#44975.


Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
[serve] fix lightweight update max ongoing requests

When a lightweight update occurs for a deployment and
`max_ongoing_requests` is updated, two components need to be notified:
1. Deployment handles, to know not to send more requests to a replica
when it's reached its maximum
2. Replicas, to know to reject requests when it's reached its maximum

Right now we handle (1), but we don't handle (2), i.e. replicas aren't
notified of the updated `max_ongoing_requests` for lightweight updates.
The problem is that (1) is not strict enforcement of
`max_ongoing_requests` since it relies on a cache that can be stale, so
the current bug is that replicas aren't updated -> updated max is not
fully enforced.

This PR fixes that, and updates a test to fully test this behavior.

Fixes ray-project#44975.


Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[serve] strict enforcement of max ongoing requests doesn't work with lightweight update
4 participants