Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Add exponential backoff for queue_len_response_deadline_s #42041

Merged
merged 7 commits into from
Dec 21, 2023

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Dec 20, 2023

Why are these changes needed?

We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled.

#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive).

This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions.

The max is set to 1s by default but can be configured using RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes changed the title [WIP][serve] [WIP][serve] backoff for queue len response Dec 20, 2023
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes changed the title [WIP][serve] backoff for queue len response [serve] Add exponential backoff for queue_len_response_deadline_s Dec 21, 2023
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes requested review from alexeykudinkin and a team December 21, 2023 17:06
@edoakes
Copy link
Contributor Author

edoakes commented Dec 21, 2023

Example of artificially injecting a 0.8s sleep into the response:

(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,159 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.1s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,361 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.2s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,815 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.4s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:21,718 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.8s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ServeReplica:default:A pid=33504) INFO 2023-12-21 11:06:22,688 default_A u3m0im2u c39176f8-7c1b-4966-bd5b-c45135dc3e46 / replica.py:745 - __CALL__ OK 0.1ms

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@GeneDer
Copy link
Contributor

GeneDer commented Dec 21, 2023

Can probably follow up by a doc change 🙃

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes merged commit bc1768b into ray-project:master Dec 21, 2023
9 checks passed
@edoakes edoakes self-assigned this Jan 2, 2024
edoakes added a commit to edoakes/ray that referenced this pull request Jan 4, 2024
…ay-project#42041)

We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled.

ray-project#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive).

This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions.

The max is set to 1s by default but can be configured using `RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S`.

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
can-anyscale pushed a commit that referenced this pull request Jan 5, 2024
Cherry-picks two PRs to address issues under high network delays:

[serve] Enable setting queue length response deadline via environment variable #42001
[serve] Add exponential backoff for queue_len_response_deadline_s #42041

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
vickytsang pushed a commit to ROCm/ray that referenced this pull request Jan 12, 2024
…ay-project#42041)

We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled.

ray-project#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive).

This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions.

The max is set to 1s by default but can be configured using `RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S`.

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants