[serve] Add exponential backoff for `queue_len_response_deadline_s` #42041

edoakes · 2023-12-20T18:07:38Z

Why are these changes needed?

We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled.

#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive).

This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions.

The max is set to 1s by default but can be configured using RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ica-queue-timeout-backoff

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes · 2023-12-21T17:07:02Z

Example of artificially injecting a 0.8s sleep into the response:

(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,159 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.1s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,361 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.2s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:20,815 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.4s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ProxyActor pid=33502) WARNING 2023-12-21 11:06:21,718 proxy 127.0.0.1 router.py:740 - Failed to get queue length from replica default#A#u3m0im2u within 0.8s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the `RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S` environment variable.
(ServeReplica:default:A pid=33504) INFO 2023-12-21 11:06:22,688 default_A u3m0im2u c39176f8-7c1b-4966-bd5b-c45135dc3e46 / replica.py:745 - __CALL__ OK 0.1ms

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

GeneDer

LGTM!

GeneDer · 2023-12-21T18:27:16Z

Can probably follow up by a doc change 🙃

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ay-project#42041) We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled. ray-project#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive). This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions. The max is set to 1s by default but can be configured using `RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S`. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Cherry-picks two PRs to address issues under high network delays: [serve] Enable setting queue length response deadline via environment variable #42001 [serve] Add exponential backoff for queue_len_response_deadline_s #42041 Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ay-project#42041) We currently have a flat deadline of 0.1s (by default). Under heavy load or high network latency conditions, this deadline might be consistently missed and cause requests to pile up because they're unable to be scheduled. ray-project#42001 made this deadline configurable, but setting it high by default defeats its purpose (to reduce tail latency when a single replica is overloaded/blocked/unresponsive). This change backs off the deadline exponentially so the initial deadline can still be low while avoiding "halting" under degraded conditions. The max is set to 1s by default but can be configured using `RAY_SERVE_MAX_QUEUE_LENGTH_RESPONSE_DEADLINE_S`. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

fix

6c4ecda

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes changed the title ~~[WIP][serve]~~ [WIP][serve] backoff for queue len response Dec 20, 2023

edoakes added 3 commits December 20, 2023 13:34

fix

584da0f

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into repl…

d01e5b5

…ica-queue-timeout-backoff

fix

754e5bb

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes changed the title ~~[WIP][serve] backoff for queue len response~~ [serve] Add exponential backoff for queue_len_response_deadline_s Dec 21, 2023

edoakes added the v2.9.1-pick label Dec 21, 2023

fix

2726b43

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes requested review from alexeykudinkin and a team December 21, 2023 17:06

shrekris-anyscale approved these changes Dec 21, 2023

View reviewed changes

fix

7e73910

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

GeneDer approved these changes Dec 21, 2023

View reviewed changes

fix

16aecbe

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes merged commit bc1768b into ray-project:master Dec 21, 2023
9 checks passed

edoakes self-assigned this Jan 2, 2024

edoakes mentioned this pull request Jan 4, 2024

[serve] Cherry-pick queue length deadline configuration changes #42176

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Add exponential backoff for `queue_len_response_deadline_s` #42041

[serve] Add exponential backoff for `queue_len_response_deadline_s` #42041

edoakes commented Dec 20, 2023 •

edited

Loading

edoakes commented Dec 21, 2023

GeneDer left a comment

GeneDer commented Dec 21, 2023

[serve] Add exponential backoff for queue_len_response_deadline_s #42041

[serve] Add exponential backoff for queue_len_response_deadline_s #42041

Conversation

edoakes commented Dec 20, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

edoakes commented Dec 21, 2023

GeneDer left a comment

Choose a reason for hiding this comment

GeneDer commented Dec 21, 2023

[serve] Add exponential backoff for `queue_len_response_deadline_s` #42041

[serve] Add exponential backoff for `queue_len_response_deadline_s` #42041

edoakes commented Dec 20, 2023 •

edited

Loading