Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Optimize cold start time #38351

Merged
merged 7 commits into from Aug 17, 2023
Merged

Conversation

zcin
Copy link
Contributor

@zcin zcin commented Aug 11, 2023

Why are these changes needed?

Small optimization that makes handles immediately send reports to the controller if the current state is "idle":

  • handle has no queued requests
  • there are 0 replicas for the deployment

This makes it so that the first request sent to a scaled-to-zero deployment doesn't have to wait for the every-10-second metric push.

Ran some tests below for comparison. The load test is ramp 0 -> 10 users -> 50 users -> 100 users. You can see this greatly helps the first ramp up from 0 -> 10 users, and doesn't really affect anything else.

Original Improved
image Screen Shot 2023-08-16 at 5 56 17 PM
Screen Shot 2023-08-16 at 8 59 55 PM

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
@zcin zcin marked this pull request as ready for review August 17, 2023 17:39
@zcin zcin requested a review from a team August 17, 2023 17:40
@edoakes edoakes merged commit e79c914 into ray-project:master Aug 17, 2023
34 of 36 checks passed
edoakes pushed a commit that referenced this pull request Aug 17, 2023
Issue caused by two PRs merged close in time to each other.

#38416 was merged first, and #38351 didn't have the updated changes. Issue: `self.deployment_name` no longer exists.
vitsai pushed a commit to vitsai/ray that referenced this pull request Aug 19, 2023
Small optimization that makes handles immediately send reports to the controller if the current state is "idle":
- handle has no queued requests
- there are 0 replicas for the deployment

This makes it so that the first request sent to a scaled-to-zero deployment doesn't have to wait for the every-10-second metric push.

Ran some tests below for comparison. The load test is ramp 0 -> 10 users -> 50 users -> 100 users. You can see this greatly helps the first ramp up from 0 -> 10 users, and doesn't really affect anything else.
vitsai pushed a commit to vitsai/ray that referenced this pull request Aug 19, 2023
Issue caused by two PRs merged close in time to each other.

ray-project#38416 was merged first, and ray-project#38351 didn't have the updated changes. Issue: `self.deployment_name` no longer exists.
@zcin zcin deleted the autoscale-cold-start branch August 25, 2023 17:33
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Small optimization that makes handles immediately send reports to the controller if the current state is "idle":
- handle has no queued requests
- there are 0 replicas for the deployment

This makes it so that the first request sent to a scaled-to-zero deployment doesn't have to wait for the every-10-second metric push.

Ran some tests below for comparison. The load test is ramp 0 -> 10 users -> 50 users -> 100 users. You can see this greatly helps the first ramp up from 0 -> 10 users, and doesn't really affect anything else.

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Issue caused by two PRs merged close in time to each other.

ray-project#38416 was merged first, and ray-project#38351 didn't have the updated changes. Issue: `self.deployment_name` no longer exists.
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Small optimization that makes handles immediately send reports to the controller if the current state is "idle":
- handle has no queued requests
- there are 0 replicas for the deployment

This makes it so that the first request sent to a scaled-to-zero deployment doesn't have to wait for the every-10-second metric push.

Ran some tests below for comparison. The load test is ramp 0 -> 10 users -> 50 users -> 100 users. You can see this greatly helps the first ramp up from 0 -> 10 users, and doesn't really affect anything else.

Signed-off-by: Victor <vctr.y.m@example.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Issue caused by two PRs merged close in time to each other.

ray-project#38416 was merged first, and ray-project#38351 didn't have the updated changes. Issue: `self.deployment_name` no longer exists.
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants