Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][SkyServe] Autoscaling doc for SkyServe #2989

Merged
merged 10 commits into from
Jan 20, 2024
Merged

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Jan 16, 2024

Blocked by #2995 .

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

docs/source/serving/autoscaling.rst Outdated Show resolved Hide resolved
docs/source/serving/autoscaling.rst Outdated Show resolved Hide resolved
cblmemo and others added 2 commits January 18, 2024 00:51
Copy link
Collaborator

@MaoZiming MaoZiming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a few nits

docs/source/serving/autoscaling.rst Outdated Show resolved Hide resolved
docs/source/serving/autoscaling.rst Outdated Show resolved Hide resolved
cblmemo and others added 3 commits January 18, 2024 17:30
Co-authored-by: Ziming Mao <ziming.mao@yale.edu>
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Made some changes, PTAL @cblmemo @MaoZiming.


# ...

The service will scale down all replicas when there is no traffic to the system and will save costs on idle replicas. In this case, the scale up will be faster when the system has no replicas: it will **scale up immediately if any traffic detected**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does

In this case, the scale up will be faster when the system has no replicas:

mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means when the service has no replica, user traffic will trigger an immediate scale-up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., the upscale delay is ignored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. it will immediately change Ntar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rephrasing: When upscaling from zero, the upscale delay will be ignored in order to bring up the service faster.

@cblmemo
Copy link
Collaborator Author

cblmemo commented Jan 20, 2024

@concretevitamin Those changes look great to me! Added a line on how to adjust scaling delays.

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

docs/source/serving/autoscaling.rst Outdated Show resolved Hide resolved

# ...

The service will scale down all replicas when there is no traffic to the system and will save costs on idle replicas. In this case, the scale up will be faster when the system has no replicas: it will **scale up immediately if any traffic detected**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rephrasing: When upscaling from zero, the upscale delay will be ignored in order to bring up the service faster.

cblmemo and others added 2 commits January 20, 2024 11:48
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
@cblmemo cblmemo merged commit 351916f into master Jan 20, 2024
19 checks passed
@cblmemo cblmemo deleted the serve-autoscaling-doc branch January 20, 2024 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants