Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SkyServe] Update Autoscaler Decision, Use Target QPS, Deprecate Auto Restart #2878

Merged
merged 30 commits into from
Dec 27, 2023

Conversation

MaoZiming
Copy link
Collaborator

@MaoZiming MaoZiming commented Dec 16, 2023

Changes:

  • Update the AutoscalerDecision;
  • Change the type of return value for evaluate_scaling (i.e., List[AutoscalerDecision]);
  • Using the same interface (target_qps instead of {upper,lower}_threshold) as spot policy;
  • Migrate the _get_desired_num_replicas function.
  • Deprecate auto_restart

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Manual Test: sky serve up tests/test_yamls/test_serve_autoscaler.yaml followed by python3 tests/test_serve_autoscaler.py
  • pytest tests/test_smoke.py:test_skyserve_gcp_http
  • pytest tests/test_smoke.py::test_skyserve_auto_restart
  • All smoke tests: pytest tests/test_smoke.py

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for migrating this!! Left several comments 🫡

docs/source/serving/service-yaml-spec.rst Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/controller.py Outdated Show resolved Hide resolved
sky/serve/controller.py Outdated Show resolved Hide resolved
sky/utils/schemas.py Outdated Show resolved Hide resolved
@MaoZiming
Copy link
Collaborator Author

@cblmemo Let's use the resource_override_dict as in the spot policy. Updated

@MaoZiming
Copy link
Collaborator Author

        if not auto_restart:
            with ux_utils.print_exception_no_traceback():
                raise ValueError('auto_restart=False is deprecated.')

Deprecate auto_restart in service_spec.py

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix!!! 🚀 Left some comments.

docs/source/serving/service-yaml-spec.rst Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/replica_managers.py Outdated Show resolved Hide resolved
sky/serve/serve_state.py Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
@MaoZiming
Copy link
Collaborator Author

@cblmemo Thanks for the quick reviews!

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix!! Sorry for the delay, left some comments (mostly nits). After these the PR should be ready to go!

docs/source/serving/service-yaml-spec.rst Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/utils/schemas.py Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/constants.py Outdated Show resolved Hide resolved
sky/serve/serve_utils.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Show resolved Hide resolved
sky/utils/schemas.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!! More comments, mostly nits ;)

sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
@MaoZiming
Copy link
Collaborator Author

@cblmemo Thanks for the comments! fixed

@MaoZiming MaoZiming changed the title [SkyServe] Update Autoscaler Decision, Use Target QPS [SkyServe] Update Autoscaler Decision, Use Target QPS, Deprecate Auto Restart Dec 22, 2023
@MaoZiming MaoZiming mentioned this pull request Dec 22, 2023
5 tasks
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the PR @MaoZiming! This will be a great improvement to the UX of the serving autoscaling policy. Left several comments.

docs/source/serving/service-yaml-spec.rst Outdated Show resolved Hide resolved
docs/source/serving/service-yaml-spec.rst Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
@MaoZiming
Copy link
Collaborator Author

@Michaelvll Thanks! PTAL

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this important change @MaoZiming! LGTM. Left several nits. : )

docs/source/serving/service-yaml-spec.rst Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Show resolved Hide resolved
sky/serve/replica_managers.py Outdated Show resolved Hide resolved
sky/serve/replica_managers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome work!! 🚀 It looks great. Left some nits 🫡 Please make sure all smoke tests passed before merging it 👀

sky/serve/autoscalers.py Outdated Show resolved Hide resolved
docs/source/serving/service-yaml-spec.rst Outdated Show resolved Hide resolved
docs/source/serving/service-yaml-spec.rst Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Outdated Show resolved Hide resolved
sky/serve/service_spec.py Show resolved Hide resolved
sky/utils/schemas.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we still have a round-robin test for 3 replicas?

Copy link
Collaborator Author

@MaoZiming MaoZiming Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might not need the smoke test for round-robin, as now the default behavior for failed replicas is auto-restart.
We already have the test_skyserve_auto_restart

@MaoZiming
Copy link
Collaborator Author

@cblmemo Thanks for the comments! All smoke tests passed.

@MaoZiming MaoZiming merged commit 708ad97 into master Dec 27, 2023
19 checks passed
@MaoZiming MaoZiming deleted the autoscaler-api branch December 27, 2023 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants