-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SkyServe] Update Autoscaler Decision, Use Target QPS, Deprecate Auto Restart #2878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for migrating this!! Left several comments 🫡
@cblmemo Let's use the resource_override_dict as in the spot policy. Updated |
Deprecate auto_restart in service_spec.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix!!! 🚀 Left some comments.
@cblmemo Thanks for the quick reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix!! Sorry for the delay, left some comments (mostly nits). After these the PR should be ready to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!! More comments, mostly nits ;)
@cblmemo Thanks for the comments! fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting the PR @MaoZiming! This will be a great improvement to the UX of the serving autoscaling policy. Left several comments.
@Michaelvll Thanks! PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this important change @MaoZiming! LGTM. Left several nits. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome work!! 🚀 It looks great. Left some nits 🫡 Please make sure all smoke tests passed before merging it 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still have a round-robin test for 3 replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might not need the smoke test for round-robin, as now the default behavior for failed replicas is auto-restart.
We already have the test_skyserve_auto_restart
@cblmemo Thanks for the comments! All smoke tests passed. |
Changes:
AutoscalerDecision
;evaluate_scaling
(i.e.,List[AutoscalerDecision]
);target_qps
instead of{upper,lower}_threshold
) as spot policy;_get_desired_num_replicas
function.auto_restart
Tested (run the relevant ones):
bash format.sh
sky serve up tests/test_yamls/test_serve_autoscaler.yaml
followed bypython3 tests/test_serve_autoscaler.py
pytest tests/test_smoke.py:test_skyserve_gcp_http
pytest tests/test_smoke.py::test_skyserve_auto_restart
pytest tests/test_smoke.py