-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Use placement groups to bypass autoscaler throttling #13844
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we are turning off admission control in create_backend right? Anyway to update the documentation about this or some sort of create_backend(_creation_timeout=...)
so it won't block?
@simon-mo yeah that's right. I think we should add a top-level flag that toggles whether we block on the operation or not. |
FYI waiting to merge this because it causes a bunch of spam about the actors not being schedulable due to a placement group implementation detail (cc @rkooo567) |
Why are these changes needed?
Creates a placement group for each actor to bypass the autoscaler's throttling in autoscaling clusters.
Also removes the resource check now that we can gracefully handle incrementally scaling backends. I added a warning message that will print when backends take a long time to startup to avoid user confusion if things "just hang."
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.