-
Notifications
You must be signed in to change notification settings - Fork 154
Adds information about the importance of adaptive allocations #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Left a couple of comments and suggestions.
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
Thank you! I applied your suggestions in my latest commit. |
Thanks! I think there are a few different aspects to this we will want to cover (cc: @arisonl @shubhaat ) Adaptive resources is enabled (from the UI):
Adaptive resources is disabled (from the UI):
Adaptive allocations enabled (from the API):
Adaptive allocations disabled (from the API):
|
Some things struck me here:
Please disregard if the linked page contains the full details and we're happy to have general overview here :) |
Thank you @ppf2 and @leemthompo for all of your suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor suggestion, otherwise LGTM.
Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
Based on my conversation with Pius, I’ve updated my PR as follows:
@shubhaat @arisonl I’d love to hear your thoughts - is this a good way to communicate how adaptive allocations work at this point in the guide? cc @ppf2 |
|
||
For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation. | ||
|
||
::::{note} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're delegating to the trained model page, should this note still live here or would it make sense to move that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you think it would make more sense to place it?
IMO it should live here because it links to the Trained model autoscaling page, where the adaptive allocations information is located. But maybe I'm misunderstanding your comment? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I agree it's good to have cost warning here. But I'd also expect this to be on the main page we're linking to, so I was just wondering if that duplication is OK.
- I'm not sure about the current placement of the note.
- Here's my thinking: The
::::{note}
goes into details about pricing implications but we've already sent readers to another page with "For more information about adaptive allocations and resources, refer to the trained model autoscaling documentation." So the flow feels wrong. I'd argue for moving the cost note before the link. And also perhaps it should beimportant
given it deals with real cost implications.
- Here's my thinking: The
- Nit: I think the wording could be more concise too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought you were referring to the sentence before the note - but it totally makes sense now.
My point is that we should include it here because the support ticket that raised this issue were about customers using inference services without realizing the cost implications.
Part of this is already mentioned on the trained model autoscaling page, specifically:
Note: If you set the minimum number of allocations to 1, you will be charged even if the system is not using those resources.
In this case, I think the duplication is fine, as it's important to warn users about potential costs (according to the issue, this information was missing from this page.)
You're absolutely right that placing the note before that sentence interrupts the flow - I’ll move it and rework the wording a bit.
Thanks so much for your suggestions!
I think these changes are a good first step, eventually I think we want to hide some of these settings on serverless completely and simplify the UI and choices for customers. Very few users on serverless see the Trained models UI, so my guess is very few read the docs as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
📸 Preveiw
Description
This PR updates the Inference integration documentation to:
Related issue: #1393