Adds information about the importance of adaptive allocations #1454

kosabogi · 2025-05-22T08:53:54Z

📸 Preveiw

Description

This PR updates the Inference integration documentation to:

Clearly state that not enabling adaptive allocations can result in unnecessary resource usage and higher costs.
Expand the scope of the page to cover not only third-party service integrations, but also the Elasticsearch service.

Related issue: #1393

szabosteve

It looks great! Left a couple of comments and suggestions.

explore-analyze/elastic-inference/inference-api.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

kosabogi · 2025-05-22T12:07:21Z

It looks great! Left a couple of comments and suggestions.

Thank you! I applied your suggestions in my latest commit.

ppf2 · 2025-05-23T19:53:22Z

Thanks! I think there are a few different aspects to this we will want to cover (cc: @arisonl @shubhaat )

Adaptive resources is enabled (from the UI):

Depending on the usage level selected, whether it is configured to search/ingest optimized and the Platform Type (ECH/ECE vs. Serverless), it may or may not autoscale down to 0 allocations when the load is low.

Adaptive resources is disabled (from the UI):

Even at the low usage level, there will still be at least 1 or 2 allocations depending on search/ingest optimized.

Adaptive allocations enabled (from the API):

If enabled, model allocations can scale down to 0 when the load is low unless the user has explicitly specified a >0 min_number_of_allocations setting.

Adaptive allocations disabled (from the API):

User defines the num_allocations used by the model.

leemthompo · 2025-05-26T10:13:05Z

Some things struck me here:

the separation between UI and API tabbed sections seems somewhat arbitrary since both are constrained by the same platform-specific infrastructure realities
the format forces readers to mentally cross-reference three variables (usage level, optimization type, platform) across multiple paragraphs
perhaps we could replace the entire tabbed prose section with a single table?
- some of the prose is vague and requires guesswork— might be better defined explicitly

Please disregard if the linked page contains the full details and we're happy to have general overview here :)

kosabogi · 2025-05-26T11:50:25Z

Some things struck me here:

the separation between UI and API tabbed sections seems somewhat arbitrary since both are constrained by the same platform-specific infrastructure realities

the format forces readers to mentally cross-reference three variables (usage level, optimization type, platform) across multiple paragraphs

perhaps we could replace the entire tabbed prose section with a single table?

some of the prose is vague and requires guesswork— might be better defined explicitly

Please disregard if the linked page contains the full details and we're happy to have general overview here :)

Thank you @ppf2 and @leemthompo for all of your suggestions!
I've updated the Adaptive allocations section by rewriting the content as a table to make it easier to scan and compare configurations across platform, usage level, and optimization type.
Let me know what you think!

alaudazzi

I left a minor suggestion, otherwise LGTM.

explore-analyze/elastic-inference/inference-api.md

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

leemthompo · 2025-05-27T08:43:49Z

Thanks @kosabogi might be nice to get a final 👀 from @ppf2 and @shubhaat before merging :)

kosabogi · 2025-06-03T10:47:10Z

Based on my conversation with Pius, I’ve updated my PR as follows:

Removed the tables to avoid duplicating the content from the Trained model autoscaling page. Instead, I added links to the relevant sections so users can easily access the autoscaling settings guidance.
Added a note to inform users about the implications of setting min_number_of_allocations greater than 0 when adaptive allocations are enabled.

@shubhaat @arisonl I’d love to hear your thoughts - is this a good way to communicate how adaptive allocations work at this point in the guide?

cc @ppf2

leemthompo · 2025-06-03T12:21:34Z

explore-analyze/elastic-inference/inference-api.md

+
+For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation.
+
+::::{note}


If we're delegating to the trained model page, should this note still live here or would it make sense to move that too?

Where do you think it would make more sense to place it?
IMO it should live here because it links to the Trained model autoscaling page, where the adaptive allocations information is located. But maybe I'm misunderstanding your comment? 🤔

I agree it's good to have cost warning here. But I'd also expect this to be on the main page we're linking to, so I was just wondering if that duplication is OK.

I'm not sure about the current placement of the note.

Here's my thinking: The ::::{note} goes into details about pricing implications but we've already sent readers to another page with "For more information about adaptive allocations and resources, refer to the trained model autoscaling documentation." So the flow feels wrong. I'd argue for moving the cost note before the link. And also perhaps it should be important given it deals with real cost implications.

Nit: I think the wording could be more concise too.

Sorry, I thought you were referring to the sentence before the note - but it totally makes sense now.

My point is that we should include it here because the support ticket that raised this issue were about customers using inference services without realizing the cost implications.
Part of this is already mentioned on the trained model autoscaling page, specifically:

Note: If you set the minimum number of allocations to 1, you will be charged even if the system is not using those resources.

In this case, I think the duplication is fine, as it's important to warn users about potential costs (according to the issue, this information was missing from this page.)

You're absolutely right that placing the note before that sentence interrupts the flow - I’ll move it and rework the wording a bit.

Thanks so much for your suggestions!

shubhaat · 2025-06-03T22:27:01Z

I think these changes are a good first step, eventually I think we want to hide some of these settings on serverless completely and simplify the UI and choices for customers. Very few users on serverless see the Trained models UI, so my guess is very few read the docs as well.

leemthompo

👍

Adds information about the importance of adaptive allocations

1573b58

kosabogi requested a review from szabosteve May 22, 2025 08:53

kosabogi requested review from a team as code owners May 22, 2025 08:53

Fixes links

b142d84

szabosteve reviewed May 22, 2025

View reviewed changes

explore-analyze/elastic-inference/inference-api.md Outdated Show resolved Hide resolved

explore-analyze/elastic-inference/inference-api.md Outdated Show resolved Hide resolved

explore-analyze/elastic-inference/inference-api.md Outdated Show resolved Hide resolved

kosabogi and others added 2 commits May 22, 2025 13:51

Update explore-analyze/elastic-inference/inference-api.md

df9f37c

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

Applies suggestions

3acb326

kosabogi requested a review from ppf2 May 22, 2025 14:01

Additional information

b5b0340

Applying suggestions

d0fdead

alaudazzi approved these changes May 27, 2025

View reviewed changes

explore-analyze/elastic-inference/inference-api.md Outdated Show resolved Hide resolved

Update explore-analyze/elastic-inference/inference-api.md

05a9a12

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

Modifications based on feedback

4112605

Merge branch 'main' into inference-adaptive-allocations

92116f2

leemthompo reviewed Jun 3, 2025

View reviewed changes

kosabogi requested review from arisonl, leemthompo and shubhaat June 3, 2025 13:23

kosabogi and others added 4 commits June 4, 2025 10:56

Applying suggestions

d4edf6a

Merge branch 'main' into inference-adaptive-allocations

412e86e

Merge branch 'main' into inference-adaptive-allocations

7f64bc2

Merge branch 'main' into inference-adaptive-allocations

f94ce67

leemthompo approved these changes Jun 6, 2025

View reviewed changes

kosabogi merged commit 47acfae into main Jun 6, 2025
6 checks passed

kosabogi deleted the inference-adaptive-allocations branch June 6, 2025 08:00


		For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation.

		::::{note}

Adds information about the importance of adaptive allocations #1454

Adds information about the importance of adaptive allocations #1454

Uh oh!

Conversation

kosabogi commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📸 Preveiw

Description

Related issue: #1393

Uh oh!

szabosteve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kosabogi commented May 22, 2025

Uh oh!

ppf2 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leemthompo commented May 26, 2025

Uh oh!

kosabogi commented May 26, 2025

Uh oh!

alaudazzi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leemthompo commented May 27, 2025

Uh oh!

kosabogi commented Jun 3, 2025

Uh oh!

leemthompo Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

kosabogi Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leemthompo Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

kosabogi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

shubhaat commented Jun 3, 2025

Uh oh!

leemthompo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kosabogi commented May 22, 2025 •

edited

Loading

ppf2 commented May 23, 2025 •

edited

Loading

kosabogi Jun 3, 2025 •

edited

Loading