How to block to auto-scaling in specific situation #15375

jinholee-makinarocks · 2024-07-05T00:59:06Z

Ask your question here:

We use knative and kserve project in our product to provide inference services with auto-scaling. In some cases, we need to pause to the auto-scaling according to the multi-tenant resource handling.
For example,
We have two tenants allocated same amount resources as following:

1st tenant : CPU 10 core, Memory 10Gib
2nd tenant : CPU 10 core, Memory 10Gib

If we should deployed lots of objects, such as InferenceService or Service, 2nd tenant have exhausted your resources.
In this situation, we have to block the auto-scaling in order to prevent the 2nd tenant from consuming the resources of the 1st tenant.

Could you tell me any idea or how to approach to solve it ?

skonto · 2024-07-11T10:59:25Z

Hi @jinholee-makinarocks. I suspect you could map your tenant concept to one or more namespaces and apply quotas per ns to reflect the total amount of resources per tenant. Knative services are running in a ns and scaling happens within a ns, so you could restrict the resources. Are you looking for something else?

jinholee-makinarocks added the kind/question Further information is requested label Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to block to auto-scaling in specific situation #15375

How to block to auto-scaling in specific situation #15375

jinholee-makinarocks commented Jul 5, 2024

skonto commented Jul 11, 2024 •

edited

Loading

How to block to auto-scaling in specific situation #15375

How to block to auto-scaling in specific situation #15375

Comments

jinholee-makinarocks commented Jul 5, 2024

Ask your question here:

skonto commented Jul 11, 2024 • edited Loading

skonto commented Jul 11, 2024 •

edited

Loading