You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use knative and kserve project in our product to provide inference services with auto-scaling. In some cases, we need to pause to the auto-scaling according to the multi-tenant resource handling.
For example,
We have two tenants allocated same amount resources as following:
1st tenant : CPU 10 core, Memory 10Gib
2nd tenant : CPU 10 core, Memory 10Gib
If we should deployed lots of objects, such as InferenceService or Service, 2nd tenant have exhausted your resources.
In this situation, we have to block the auto-scaling in order to prevent the 2nd tenant from consuming the resources of the 1st tenant.
Could you tell me any idea or how to approach to solve it ?
The text was updated successfully, but these errors were encountered:
Hi @jinholee-makinarocks. I suspect you could map your tenant concept to one or more namespaces and apply quotas per ns to reflect the total amount of resources per tenant. Knative services are running in a ns and scaling happens within a ns, so you could restrict the resources. Are you looking for something else?
Ask your question here:
We use knative and kserve project in our product to provide inference services with auto-scaling. In some cases, we need to pause to the auto-scaling according to the multi-tenant resource handling.
For example,
We have two tenants allocated same amount resources as following:
If we should deployed lots of objects, such as InferenceService or Service, 2nd tenant have exhausted your resources.
In this situation, we have to block the auto-scaling in order to prevent the 2nd tenant from consuming the resources of the 1st tenant.
Could you tell me any idea or how to approach to solve it ?
The text was updated successfully, but these errors were encountered: