Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kserve #1603

Open
3 tasks
tenzen-y opened this issue Jan 17, 2024 · 12 comments
Open
3 tasks

Support Kserve #1603

tenzen-y opened this issue Jan 17, 2024 · 12 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tenzen-y
Copy link
Member

What would you like to be added:
I would like to support the serverless ML Inference tool, Kserve.

Why is this needed:
In the hybrid workload (which means training jobs and inference servers and so on) cluster, users often want to manage all cluster capacities by the kueue's flavorQuotas. So, as the first step to support the inference server, supporting Kserve in kueue is nice to have.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.
We will probably implement suspend semantics on the Kserve side.
Additionally, we need to move #77 forward together to support the inference server's autoscaling semantics.

@tenzen-y tenzen-y added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 17, 2024
@alculquicondor
Copy link
Contributor

I think I talked about this with @astefanutti
Also cc @mwielgus @ahg-g

@ahg-g
Copy link
Contributor

ahg-g commented Jan 24, 2024

How do you envision that working? Can you list a couple of CUJs?

@kerthcet
Copy link
Contributor

kerthcet commented Mar 6, 2024

Online inference service is somehow latency sensitive, scalability is highly required, reclaim/preempt the kserve managed services looks not right. I guess Kserve is not that good at offline inference, which in my mind maybe helpful. cc @terrytangyuan

@lizzzcai
Copy link
Contributor

lizzzcai commented Mar 6, 2024

I would like to see possible support for this as I am looking for a unified way of managing resources for both model training and serving and Kueue looks like it has this capability. In our case, both training and serving are running in the same cluster. And how it can integrate with the recent MultiKueue feature to schedule workload to clusters with available GPU (sometimes there is a shortage of GPU in certain regions). As KServe deployment has min and max replicas, it should be scheduled to cluster that can meet the max replicas.

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 6, 2024

How do you envision that working? Can you list a couple of CUJs?

I imagined that the similar approach as RayCluster.

So, I would like to add Suspend field to InferenceService resource.

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 6, 2024

Online inference service is somehow latency sensitive, scalability is highly required, reclaim/preempt the kserve managed services looks not right. I guess Kserve is not that good at offline inference, which in my mind maybe helpful. cc @terrytangyuan

@kerthcet I believe that lending limit would allow us to guarantee capacities for latency sensitive Workloads.

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 6, 2024

I would like to see possible support for this as I am looking for a unified way of managing resources for both model training and serving and Kueue looks like it has this capability. In our case, both training and serving are running in the same cluster. And how it can integrate with the recent MultiKueue feature to schedule workload to clusters with available GPU (sometimes there is a shortage of GPU in certain regions). As KServe deployment has min and max replicas, it should be scheduled to cluster that can meet the max replicas.

Yes, that's right. Actually, I also deploy Job and Inference Server into a single cluster.

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 6, 2024

Let me try to design this integrations.

/assign

@terrytangyuan
Copy link
Member

Thanks! Great to see this. Looking forward to your proposal. @tenzen-y

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 6, 2024

Thanks! Great to see this. Looking forward to your proposal. @tenzen-y

I will create a dedicated issue later in Kserve side as well.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2024
@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 5, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants