From 0743da87117e3b3d7dbe8a6acafcff65cec84a94 Mon Sep 17 00:00:00 2001 From: Sergey Kanzhelev Date: Tue, 16 Apr 2024 23:44:24 +0000 Subject: [PATCH] WG serving proposal --- OWNERS_ALIASES | 4 ++ liaisons.md | 1 + sig-apps/README.md | 1 + sig-architecture/README.md | 1 + sig-autoscaling/README.md | 1 + sig-instrumentation/README.md | 1 + sig-list.md | 1 + sig-network/README.md | 1 + sig-node/README.md | 1 + sig-scheduling/README.md | 1 + sig-storage/README.md | 1 + sigs.yaml | 44 ++++++++++++++++ wg-serving/README.md | 44 ++++++++++++++++ wg-serving/charter.md | 98 +++++++++++++++++++++++++++++++++++ 14 files changed, 200 insertions(+) create mode 100644 wg-serving/README.md create mode 100644 wg-serving/charter.md diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES index a9e0b17f877..96d1e206cb7 100644 --- a/OWNERS_ALIASES +++ b/OWNERS_ALIASES @@ -142,6 +142,10 @@ aliases: - JimBugwadia - poonam-lamba - sudermanjr + wg-serving-leads: + - ArangoGutierrez + - SergeyKanzhelev + - terrytangyuan wg-structured-logging-leads: - mengjiao-liu - pohly diff --git a/liaisons.md b/liaisons.md index e08a51d1c83..e238f092d40 100644 --- a/liaisons.md +++ b/liaisons.md @@ -60,6 +60,7 @@ members will assume one of the departing members groups. | [WG Device Management](wg-device-management/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) | | [WG LTS](wg-lts/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | | [WG Policy](wg-policy/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) | +| [WG Serving](wg-serving/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) | | [WG Structured Logging](wg-structured-logging/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | | [Committee Code of Conduct](committee-code-of-conduct/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | | [Committee Security Response](committee-security-response/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) | diff --git a/sig-apps/README.md b/sig-apps/README.md index f66544965b3..56c48d97fc5 100644 --- a/sig-apps/README.md +++ b/sig-apps/README.md @@ -59,6 +59,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. The following [working groups][working-group-definition] are sponsored by sig-apps: * [WG Batch](/wg-batch) * [WG Data Protection](/wg-data-protection) +* [WG Serving](/wg-serving) ## Subprojects diff --git a/sig-architecture/README.md b/sig-architecture/README.md index 9c75ccb0ebc..eb8e07eaea6 100644 --- a/sig-architecture/README.md +++ b/sig-architecture/README.md @@ -60,6 +60,7 @@ The following [working groups][working-group-definition] are sponsored by sig-ar * [WG Device Management](/wg-device-management) * [WG LTS](/wg-lts) * [WG Policy](/wg-policy) +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-autoscaling/README.md b/sig-autoscaling/README.md index 9142338330c..06547fbdffe 100644 --- a/sig-autoscaling/README.md +++ b/sig-autoscaling/README.md @@ -48,6 +48,7 @@ The Chairs of the SIG run operations and processes governing the SIG. The following [working groups][working-group-definition] are sponsored by sig-autoscaling: * [WG Batch](/wg-batch) * [WG Device Management](/wg-device-management) +* [WG Serving](/wg-serving) ## Subprojects diff --git a/sig-instrumentation/README.md b/sig-instrumentation/README.md index 9e5b1ffc9a9..f2e24bd46e0 100644 --- a/sig-instrumentation/README.md +++ b/sig-instrumentation/README.md @@ -53,6 +53,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. ## Working Groups The following [working groups][working-group-definition] are sponsored by sig-instrumentation: +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-list.md b/sig-list.md index 7c7de17e7db..01ffe44dc33 100644 --- a/sig-list.md +++ b/sig-list.md @@ -67,6 +67,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md) |[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture
* Autoscaling
* Network
* Node
* Scheduling
|* [John Belamaric](https://github.com/johnbelamaric), Google
* [Kevin Klues](https://github.com/klueska), NVIDIA
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](TBD)
|[LTS](wg-lts/README.md)|[lts](https://github.com/kubernetes/kubernetes/labels/wg%2Flts)|* Architecture
* Cluster Lifecycle
* K8s Infra
* Release
* Security
* Testing
|* [Jeremy Rickard](https://github.com/jeremyrickard), Microsoft
* [Jordan Liggitt](https://github.com/liggitt), Google
* [Micah Hausler](https://github.com/micahhausler), Amazon
|* [Slack](https://kubernetes.slack.com/messages/wg-lts)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-lts)|* Regular WG Meeting: [Tuesdays at 07:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/92480197536?pwd=dmtSMGJRQmNYYTIyZkFlQ25JRngrdz09)
|[Policy](wg-policy/README.md)|[policy](https://github.com/kubernetes/kubernetes/labels/wg%2Fpolicy)|* Architecture
* Auth
* Multicluster
* Network
* Node
* Scheduling
* Storage
|* [Jim Bugwadia](https://github.com/JimBugwadia), Kyverno/Nirmata
* [Poonam Lamba](https://github.com/poonam-lamba), Google
* [Andy Suderman](https://github.com/sudermanjr), Fairwinds
|* [Slack](https://kubernetes.slack.com/messages/wg-policy)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-policy)|* Regular WG Meeting: [Wednesdays at 8:00 PT (Pacific Time) (semimonthly)](https://zoom.us/j/7375677271)
+|[Serving](wg-serving/README.md)|[serving](https://github.com/kubernetes/kubernetes/labels/wg%2Fserving)|* Apps
* Architecture
* Autoscaling
* Instrumentation
* Network
* Node
* Scheduling
* Storage
|* [Eduardo Arango](https://github.com/ArangoGutierrez), NVIDIA
* [Sergey Kanzhelev](https://github.com/SergeyKanzhelev), Google
* [Yuan Tang](https://github.com/terrytangyuan), Red Hat
|* [Slack](https://kubernetes.slack.com/messages/wg-serving)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-serving)|* WG Serving weekly meeting ([Calendar](https://calendar.google.com/calendar/embed?src=e896b769743f3877edfab2d4c6a14132b2aa53287021e9bbf113cab676da54ba%40group.calendar.google.com)): [Wednesdays at 9:00 AM PT (Pacific Time) (weekly)](https://zoom.us/j/93517402529?pwd=RnkwUUQ4L3J2QmNYYlNBcnZGbXcvQT09)
|[Structured Logging](wg-structured-logging/README.md)|[structured-logging](https://github.com/kubernetes/kubernetes/labels/wg%2Fstructured-logging)|* API Machinery
* Architecture
* Cloud Provider
* Instrumentation
* Network
* Node
* Scheduling
* Storage
|* [Mengjiao Liu](https://github.com/mengjiao-liu), DaoCloud
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-structured-logging)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-structured-logging)| ### Committees diff --git a/sig-network/README.md b/sig-network/README.md index 99283aa437a..6d53ba81254 100644 --- a/sig-network/README.md +++ b/sig-network/README.md @@ -74,6 +74,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. The following [working groups][working-group-definition] are sponsored by sig-network: * [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-node/README.md b/sig-node/README.md index 4e0dc89ed6c..be0bf38c1a3 100644 --- a/sig-node/README.md +++ b/sig-node/README.md @@ -55,6 +55,7 @@ The following [working groups][working-group-definition] are sponsored by sig-no * [WG Batch](/wg-batch) * [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-scheduling/README.md b/sig-scheduling/README.md index 3901cf69fb0..9980cd500ec 100644 --- a/sig-scheduling/README.md +++ b/sig-scheduling/README.md @@ -65,6 +65,7 @@ The following [working groups][working-group-definition] are sponsored by sig-sc * [WG Batch](/wg-batch) * [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-storage/README.md b/sig-storage/README.md index 7640d2be84d..042ff5dd4d8 100644 --- a/sig-storage/README.md +++ b/sig-storage/README.md @@ -58,6 +58,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. The following [working groups][working-group-definition] are sponsored by sig-storage: * [WG Data Protection](/wg-data-protection) * [WG Policy](/wg-policy) +* [WG Serving](/wg-serving) * [WG Structured Logging](/wg-structured-logging) diff --git a/sigs.yaml b/sigs.yaml index 2a0844a49eb..490533115e8 100644 --- a/sigs.yaml +++ b/sigs.yaml @@ -3481,6 +3481,50 @@ workinggroups: liaison: github: pohly name: Patrick Ohly +- dir: wg-serving + name: Serving + mission_statement: > + Discuss and enhance the support of inference serving for accelerated workloads + in Kubernetes. Make Kubernetes the natural choice for hosting production inference + reliably, and improve all serving workloads along the way. + + charter_link: charter.md + stakeholder_sigs: + - Apps + - Architecture + - Autoscaling + - Instrumentation + - Network + - Node + - Scheduling + - Storage + label: serving + leadership: + chairs: + - github: ArangoGutierrez + name: Eduardo Arango + company: NVIDIA + - github: SergeyKanzhelev + name: Sergey Kanzhelev + company: Google + - github: terrytangyuan + name: Yuan Tang + company: Red Hat + meetings: + - description: WG Serving weekly meeting ([Calendar](https://calendar.google.com/calendar/embed?src=e896b769743f3877edfab2d4c6a14132b2aa53287021e9bbf113cab676da54ba%40group.calendar.google.com)) + day: Wednesday + time: 9:00 AM + tz: PT (Pacific Time) + frequency: weekly + url: https://zoom.us/j/93517402529?pwd=RnkwUUQ4L3J2QmNYYlNBcnZGbXcvQT09 + archive_url: https://docs.google.com/document/d/1aExJFtaLnO-TM6_2uILgI8NI0IjOm7FcwLABBKEMEo0/edit + recordings_url: https://www.youtube.com/playlist?list=TODO + contact: + slack: wg-serving + mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-serving + liaison: + github: soltysh + name: Maciej Szulik - dir: wg-structured-logging name: Structured Logging mission_statement: > diff --git a/wg-serving/README.md b/wg-serving/README.md new file mode 100644 index 00000000000..35b2db0d568 --- /dev/null +++ b/wg-serving/README.md @@ -0,0 +1,44 @@ + +# Serving Working Group + +Discuss and enhance the support of inference serving for accelerated workloads in Kubernetes. Make Kubernetes the natural choice for hosting production inference reliably, and improve all serving workloads along the way. + +The [charter](charter.md) defines the scope and governance of the Serving Working Group. + +## Stakeholder SIGs +* [SIG Apps](/sig-apps) +* [SIG Architecture](/sig-architecture) +* [SIG Autoscaling](/sig-autoscaling) +* [SIG Instrumentation](/sig-instrumentation) +* [SIG Network](/sig-network) +* [SIG Node](/sig-node) +* [SIG Scheduling](/sig-scheduling) +* [SIG Storage](/sig-storage) + +## Meetings +*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-serving) for the group will typically add invites for the following meetings to your calendar.* +* WG Serving weekly meeting ([Calendar](https://calendar.google.com/calendar/embed?src=e896b769743f3877edfab2d4c6a14132b2aa53287021e9bbf113cab676da54ba%40group.calendar.google.com)): [Wednesdays at 9:00 AM PT (Pacific Time)](https://zoom.us/j/93517402529?pwd=RnkwUUQ4L3J2QmNYYlNBcnZGbXcvQT09) (weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=9:00 AM&tz=PT%20%28Pacific%20Time%29). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1aExJFtaLnO-TM6_2uILgI8NI0IjOm7FcwLABBKEMEo0/edit). + * [Meeting recordings](https://www.youtube.com/playlist?list=TODO). + +## Organizers + +* Eduardo Arango (**[@ArangoGutierrez](https://github.com/ArangoGutierrez)**), NVIDIA +* Sergey Kanzhelev (**[@SergeyKanzhelev](https://github.com/SergeyKanzhelev)**), Google +* Yuan Tang (**[@terrytangyuan](https://github.com/terrytangyuan)**), Red Hat + +## Contact +- Slack: [#wg-serving](https://kubernetes.slack.com/messages/wg-serving) +- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-serving) +- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fserving) +- Steering Committee Liaison: Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) + + + diff --git a/wg-serving/charter.md b/wg-serving/charter.md new file mode 100644 index 00000000000..fe93ebc50b9 --- /dev/null +++ b/wg-serving/charter.md @@ -0,0 +1,98 @@ +# WG Serving Charter + +This charter adheres to the conventions described in the [Kubernetes Charter README] and uses +the Roles and Organization Management outlined in [wg-governance]. + +[Kubernetes Charter README]: /committee-steering/governance/README.md + +## Scope + +Discuss and enhance serving workloads on Kubernetes, specifically focusing on +hardware-accelerated AI/ML inference. The working group will focus on the novel +challenges of compute-intensive online inference. Scenarios solving use cases +involving non-fungible accelerators will be prioritized over solutions against +generic CPU. However, all improvements should, where possible, benefit other +serving workloads like web services or stateful databases, be usable as +primitives by multiple ecosystem projects, and compose well into the workflows +of those deploying models to production. The Working Group Batch has a similar +scope. The difference in scope by a simplified definition is that the Serving WG +will generally concentrate on the workloads where Pods are running with +restartPolicy=Always, while WG Batch will generally be looking at Pods with the +restartPolicy=OnFailure. There are edge cases to this definition, but it creates +an easy enough framework to differentiate the scope of these two Working Groups. + +### In scope + +- Gather requirements for serving workloads (inference primarily, but benefiting + other non-batch use cases where possible) that have broad community alignment + from practitioners, distros, and vendors. Provide concrete input to other SIGs + and WGs around needs for identified requirements. Do it in partnership + with existing ecosystem projects like kServe, Seldon, Kaito, and + others to identify, extract, or implement common shared problems (like Kueue + abstracted deferred scheduling for multiple batch frameworks). +- Specific areas of improvement include: + - Directly improve key kubernetes workload controllers when used with + accelerators and the most common inference serving frameworks and model + servers. + - Explore new projects that improve orchestration, scaling, and load balancing + of inference workloads and compose well with other workloads on Kubernetes + - Being able to run serving workloads safely while giving up + available slack capacity to batch frameworks + +### Out of scope + +- Training and batch inference, which are covered by WG Batch. +- Ability to describe the workflows for serving workloads is out of scope, + Kubernetes will offer building blocks to MLOps platforms to build those. + +## Stakeholders + +Stakeholders in this working group span multiple SIGs that own parts of the +code in core kubernetes components and addons. + +- SIG Apps as a primary SIG +- SIG Architecture +- SIG Node +- SIG Scheduling +- SIG Autoscaling +- SIG Network +- SIG Instrumentation +- SIG Storage + +## Deliverables + +The list of deliverables include the following high level features: + +- To SIG Apps: + - Ability to express the model serving workloads with easy to understand logical + objects with the ability to scale to multi-host +- To SIG Scheduling and Autoscaling + - Faster scaling up and down + - Ability to preempt workloads +- To SIG Node: + - Runtime support for Pods preemption + - Runtime support for devices partitioning + +## Roles and Organization Management + +This WG adheres to the Roles and Organization Management outlined in [wg-governance] +and opts-in to updates and modifications to [wg-governance]. + +[wg-governance]: /committee-steering/governance/wg-governance.md + +Additionally, the WG commits to: + +- maintain a solid communication line between the Kubernetes groups and the wider CNCF community; +- submit a proposal to the KubeCon/CloudNativeCon maintainers track; + +## Timelines and Disbanding + +As a first mandate, the WG will define a roadmap in the first quarter of operation. +We believe there will be a set of features the Working Group can identify and deliver +that will enable the majority of frameworks operate natively on Kubernetes. + +Achieving the aforementioned deliverables, also mentioned in the `In Scope` +section, will allow us to decide when to disband this WG. There is no +expectations that the Working Group will be converted into SIG long term, +however, there is a chance that a separate project or a sizeable sub-component +of SIG Apps can be created as a result of a Working Group.