Skip to content

Conversation

@tarilabs
Copy link
Member

I'm following up on action item: raise WG proposal to Kubeflow per yesterday's Model Registry meeting (recording timestamp).

As discussed in KF community meeting.

Main links:

👉 I'm starting to raise a draft PR in order to "seed/bootstrap" the work in raising the request to form the WG--using a draft PR give us a branch we can collaborate on between stakeholders @andreyvelich @Tomcli @dhirajsb @rimolive

This also give us a medium we can keeps-tab-on so to report back on progress during Tuesdays' community plenary meetings, wdyt?

@thesuperzapper
Copy link
Member

I am very strongly opposed to using the name WG-Lifecycle, because that implies that the working group is related to the lifecycle of Kubeflow itself.

My proposal for the name is: WG-Data

Where "data" can mean both actual data (spark) and metadata (model registry). We can also split it up in the future, if the members who are maintaining these components diverge.

@tarilabs
Copy link
Member Author

My proposal for the name is: WG-Data

very well noted @thesuperzapper , as also marked here:
https://github.com/kubeflow/community/pull/673/files#diff-11b55409b3d27f083915bd4b910672caaf0e9550cf34d77fe76e8b6b9515023dR524

I just wanted to have a branch where to start collecting this kind of feedback in a non-sparse way and also to report back to you and the group on the progress on Tuesday meetings.

@dhirajsb
Copy link

dhirajsb commented Dec 14, 2023

@thesuperzapper how about we make it more explicit WG ML Model Data?

@thesuperzapper
Copy link
Member

As it currently stands, this WG does not meet the requirement for diverse leadership given all chairs come from one company (IBM - which owns RedHat).

@dhirajsb
Copy link

@thesuperzapper Andrey is listed as a Chair, he's from Apple

@tarilabs
Copy link
Member Author

noticing only now it was not marked as Draft PR despite being my intent:

using a draft PR give us a branch we can collaborate on

my sincerest apologies.

Marked as Draft PR per original message in thead.

@rimolive
Copy link
Member

@thesuperzapper Is there a minimum number of companies to compose the chair to make the WG eligible?

@thesuperzapper
Copy link
Member

While there is no specific number requirement, the steering comity must approve the new WG (currently, @jbottum @james-jwu) in line with the community's interests. I would expect at least some concern with having 4 leads from one company and only 1 from another.

For reference, here is the lifecycle and other info about forming a working group:

Also, there are only meant to be 2-3 chairs, some other WGs have more, but in most cases, there are 2 active members and we just need to formally clean up the inactive chairs.

@thesuperzapper
Copy link
Member

Also, some of the proposed chairs are not even current Kubeflow org members, so are ineligible unless they go through that process first:

@rimolive
Copy link
Member

Thank you for the references! Those are valid points though, and I'll see how we can work on the eligibility topic as well as your concerns.

@tarilabs
Copy link
Member Author

As Ricardo noted, thanks !

Is there guidance for deputies to keep work WG ongoing during leaves, please?
The reason >3 is I was going through this point earlier today and seeing other WGs have >3 I assumed it was for that semantic.

As noted, will work out to account all the feedback received; thank you those are very helpful

@andreyvelich
Copy link
Member

Thank you for starting this @tarilabs! Let's collaborate together on this PR for the WG Charter and Name.

Please provide your suggestion on how we should name this WG that initially will have Spark Operator and Model Registry component.

A few initial suggestions if WG Lifecycle is too ambitious:

  • WG Data
  • WG ML Data
  • WG ML Lifecycle

I would expect at least some concern with having 4 leads from one company and only 1 from another.

This is valid concern @thesuperzapper. We can add folks from Spark Operator maintainers to this WG
cc @mwielgus @vara-bonthu @yuchaoran2011

@andreyvelich
Copy link
Member

cc @kubeflow/wg-training-leads
@kubeflow/wg-pipeline-leads
@kubeflow/wg-deployment-leads
@kubeflow/wg-notebooks-leads
@kubeflow/wg-manifests-leads

@bigsur0
Copy link

bigsur0 commented Dec 15, 2023

I would request "WG ML Lifecycle" if the purpose of the group is to house things in the MLOps orbit that don't have a more specific working group yet so they can "incubate". Data Preparation, Feature Store, and Model Registry being 3 examples that have been recently discussed that likely aren't big enough yet to have their own working group. I guess one key aspect here is to consider how new efforts can happen without the overhead of setting-up a new working group for each one until it is truly merited and bandwidth is available.

Is there a process that exists for refactoring a topic out of one working group to a new working group?

@jbottum
Copy link
Contributor

jbottum commented Dec 18, 2023

Kubeflow seems to be entering a new growth phase. The community needs a structure to support add-on components (Spark, Ray, Model Registry, Feature Store, etc). We want to encourage contributors and users to meet, discuss, experiment, decide, store code and produce documentation with a goal that integrations will help both Kubeflow and the add-on projects. We need to minimize overhead. We need to set expectations (of support...to/from Kubeflow and for users) especially if we are experimenting and trying to find market acceptance. Most importantly, we need active user participation, comment and leadership. I want to move this forward...I am a +1 to adding a single umbrella WG for all of these projects to get things moving. @james-jwu would you please provide your thoughts

@thesuperzapper
Copy link
Member

I think that the name WG Data will happily encompass the various categories proposed:

  • distributed processing (spark, Ray, etc.)
  • model registry (unnamed redhat proposal)
  • feature store (potentially feast)

Also, WG Data follows the convention of being a single word, like all other working group names.

I am still very against WG Lifecycle, at best it's like calling it WG Other because the whole point of Kubeflow is to map across the MLOps lifecycle, so it's just confusing.


Separately to the discussion around names, I think we should confirm that the maintainers of these various components are actually overlapping, otherwise it will make it difficult for this "mega working group" to function.

@vara-bonthu
Copy link
Contributor

+1 to @thesuperzapper

I would suggest voting for WG Data, as it seems most appropriate for the Spark Operator. This is because it is primarily used for data processing, both batch and streaming, as well as some ML processing.

@tarilabs
Copy link
Member Author

tarilabs commented Dec 19, 2023

New commit ae188fe incorporates some feedback received around:

  • put even more prominent name is provisional. Noted more recent feedback here and here seems will eventually converge into WG Data but while still draft is a chance to account for all proposals like here
  • reflected name provisional in PR title
  • reworked designated chairs

will keep posted during KF Community meeting on any further updates.

@tarilabs tarilabs changed the title WG Lifecycle proposal WG Data(name provisional) proposal Dec 19, 2023
@thesuperzapper
Copy link
Member

Just so we are clear, I think WG Data should be the name, not WG ML Data as the PR currently stands.

quote:

Given current discussions with Feast, I recommend we start the Data WG without them and update the WG later.

ref:

kubeflow#673 (comment)

This reverts commit fa3c318.

Signed-off-by: tarilabs <matteo.mortari@gmail.com>
@tarilabs
Copy link
Member Author

Given current discussions with Feast, I recommend we start the Data WG without them and update the WG later.

this is one of those times I'm glad we kept tidy in this PR history commits, done with 77d772d

@andreyvelich
Copy link
Member

@tarilabs Please can you rebase your PR, so we can merge it ?

@tarilabs
Copy link
Member Author

@tarilabs Please can you rebase your PR, so we can merge it ?

yes I'm on it 🙏

@tarilabs
Copy link
Member Author

one moment sorry
/hold

resolve conflict on OWNER_ALIASES from upstream/master

Signed-off-by: tarilabs <matteo.mortari@gmail.com>
@tarilabs
Copy link
Member Author

/remove-hold

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tarilabs!
Just small updates.

name: Matteo Mortari
company: Red Hat
meetings:
- description: KF Model Registry community meeting (US/EMEA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add the Spark Operator call please ?
cc @vara-bonthu @ChenYi015 @nabuskey

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmk if the 89ddd0a satisfies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarilabs can you run make generate to re-generate the readme ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's been a while I didn't look into this PR, thanks for the reminder 🙏 👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run make generate to re-generate the readme

done with 8f3f780

tarilabs and others added 5 commits August 20, 2025 15:40
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally we can move forward with this PR 🎉
Thanks everyone!
/lgtm
/approve
/hold
@tarilabs Feel to un-hold, please also create PR in the Kubeflow website to update WG list: https://www.kubeflow.org/docs/about/community/#kubeflow-working-groups

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tarilabs

This comment was marked as duplicate.

@tarilabs

This comment was marked as duplicate.

@tarilabs
Copy link
Member Author

thank you @andreyvelich for the confirmation!
will unhold and create the PR

/remove-hold
ref: #673 (review)

@google-oss-prow google-oss-prow bot merged commit 7c3f61a into kubeflow:master Aug 20, 2025
2 checks passed
tarilabs added a commit to tarilabs/kubeflow-website that referenced this pull request Aug 20, 2025
followups to kubeflow/community#673 (review)

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
google-oss-prow bot pushed a commit to kubeflow/website that referenced this pull request Aug 20, 2025
followups to kubeflow/community#673 (review)

Signed-off-by: Matteo Mortari <matteo.mortari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.