Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release 1.2] Feature Planning / Roadmap #5224

Closed
swiftdiaries opened this issue Aug 18, 2020 · 25 comments
Closed

[Release 1.2] Feature Planning / Roadmap #5224

swiftdiaries opened this issue Aug 18, 2020 · 25 comments
Labels

Comments

@swiftdiaries
Copy link
Member

/kind question

Question:
Hey all,
This is a tracking issue for feature planing and release roadmap for Kubeflow 1.2.
This will serve as a common issue to track features going into 1.2 and touch base with the application owners / WG owners.

It'd be great if they could post a list of things they're planning around 1.2.
/cc
Application WGs:

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Aug 18, 2020
@swiftdiaries swiftdiaries moved this from To Do to High priority in Needs Triage Aug 18, 2020
@swiftdiaries swiftdiaries pinned this issue Aug 18, 2020
@swiftdiaries swiftdiaries added this to To do in Kubeflow 1.2 via automation Aug 18, 2020
@swiftdiaries
Copy link
Member Author

KF Serving: https://kubeflow.slack.com/archives/CH6E58LNP/p1597776731267800
Latest release was KFS v0.4
This will probably be the release that goes in KF v1.2

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.59

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@swiftdiaries swiftdiaries added the area/release Release related work items label Aug 18, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Aug 20, 2020

For KF Pipelines, we don't have headline features planned for the upcoming 3 months.
KFP IR is our focus right now, and it will take more than 3 month to land.

@andreyvelich
Copy link
Member

Regarding to AutoML main feature will be switching to Katib v1beta1 version from v1alpha3.
In addition, we want to bring these features/enhancement:

  1. Support new Trial templates (ref: Support new Trial Template in v1beta1 katib#1208).
  2. Extracting metrics in multiply ways (ref: extracting metric value in multiple ways katib#1140).
  3. Resume Experiment from volume (ref: Save Suggestion state in persistent volume katib#1250).
  4. Enable to use custom CRDs as Trial template (ref: [Feature] Modify Job provider to support any kind of Kubernetes CRDs katib#1214).
  5. Early stopping (ref: support early stop feature katib#692).
  6. Test infrastructure design for AutoML WG (ref:A more sustainable approach to owning and maintaining test/release infrastructure testing#737, Insufficient regional quota to satisfy request and katib job is blocked. testing#749).
  7. Unit/e2e test improvements.
  8. UI enhancement and bug fixes.

/cc @gaocegege @johnugeorge

For Training WG I think one of the main feature will be migrating controller code to use kubeflow/common repository.
TF opeartor: kubeflow/training-operator#1171.
PyTorch operator: kubeflow/pytorch-operator#282.

@Jeffwan @terrytangyuan @ChanYiLin What else do we want to bring with Training WG for Kubeflow 1.2 ?

@jlewi
Copy link
Contributor

jlewi commented Sep 1, 2020

@swiftdiaries What is the timeline for 1.2?

@swiftdiaries
Copy link
Member Author

Mid-November was the target I was shooting for so mid-October is when I hope to start the process. I want to do a minor release in 2 weeks to get the state of things and document stuff

@Jeffwan
Copy link
Member

Jeffwan commented Sep 1, 2020

Training storied for 1.2 and this year has been documented in roadmap doc here. I think WG-Training will try to finish as more as we can. Some of them might be postposed to 1.3 release. I think we can update the progress later.

/cc @terrytangyuan @gaocegege @johnugeorge @ChanYiLin Feel free to add more stories.

Maintenance and reliability

  • Enhance maintainability of operator common module. Related issue: #54.
  • Migrate operators to use kubeflow/common APIs. Related issue: #64.

Features

  • Support Pytorch elastic in pytorch-operator Related issue: pytorch-operator#296
  • Surface Pod and other Errors that Prevent TFJob from starting tf-operator#1131
  • Support dynamic volume provisioning for distributed training jobs. Related issue: #19.
  • MLOps - Allow user to submit jobs using Git repo without building container images. Related issue: #66.
  • Add Job priority and Queue in SchedulingPolicy for advanced scheduling in common operator. Related issue: #46.
  • Add pipeline launcher components for different training jobs. Related issue: pipeline#3445.

Monitoring

  • Provides a standardized logging interface. Related issue: #60.
  • Expose generic prometheus metrics in common operators. Related issue: #22.
  • Centralized Job Dashboard for training jobs (Add metadata graph, model artifacts later). Related issue: #67.

Performance

@jlewi
Copy link
Contributor

jlewi commented Oct 6, 2020

@swiftdiaries Could you provide an update on the release? Are the respective WGs on track? It would be good to have a standing item on the agenda to discuss the release.

@jlewi
Copy link
Contributor

jlewi commented Oct 13, 2020

Any update on the release?

/cc @kubeflow/wg-automl-leads @kubeflow/wg-serving-leads @kubeflow/wg-training-leads

@jlewi
Copy link
Contributor

jlewi commented Oct 15, 2020

@kimwnasptd @thesuperzapper any update for notebooks?

@Jeffwan @animeshsingh @yanniszark @swiftdiaries @vpavlin @yanniszark update for deployments/kfctl/manifests?

@Jeffwan
Copy link
Member

Jeffwan commented Oct 15, 2020

@jlewi I will give some updates on training side later next week.

@jlewi
Copy link
Contributor

jlewi commented Oct 16, 2020

Prospective deployment leads(@Jeffwan @animeshsingh @yanniszark @swiftdiaries @vpavlin @yanniszark ) releases are arguably defined by the branch cut of kubeflow/manifests.

Per kubeflow/community#402 this repo will be owned by you. Do we have a date for a branch cut? The initial suggested date was mid october. We are arguably past October.

Does someone want to volunteer to take the lead on cutting the branch and coordinating with the wgs to get their manifests up dated.

@Jeffwan
Copy link
Member

Jeffwan commented Oct 16, 2020

@jlewi Let me help on 1.2. I can pick up the work from @swiftdiaries and coordinate with other WG leads.

My concern is we don't have enough work to rollout in 1.2 and if we still want to have 3 months release period.

@andreyvelich
Copy link
Member

Any update on the release?

/cc @kubeflow/wg-automl-leads @kubeflow/wg-serving-leads @kubeflow/wg-training-leads

From AutoML WG this is the track issue: kubeflow/katib#1360.

@thesuperzapper
Copy link
Member

Depending on the cutoff date for Kubeflow 1.2, we would really like to get the new crud-web-app implemented to replace the old jupyter-web-app, the first PR is #5332, but there is more to come before it will be ready for prime time.

If the cutoff is really soon, we might use some variation of PR #5280 to ensure we get some features released.

Comments @kimwnasptd?

@jlewi
Copy link
Contributor

jlewi commented Oct 19, 2020

@thesuperzapper The cutoff date was mid-october so it is arguably past. I would encourage all application owners to ensure their manifests are as up to date as possible to avoid missing the release.

@thesuperzapper
Copy link
Member

thesuperzapper commented Oct 19, 2020

@jlewi, while I agree that we missed the boat (assuming a mid October cutoff) this raises some questions about Kubeflow 1.2

What major changes are being implemented?

I agree with @Jeffwan, that we might consider changing how we manage releases to a voting system, to make sure there is sufficient differentiation between releases.

This is especially important because our release process is a bit arduous and long.

EDIT: also had a other thought; given how much work is involved for users of kubeflow to install/update, I think there is some merit in waiting for the deployment working group to resolve the issues with the manifests repo before releasing any new versions

@Jeffwan
Copy link
Member

Jeffwan commented Oct 26, 2020

I sent out email about 1.2 release. We still target for Nov 16 and plan to cut release around Nov 7. I don't think we will have that many features to roll out as before. Stability is the first priority in this version. I will pull a list from engineer point of view and coordinate with PMs on the features list/improvement in 1.2.

WG Owners, please help make sure code can be complete asap and let's at least build a stable manifest into 1.2. for kubeflow manifest testing, I need help from different cloud provider or vendors to sign off the release. @PatrickXYS @animeshsingh @yanniszark @vpavlin @Bobgy @adrian555

/cc @kubeflow/wg-automl-leads
/cc @kubeflow/wg-pipeline-leads
/cc @kubeflow/wg-serving-leads
/cc @kubeflow/wg-training-leads
/cc @kubeflow/wg-notebook-leads

@PatrickXYS
Copy link
Member

Thanks for driving 1.2 @Jeffwan

I'll help validate release from AWS side as well

@Bobgy
Copy link
Contributor

Bobgy commented Oct 27, 2020

Pipelines side will update manifests to include KFP 1.0.4.
I'll own that.

@nakfour
Copy link
Member

nakfour commented Oct 29, 2020

@Jeffwan can we add kubeflow/manifests#1567 ?

@thesuperzapper
Copy link
Member

@Jeffwan I thought we agreed that the release would be cut on the 15th, not the 7th

@Jeffwan
Copy link
Member

Jeffwan commented Oct 30, 2020

@theadactyl em. I thought that's the official release date. Do we have some large feature not done yet and want to bring back to 1.2? I synced with AutoML, Pipeline, Serving, they already cut the Katib, KFP, KFServing release for v1.2. Not sure the new UI on notebook side? Any estimation?

@Jeffwan
Copy link
Member

Jeffwan commented Nov 18, 2020

1.2 Feature Planning / Roadmap are done. Let's track release progress in #5371

@Jeffwan Jeffwan closed this as completed Nov 18, 2020
Needs Triage automation moved this from High priority to Closed Nov 18, 2020
Kubeflow 1.2 automation moved this from To do to Done Nov 18, 2020
@kubeflow-bot kubeflow-bot removed this from Closed in Needs Triage Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Kubeflow 1.2
  
Done
Development

No branches or pull requests

9 participants