Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Kubeflow v1.2 features #5178

Closed
Jeffwan opened this issue Jul 31, 2020 · 7 comments
Closed

[AWS] Kubeflow v1.2 features #5178

Jeffwan opened this issue Jul 31, 2020 · 7 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Jul 31, 2020

/kind question

Question:
I'd like to collect some feedbacks on 1.2 features user wants on AWS side.

We recently roll out a blog post to share the existing AWS works on Kubeflow.
https://aws.amazon.com/blogs/opensource/enterprise-ready-kubeflow-securing-and-scaling-ai-and-machine-learning-pipelines-with-aws/

This issue is just used to track AWS specific features like service integration, etc. We will definitely contribute to generic features and projects, but it's better to have separate topic to discuss them.

Kubeflow

  • Consider if we still want to use kfctl to manage non-k8s resources.
    • Evaluate kfctl - operator
    • Evaluate CDK
  • Keep investigating on multi-user in Kubeflow Pipeline (Authorization, Manifest and Artifacts)
  • Tekton based Testing Infrastructure
  • [kfctl] Support addons in AWSPlugin - User want to install Fluentd, Nvidia, FSX, EFS, etc by their own
  • Better GPU monitoring (Metrics) in Jupyter Notebook and Kubeflow Pipeline
  • Better Logs support in Kubeflow Pipeline and training operators
  • Consider to create one single training operators to simplify the installation - work with WG training.
  • Support Manifest upgrade (Need to identify application and stable components first)
  • Keep improving v3 manifests and organize them into stacks and remove low quality applications
  • Jupyter Infra improvement plan
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
platform/aws 0.90

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.59

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@apryiomka
Copy link

apryiomka commented Aug 18, 2020

My recommendation:

  1. dropping kfctl completely and make manifest to be deployed by kustomize only.
  2. investing in solid metadata service that is shared by notebooks, katib and KFP
  3. integrating katib with metadata and KFP over using its own stand alone DB
  4. improving kfp SDK to be more applied scientist firendly
  5. allowing kustomization to the notebooks UI, removing / disabling some k8s components like persistent volumes, etc that are not used by model developers
  6. multi-cluster deployment - borrowed from Flyte
  7. Improving KFP rendering UI to scale to relatively large workflows, say 10s or hundreds of parallel sub graphs

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/kfctl 0.65

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jtfogarty jtfogarty moved this from To Do to Assigned to Area Owner For Triage in Needs Triage Oct 7, 2020
@kubeflow-bot kubeflow-bot removed this from Assigned to Area Owner For Triage in Needs Triage Oct 7, 2020
@Jeffwan
Copy link
Member Author

Jeffwan commented Oct 8, 2020

/assign @PatrickXYS

@stale
Copy link

stale bot commented Jan 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Jan 9, 2021
@stale stale bot closed this as completed Jan 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants