Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmaps for 1.4 release #1496

Closed
Jeffwan opened this issue Dec 1, 2021 · 2 comments
Closed

Roadmaps for 1.4 release #1496

Jeffwan opened this issue Dec 1, 2021 · 2 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Dec 1, 2021

Training Operator 1.3 operator has been released for few months. It's time to build wishlist for the next release. Community is collecting roadmaps for kubeflow 1.5 release as well (Jan or Feb?). kubeflow/community#535

I think for the next release, we can put more time to build a decent elastic training story.

  • @gaocegege is working on PyTorch parts and the large PR has been merged.
  • On the other hand, mpi-operator v1 has been integrated into training-operator and we can enrich the elastic work (expose arbitrary worker to scale in instead of just operating the numbers) based on what @zw0610 did in the past.
  • If we do have additional time, we can revisit tensor flow elastic training story.

There's some more meaningful tasks like GenericJob which can support flexible framework is nice to have. Supporting different gang definition maybe something worth to explore as well. Feel free to brainstorm the ideas and we can summarize a roadmap and then recruit contributors and release managers.

/cc @kubeflow/wg-training-leads

@terrytangyuan
Copy link
Member

terrytangyuan commented Dec 1, 2021

Notes from today's community meeting:

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants