Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Katib ROADMAP 2022/2023 #2153

Merged
merged 4 commits into from
Aug 24, 2023
Merged

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented May 10, 2023

I created initial draft for Katib ROADMAP 2022/2023.
I tried to consolidated features that we implemented in 2022, since we didn't update the ROADMAP in 2022 (if I missed some significant features that we've done, please let me know).

Please let me know if you have better ideas how to name headers.

Let's discuss the items during next month and finalise the ROADMAP.

/cc @johnugeorge @gaocegege @tenzen-y @anencore94 @kimwnasptd @elenzio9 @orfeas-k @apo-ger @d-gol @shaowei-su @zhixian82 @fischor @keisuke-umezawa @jbottum @DnPlas @juliusvonkohout

cc @kubeflow/wg-training-leads

@google-oss-prow
Copy link

@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: d-gol, shaowei-su, fischor, keisuke-umezawa, zhixian82.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

I created initial draft for Katib ROADMAP 2022/2023.
I tried to consolidated features that we implemented in 2022, since we didn't update the ROADMAP in 2022.

Please let me know if you have better ideas how to name headers.

Let's discuss the items during next month and finalise the ROADMAP.

/cc @johnugeorge @gaocegege @tenzen-y @anencore94 @kimwnasptd @elenzio9 @orfeas-k @apo-ger @d-gol @shaowei-su @zhixian82 @fischor @keisuke-umezawa @jbottum @DnPlas @juliusvonkohout

cc @kubeflow/wg-training-leads

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andreyvelich
Copy link
Member Author

/hold for review

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for compiling our roadmap!

Also, I'm thinking of creating a separate CRD for the Nas might be better. Since currently Experiment CRD seems to be a bit complex.

- Support advance HyperParameter tuning algorithms:

- Population Based Training (PBT) - [#1382](https://github.com/kubeflow/katib/issues/1382)
- Tree of Parzen Estimators (TPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right @tenzen-y.
Since this ROADMAP for 2022/2023, I tried to outline features that we already implemented in 2022 there, so users will have overview of what we do in 2022-2023.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.
That's great! Thanks for clarifying!

@andreyvelich
Copy link
Member Author

Also, I'm thinking of creating a separate CRD for the Nas might be better. Since currently Experiment CRD seems to be a bit complex.

@tenzen-y Yes, we were discussing about this before, how to separate various AutoML features from the Kubernetes CRD perspective:

  1. HyperParameter tuning
  2. NAS
  3. Auto Feature Engineering
  4. Auto Model Compression

Then, it will be easier to maintain APIs for every aspect of AutoML.
I think, we need to decide on APIs structure before moving Katib Component to v1 version.

@tenzen-y
Copy link
Member

Also, I'm thinking of creating a separate CRD for the Nas might be better. Since currently Experiment CRD seems to be a bit complex.

@tenzen-y Yes, we were discussing about this before, how to separate various AutoML features from the Kubernetes CRD perspective:

  1. HyperParameter tuning
  2. NAS
  3. Auto Feature Engineering
  4. Auto Model Compression

Then, it will be easier to maintain APIs for every aspect of AutoML. I think, we need to decide on APIs structure before moving Katib Component to v1 version.

Oh, I see. Thank you for letting me know :) It would be worth creating an issue for discussion.

@johnugeorge
Copy link
Member

Also, I'm thinking of creating a separate CRD for the Nas might be better. Since currently Experiment CRD seems to be a bit complex.

@tenzen-y Yes, we were discussing about this before, how to separate various AutoML features from the Kubernetes CRD perspective:

  1. HyperParameter tuning
  2. NAS
  3. Auto Feature Engineering
  4. Auto Model Compression

Then, it will be easier to maintain APIs for every aspect of AutoML. I think, we need to decide on APIs structure before moving Katib Component to v1 version.

I feel, we need to make API changes in this release if we are planning to do so.
Example: Can "Experiment" resource always refer to HP tuning? Should we rename the API resource? One other option is to add explicity resource names to other AutoML features

@tenzen-y
Copy link
Member

I feel, we need to make API changes in this release if we are planning to do so.

If we can collect contributors to do that, we can do that in this release. However, changing the API name is so significant work...

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@gaocegege
Copy link
Member

Should we merge this?

@andreyvelich
Copy link
Member Author

Sure, I removed Katib CRDs naming (Experiment, Suggestion, Trial) from the ROADMAP, since we need to discuss what naming we are going to have in our V1 APIs version.
We discussed this initially during our latest community call.
I will create an issue for this soon.

If other items look good, we can merge it.
/assign @gaocegege @tenzen-y @johnugeorge

@andreyvelich andreyvelich changed the title [WIP] Add Katib ROADMAP 2022/2023 Add Katib ROADMAP 2022/2023 Aug 24, 2023
@gaocegege
Copy link
Member

LGTM!

@tenzen-y
Copy link
Member

Thanks Andrey!
/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Aug 24, 2023
@andreyvelich
Copy link
Member Author

/hold cancel

Comment on lines 42 to 43
# network-plugin: cni
# cni: flannel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set up CNI due to kubernetes/minikube#16143.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me add it back.
Not sure why we are getting the following error in our CI:

chmod: cannot access '/etc/cni/net.d': No such file or directory
Error: The process '/usr/bin/sudo' failed with exit code 1

Any ideas ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the error occurs. How about upgrading the Ubuntu version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try that, 23.04

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright 22.04 work, let me submit separate PR to update the ubuntu version for our E2Es.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🎉

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot removed the lgtm label Aug 24, 2023
@andreyvelich
Copy link
Member Author

/hold

@andreyvelich
Copy link
Member Author

/hold cancel

@tenzen-y
Copy link
Member

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Aug 24, 2023
@google-oss-prow google-oss-prow bot merged commit e3e0aa2 into kubeflow:master Aug 24, 2023
59 checks passed
@andreyvelich andreyvelich deleted the roadmap-2023 branch August 24, 2023 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants