Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Improve the UX for the katib-config #2150

Closed
Tracked by #2156
tenzen-y opened this issue Apr 24, 2023 · 29 comments · Fixed by #2176
Closed
Tracked by #2156

[Proposal] Improve the UX for the katib-config #2150

tenzen-y opened this issue Apr 24, 2023 · 29 comments · Fixed by #2176

Comments

@tenzen-y
Copy link
Member

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Currently, YAML (ConfigMap) and JSON (Options) are mixed in katib-config.
IMO, this situation does not have good UX.

So I'd like to enable setting parameters using only YAML.

My proposal is two steps:

  1. Make available to set the parameters as YAML like the following:
apiVersion: config.katib.kubeflow.org/v1beta1
kind: KatibConfig
metrics-collector-sidecar:
- name: StdOut
  image: "docker.io/kubeflowkatib/file-metrics-collector:latest"
suggestion:
- name: random
  image: "docker.io/kubeflowkatib/suggestion-hyperopt:latest"
  resources:
    limits:
      memory: 200Mi
early-stopping:
- name: medianstop
  image: "docker.io/kubeflowkatib/earlystopping-medianstop:latest"
...
  1. Generate a ConfigMap using the katib-config as a YAML defined in the first step:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configMapGenerator:
- files:
  - katib-config.yaml
  name: katib-config
...

This way is similar to the component-config of kubebuilder and is a popular way, although we don't depend on the component-config API in the katib-config.

Note: The config.katib.kubeflow.org/v1beta1 KatibConfig isn't a CustomResource, just a YAML. So the maintenance costs wouldn't increase.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]


Love this feature? Give it a 👍 We prioritize the features with the most 👍

@tenzen-y
Copy link
Member Author

@andreyvelich
Copy link
Member

I think, it is very interesting idea.

But, users will lose ability to modify Katib Config at runtime (as we mentioned in this doc: https://www.kubeflow.org/docs/components/katib/katib-config/), right ?
The proposed config will be upload to the Katib Controller during startup, and we can't make changes after it.

Which means if users implement custom Suggestion and want to use it with Katib, they have to modify config and restart the controller.

Do we have any workaround for it ?

@gaocegege
Copy link
Member

Thanks for raising this!

Personally, prefer keeping a configmap. But I think we could refine the config structure.

@tenzen-y
Copy link
Member Author

tenzen-y commented Apr 25, 2023

I think, it is very interesting idea.

But, users will lose ability to modify Katib Config at runtime (as we mentioned in this doc: ?https://www.kubeflow.org/docs/components/katib/katib-config/), right ?
The proposed config will be upload to the Katib Controller during startup, and we can't make changes after it.

Which means if users implement custom Suggestion and want to use it with Katib, they have to modify config and restart the controller.

Do we have any workaround for it ?

Thank you for your feedback! @andreyvelich

I think that users can modify the katib-config at runtime since this proposal only contains changing the katib-config schema.
So the katib-controller reads the katib-config using the same logic as the current one.

@tenzen-y
Copy link
Member Author

Thanks for raising this!

Personally, prefer keeping a configmap. But I think we could refine the config structure.

Thanks for your suggestions! @gaocegege

We keep using the configmap after my proposal is implemented.
This means that we can get the below a configmap once run with kustomize build:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: katib-config
  namespace: kubeflow
data:
  katib-config.yaml: |
    apiVersion: config.katib.kubeflow.org/v1beta1
    kind: KatibConfig
    metrics-collector-sidecar:
    - name: StdOut
      image: "docker.io/kubeflowkatib/file-metrics-collector:latest"
    suggestion:
    - name: random
      image: "docker.io/kubeflowkatib/suggestion-hyperopt:latest"
      resources:
        limits:
          memory: 200Mi
    early-stopping:
    - name: medianstop
      image: "docker.io/kubeflowkatib/earlystopping-medianstop:latest"
    ...

@andreyvelich
Copy link
Member

Thanks for the clarification @tenzen-y, that might work.
Then, we are going to parse that ConfigMap during runtime to get the actual Suggestion data, right ?

@tenzen-y
Copy link
Member Author

Thanks for the clarification @tenzen-y, that might work. Then, we are going to parse that ConfigMap during runtime to get the actual Suggestion data, right ?

@andreyvelich Yes, you're right 👍

@tenzen-y
Copy link
Member Author

If @kubeflow/wg-automl-leads don't have any objections, I'd like to work on this. After that, I'll work on #2149.

@andreyvelich
Copy link
Member

Sure, thank you for your contributions @tenzen-y!
/assign @tenzen-y

One question that I might have, do we want to include only Cert Generator args to Katib Config, as you mentioned here: #2149 (comment) ?

Do we want to still use args for Katib Controller settings (e.g. trial-resources)?

@johnugeorge
Copy link
Member

johnugeorge commented May 10, 2023

@tenzen-y To confirm: So for any new change to be made in config, users have to either do Kustomise build on the yaml to generate configmap or can edit generated configmap manually. right?

Btw, this would change the current configmap structure which needs to be added in release notes. I don't think, there is any problem with that though.

@tenzen-y
Copy link
Member Author

One question that I might have, do we want to include only Cert Generator args to Katib Config, as you mentioned here: #2149 (comment) ?

Do we want to still use args for Katib Controller settings (e.g. trial-resources)?

@andreyvelich That's a good point. Ideally, migrating all args to katib-config would be good, although the backward compatibility will be loset. wdyt?

@tenzen-y
Copy link
Member Author

To confirm: So for any new change to be made in config, users have to either do Kustomise build on the yaml to generate configmap or can edit generated configmap manually. right?

@johnugeorge Yes, we keep using pure YAML, not CRD. So, users need to build manifests with kustomize the same as always.

Btw, this would change the current configmap structure which needs to be added in release notes. I don't think, there is any problem with that though.

Sounds great. Let's point this out in a release note.

@johnugeorge
Copy link
Member

Sounds good

@andreyvelich
Copy link
Member

One question that I might have, do we want to include only Cert Generator args to Katib Config, as you mentioned here: #2149 (comment) ?
Do we want to still use args for Katib Controller settings (e.g. trial-resources)?

@andreyvelich That's a good point. Ideally, migrating all args to katib-config would be good, although the backward compatibility will be loset. wdyt?

@tenzen-y @johnugeorge I think, we need to decide purpose of Katib Config from the Katib user perspective.
Do we want to give them ability to modify HP Tuning Algorithms, Metrics Collectors, and Katib components (DB, Controller, Cert Manager) settings in one config ?

@tenzen-y
Copy link
Member Author

One question that I might have, do we want to include only Cert Generator args to Katib Config, as you mentioned here: #2149 (comment) ?
Do we want to still use args for Katib Controller settings (e.g. trial-resources)?

@andreyvelich That's a good point. Ideally, migrating all args to katib-config would be good, although the backward compatibility will be loset. wdyt?

@tenzen-y @johnugeorge I think, we need to decide purpose of Katib Config from the Katib user perspective. Do we want to give them ability to modify HP Tuning Algorithms, Metrics Collectors, and Katib components (DB, Controller, Cert Manager) settings in one config ?

@andreyvelich IMO, all options for the katib can be modified in KatibConfig would be useful.
@johnugeorge WDYT?

@andreyvelich
Copy link
Member

Makes sense, let's have unify place for all Katib settings.

@tenzen-y
Copy link
Member Author

Makes sense, let's have unify place for all Katib settings.

Awesome! Let's go ahead!

@tenzen-y
Copy link
Member Author

Through some more consideration, I'd like to change the proposed design like the following:

apiVersion: config.katib.kubeflow.org/v1beta1
kind: KatibConfig
static:
  metricsAddr: ":8080"
  injectSecurityContext: false
...
dynamic:
  metrics-collector-sidecar:
  - name: StdOut
    image: "docker.io/kubeflowkatib/file-metrics-collector:latest"
  suggestion:
  - name: random
    image: "docker.io/kubeflowkatib/suggestion-hyperopt:latest"
    resources:
      limits:
        memory: 200Mi
  early-stopping:
  - name: medianstop
    image: "docker.io/kubeflowkatib/earlystopping-medianstop:latest"
...

In the above new design, we put options that users need to restart the katib-controller inside static and options that users don't need to restart one inside dynamic.

This means we move options specifying via katib-controller flags to static and move options specifying via katib-config to dynamic.

@kubeflow/wg-automl-leads wdyt?

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 1, 2023

@kubeflow/wg-automl-leads ping

@andreyvelich
Copy link
Member

Sorry for the late reply @tenzen-y.
I guess, that additional params (static and dynamic) will be useful if users don't check the full documentation ? Do we have anything else when it will be useful ?

Also, (if we decide to use them) maybe we should use runtime instead of dynamic and initialization (or think about better word to explain start time) instead of static ?

WDYT @tenzen-y @johnugeorge @gaocegege ?

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 2, 2023

Thanks for the response! @andreyvelich

Also, (if we decide to use them) maybe we should use runtime instead of dynamic and initialization (or think about better word to explain start time) instead of static ?

The runtime and initialization sounds good!

Do we have anything else when it will be useful ?

I think runtime and initialization would be enough since users will be confused by overcomplicating KatibConfig.
wdyt?

@andreyvelich
Copy link
Member

I think runtime and initialization would be enough since users will be confused by overcomplicating KatibConfig.
wdyt?

I meant, can this additional fields (runtime and initialization) overcomplicate Katib Config ? Or it provides more clarity how these settings are used ?

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 5, 2023

Ah, I see. Thanks for the clarification!
I believe it provides more clarity. But let me know what other leads think. > @johnugeorge @gaocegege

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 9, 2023

I'll move this forward with runtime and initialization since others seem not to have any concerns.

@tenzen-y
Copy link
Member Author

I just came back here. I will create a PR within this week.

@andreyvelich
Copy link
Member

That would be great @tenzen-y, thank you for your hard work!
If you need any help please let me know.

@tenzen-y
Copy link
Member Author

If you need any help please let me know.

Thanks!

@andreyvelich
Copy link
Member

andreyvelich commented Jul 18, 2023

@tenzen-y We've been discussing with @johnugeorge about naming.
We might even make it shorter - init, similar to initContainer in K8s.
So make it as follows:

runtime:
  suggestion:
    - name: random
      image: docker.io/random
init:
  controller:
    trialResources:
      - TFJob.v1.kubeflow.org
      - PyTorchJob.v1.kubeflow.org
    metricsAddr: 8080
  cert-generator:
    namespace: kubeflow

WDYT @tenzen-y @johnugeorge ?

@tenzen-y
Copy link
Member Author

@tenzen-y We've been discussing with @johnugeorge about naming. We might even make it shorter - init, similar to initContainer in K8s. So make it as follows:

runtime:
  suggestion:
    - name: random
      image: docker.io/random
init:
  controller:
    trialResources:
      - TFJob.v1.kubeflow.org
      - PyTorchJob.v1.kubeflow.org
    metricsAddr: 8080
  cert-generator:
    namespace: kubeflow

WDYT @tenzen-y @johnugeorge ?

@andreyvelich @johnugeorge It looks good :) The shorter name increases visibility. Thank you for a great suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants