Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arguments missing for custom metrics collector #2109

Closed
votti opened this issue Feb 10, 2023 · 13 comments
Closed

Arguments missing for custom metrics collector #2109

votti opened this issue Feb 10, 2023 · 13 comments

Comments

@votti
Copy link

votti commented Feb 10, 2023

/kind bug

What steps did you take and what happened:
I want to run an experiment with a custom metrics collector using the following spec:

  metricsCollectorSpec:
    source:
      fileSystemPath:
        path: "/tmp/outputs/mlpipeline_metrics/data"
        kind: File
    collector:
      customCollector:
        image: votti/kfpv1-metricscollector:v0.0.7
        imagePullPolicy: Always
        name: custom-metrics-logger-and-collector
      kind: Custom

This is used for: #2019

What did you expect to happen:
I expected the metrics collector sidecar pod to be created to carry the arguments:

  • -t trial.Name
  • -m: metricNames
  • -o-type: string(trial.Spec.Objective.Type)
  • -s-db: katibmanagerv1beta1.GetDBManagerAddr()

which should have been added by the following code:

args := []string{"-t", trial.Name, "-m", metricNames, "-o-type", string(trial.Spec.Objective.Type), "-s-db", katibmanagerv1beta1.GetDBManagerAddr()}

Anything else you would like to add:

Environment:

  • Katib version (check the Katib controller image version): docker.io/kubeflowkatib/katib-controller:v0.14.0"
  • Kubernetes version: (kubectl version):
    Client Version: v1.26.0
    Kustomize Version: v4.5.7
    Server Version: v1.23.14-gke.1800
  • OS (uname -a): Linux LAPTOP-SS9UALAT 4.19.128-microsoft-standard SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

@andreyvelich
Copy link
Member

andreyvelich commented Feb 10, 2023

Thank you for creating this @votti!

I think, originally we didn't modify Custom Metrics Collector container args. So user can set any container args without unexpected parameters.
To send Trial names for such cases, you can check this example: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L32-L36.

Does it solve your problem ? Or you want to pass some other info to your container ?

@votti
Copy link
Author

votti commented Feb 10, 2023

Ok let me try this again - I thought TrialNamePrefix from metadata.name was just the prefix of the trial, but not the actual trial_name.

I need the trial name to submit the metrics to the correct trial_name such that it is correctly registered:
https://github.com/d-one/katib/blob/0d25cbfc6a69e3014c3c25c9ea94b4c97b965473/cmd/metricscollector/v1beta1/kfpv1-metricscollector/main.py#L109

I wonder how the kubeflowkatib/custom-metrics-collector:latest managed to do this - but i guess that code is still lost: #1263

Once I get this working, I think I have the know-how together to make a new custom-metrics-collector example that should simplify any future development of such a collector.

@votti
Copy link
Author

votti commented Feb 10, 2023

Actually this works and gives the name of the current pod.
In my case this is the TrialName+the suffix, so I will need some custom code to get the true TrialName - but this is definitely doable.

  metricsCollectorSpec:
    collector:
      customCollector:
        args:
          - -m
          - val-accuracy;accuracy
          - -s
          - katib-db-manager.kubeflow:6789
          - -path
          - /tmp/outputs/mlpipeline_metrics/data
          - -t
          - $(TrialNamePrefix)
        image: votti/kfpv1-metricscollector:v0.0.9
        imagePullPolicy: Always
        name: custom-metrics-logger-and-collector
        env:
          - name: TrialNamePrefix
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
      kind: Custom
    source:
      fileSystemPath:
        kind: File
        path: /tmp/outputs/mlpipeline_metrics/data

Thanks for pointing me to this!

@votti votti closed this as completed Feb 10, 2023
@andreyvelich
Copy link
Member

Great. @votti I guess, it gives you Pod name where Training + Metrics Collector containers are running, right ?

@votti
Copy link
Author

votti commented Feb 10, 2023

Yes I was just updating my comment as I noticed this.

But in my case the Pod name is the TrialName + suffix, so this will work.

@andreyvelich
Copy link
Member

andreyvelich commented Feb 10, 2023

Thanks @votti. That means that Pod name must have Trial name, which might not work for every Kubernetes CRDs as Trial template (e.g. Argo Workflows).
@johnugeorge @tenzen-y I think we should keep this in mind when users try to create Custom Metrics Collectors.

Also, we mutate Trial name to the Pod labels, so another solution is to fetch this data from Pod's labels.

@votti votti reopened this Feb 10, 2023
@votti
Copy link
Author

votti commented Feb 10, 2023

Somehow having the option/a flag to add the standard CLI arguments to the customCollector as well would be a bit more elegant in my opinion.
Maybe this would work with a StandardCustom or so kind for the collector?

Then such a standard custom metrics collector just would need to stick to a CLI interface to get these parameters, instead of relying on additional workarounds.

Eg having to hard code the metrics as well as the katib-db-manager as in the example is a bit inelegant as well.

@votti
Copy link
Author

votti commented Feb 10, 2023

@pod lables: in my case I do not find that label (should be katib.kubeflow.org/tria
Could be because I am actually working on an Argo Workflow, where the Trial kicks of a Workflow that orchestrates the pods.
Yet this does not seem to be a reliable solution as well.

@andreyvelich
Copy link
Member

Then such a standard custom metrics collector just would need to stick to a CLI interface to get these parameters, instead of relying on additional workarounds.

We should think more about it, we can always inject args to container ENVs to not break the runtime.

@pod lables: in my case I do not find that label (should be katib.kubeflow.org/tria

@votti What version of Katib Controller you are using ? This feature was implemented recently (in Katib 0.15 release).

@votti
Copy link
Author

votti commented Feb 10, 2023

@andreyvelich : 0.14 - will check in 0.15 soon

votti added a commit to d-one/katib that referenced this issue Feb 10, 2023
The TrialName can be parse from the pod name.

This seems currently a good way to get the trial name. For
more discussion see: kubeflow#2109
@tenzen-y
Copy link
Member

Thanks @votti. That means that Pod name must have Trial name, which might not work for every Kubernetes CRDs as Trial template (e.g. Argo Workflows). @johnugeorge @tenzen-y I think we should keep this in mind when users try to create Custom Metrics Collectors.

Also, we mutate Trial name to the Pod labels, so another solution is to fetch this data from Pod's labels.

It seems that we don't have any documentation for the custom metrics collector. It would be good to add the documentation for that and mention about Pod name.

votti added a commit to d-one/katib that referenced this issue Jul 18, 2023
The TrialName can be parse from the pod name.

This seems currently a good way to get the trial name. For
more discussion see: kubeflow#2109
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants