Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose step id and step name #1191

Merged
merged 8 commits into from
Apr 25, 2019
Merged

Conversation

cheyang
Copy link
Contributor

@cheyang cheyang commented Apr 19, 2019

Need expose step id and step name when running pipelines.

This change is Reviewable

@cheyang
Copy link
Contributor Author

cheyang commented Apr 19, 2019

/assign @Ark-kun

@cheyang cheyang changed the title Expose step uid and step name Expose step id and step name Apr 19, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 19, 2019

Can you please explain the purpose of this change? Why do you need to pass step ID and name?
I see that you're passing workflow ID and name which is not the same as task/pod id/name.
I also see that you're using Argo placeholders in the code. I'd like to limit their usage since this will prevent us from switching orchestrators if we decide to do so.

@cheyang
Copy link
Contributor Author

cheyang commented Apr 19, 2019

Yes, the requirement is from our customer. They want to persistent the logs and some outputs to distributed storage so they need the unique name of the step. And I think we can change if the orchestrators are changed.

What's your suggestion on that? If the user want to get the unique id and name of the step.

@animeshsingh
Copy link
Contributor

Hi @cheyang - why do they want to persist at step level? Argo has a setting where you can configure logs to be stored on S3 for archival for a pipeline run. Also then what will be needed is UI change to display the logs from the S3 if its stored there. cc @vicaire

@cheyang
Copy link
Contributor Author

cheyang commented Apr 20, 2019

Hi @cheyang - why do they want to persist at step level? Argo has a setting where you can configure logs to be stored on S3 for archival for a pipeline run. Also then what will be needed is UI change to display the logs from the S3 if its stored there. cc @vicaire

Thanks for response.

In fact, the purpose is to persistent or organize the logs and some outputs(for example, the result of feature extraction) in specific format in the distributed storage so it can be easily used to trace the process
of experiment even a long time after the run (even the step container is cleaned up). A sample format is {experiment_name}/{RUN_ID}/{STEP_ID}, so the user can easily build the relationship between each step of run and the result of the step.

And for on-prem user and the user in China, S3 is impossible to use.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 22, 2019

@cheyang Is it possible to move these to the pipeline level? The higher it is - the easier it would be to change later.
The pipeline step may receive a URI or a unique ID (in contrast with explicitly receiving "step name" and "step id"). See

output_template = str(output) + '/{{workflow.uid}}/{{pod.name}}/data'

Our current way of passing data (until artifact passing support) is to give some step a URI template which gets resolved at run time and the step outputs that URI to pass it forward. So a train step outputs the model URI and not step name or ID. This is also easier for the consuming step since you pass it a URI instead of step name and ID that it needs to combine to build that URI.

@cheyang
Copy link
Contributor Author

cheyang commented Apr 22, 2019

@Ark-kun our users will build a lot of pipelines, so it is a duplicate work to set output_template = str(output) + '/{{workflow.uid}}/{{pod.name}}/data for them and expose more details to them. I think the concept step is in Kubeflow. It's fine for the user to understand. But workflow and pod is implementation layer concept, I prefer to make them transparent to the pipeline authors.

I think the benefits of doing this in abstraction layer(arena op) are only changes in one place if we are not using Argo any more. The end user should be not aware of this change, they just update the SDK.

WDYT?

@vicaire
Copy link
Contributor

vicaire commented Apr 24, 2019

@Ark-kun, do you have additional comments on this one?

Copy link
Contributor

@animeshsingh animeshsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vicaire @Ark-kun @cheyang am still struggling to see the overall value here. We can archive logs from a pipeline - Argo supports that
https://github.com/argoproj/argo/blob/master/docs/workflow-controller-configmap.yaml#L40-L76

Now wouldnt it be easier to look in Argo and see if they support a backend other than S3?

@vicaire
Copy link
Contributor

vicaire commented Apr 25, 2019

@Ark-kun, any further comments on this one?

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 25, 2019

@Ark-kun, do you have additional comments on this one?

I'm handling this PR and I'm also chatting with @cheyang on Hangouts.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 25, 2019

@cheyang

A sample format is {experiment_name}/{RUN_ID}/{STEP_ID}

Are you sure your code is doing what you want? I do not see it to be using step/pod ID. It only uses workflow.uid which is the same for all steps. Did you mean pod.name?

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 25, 2019

I see you've fixed the pod.name issue

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vicaire
Copy link
Contributor

vicaire commented Apr 25, 2019

@animeshsingh, I saw your comment only after entering mine. I agree on the idea of using Argo to backup the logs to an object store. (exposing step ID and step name might be useful regardless to, for instance, call the future ML Metadata service and store metadata about an artifact, including which pipeline and which pipeline step computed it).

@k8s-ci-robot k8s-ci-robot merged commit 8c8e505 into kubeflow:master Apr 25, 2019
@cheyang
Copy link
Contributor Author

cheyang commented Apr 25, 2019

I see you've fixed the pod.name issue

/lgtm
/approve

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants