Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(components): De-hardcoded local output paths. #580

Conversation

Ark-kun
Copy link
Contributor

@Ark-kun Ark-kun commented Dec 21, 2018

Preparation for the future storage system.


This change is Reviewable

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Dec 21, 2018

/test kubeflow-pipeline-sample-test

@Ark-kun Ark-kun force-pushed the Components---De-hardcoded-local-output-paths branch from d5a541e to f2efab5 Compare December 27, 2018 00:55
@Ark-kun Ark-kun changed the title Components - De-hardcoded local output paths. [WIP]Components - De-hardcoded local output paths. Jan 2, 2019
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 2, 2019

/test kubeflow-pipeline-build-image

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 3, 2019

/test kubeflow-pipeline-sample-test

@Ark-kun Ark-kun force-pushed the Components---De-hardcoded-local-output-paths branch from 908ce6e to a6adf38 Compare January 5, 2019 03:15
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 9, 2019

/test kubeflow-pipeline-sample-test

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 9, 2019

/test kubeflow-pipeline-e2e-test

@@ -60,6 +61,14 @@ def parse_arguments():
type=int,
default=32,
help='Batch size used in prediction.')
parser.add_argument('--prediction-results-uri-pattern-output-path',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably missed some context here. Why is this needed? Does it add burden to component author to add each output as a command line argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The industry is moving away from Docker towards open and standardized container support and so do our users and customers.
We have received bug reports from users that our pipelines fail on non-Docker Kubernetes clusters.
While making Argo work on a non-Docker cluster is trivial, it's much harder with Pipelines.
One particular problem is with local artifact paths. There are specific system-imposed requirements for where the artifacts can and cannot be stored. More so, this depends on the Argo executor being used in the cluster.
De-hardcoding the local artifact paths seems to be the only option for keeping the pipelines portable.
Usually the programs are already written without any paths hard-coded.
If the program did not follow the best practices, the author only need to add one line to the program to resolve the problem. This seem to be a small price where the alternative is a component that only work on some clusters.

@gaoning777
Copy link
Contributor

Are you planning to close this issue? since you did not try to resolve hongye's concern, should we close it?

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Aug 13, 2019

The industry is moving away from Docker towards open and standardized container support and so do our users and customers.
We have received bug reports from users that our pipelines fail on non-Docker Kubernetes clusters. #1654
While making Argo work on a non-Docker cluster is trivial, it's much harder with Pipelines.
One particular problem is with local artifact paths. There are specific system-imposed requirements for where the artifacts can and cannot be stored. More so, this depends on the Argo executor being used in the cluster.
De-hardcoding the local artifact paths seems to be the only option for keeping the pipelines portable.
Usually the programs are already written without any paths hard-coded.
If the program did not follow the best practices, the author only need to add one line to the program to resolve the problem. This seem to be a small price where the alternative is a component that only work on some clusters.

More so, we've received reports that our components code is hard to test. Indeed most of the components miss test cases and testing the components is complicated by the fact that they try to write outputs to unconfigurable global locations which are usually off-limits for the test code. Making the paths output paths configurable makes the component code much easier to test.

@xinbinhuang
Copy link

xinbinhuang commented Aug 18, 2020

@Ark-kun One quick question: does it mean that moving forward, components should not utilize local output path but external storage system for "artifacts" (i.e. not the argo artifacts) that user want to persist?

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Aug 26, 2020

@Ark-kun One quick question: does it mean that moving forward, components should not utilize local output path but external storage system for "artifacts" (i.e. not the argo artifacts) that user want to persist?

No. The components should still output data by writing it to local paths unless they output 100GB of data.

But components should not hardcode those output paths in the code. The local paths should be given by the system.

Does this make sense?

@google-cla google-cla bot added the cla: yes label Aug 26, 2020
@Ark-kun Ark-kun changed the title [WIP]Components - De-hardcoded local output paths. refactor(components): De-hardcoded local output paths. Aug 26, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Aug 29, 2020

/retest

Looks like we need to fix this flakiness

@Bobgy
Copy link
Contributor

Bobgy commented Aug 30, 2020

/retest

3 similar comments
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Sep 1, 2020

/retest

@Bobgy
Copy link
Contributor

Bobgy commented Sep 1, 2020

/retest

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Sep 2, 2020

/retest

@Bobgy
Copy link
Contributor

Bobgy commented Sep 3, 2020

/lgtm
/approve
Thanks @Ark-kun

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a77af2c into kubeflow:master Sep 3, 2020
@Bobgy Bobgy added cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch cherrypicked cherry picked to release branch `release-x.y` labels Sep 4, 2020
Bobgy pushed a commit to Bobgy/pipelines that referenced this pull request Sep 4, 2020
* Components - De-hardcoded local output paths.

* pip install pathlib2

* Added component.yaml changes

* The Dataflow components have been deleted
Bobgy pushed a commit that referenced this pull request Sep 4, 2020
* Components - De-hardcoded local output paths.

* pip install pathlib2

* Added component.yaml changes

* The Dataflow components have been deleted
@Bobgy Bobgy removed cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch cherrypicked cherry picked to release branch `release-x.y` labels Sep 10, 2020
Bobgy added a commit that referenced this pull request Sep 10, 2020
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
* Components - De-hardcoded local output paths.

* pip install pathlib2

* Added component.yaml changes

* The Dataflow components have been deleted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants