Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added and fixed links fixed branding and text (#128)
Updated branding; added a link to the guide to building your own components; fixed 2 links; made other minor textual fixes.
- Loading branch information
1 parent
63e9c54
commit 8c75eca
Showing
1 changed file
with
18 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,34 @@ | ||
# ML Pipeline Components | ||
# Kubeflow pipeline components | ||
|
||
ML Pipeline Components are implementation of ML Pipeline tasks. Each task takes | ||
one or more [artifacts](../artifacts) as input and may produce one or more | ||
[artifacts](../artifacts). | ||
Kubeflow pipeline components are implementations of Kubeflow pipeline tasks. Each task takes | ||
one or more [artifacts](https://github.com/kubeflow/pipelines/wiki/Concepts#step-output-artifacts) | ||
as input and may produce one or more | ||
[artifacts](https://github.com/kubeflow/pipelines/wiki/Concepts#step-output-artifacts) as output. | ||
|
||
|
||
## XGBoost DataProc Components | ||
* [Setup Cluster](dataproc/xgboost/create_cluster.py) | ||
**Example: XGBoost DataProc components** | ||
* [Set up cluster](dataproc/xgboost/create_cluster.py) | ||
* [Analyze](dataproc/xgboost/analyze.py) | ||
* [Transform](dataproc/xgboost/transform.py) | ||
* [Distributed Train](dataproc/xgboost/train.py) | ||
* [Delete Cluster](dataproc/xgboost/delete_cluster.py) | ||
* [Distributed train](dataproc/xgboost/train.py) | ||
* [Delete cluster](dataproc/xgboost/delete_cluster.py) | ||
|
||
Each task usually includes two parts: | ||
|
||
``Client Code`` | ||
``Client code`` | ||
The code that talks to endpoints to submit jobs. For example, code to talk to Google | ||
Dataproc API to submit Spark job. | ||
Dataproc API to submit a Spark job. | ||
|
||
``Runtime Code`` | ||
The code that does the actual job and usually run in cluster. For example, Spark code | ||
that transform raw data into preprocessed data. | ||
``Runtime code`` | ||
The code that does the actual job and usually runs in the cluster. For example, Spark code | ||
that transforms raw data into preprocessed data. | ||
|
||
``Container`` | ||
A container image that runs the client code. | ||
|
||
There is a naming convention to client code and runtime code. For a task named "mytask", | ||
there is mytask.py including client code, and there is also a mytask directory holding | ||
all runtime code. | ||
Note the naming convention for client code and runtime code—for a task named "mytask": | ||
|
||
* The `mytask.py` program contains the client code. | ||
* The `mytask` directory contains all the runtime code. | ||
|
||
See [how to build your own components](https://github.com/kubeflow/pipelines/wiki/Build-Your-Own-Component) |