Skip to content
Branch: master
Find file History
gaoning777 and k8s-ci-robot Release 1449d08 (#2099)
* Updated component images to version 1449d08

* Updated components to version e7a021e
Latest commit 4f46720 Sep 13, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md move old gcp components to deprecated folder (#2031) Sep 6, 2019
xgboost_training_cm.py Release 1449d08 (#2099) Sep 13, 2019

README.md

Overview

The xgboost-training-cm.py pipeline creates XGBoost models on structured data in CSV format. Both classification and regression are supported.

The pipeline starts by creating an Google DataProc cluster, and then running analysis, transformation, distributed training and prediction in the created cluster. Then a single node confusion-matrix aggregator is used (for classification case) to provide the confusion matrix data to the front end. Finally, a delete cluster operation runs to destroy the cluster it creates in the beginning. The delete cluster operation is used as an exit handler, meaning it will run regardless of whether the pipeline fails or not.

Requirements

Preprocessing uses Google Cloud DataProc. Therefore, you must enable the DataProc API for the given GCP project.

Compile

Follow the guide to building a pipeline to install the Kubeflow Pipelines SDK and compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a .zip file.

Deploy

Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (.zip file) as a new pipeline template.

Run

Most arguments come with default values. Only output and project need to be filled always.

  • output is a Google Storage path which holds pipeline run results. Note that each pipeline run will create a unique directory under output so it will not override previous results.
  • project is a GCP project.

Components source

Create Cluster: source code container

Analyze (step one for preprocessing): source code container

Transform (step two for preprocessing): source code container

Distributed Training: source code container

Distributed Predictions: source code container

Confusion Matrix: source code container

ROC: source code container

Delete Cluster: source code container

You can’t perform that action at this time.