<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>

The sample was tested with TFX version: 0.21.4 and KFP version: 0.4.0. Make sure that these versions are installed in your environment.

In [36]:
!python -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"
!python -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

TFX version: 0.21.4
KFP version: 0.4.0


Add a path to skaffold.

In [37]:
PATH=%env PATH
%env PATH=/home/jupyter/.local/bin:{PATH}

env: PATH=/home/jupyter/.local/bin:/home/jupyter/.local/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games


## Building and deploying the pipeline

You will use TFX CLI to compile and deploy the pipeline. As explained in the previous section, the environment specific settings can be provided through a set of environment variables and embedded into the pipeline package at compile time.

### Configure settings


In [38]:
PROJECT_ID = 'mlops-dev-env'
GCP_REGION = 'us-central1'
ENDPOINT = '309f723963874a47-dot-us-central2.pipelines.googleusercontent.com'
ARTIFACT_STORE_URI = 'gs://mlops-dev-workspace'
PIPELINE_NAME = 'custom_component'
DATA_ROOT_URI = 'gs://mlops-dev-workspace/data/taxi'
CUSTOM_TFX_IMAGE = 'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)

In [39]:
!gsutil cat {DATA_ROOT_URI}/data.csv

pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0


### Compile the pipeline


In [40]:
%env PROJECT_ID={PROJECT_ID}
%env KUBEFLOW_TFX_IMAGE={CUSTOM_TFX_IMAGE}
%env ARTIFACT_STORE_URI={ARTIFACT_STORE_URI}
%env DATA_ROOT_URI={DATA_ROOT_URI}
%env GCP_REGION={GCP_REGION}
%env PIPELINE_NAME={PIPELINE_NAME}

env: PROJECT_ID=mlops-dev-env
env: KUBEFLOW_TFX_IMAGE=gcr.io/mlops-dev-env/custom_component
env: ARTIFACT_STORE_URI=gs://mlops-dev-workspace
env: DATA_ROOT_URI=gs://mlops-dev-workspace/data/taxi
env: GCP_REGION=us-central1
env: PIPELINE_NAME=custom_component


In [41]:
!tfx pipeline compile --engine kubeflow --pipeline_path dataflow_runner.py

CLI
Compiling pipeline
INFO:absl:Adding upstream dependencies for component CsvExampleGen
INFO:absl:Adding upstream dependencies for component HelloComponent
INFO:absl:   ->  Component: CsvExampleGen
INFO:absl:Adding upstream dependencies for component StatisticsGen
INFO:absl:   ->  Component: HelloComponent
Pipeline compiled successfully.
Pipeline package path: /home/jupyter/tfx-sandbox/hello_world/custom_component.tar.gz


### Deploy the pipeline package to AI Platform Pipelines

After the pipeline code compiles without any errors you can use the `tfx pipeline create` command to perform the full build and deploy the pipeline. 


In [None]:
!tfx pipeline update  \
--pipeline_path=dataflow_runner.py \
--endpoint={ENDPOINT} 

CLI
Updating pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Reading build spec from build.yaml
Use skaffold to build the container image.
/home/jupyter/.local/bin/skaffold
New container image is built. Target image is available in the build spec file.


If you need to redeploy the pipeline you can first delete the previous version using `tfx pipeline delete` or you can update the pipeline in-place using `tfx pipeline update`.

To delete the pipeline:

`tfx pipeline delete --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}`

To update the pipeline:

`tfx pipeline update --pipeline_path runner.py --endpoint {ENDPOINT}`

### Create and monitor a pipeline run
After the pipeline has been deployed, you can trigger and monitor pipeline runs using TFX CLI or KFP UI.

To submit the pipeline run using TFX CLI:

In [34]:
!tfx run create --pipeline_name={PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Creating a run for pipeline: custom_component
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Run created for pipeline: custom_component
+------------------+--------------------------------------+----------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| pipeline_name    | run_id                               | status   | created_at                | link                                                                                                                        |
| custom_component | 83b7e3fa-cd64-4b0f-be90-34aa9bb92a53 |          | 2020-05-12T22:14:07+00:00 | http://309f723963874a47-dot-us-central2.pipelines.googleusercontent.com/#/runs/details/83b7e3fa-cd64-4b0f-be90-34aa9bb92a53 |
+------------------+--------------------------------------+----------+---------------------------+--------------------------------------------------

To list all active runs of the pipeline:

In [16]:
!tfx run list --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}

CLI
Listing all runs of pipeline: custom_component
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
+------------------+--------------------------------------+-----------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| pipeline_name    | run_id                               | status    | created_at                | link                                                                                                                        |
| custom_component | 7700e961-1501-4a23-8301-77d33713bc94 | Succeeded | 2020-05-12T17:40:46+00:00 | http://309f723963874a47-dot-us-central2.pipelines.googleusercontent.com/#/runs/details/7700e961-1501-4a23-8301-77d33713bc94 |
+------------------+--------------------------------------+-----------+---------------------------+----------------------------------------------------------------------------------------

To retrieve the status of a given run:

In [None]:
RUN_ID='[YOUR RUN ID]'

!tfx run status --pipeline_name {PIPELINE_NAME} --run_id {RUN_ID} --endpoint {ENDPOINT}