## Kubeflow pipelines

This notebook goes through the steps of using Kubeflow pipelines using the Python3 interpreter (command-line) to preprocess, train, tune and deploy the babyweight model.


### 1. Start Hosted Pipelines and Notebook

To try out this notebook, first launch Kubeflow Hosted Pipelines and an AI Platform Notebooks instance.
Follow the instructions in this [README.md](pipelines/README.md) file.

### 2. Install necessary packages

In [2]:
%pip install --quiet kfp python-dateutil --upgrade --use-feature=2020-resolver

Note: you may need to restart the kernel to use updated packages.


Make sure to *restart the kernel* to pick up new packages (look for button in the ribbon of icons above this notebook)

### 3. Connect to the Hosted Pipelines

Visit https://console.cloud.google.com/ai-platform/pipelines/clusters
and get the hostname for your cluster.  You can get it by clicking on the Settings icon.
Alternately, click on the Open Pipelines Dashboard link and look at the URL.
Change the settings in the following cell

In [3]:
# CHANGE THESE
PIPELINES_HOST='447cdd24f70c9541-dot-us-central1.notebooks.googleusercontent.com'
PROJECT='qwiklabs-gcp-01-974853e7c436'
BUCKET='ai-analytics-solutions-kfpdemo'

In [4]:
import kfp
import os
client = kfp.Client(host=PIPELINES_HOST)
#client.list_pipelines()

ERROR:root:Failed to get healthz info attempt 1 of 5.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/kfp/_client.py", line 312, in get_kfp_healthz
    response = self._healthz_api.get_healthz()
  File "/opt/conda/lib/python3.7/site-packages/kfp_server_api/api/healthz_service_api.py", line 77, in get_healthz
    return self.get_healthz_with_http_info(**kwargs)  # noqa: E501
  File "/opt/conda/lib/python3.7/site-packages/kfp_server_api/api/healthz_service_api.py", line 162, in get_healthz_with_http_info
    collection_formats=collection_formats)
  File "/opt/conda/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 383, in call_api
    _preload_content, _request_timeout, _host)
  File "/opt/conda/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 202, in __call_api
    raise e
  File "/opt/conda/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 199, in __call_api
    _request_timeout=_request_timeout)
  File "/opt/

TimeoutError: Failed getting healthz endpoint after 5 attempts.

## 4. [Optional] Build Docker containers

I have made my containers public (See https://cloud.google.com/container-registry/docs/access-control on how to do this), so you can simply use my images.

In [None]:
%%bash
cd pipelines/containers
#bash build_all.sh

Check that the Docker images work properly ...

In [None]:
#!docker run -t gcr.io/ai-analytics-solutions/babyweight-pipeline-bqtocsv:latest --project $PROJECT  --bucket $BUCKET --local

### 5. Upload and execute pipeline

Upload to the Kubeflow pipeline cluster

In [3]:
from pipelines.containers.pipeline import mlp_babyweight

args = {
    'project' : PROJECT, 
    'bucket' : BUCKET
}

#pipeline = client.create_run_from_pipeline_func(mlp_babyweight.preprocess_train_and_deploy, args)

os.environ['HPARAM_JOB'] = 'babyweight_200207_231639' # change to job from complete step
pipeline = client.create_run_from_pipeline_func(mlp_babyweight.train_and_deploy, args)

In [None]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.