# Train and deploy on Kubeflow from Notebooks

This notebook introduces you to using Kubeflow Fairing to train and deploy a model to Kubeflow on Google Kubernetes Engine (GKE), and Kubeflow Pipeline to build a simple pipeline and deploy on GKE. This notebook demonstrate how to:
 
* Train an XGBoost model in a local notebook,
* Use Kubeflow Fairing to train an XGBoost model remotely on Kubeflow,
  * For simplicity code-generated synthetic data is used.
  * The append builder is used to rapidly build a docker image.
* Use Kubeflow Fairing to deploy a trained model to Kubeflow, and Call the deployed endpoint for predictions.
* Use a simple pipeline to train a model in GKE. 

To learn more about how to run this notebook locally, see the guide to [training and deploying on GCP from a local notebook][gcp-local-notebook].

[gcp-local-notebook]: https://kubeflow.org/docs/fairing/gcp/tutorials/gcp-local-notebook/

## Set up your notebook for training an XGBoost model

Import the libraries required to train this model.

In [4]:
!pip3 install retrying
!pip3 install https://github.com/kubeflow/fairing/archive/master.zip
!pip3 install kfmd

[33mYou are using pip version 19.0.1, however version 19.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Collecting https://github.com/kubeflow/fairing/archive/master.zip
  Downloading https://github.com/kubeflow/fairing/archive/master.zip
[K     | 2.6MB 80.3MB/ss


Building wheels for collected packages: fairing
  Building wheel for fairing (setup.py) ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-ctd8rjlc/wheels/d7/62/60/cece640c93ab6418ee8a72f5cdca19b02eff4ce4515edbd99c
Successfully built fairing
[33mYou are using pip version 19.0.1, however version 19.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Collecting kfmd
  Downloading https://files.pythonhosted.org/packages/cf/72/048a49042dacd93925f6f4253cb765aeddef34da4cbec05066dc1ac555f5/kfmd-0.1.8.tar.gz
Building wheels for collected packages: kfmd
  Building wheel for kfmd (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/3d/ef/17/5f5099e588c582d66506547e0bd28bd7071959137a88b110ca
Successfully built kfmd
Installing collected packages: kfmd
Successfully installed kfmd-0.1.8
[33mYou are using pip version 19.0.1, however version 19.2.1 is available.
You should consider upgrading via the 'pi

In [3]:
import util
from pathlib import Path
import os

############# delete ########
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/home/jovyan/key.json"
util.notebook_setup()


In [11]:
# fairing:include-cell
import fire
import joblib
import logging
import kfmd
import nbconvert
import os
import pathlib
import sys
from pathlib import Path
import pandas as pd
import pprint
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from xgboost import XGBRegressor
from importlib import reload
from sklearn.datasets import make_regression
from kfmd import metadata
from datetime import datetime


In [6]:
# Imports not to be included in the built docker image
import kfp
import kfp.components as comp
import kfp.gcp as gcp
import kfp.dsl as dsl
import kfp.compiler as compiler
from kubernetes import client as k8s_client
import fairing   
from fairing.builders import append
from fairing.deployers import job
from fairing.preprocessors.converted_notebook import ConvertNotebookPreprocessorWithFire


In [7]:
# fairing:include-cell
def read_synthetic_input(test_size=0.25):
    """generate synthetic data and split it into train and test."""
    # generate regression dataset
    X, y = make_regression(n_samples=200, n_features=5, noise=0.1)
    train_X, test_X, train_y, test_y = train_test_split(X,
                                                      y,
                                                      test_size=test_size,
                                                      shuffle=False)

    imputer = SimpleImputer()
    train_X = imputer.fit_transform(train_X)
    test_X = imputer.transform(test_X)

    return (train_X, train_y), (test_X, test_y)


In [12]:
# fairing:include-cell
def train_model(train_X,
                train_y,
                test_X,
                test_y,
                n_estimators,
                learning_rate):
    """Train the model using XGBRegressor."""
    model = XGBRegressor(n_estimators=n_estimators, learning_rate=learning_rate)

    model.fit(train_X,
            train_y,
            early_stopping_rounds=40,
            eval_set=[(test_X, test_y)])

    print("Best RMSE on eval: %.2f with %d rounds",
               model.best_score,
               model.best_iteration+1)
    return model

def eval_model(model, test_X, test_y):
    """Evaluate the model performance."""
    predictions = model.predict(test_X)
    mae=mean_absolute_error(predictions, test_y)
    logging.info("mean_absolute_error=%.2f", mae)
    return mae

def save_model(model, model_file):
    """Save XGBoost model for serving."""
    joblib.dump(model, model_file)
    logging.info("Model export success: %s", model_file)

Define various constants

## Define Train and Predict functions

In [13]:
# fairing:include-cell
class ModelServe(object):
    
    def __init__(self, model_file=None):
        self.n_estimators = 50
        self.learning_rate = 0.1
        if not model_file:
            if "MODEL_FILE" in os.environ:
                print("model_file not supplied; checking environment variable")
                model_file = os.getenv("MODEL_FILE")
            else:
                print("model_file not supplied; using the default")
                model_file = "mockup-model.dat"
        
        self.model_file = model_file
        print("model_file={0}".format(self.model_file))
        
        self.model = None
        self.exec = self.create_execution()

    def train(self):
        (train_X, train_y), (test_X, test_y) = read_synthetic_input()
        self.exec.log_input(metadata.DataSet(
            description="xgboost synthetic data",
            name="synthetic-data",
            owner="someone@kubeflow.org",
            uri="file://path/to/dataset",
            version="v1.0.0"))
        
        model = train_model(train_X,
                          train_y,
                          test_X,
                          test_y,
                          self.n_estimators,
                          self.learning_rate)

        mae = eval_model(model, test_X, test_y)
        self.exec.log_output(metadata.Metrics(
            name="xgboost-synthetic-traing-eval",
            owner="someone@kubeflow.org",
            description="training evaluation for xgboost synthetic",
            uri="gcs://path/to/metrics",
            metrics_type=metadata.Metrics.VALIDATION,
            values={"mean_absolute_error": mae}))
        
        save_model(model, self.model_file)
        self.exec.log_output(metadata.Model(
            name="housing-price-model",
            description="housing price prediction model using synthetic data",
            owner="someone@kubeflow.org",
            uri=self.model_file,
            model_type="linear_regression",
            training_framework={
                "name": "xgboost",
                "version": "0.9.0"
            },
            hyperparameters={
                "learning_rate": self.learning_rate,
                "n_estimators": self.n_estimators
            },
            version=datetime.utcnow().isoformat("T")))
        
    def predict(self, X, feature_names):
        """Predict using the model for given ndarray."""
        if not self.model:
            self.model = joblib.load(self.model_file)
        # Do any preprocessing
        prediction = self.model.predict(data=X)
        # Do any postprocessing
        return [[prediction.item(0), prediction.item(0)]]
    
    def create_execution(self):
        workspace = metadata.Workspace(
        # Connect to metadata-service in namesapce kubeflow in k8s cluster.
        backend_url_prefix="metadata-service.kubeflow:8080",
        name="xgboost-synthetic",
        description="workspace for xgboost-synthetic artifacts and executions")
        
        r = metadata.Run(
            workspace=workspace,
            name="xgboost-synthetic-faring-run" + datetime.utcnow().isoformat("T"),
            description="a notebook run")

        return metadata.Execution(
            name = "execution" + datetime.utcnow().isoformat("T"),
            workspace=workspace,
            run=r,
            description="execution for training xgboost-synthetic")

## Train your Model Locally

* Train your model locally inside your notebook

In [14]:
ModelServe(model_file="mockup-model.dat").train()

model_file=mockup-model.dat
[0]	validation_0-rmse:140.364
Will train until validation_0-rmse hasn't improved in 40 rounds.
[1]	validation_0-rmse:134.368
[2]	validation_0-rmse:128.262
[3]	validation_0-rmse:122.131
[4]	validation_0-rmse:117.647
[5]	validation_0-rmse:113.133
[6]	validation_0-rmse:108.111
[7]	validation_0-rmse:103.921
[8]	validation_0-rmse:101.611
[9]	validation_0-rmse:98.3824
[10]	validation_0-rmse:94.145
[11]	validation_0-rmse:91.121
[12]	validation_0-rmse:88.3225
[13]	validation_0-rmse:85.5555
[14]	validation_0-rmse:83.3553
[15]	validation_0-rmse:81.3385
[16]	validation_0-rmse:80.4257
[17]	validation_0-rmse:78.5906
[18]	validation_0-rmse:76.9606
[19]	validation_0-rmse:75.3192
[20]	validation_0-rmse:73.6229
[21]	validation_0-rmse:71.8799
[22]	validation_0-rmse:70.4872
[23]	validation_0-rmse:68.9296
[24]	validation_0-rmse:67.5273
[25]	validation_0-rmse:67.1219
[26]	validation_0-rmse:66.0533
[27]	validation_0-rmse:64.5362
[28]	validation_0-rmse:63.5446
[29]	validation_0-rm

mean_absolute_error=36.96
Model export success: mockup-model.dat


Best RMSE on eval: %.2f with %d rounds 52.49889 50


## Predict locally

* Run prediction inside the notebook using the newly created notebook

In [15]:
(train_X, train_y), (test_X, test_y) =read_synthetic_input()

ModelServe().predict(test_X, None)

model_file not supplied; using the default
model_file=mockup-model.dat


[[6.803134918212891, 6.803134918212891]]

## Use Fairing to Launch a K8s Job to train your model

### Set up Kubeflow Fairing for training and predictions

Import the `fairing` library and configure the environment that your training or prediction job will run in.

In [16]:
# Setting up google container repositories (GCR) for storing output containers
# You can use any docker container registry istead of GCR
GCP_PROJECT = fairing.cloud.gcp.guess_project_name()
print(GCP_PROJECT)
DOCKER_REGISTRY = 'gcr.io/{}/fairing-job'.format(GCP_PROJECT)
print(DOCKER_REGISTRY)
PY_VERSION = ".".join([str(x) for x in sys.version_info[0:3]])
BASE_IMAGE = 'python:{}'.format(PY_VERSION)
# ucan use Dockerfile in this repo to build and use the base_image
base_image = "gcr.io/kubeflow-images-public/xgboost-fairing-example-base:v-20190612"


zhenghui-kubeflow
gcr.io/zhenghui-kubeflow/fairing-job


## Use fairing to build the docker image

* This uses the append builder to rapidly build docker images

In [26]:
from fairing.builders import cluster
preprocessor = ConvertNotebookPreprocessorWithFire("ModelServe")

if not preprocessor.input_files:
    preprocessor.input_files = set()
input_files=["xgboost_util.py", "mockup-model.dat"]
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()

[PosixPath('build-train-deploy.py'), 'mockup-model.dat', 'xgboost_util.py']

In [30]:
cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,
                                                 base_image=base_image,
                                                 namespace='zhenghui-at-google-com-e1e117',
                                                 preprocessor=preprocessor,
                                                 pod_spec_mutators=[fairing.cloud.gcp.add_gcp_credentials_if_exists],
                                                 context_source=cluster.gcs_context.GCSContextSource())
cluster_builder.build()

Building image using cluster builder.
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_uuj5upj9 already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_csmnoar3 already exists in Fairing context, skipping...
Creating docker context: /tmp/fairing_context_0uul09r_
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_uuj5upj9 already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_csmnoar3 already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_ohhy43dg already exists in Fairing context, skipping...
Not able to find gcp credentials secret: user-gcp-sa
Waiting for fairing-builder-6xb6v to start...
Waiting for fairing-builder-6xb6v to start...
Waiting for fairi

[36mINFO[0m[0006] Downloading base image gcr.io/kubeflow-images-public/xgboost-fairing-example-base:v-20190612
[36mINFO[0m[0006] Downloading base image gcr.io/kubeflow-images-public/xgboost-fairing-example-base:v-20190612
[33mWARN[0m[0006] Error while retrieving image from cache: getting image from path: open /cache/sha256:f90e54e312c4cfba28bec6993add2a85b4e127b52149ec0aaf41e5f8889a4086: no such file or directory
[36mINFO[0m[0006] Checking for cached layer gcr.io/zhenghui-kubeflow/fairing-job/fairing-job/cache:e46cfa04f5f0d0445ce3ce8b91886d94e96f2875510a69aa9afaeb0ba9e62fc4...
[36mINFO[0m[0006] No cached layer found for cmd RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi
[36mINFO[0m[0006] Unpacking rootfs as cmd RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi requires it.
[36mINFO[0m[0116] Taking snapshot of full filesystem...
[36mINFO[0m[0131] Skipping paths under /dev, as it is a whitelisted directo

In [25]:
builder = append.append.AppendBuilder(registry=DOCKER_REGISTRY,
                                      base_image=cluster_builder.image_tag, preprocessor=preprocessor)
builder.build()


Building image using Append builder...
Creating docker context: /tmp/fairing_context_etb8hkx_
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
build-train-deploy.py already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_mibzzsjo already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_4fakhp0m already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_l8_8yobc already exists in Fairing context, skipping...
/tmp/fairing_dockerfile_6ppkd982 already exists in Fairing context, skipping...
Loading Docker credentials for repository 'gcr.io/zhenghui-kubeflow/fairing-job/fairing-job:E58E6963'
Invoking 'docker-credential-gcloud' to obtain Docker credentials.
Successfully obtained Docker credentials.


V2DiagnosticException: response: {'docker-distribution-api-version': 'registry/2.0', 'content-type': 'application/json', 'date': 'Thu, 25 Jul 2019 22:39:27 GMT', 'server': 'Docker Registry', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length': '164', '-content-encoding': 'gzip'}
Failed to fetch "E58E6963" from request "/v2/zhenghui-kubeflow/fairing-job/fairing-job/manifests/E58E6963".: None

## Launch the K8s Job

* Use pod mutators to attach a PVC and credentials to the pod

In [24]:
pod_spec = builder.generate_pod_spec()
NAMESPACE = "user1"
train_deployer = job.job.Job(namespace=NAMESPACE, 
                             cleanup=False,
                             pod_spec_mutators=[
                             fairing.cloud.gcp.add_gcp_credentials_if_exists])

# Add command line arguments
pod_spec.containers[0].command.extend(["train"])
result = train_deployer.deploy(pod_spec)

NameError: name 'builder' is not defined

In [None]:
!kubectl get jobs -l fairing-id={train_deployer.job_id} -o yaml

## Deploy the trained model to Kubeflow for predictions

In [21]:
from fairing.deployers import serving
pod_spec = builder.generate_pod_spec()

module_name = os.path.splitext(preprocessor.executable.name)[0]
deployer = serving.serving.Serving(module_name + ".ModelServe",
                                   service_type="ClusterIP",
                                   labels={"app": "mockup"})
    
url = deployer.deploy(pod_spec)

INFO:root:Cluster endpoint: http://fairing-service-jjgxd.user1.svc.cluster.local


In [22]:
!kubectl get deploy -o yaml {deployer.deployment.metadata.name}

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2019-06-12T20:22:27Z"
  generateName: fairing-deployer-
  generation: 1
  labels:
    app: mockup
    fairing-deployer: serving
    fairing-id: cbc0e610-8d4f-11e9-9207-96ec34699c76
  name: fairing-deployer-cltbb
  namespace: user1
  resourceVersion: "7556174"
  selfLink: /apis/extensions/v1beta1/namespaces/user1/deployments/fairing-deployer-cltbb
  uid: cbc54e8f-8d4f-11e9-b008-42010a8e01a5
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mockup
      fairing-deployer: serving
      fairing-id: cbc0e610-8d4f-11e9-9207-96ec34699c76
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: mockup
        fairing-deployer: serv

## Call the prediction endpoint

Create a test dataset, then call the endpoint on Kubeflow for predictions.

In [23]:
(train_X, train_y), (test_X, test_y) =read_synthetic_input()


In [24]:
full_url = url + ":5000/predict"
result = util.predict_nparray(full_url, test_X)
pprint.pprint(result.content)

(b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>500 Inter'
 b'nal Server Error</title>\n<h1>Internal Server Error</h1>\n<p>The server en'
 b'countered an internal error and was unable to complete your request. Either '
 b'the server is overloaded or there is an error in the application.</p>\n')


## Clean up the prediction endpoint

Delete the prediction endpoint created by this notebook.

In [33]:
# !kubectl delete service -l app=ames
# !kubectl delete deploy -l app=ames

## Build a simple 1 step pipeline

In [25]:
EXPERIMENT_NAME = 'MockupModel'

#### Define the pipeline
Pipeline function has to be decorated with the `@dsl.pipeline` decorator

In [26]:
@dsl.pipeline(
   name='Training pipeline',
   description='A pipeline that trains an xgboost model for the Ames dataset.'
)
def train_pipeline(
   ):      
    command=["python", preprocessor.executable.name, "train"]
    train_op = dsl.ContainerOp(
            name="train", 
            image=builder.image_tag,        
            command=command,
            ).apply(
                gcp.use_gcp_secret('user-gcp-sa'),
            )
    train_op.container.working_dir = "/app"

#### Compile the pipeline

In [27]:
pipeline_func = train_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.zip'
compiler.Compiler().compile(pipeline_func, pipeline_filename)

#### Submit the pipeline for execution

In [28]:
#Specify pipeline argument values
arguments = {}

# Get or create an experiment and submit a pipeline run
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)

#Submit a pipeline run
run_name = pipeline_func.__name__ + ' run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)

#vvvvvvvvv This link leads to the run information page. (Note: There is a bug in JupyterLab that modifies the URL and makes the link stop working)

INFO:root:Creating experiment MockupModel.
