# Train and deploy on Kubeflow from Notebooks

This notebook introduces you to using Kubeflow Fairing to train and deploy a model to Kubeflow on Google Kubernetes Engine (GKE), and Google Cloud ML Engine. This notebook demonstrate how to:
 
* Train an XGBoost model in a local notebook,
* Use Kubeflow Fairing to train an XGBoost model remotely on Kubeflow,
* Use Kubeflow Fairing to train an XGBoost model remotely on Cloud ML Engine,
* Use Kubeflow Fairing to deploy a trained model to Kubeflow, and
* Call the deployed endpoint for predictions.

To learn more about how to run this notebook locally, see the guide to [training and deploying on GCP from a local notebook][gcp-local-notebook].

[gcp-local-notebook]: https://kubeflow.org/docs/fairing/gcp-local-notebook/

## Set up your notebook for training an XGBoost model

Import the libraries required to train this model.

In [1]:
!pip3 install --user joblib
!pip3 install --user sklearn

[33mYou are using pip version 19.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[33mYou are using pip version 19.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [6]:
import argparse
import logging
import os
import joblib
import sys
from pathlib import Path
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from xgboost import XGBRegressor
import ames
from importlib import reload

reload(ames)

<module 'ames' from '/home/jovyan/git_jlewi-kubecon-demo/ames.py'>

In [14]:
fairing_code = os.path.join(Path.home(), "git_jlewi-kubecon-demo", "fairing")

if os.path.exists(fairing_code):    
    logging.info("Adding %s to path", fairing_code)
    sys.path = [fairing_code] + sys.path

Adding /home/jovyan/git_jlewi-kubecon-demo/fairing to path


In [15]:
logging.basicConfig(format='%(message)s')
logging.getLogger().setLevel(logging.INFO)

Define a class for your model, with methods for training and prediction.

In [7]:
class HousingServe(object):    
    def __init__(self):
        self.train_input = "ames_dataset/train.csv"
        self.n_estimators = 50
        self.learning_rate = 0.1
        self.model_file = "trained_ames_model.dat"
        self.model = None

    def train(self):
        (train_X, train_y), (test_X, test_y) = ames.read_input(self.train_input)
        model = ames.train_model(train_X,
                                 train_y,
                                 test_X,
                                 test_y,
                                 self.n_estimators,
                                 self.learning_rate)

        ames.eval_model(model, test_X, test_y)
        ames.save_model(model, self.model_file)

    def predict(self, X, feature_names):
        """Predict using the model for given ndarray."""
        if not self.model:
            self.model = joblib.load(self.model_file)
        # Do any preprocessing
        prediction = self.model.predict(data=X)
        # Do any postprocessing
        return [[prediction.item(0), prediction.item(0)]]

## Train an XGBoost model in a notebook

Call `HousingServe().train()` to train your model, and then evaluate and save your trained model.

In [8]:
HousingServe().train()

[0]	validation_0-rmse:177514
Will train until validation_0-rmse hasn't improved in 40 rounds.
[1]	validation_0-rmse:161858
[2]	validation_0-rmse:147237
[3]	validation_0-rmse:134132
[4]	validation_0-rmse:122224
[5]	validation_0-rmse:111538
[6]	validation_0-rmse:102142
[7]	validation_0-rmse:93392.3
[8]	validation_0-rmse:85824.6
[9]	validation_0-rmse:79667.6
[10]	validation_0-rmse:73463.4
[11]	validation_0-rmse:68059.4
[12]	validation_0-rmse:63350.5
[13]	validation_0-rmse:59732.1
[14]	validation_0-rmse:56260.7
[15]	validation_0-rmse:53392.6
[16]	validation_0-rmse:50770.8
[17]	validation_0-rmse:48107.8
[18]	validation_0-rmse:45923.9
[19]	validation_0-rmse:44154.2
[20]	validation_0-rmse:42488.1
[21]	validation_0-rmse:41263.3
[22]	validation_0-rmse:40212.8
[23]	validation_0-rmse:39089.1
[24]	validation_0-rmse:37691.1
[25]	validation_0-rmse:36875.2
[26]	validation_0-rmse:36276.2
[27]	validation_0-rmse:35444.1
[28]	validation_0-rmse:34831.5
[29]	validation_0-rmse:34205.4
[30]	validation_0-rmse

## Set up Kubeflow Fairing for training and predictions

Import the `fairing` library and configure the environment that your training or prediction job will run in.

In [9]:
import os
import fairing

# Setting up google container repositories (GCR) for storing output containers
# You can use any docker container registry istead of GCR
GCP_PROJECT = fairing.cloud.gcp.guess_project_name()
DOCKER_REGISTRY = 'gcr.io/{}/fairing-job'.format(GCP_PROJECT)
PY_VERSION = ".".join([str(x) for x in sys.version_info[0:3]])
BASE_IMAGE = 'python:{}'.format(PY_VERSION)

## Train an XGBoost model remotely on Kubeflow

Import the `TrainJob` and `KubeflowGKEBackend` classes. Kubeflow Fairing packages the `HousingServe` class, the training data, and the training job's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the training job on Kubeflow.

In [11]:
rerun_training = True

In [None]:
from fairing import TrainJob
from fairing.backends import KubeflowGKEBackend
train_job = TrainJob(HousingServe, BASE_IMAGE, input_files=['ames_dataset/train.csv', "ames.py", 
                                                            "requirements.txt"],
                     docker_registry=DOCKER_REGISTRY, backend=KubeflowGKEBackend())

if rerun_training:
    train_job.submit()
else:
    # Print out cached output. This is solely for demo purposes since training can take a while
    with open("train_output.txt") as hf:        
        print(hf.read())

INFO:fairing.kubernetes.manager:Pod started running True


[36mINFO[0m[0000] Downloading base image python:3.6.7
ERROR: logging before flag.Parse: E0517 18:36:50.906574       1 metadata.go:142] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg
ERROR: logging before flag.Parse: E0517 18:36:50.910509       1 metadata.go:159] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
2019/05/17 18:36:50 No matching credentials were found, falling back on anonymous
[36mINFO[0m[0000] Executing 0 build triggers
[36mINFO[0m[0000] Unpacking rootfs as cmd RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi requires it.
[36mINFO[0m[0020] Taking snapshot of full filesystem...
[36mINFO[0m[0028] Skipping paths under /dev, as it is a whitelisted directory
[36mINFO[0m[0028] S

  Downloading https://files.pythonhosted.org/packages/0a/9d/8bd5d0e516b196f59f1c4439b424b8d4fa62d492a4b531aae322d2d82a7b/grpcio-1.20.1-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
Collecting flask (from seldon-core->-r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/9a/74/670ae9737d14114753b8c8fdf2e8bd212a05d3b361ab15b44937dfd40985/Flask-1.0.3-py2.py3-none-any.whl (92kB)
Collecting six>=1.9.0 (from google-auth>=1.2.0->google-cloud-storage->-r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Collecting pyasn1-modules>=0.2.1 (from google-auth>=1.2.0->google-cloud-storage->-r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/91/f0/b03e00ce9fddf4827c42df1c3ce10c74eadebfb706231e8d6d1c356a4062/pyasn1_modules-0.2.5-py2.py3-none-any.whl (74kB)
Collecting cachetools>=2.0.0 (from google-auth>=1.2.0->google-c

Installing collected packages: six, pyasn1, pyasn1-modules, cachetools, rsa, google-auth, chardet, urllib3, idna, certifi, requests, protobuf, googleapis-common-protos, pytz, google-api-core, google-cloud-core, google-resumable-media, google-cloud-storage, numpy, python-dateutil, pandas, joblib, scipy, xgboost, scikit-learn, sklearn, opentracing, click, MarkupSafe, Jinja2, Werkzeug, itsdangerous, flask, Flask-OpenTracing, flatbuffers, pyyaml, tornado, threadloop, thrift, jaeger-client, flask-cors, redis, astor, gast, absl-py, keras-preprocessing, grpcio, termcolor, h5py, keras-applications, markdown, tensorboard, mock, tensorflow-estimator, tensorflow, grpcio-opentracing, seldon-core
  Running setup.py install for googleapis-common-protos: started
    Running setup.py install for googleapis-common-protos: finished with status 'done'
  Running setup.py install for sklearn: started
    Running setup.py install for sklearn: finished with status 'done'
  Running setup.py install for opentr

In [11]:
!kubectl get jobs -o yaml fairing-job-j5bh6

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2019-05-17T01:25:23Z"
  generateName: fairing-job-
  labels:
    fairing-deployer: job
    fairing-id: a483e1be-7842-11e9-85d6-0a580a00012d
  name: fairing-job-j5bh6
  namespace: kubeflow
  resourceVersion: "13095243"
  selfLink: /apis/batch/v1/namespaces/kubeflow/jobs/fairing-job-j5bh6
  uid: a48de4d8-7842-11e9-8964-42010a8e00ff
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: a48de4d8-7842-11e9-8964-42010a8e00ff
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: a48de4d8-7842-11e9-8964-42010a8e00ff
        fairing-deployer: job
        fairing-id: a483e1be-7842-11e9-85d6-0a580a00012d
        job-name: fairing-job-j5bh6
      name: fairing-deployer
    spec:
      containers:
      - command:
        - python
        - /app/function_shim.py
        - --serialized_fn_file
        - /app/p

## Deploy the trained model to Kubeflow for predictions

Import the `PredictionEndpoint` and `KubeflowGKEBackend` classes. Kubeflow Fairing packages the `HousingServe` class, the trained model, and the prediction endpoint's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the prediction endpoint on Kubeflow.

In [12]:
rerun_deploy = False
from fairing import PredictionEndpoint
from fairing.backends import KubeflowGKEBackend
endpoint = PredictionEndpoint(HousingServe, BASE_IMAGE, input_files=['trained_ames_model.dat', "requirements.txt"],
                              docker_registry=DOCKER_REGISTRY, backend=KubeflowGKEBackend())

if rerun_deploy:
    endpoint.create()
else:
    endpoint.url = "http://fairing-service-p7zjs.kubeflow.svc.cluster.local:5000/predict"
    # Print out cached output. This is solely for demo purposes since training can take a while
    with open("deploy_output.txt") as hf:        
        print(hf.read())

INFO:root:Using ClusterBuilder


INFO:root:Using ClusterBuilder
INFO:root:Building the docker image.
INFO:root:Creating docker context: /tmp/fairing.context.tar.gz
INFO:root:Adding files to context: {'trained_ames_model.dat', 'requirements.txt'}
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /home/jovyan/git_jlewi-kubecon-demo/fairing/fairing/__init__.py at /app/fairing/__init__.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /home/jovyan/git_jlewi-kubecon-demo/fairing/fairing/runtime_config.py at /app/fairing/runtime_config.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding trained_ames_model.dat at /app/trained_ames_model.dat
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding requirements.txt at /app/requirements.txt
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /home/jovyan/git_jlewi-kubecon-demo/fairing/fairing/functions/function_shim.py at /app/function_shim.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /opt/conda/lib/python3.6/site-packages/cloudpickle/__init__.p

In [13]:
!kubectl get deploy -o yaml fairing-deployer-gqpq8

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2019-05-17T01:32:33Z"
  generateName: fairing-deployer-
  generation: 1
  labels:
    fairing-deployer: serving
    fairing-id: a4bd1e10-7843-11e9-85d6-0a580a00012d
  name: fairing-deployer-gqpq8
  namespace: kubeflow
  resourceVersion: "13097036"
  selfLink: /apis/extensions/v1beta1/namespaces/kubeflow/deployments/fairing-deployer-gqpq8
  uid: a4be5a12-7843-11e9-8964-42010a8e00ff
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      fairing-deployer: serving
      fairing-id: a4bd1e10-7843-11e9-85d6-0a580a00012d
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        fairing-deployer: serving
        fairing-id: a4bd1e10-7843-11e9-85d6-0

## Call the prediction endpoint

Create a test dataset, then call the endpoint on Kubeflow for predictions.

In [14]:
(train_X, train_y), (test_X, test_y) = read_input("ames_dataset/train.csv")

In [17]:
import pprint
test_X
result = endpoint.predict_nparray(test_X)
pprint.pprint(result)

'{"data":{"names":["t:0","t:1"],"tensor":{"shape":[1,2],"values":[165164.875,165164.875]}},"meta":{}}\n'


## Clean up the prediction endpoint

Delete the prediction endpoint created by this notebook.

In [16]:
# endpoint.delete()