# Kubeflow Fairing Introduction

Kubeflow Fairing is a Python package that streamlines the process of `building`, `training`, and `deploying` machine learning (ML) models in a hybrid cloud environment. By using Kubeflow Fairing and adding a few lines of code, you can run your ML training job locally or in the cloud, directly from Python code or a Jupyter notebook. After your training job is complete, you can use Kubeflow Fairing to deploy your trained model as a prediction endpoint.


# How does Kubeflow Fairing work

Kubeflow Fairing 
1. Packages your Jupyter notebook, Python function, or Python file as a Docker image
2. Deploys and runs the training job on Kubeflow or AI Platform. 
3. Deploy your trained model as a prediction endpoint on Kubeflow after your training job is complete.


# Goals of Kubeflow Fairing project

- Easily package ML training jobs: Enable ML practitioners to easily package their ML model training code, and their code’s dependencies, as a Docker image.
- Easily train ML models in a hybrid cloud environment: Provide a high-level API for training ML models to make it easy to run training jobs in the cloud, without needing to understand the underlying infrastructure.
- Streamline the process of deploying a trained model: Make it easy for ML practitioners to deploy trained ML models to a hybrid cloud environment.


> Note: Before fairing workshop, please read `README.md` under `02_01_fairing_introduction`


In [8]:
# check fairing is installed 
!pip show kubeflow-fairing

Name: kubeflow-fairing
Version: 1.0.2
Summary: Kubeflow Fairing Python SDK.
Home-page: https://github.com/kubeflow/fairing
Author: Kubeflow Authors
Author-email: hejinchi@cn.ibm.com
License: Apache License Version 2.0
Location: /usr/local/lib/python3.6/dist-packages
Requires: google-cloud-storage, docker, google-cloud-logging, oauth2client, numpy, nbconvert, retrying, kubernetes, kubeflow-pytorchjob, six, kubeflow-tfjob, grpcio, azure-mgmt-storage, tornado, future, azure-storage-file, boto3, python-dateutil, setuptools, requests, httplib2, kfserving, urllib3, ibm-cos-sdk, notebook, google-auth, google-api-python-client, cloudpickle
Required-by: 


## Basic Example

If you see any issues, please restart notebook. It's probably because of new installed packages.

Click `Kernel` -> `Restart & Clear Output`

In [7]:
#%%writefile train_model.py
import os
import sys
import tensorflow as tf
import numpy as np

def nkTrain():
    # Genrating random linear data 
    # There will be 50 data points ranging from 0 to 50 
    x = np.linspace(0, 50, 50) 
    y = np.linspace(0, 50, 50) 

    # Adding noise to the random linear data 
    x += np.random.uniform(-4, 4, 50) 
    y += np.random.uniform(-4, 4, 50) 

    n = len(x) # Number of data points 

    X = tf.placeholder("float") 
    Y = tf.placeholder("float")
    W = tf.Variable(np.random.randn(), name = "W") 
    b = tf.Variable(np.random.randn(), name = "b") 
    learning_rate = 0.01
    training_epochs = 1000
    
    # Hypothesis 
    y_pred = tf.add(tf.multiply(X, W), b) 

    # Mean Squared Error Cost Function 
    cost = tf.reduce_sum(tf.pow(y_pred-Y, 2)) / (2 * n)

    # Gradient Descent Optimizer 
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 

    # Global Variables Initializer 
    init = tf.global_variables_initializer() 


    sess = tf.Session()
    sess.run(init) 
      
    # Iterating through all the epochs 
    for epoch in range(training_epochs): 
          
        # Feeding each data point into the optimizer using Feed Dictionary 
        for (_x, _y) in zip(x, y): 
            sess.run(optimizer, feed_dict = {X : _x, Y : _y}) 
          
        # Displaying the result after every 50 epochs 
        if (epoch + 1) % 50 == 0: 
            # Calculating the cost a every epoch 
            c = sess.run(cost, feed_dict = {X : x, Y : y}) 
            print("Epoch", (epoch + 1), ": cost =", c, "W =", sess.run(W), "b =", sess.run(b)) 
      
    # Storing necessary values to be used outside the Session 
    training_cost = sess.run(cost, feed_dict ={X: x, Y: y}) 
    weight = sess.run(W) 
    bias = sess.run(b) 

    print('Weight: ', weight, 'Bias: ', bias)

## Local training for development



In [3]:
nkTrain()

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 50 : cost = 6.010859 W = 1.0002372 b = 0.260292
Epoch 100 : cost = 5.8873587 W = 0.99437755 b = 0.5443702
Epoch 150 : cost = 5.800606 W = 0.9891751 b = 0.7965846
Epoch 200 : cost = 5.741632 W = 0.9845562 b = 1.0205089
Epoch 250 : cost = 5.703499 W = 0.9804554 b = 1.2193136
Epoch 300 : cost = 5.680856 W = 0.9768146 b = 1.3958187
Epoch 350 : cost = 5.6695924 W = 0.9735821 b = 1.5525293
Epoch 400 : cost = 5.6665587 W = 0.97071224 b = 1.6916598
Epoch 450 : cost = 5.669358 W = 0.9681642 b = 1.8151871
Epoch 500 : cost = 5.676172 W = 0.9659021 b = 1.9248542
Epoch 550 : cost = 5.685632 W = 0.96389365 b = 2.0222213
Epoch 600 : cost = 5.6967263 W = 0.96211064 b = 2.1086648
Epoch 650 : cost = 5.708691 W = 0.9605275 b = 2.1854148
Epoch 700 : cost = 5.720987 W = 0.9591218 b = 2.2535608
Epoch 750 : cost = 5.7332225 W = 0.95787376 b = 2.314065
Epoch 800 : cost = 5.745124 W = 0.95676595 b = 2.3677742
Ep

## Remote training

We will show you how to remotely run training job in kubernetes cluster. You can use `ECR` as your container image registry.

In [11]:
!nkode create:image

hello auto from ./src/commands/hello.js
sh /usr/lib/node_modules/nkode/scripts/remoteTrain/remote_train.sh  auto data tensorflow/tensorflow:1.15.0-py3
auto data tensorflow/tensorflow:1.15.0-py3
[I 201022 20:33:44 utils:320] IMDS ENDPOINT: http://169.254.169.254/
[W 201022 20:33:44 function:49] The FunctionPreProcessor is optimized for using in a notebook or IPython environment. For it to work, the python version should be same for both local python and the python in the docker. Please look at alternatives like BasePreprocessor or FullNotebookPreprocessor.
[W 201022 20:33:44 tasks:62] Using builder: <class 'kubeflow.fairing.builders.cluster.cluster.ClusterBuilder'>
[I 201022 20:33:44 tasks:66] Building the docker image.
[I 201022 20:33:44 cluster:46] Building image using cluster builder.
[W 201022 20:33:44 base:94] /usr/local/lib/python3.6/dist-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
[I 201022 20:33:44 base:107] Creating docker context: /tmp/

  Downloading https://files.pythonhosted.org/packages/dc/41/9fa443d5ae8907dd8f7d12146cb0092dc053afd67b5b57e7e8786a328547/jupyter_client-6.1.7-py3-none-any.whl (108kB)
Collecting ipykernel
  Downloading https://files.pythonhosted.org/packages/52/19/c2812690d8b340987eecd2cbc18549b1d130b94c5d97fcbe49f5f8710edf/ipykernel-5.3.4-py3-none-any.whl (120kB)
Collecting terminado>=0.8.3
  Downloading https://files.pythonhosted.org/packages/5f/17/fa9560738187cdb185d282a120ec4a147064f001d7c0114271ca1374d0a1/terminado-0.9.1-py3-none-any.whl
Collecting Send2Trash
  Downloading https://files.pythonhosted.org/packages/49/46/c3dc27481d1cc57b9385aff41c474ceb7714f7935b1247194adae45db714/Send2Trash-1.5.0-py3-none-any.whl
Collecting jinja2
  Downloading https://files.pythonhosted.org/packages/30/9e/f663a2aa66a09d838042ae1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl (125kB)
Collecting pyzmq>=17
  Downloading https://files.pythonhosted.org/packages/56/ff/34bf45e5cf8367edcf

  Downloading https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl (69kB)
Collecting cachetools<5.0,>=2.0.0
  Downloading https://files.pythonhosted.org/packages/cd/5c/f3aa86b6d5482f3051b433c7616668a9b96fbe49a622210e2c9781938a5c/cachetools-4.1.1-py3-none-any.whl
Collecting google-crc32c<2.0dev,>=1.0; python_version >= "3.5"
  Downloading https://files.pythonhosted.org/packages/72/4a/399443af2bb5596c5ea1f575c39b8a0e3d6823e31fa6de2bee9bffc84553/google_crc32c-1.0.0-cp36-cp36m-manylinux2010_x86_64.whl
Collecting pytz
  Downloading https://files.pythonhosted.org/packages/4f/a4/879454d49688e2fad93e59d7d4efda580b783c745fd2ec2a3adf87b0808d/pytz-2020.1-py2.py3-none-any.whl (510kB)
Collecting protobuf>=3.12.0
  Downloading https://files.pythonhosted.org/packages/30/79/510974552cebff2ba04038544799450defe75e96ea5f1675dbf72cc8744f/protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl (1.3MB)
Collecting g

  Building wheel for retrying (setup.py): finished with status 'done'[W 201022 20:35:57 aws:70] Not able to find aws credentials secret: aws-secret
[W 201022 20:35:57 job:101] The job fairing-job-kqw78 launched.
[W 201022 20:35:57 manager:298] Waiting for fairing-job-kqw78-kc72q to start...
[W 201022 20:35:57 manager:298] Waiting for fairing-job-kqw78-kc72q to start...
[W 201022 20:35:58 manager:298] Waiting for fairing-job-kqw78-kc72q to start...
[I 201022 20:36:41 manager:304] Pod started running True

  Created wheel for retrying: filename=retrying-1.3.3-cp36-none-any.whl size=9533 sha256=1cf97f0c1dd8f029209fc294c9ec408ae115d2ebb0deb2bd19124a6df55df4ae
  Stored in directory: /tmp/pip-ephem-wheel-cache-8b_htg7x/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
  Building wheel for pandocfilters (setup.py): started
  Building wheel for pandocfilters (setup.py): finished with status 'done'
  Created wheel for pandocfilters: filename=pandocfilters-1.4.2-cp36-none-any.wh

2020-10-22 20:36:41.940404: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)[W 201022 20:36:57 job:173] Cleaning up job fairing-job-kqw78...

2020-10-22 20:36:41.940427: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fairing-job-kqw78-kc72q): /proc/driver/nvidia/version does not exist
2020-10-22 20:36:41.940689: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-10-22 20:36:41.964970: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-10-22 20:36:41.965167: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49a0ac0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-22 20:36:41.965183: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0)

## Remote Deployment

This will deploy a remote end to test

In [5]:
!curl

curl: try 'curl --help' or 'curl --manual' for more information


In [6]:
!curl http://fairing-service-p2qsx.eksworkshop.svc.cluster.local:5000/predict
        

curl: (56) Recv failure: Connection reset by peer
