# Clipper Tutorial: Part 2

In this part of the tutorial, you will put on your data scientist hat and train and deploy some models to Clipper to improve your application accuracy.


# Connect to Clipper (again)

Because this is a separate Python instance, you must create a new `ClipperConnection` object and connect to your running Clipper cluster.

In [None]:
import sys
import os

from clipper_admin import ClipperConnection, DockerContainerManager
clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.connect()

# Connect to Clipper using a Configuration File

You can build models, register applications, and link models to apps by using a configuration file that specifies the models, applications, and links. An example of such a file is "sample_config.yaml", found in this directory.

In [None]:
clipper_conn_config = ClipperConnection(DockerContainerManager())
clipper_conn_config.start_clipper(config_file="sample_config.yaml")

# Load Cifar

Because this is a new notebook, you must load the CIFAR dataset again. This time, you will be using it to train and evaluate machine learning models.

Set `cifar_loc` to the same location you did in the "Download the Images" section of part one of the tutorial. You will load into Python the number of training and test datapoints specified in "Extract the images" section of part one.

In [None]:
cifar_loc = ""
import cifar_utils
train_x, train_y = cifar_utils.filter_data(
    *cifar_utils.load_cifar(cifar_loc, cifar_filename="cifar_train.data", norm=True))
test_x, test_y = cifar_utils.filter_data(
    *cifar_utils.load_cifar(cifar_loc, cifar_filename="cifar_test.data", norm=True))

# Train Logistic Regression Model

When tackling a new problem with machine learning, it's always good to start with simple models and only add complexity when needed. Start by training a logistic regression binary classifier using [Scikit-Learn](http://scikit-learn.org/). This model gets about 68% accuracy on the offline evaluation dataset if you use 10,000 training examples. It gets about 74% if you use all 50,000 examples.

In [None]:
from sklearn import linear_model as lm 
def train_sklearn_model(m, train_x, train_y):
    m.fit(train_x, train_y)
    return m
lr_model = train_sklearn_model(lm.LogisticRegression(), train_x, train_y)
print("Logistic Regression test score: %f" % lr_model.score(test_x, test_y))

# Deploy Logistic Regression Model

While 68-74% accuracy on a CIFAR binary classification task is significantly below state of the art, it's already much better than the 50% accuracy your application yields right now by guessing randomly.

To deploy your Scikit-Learn logistic regression model, you can use one of the provided Clipper model deployer modules. In particular, because the Scikit-Learn model can be pickled, you can wrap it in a function closure and
deploy that function directly as a model to Clipper without needing to manually save the model or write a Docker
container. To do this, you will import the `clipper_admin.deployers.python` module.

First, let's write and test the prediction function we will deploy. This function must take a list of inputs (a list of CIFAR images in this example) as the only function argument and return a list of predictions as strings.

> _The reason the prediction function takes a list of inputs rather than a single input is to allow deployed models the possibility of computing multiple predictions in parallel to improve model performance. For example, many models that run on a GPU can significantly improve throughput by batching predictions to better utilize the many parallel cores of the GPU._

In [None]:
def sklearn_predict(xs):
    preds = lr_model.predict(xs)
    return [str(p) for p in preds]

# Test the function on the first four images in the CIFAR dataset
sklearn_predict(test_x[:4])

Now that you have defined a prediction function, you can deploy it directly to Clipper. When deploying a model, you must assign some additional metadata to the model. You must give it a unique name ("sklearn-cifar"), a version (1), and the input type it accepts (this input type must match any applications you link the model to). You can also optionally assign descriptive labels to the model and specify how many replicas of the model to launch. Each model replica runs in its own Docker container and so adding more replicas can increase the aggregate throughput of the model. You can always change the number of replicas of a model later by calling `clipper_conn.set_num_replicas()`.

After completing this step, Clipper will be managing a new container in Docker with your model in it:
<img src="img/deploy_sklearn_model.png" style="width: 500px;"/>

> *Once again, because you are deploying a Docker image this command may take awhile to download the image. Thanks for being patient!*

In [None]:
from clipper_admin.deployers import python as python_deployer

model_name = "birds-vs-planes-classifier"

python_deployer.deploy_python_closure(clipper_conn,
                                      name=model_name,
                                      version=1,
                                      input_type="doubles",
                                      func=sklearn_predict
                                     )

## Link your app to your model

To use your newly deployed model to generate predictions, it needs to be linked to your Clipper application. This tells Clipper to route requests for the `"cifar-demo"` app to the `"birds-vs-planes-classifier"` model to make predictions.

When you link a model to an application, it automatically applies to all versions of that model. So if you update the model to a new version or rollback to an old one, all application links will be automatically preserved.

In [None]:
app_name = "cifar-demo"
clipper_conn.link_model_to_app(app_name, model_name)

You can view which models your app is linked to by running the code below.

In [None]:
clipper_conn.get_linked_models(app_name)

Now that you've deployed and linked your model to your app, go ahead and check back on your running frontend application from part 1. You should see the accuracy rise from around 50% to the accuracy of your SKLearn model (68-74%), without having to stop or modify your application at all!

# Load TensorFlow Model

To improve the accuracy of your application further, you will now deploy a TensorFlow convolutional neural network. This model takes a few hours to train, so you will download the trained model parameters rather than training it from scratch. This model gets about 88% accuracy on the test dataset.

There is a pre-trained TensorFlow model stored in the repo using [`git-lfs`](https://git-lfs.github.com/). Once you install `git-lfs`, you can download the model with the command `git lfs pull`. If you don't want to deploy a TensorFlow model, you can skip this step.

In [None]:
import os
import tensorflow as tf
import numpy as np
tf_cifar_model_path = os.path.abspath("tf_cifar_model/cifar10_model_full")
tf_session = tf.Session('', tf.Graph())
with tf_session.graph.as_default():
    saver = tf.train.import_meta_graph("%s.meta" % tf_cifar_model_path)
    saver.restore(tf_session, tf_cifar_model_path)

def tensorflow_score(session, test_x, test_y):
    """
    NOTE: This predict method expects pre-whitened (normalized) images
    """
    logits = session.run('softmax_logits:0',
                           feed_dict={'x:0': test_x})
    relevant_activations = logits[:, [cifar_utils.negative_class, cifar_utils.positive_class]]
    preds = np.argmax(relevant_activations, axis=1)
    return float(np.sum(preds == test_y)) / float(len(test_y))
print("TensorFlow CNN test score: %f" % tensorflow_score(tf_session, test_x, test_y))

# Deploy TensorFlow Model

Unlike the Scikit-Learn model, TensorFlow models cannot be pickled. Instead, they must be saved using TensorFlow's native serialization API. Because of this, you must save the model yourself and specify a Docker container that knows how to load and run a TensorFlow model. We have provided a Docker image for you, the `clipper/tf_cifar_container:latest`, that can run the CIFAR TensorFlow model. The Docker container will load and reconstruct the model from the serialized model checkpoint when the container is started.

After completing this step and deploying the new model, Clipper will send queries to the newly-deployed TensorFlow model instead of the logistic regression Scikit-Learn model, improving the application's accuracy.
<img src="img/tf_replaces_sklearn_model.png" style="width: 600px;"/>

> *Once again, please patient while the Docker image is downloaded.*

In [None]:
clipper_conn.build_and_deploy_model(
    name=model_name,
    version=2,
    input_type="doubles",
    model_data_path=os.path.abspath("tf_cifar_model"),
    base_image="clipper/tf_cifar_container:latest",
    num_replicas=1
)

If you check the accuracy of your frontend application a final time, you should see accuracy around 88%.

# Inspect Clipper Metrics

Clipper also records various system performance metrics. You can inspect the current state of these metrics with the `inspect_instance()` command.

In [None]:
clipper_conn.inspect_instance()

__Congratulations! You've now successfully completed the tutorial. You started Clipper, created an application and queried it from a frontend client, and deployed two models trained in two different machine learning frameworks (Scikit-Learn and TensorFlow) to the running system.__

*Head back to the notebook from part 1. When you're done watching the accuracy of your application, stop the cell (hit the little "stop" square in the notebook toolbar).*

<img src="img/warning.jpg" style="width: 400px;"/>

> This step will stop and remove all Clipper Docker containers. This command will not affect any other running Docker containers.

# Cleanup


__When you're completely done with the tutorial and want to shut down your Clipper instance, you can run the `stop_all()` command to stop all the Clipper Docker containers.__



In [None]:
clipper_conn.stop_all()
clipper_conn_config.stop_all()