# Operationalizing models

We're going to learn how to train a predictive model in an interactive notebook and then expose it as a REST service.  The techniques we'll use in this notebook are intended to illustrate the power and flexibility afforded by publishing models as microservices and are not ones you'd use in production.  However, you'll finish this notebook well-equipped to use a production model server of your choice (or to start implementing your own)!

## Preliminaries

The service we're going to use is available [here](https://github.com/willb/simple-model-server/).  This service exposes a couple of routes:

- `/model`, which accepts a HTTP POST request with a form payload containing two serialized Python callables, `validator`, which validates that an argument supplied is suitable for the given model, and `predictor`, which does the actual prediction work; and
- `/predict`, which accepts a HTTP POST request with a form payload containing a serialized Python object called `args` (input data to score) and returns the result of making a prediction

Our service is not quite immutable, but it is *write-once*:  we will only allow users to install a single model without restarting the service.  Restarting the service will enable you to install a new model.

## A client library

In order to make it simpler to deal with sending serialized Python objects to the service, we'll make use of a very simple client library:

In [1]:
import base64
import requests
import cloudpickle

def publish(baseurl, validator, predictor):
    """ publish a model consisting of two callables (validator, 
        which validates input data, and predictor, which makes a 
        prediction) to the model service located at baseurl """
    val = base64.b64encode(cloudpickle.dumps(validator))
    pred = base64.b64encode(cloudpickle.dumps(predictor))
    payload = {'validator' : val, 'predictor': pred}
    url = "%smodel" % baseurl
    r = requests.post(url, data=payload)
    return r.text

def predict(baseurl, args):
    """ make a prediction from the model service located at baseurl """
    payload = {'args': base64.b64encode(cloudpickle.dumps(args))}
    url = "%spredict" % baseurl
    r = requests.post(url, data=payload)
    return r.text

## Setting up Spark

We'll now set up a Spark context, as we did in the previous notebook, generate some random data, and train a k-means clustering for it.

In [2]:
import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession, SQLContext

spark = SparkSession.builder.master("local[1]").getOrCreate()
sc = spark.sparkContext

In [3]:
from pyspark.sql.functions import array, column, rand, udf
from pyspark.ml.linalg import Vectors, VectorUDT
as_vector = udf(lambda l: Vectors.dense(l), VectorUDT())

randomDF = spark.range(0, 2048).select((rand() * 2 - 1).alias("x"), (rand() * 2 - 1).alias("y")).select(column("x"), column("y"), as_vector(array(column("x"), column("y"))).alias("features"))

In [4]:
from pyspark.ml.clustering import KMeans

K = 7
SEED = 0xdea110c8

kmeans = KMeans().setK(K).setSeed(SEED).setFeaturesCol("features")
model = kmeans.fit(randomDF)

## Serializing our model

The Spark models are convenient but many of their implementations depend on having access to a Spark context.  We could, of course, have our model service depend upon Spark, but there's another option:  we can create our own lightweight implementation of the model itself.  (This is usually pretty straightforward, since most machine learning models are far more complicated to train than they are to predict from.)

We'll start by looking at the actual data contained within the model:  the cluster centers.  The Spark k-means implementation exposes these by the `clusterCenters` method:

In [5]:
model.clusterCenters()

[array([-0.70752534, -0.46349022]),
 array([ 0.69432508,  0.43201093]),
 array([ 0.65751856, -0.47415352]),
 array([-0.67205974,  0.59460778]),
 array([-0.0472727 , -0.69547542]),
 array([ 0.08862141,  0.72747683]),
 array([-0.1152846 ,  0.07499047])]

In order to make a prediction for the k-means model, we need only identify which of the cluster centers is closest to a given point.  We'll calculate the Euclidean distance from each point to the center (using the `norm` of the vector differences) and find the smallest distance.

In [6]:
from numpy.linalg import norm

centers = model.clusterCenters()

def km_predict(vec):
    _, idx = min([(norm(vec - center), idx) for idx, center in enumerate(centers)])
    return idx

We can see that predicting the closest center to one of the actual centers gives us the expected result:

In [7]:
from numpy import array
for center in centers:
    print(km_predict(center))

0
1
2
3
4
5
6


Now we'll actually publish the model.  Before running the cell below, replace `YOUR_HOSTNAME_HERE` with the route assigned to the simple model service in your project.  For example, if your hostname was `simple-model-server-myproject.127.0.0.1.nip.io`, you'd change the line to `hostname = "simple-model-server-myproject.127.0.0.1.nip.io"`.

In [None]:
hostname = "YOUR_HOSTNAME_HERE"
service_url = "http://%s:8080/" % hostname

def km_validate(args):
    """ returns true if argument has the same number of dimensions as our cluster center """
    return len(args) == len(centers[0])

publish(service_url, km_validate, km_predict)

# Exercises 

1.  Design a REST API for serving multiple versions of multiple models.
1.  &starf; What else would you need to change about the simple model service we used in this project in order to use it in production?  (Hint:  to start, consider security, resilience, high-availability, and load-balancing.)
1.  REST calls are convenient but can involve expensive serialization and communication.  (This service in particular trades generality for communication overhead!)  Consider a few techniques for adapting this service to support scoring data with lower latency or higher throughput.
1.  The landscape of open-source model servers is increasingly competitive.  Identify a few that you'd like to try out and think about how you'd adapt these techniques to use with a production server.
1.  &starf; Consider how you'd design a model service for extremely low-latency environments like transaction processing (in which the round-trip cost of an RPC call might be totally unacceptable).