# Ray Serve - Model Serving Challenges

© 2019-2020, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademy_Logo_clearbanner_141x100.png)

Now we'll explore a nontrivial example for Ray Serve.

We'll work through an example that also covers training a model, deploying it, then updating later, based on this [documentation example](https://docs.ray.io/en/latest/serve/deployment.html). This page also has a section on [deployment to Kubernetes](https://docs.ray.io/en/latest/serve/deployment.html#deploying-as-a-kubernetes-service).

See also the Serve documentation's [mini-tutorials](https://docs.ray.io/en/latest/serve/tutorials/index.html) for using Serve with various frameworks.

In [None]:
!../tools/start-ray.sh --check --verbose

In [None]:
import ray
from ray import serve
import os
import requests  # for making web requests

In [None]:
ray.init(address='auto', ignore_reinit_error=True)

At this time, we either have to restart the ray cluster, or use a different port using the `http_port` argument, which we do here. The next release of Ray will have `serve.shutdown()` method that will allow us to cleanly shutdown Serve at the end of each lesson (like the last one).

There is also an `http_host` argument, which defaults to `localhost`. When you want to serve requests from other machines, use `0.0.0.0` for this argument, so those machines can access this service.

In [None]:
PORT=8001

In [None]:
serve.init(name='serve-example-2', http_port=PORT)  # Name for this Serve instance

## First, Get a Model to Serve ;)

We'll begin by training a classifier with the Iris data we used before, this time using [scikit-learn](https://scikit-learn.org/stable/). The details aren't too important for our purposes, except for the fact we'll save the trained model to disk for subsequent serving.

In [None]:
import pickle
import json
import numpy as np

from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error

In [None]:
# Load data
iris_dataset = load_iris()
data, target, target_names = iris_dataset["data"], iris_dataset[
    "target"], iris_dataset["target_names"]

In [None]:
# Instantiate model
model = GradientBoostingClassifier()

In [None]:
# Training and validation split
np.random.shuffle(data), np.random.shuffle(target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

In [None]:
# Train and evaluate models
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

In [None]:
# Save the model and label to file. (This could also be S3 or other "global" place)
os.path.exists('/tmp/data') or os.makedirs('/tmp/data')
with open("/tmp/data/iris_model_logistic_regression.pkl", "wb") as f:
    pickle.dump(model, f)
with open("/tmp/data/iris_labels.json", "w") as f:
    json.dump(target_names.tolist(), f)

## Create a Model and Serve It

Next, we define a servable model by instantiating a class and defining the `__call__` method that Ray Serve will use. Then, we'll define the backend and endpoint that use it.

In [None]:
class BoostingModelv1:
    def __init__(self):
        with open("/tmp/data/iris_model_logistic_regression.pkl", "rb") as f:
            self.model = pickle.load(f)
        with open("/tmp/data/iris_labels.json") as f:
            self.label_list = json.load(f)

    def __call__(self, flask_request):
        payload = flask_request.json
        print("Worker: received flask request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name, "version": "v1"}

In [None]:
serve.create_backend("lr:v1", BoostingModelv1)
serve.create_endpoint("iris_classifier", backend="lr:v1", route="/regressor")

Internally, Serve stores the model as a Ray actor and routes traffic to it as the endpoint is queried, in this case over HTTP. 

Now let’s query the endpoint to see results.

In [None]:
sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}
response = requests.get(f"http://localhost:{PORT}/regressor", json=sample_request_input)
response.text

## Deploying Updated Models

Updating the model is as simple as deploying the first version. First we train a new model on the same data.

In [None]:
# Instantiate a new model
model = GradientBoostingClassifier()

# Training and validation split
np.random.shuffle(data), np.random.shuffle(target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

# Train and evaluate models
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

# Save the model and label to file
with open("/tmp/data/iris_model_logistic_regression_2.pkl", "wb") as f:
    pickle.dump(model, f)
with open("/tmp/data/iris_labels_2.json", "w") as f:
    json.dump(target_names.tolist(), f)

Now we define a new model class that uses the saved model.

In [None]:
class BoostingModelv2:
    def __init__(self):
        with open("/tmp/data/iris_model_logistic_regression_2.pkl", "rb") as f:
            self.model = pickle.load(f)
        with open("/tmp/data/iris_labels_2.json") as f:
            self.label_list = json.load(f)

    def __call__(self, flask_request):
        payload = flask_request.json
        print("Worker: received flask request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name, "version": "v2"}

Finally, we create a new backend with this model and set the `iris_classifier` traffic to split the traffic between the two models.

In [None]:
serve.create_backend("lr:v2", BoostingModelv2)
serve.set_traffic("iris_classifier", {"lr:v2": 0.25, "lr:v1": 0.75})

In [None]:
for i in range(10):
    response = requests.get(f"http://localhost:{PORT}/regressor", json=sample_request_input).json()
    print(response)

## Exercise - Try More Models and Traffic Patterns

Here are some things you can try:

1. Refactor for the `BoostingModelvN` classes to eliminate duplication. We don't need separate implementations because we could use constructor arguments to specify what's unique for each one. Recall that you can pass constructor arguments to `serve.create_backend()`.
2. Add one or more new models.
3. Change the traffic patterns.
4. "Automate" the steps we did:
    * Wrap the training in a function.
    * Run the whole sequence of retraining every minute for a few minutes.

## Cleanup

In [None]:
eps = serve.list_endpoints()
for name in eps.keys():
    serve.delete_endpoint(name)

bes = serve.list_backends()
for name in bes.keys():
    serve.delete_backend(name)

eps = serve.list_endpoints()
bes = serve.list_backends()
print(f'endpoints: {eps}, backends {bes}')

# serve.shutdown()