# Deploying ML model using tensorflow serving 

## 1) Prepare your saved model 

A SavedModel represents a version of your model. It is stored as a directory containing a saved_model.pb file, which defines the computation graph (represented as a serialized protocol buffer), and a variables subdirectory containing the variable values. Here is the code to save your model.

In [None]:
# assume that you have already train your model
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version)
tf.saved_model.save(model, model_path)

The directory structure of your saved model is as follows: 
```
my_mnist_model
└── 0001
    ├── assets
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index
```


There are two ways to load the model in your python file:

you can load a SavedModel using the tf.saved_model.load() function. However, the returned object is not a Keras model: it represents the SavedModel, including its computation graph and variable values. You can use it like a function, and it will make predictions (make sure to pass the inputs as tensors of the appropriate type):

In [None]:
saved_model = tf.saved_model.load(model_path)
y_pred = saved_model(tf.constant(X_new, dtype=tf.float32))

Alternatively, you can load this SavedModel directly to a Keras model using the keras.models.load_model() function 
However, we only use load_model to test our saved model to see whether it is the correct model. Our goal is to deploy the model and send prediction request to it. 

In [None]:
# load using keras backend
model = keras.models.load_model(model_path)
y_pred = model.predict(tf.constant(X_new, dtype=tf.float32))


## 2) INSTALLING TENSORFLOW SERVING

There are many ways to install TF Serving, but Docker is highly recommended by Tensorflow team due to its high performance and simple instalation. The guide below assume that you have already install docker. The following code is run on **terminal**

First pull the TF Serving image: 
```
docker pull tensorflow/serving
```

Create Docker container to run this image:
```
$ docker run -it --rm -p 8500:8500 -p 8501:8501 \
             -v "$ML_PATH/my_mnist_model:/models/my_mnist_model" \
             -e MODEL_NAME=my_mnist_model \
             tensorflow/serving
```

TF Serving is running. It loaded our MNIST model (version 1), and it is serving it through both gRPC (on port 8500) and REST (on port 8501). Here is the meaning of all Docker flags above:

-it: interactive, which mean you can terminate the container using Ctrl-C

--rm: remove the container after you stop it
-p 8500:8500: Forward host TCP port 8500 to container's TCP port 8500. TF serving serve gRPC API through port 8500 (default)

-p 8501:8501: Forward host TCP port 8501 to container's TCP port 8501. TF serving serve REST API through port 8501 (default)

-v `host_path:container_path` : mount the host directory to container. In Windows, you may need to replace / with \ in the host path.

-e MODEL_NAME=my_mnist_model: Set the container's MODEL_NAME environment variable, so that TF Serving knows which model to serve. By default it serve the latest model it finds 

tensorflow/serving: name of the image to run 

Now your tensorflow model is running in a container image. In order to make predictions, we will send request to the REST API at 8501 port in the container

## 3) Send request through the REST API:

Prepare JSON file to send request to TF Serving. JSON is puretext base so the input needs to be converted into python list 

Note: Predict SignatureDef ("signature_name": "serving_default")

Predict SignatureDefs support calls to TensorFlow Serving's Predict API. These signatures allow you to flexibly support arbitrarily many input and output Tensors

Pros and cons of using JSON
Pros:
- Easy to use
- Does not require complicated dependencies 
Cons: 
- Need to transform to list, which is not very efficient
- Convert float to string is not very ideal, especially for large data 

Solution? gRPC (google Remote Procedure Calls). This is an open source technology which is developed by google. It is a bit advance so we will not teach it in this class. Just understand 

In [None]:
import json
# X_new is our input data
input_data_json = json.dumps({
    "signature_name": "serving_default",
    "instances": X_new.tolist(),
})

We can then send the input data by sending an HTTP POST request to TF Serving. This can be done by the request python library:

In [None]:
import requests

SERVER_URL = 'http://localhost:8501/v1/models/my_mnist_model:predict'
response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status() # raise an exception in case of error
response = response.json()

TF Serving then return us with dictionary contain a single "predictions" key, which has the array of probabilities. You can get the 
y_proba = np.array(response["predictions"])
```