# Model Deployment using Keras

## 1. Introduction
In this workbook, we will train a simple Keras MNIST CNN model and deploy that for inference

Parts of this workbook are borrowed from [here](https://keras.io/examples/vision/mnist_convnet/)

## 2. Imports and Dependencies.
The few packages needed are loaded next. Particularly, `numpy`, `tensorflow`, `keras`, `mlflow` will be majorly used in this tutorial. `requests` package will be used for performing query. `json` is used to post and get response from the server.

In [1]:
import os
import sys
import mlflow
import mlflow.keras
import numpy as np
from mlflow import pyfunc
import cloudpickle
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from mlflow.utils.environment import _mlflow_conda_env

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

2021-11-14 19:26:45.910967: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-14 19:26:45.911013: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## MLflow for experiment tracking and model deployment

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles four primary functions:

- Tracking experiments to record and compare parameters and results (MLflow Tracking).
- Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).
- Providing a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations (MLflow Model Registry).

More information [here](https://www.mlflow.org/docs/latest/index.html#)



![image.png](https://www.mlflow.org/docs/latest/_images/scenario_4.png)

- localhost maps to the server on which the current notebook is running

- Tracking server maps to the server at environment variable `TRACKING_URL` that can be printed using `os.environ.get("TRACKING_URL")`

- Create an mlflow client that communicates with the tracking server

In [2]:
from mlflow import pyfunc

# Setting a tracking uri to log the mlflow logs in a particular location tracked by 
from mlflow.tracking import MlflowClient
tracking_uri = os.environ.get("TRACKING_URL")
client = MlflowClient(tracking_uri=tracking_uri)
mlflow.set_tracking_uri(tracking_uri)

## Create an experiment in mlflow database using mlflow client

- Get the list of all the experiments (Click on **Experiments** tab on the sidebar to see the list)
- Create a new experiment named *numpy_deployment* if it doesn't exist
- Set *numpy_deployment* as the new experiment under which different **runs** are tracked

## MLflow Entity Hierarchy

- Experiment 1
    - Run 1
        - Parameters
        - Metrics
        - Artifacts
            - Folder 1
                - File 1
                - File 2
            - Folder 2 
    - Run 2
    - Run 3

- Experiment 2
- Experiment 3        

In [3]:
# Setting a tracking project experiment name to keep the experiments organized
experiments = client.list_experiments()
experiment_names = []
for exp in experiments:
    experiment_names.append(exp.name)
experiment_name = "keras_deployment"
if experiment_name not in experiment_names:
    mlflow.create_experiment(experiment_name)
mlflow.set_experiment(experiment_name)


## Python Class for inference

- ModelWrapper is derived from mlflow.pyfunc.PythonModel [more info](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html)
- load_context() member function is used to load the model. In this case, it loads a keras trained model which can be loaded.
- predict member function takes a numpy array as input and outputs another numpy array
- An object of this class will be saved as a pickle file in blob storage

In [4]:
## Model Wrapper that takes 
class ModelWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self,context):
        import numpy as np
        import tensorflow as tf
        self.model = tf.keras.models.load_model(context.artifacts['model_path'])
        print("Model initialized")
    
    def predict(self, context, model_input):
        import numpy as np
        import json
        import tensorflow as tf
        json_txt = ", ".join(model_input.columns)
        data_list = json.loads(json_txt)
        inputs = np.array(data_list)
        print(inputs.shape)
        if len(inputs.shape) == 4:
            print('batch inference')
            predictions = self.model.predict(inputs)
            predictions = predictions.tolist()
        elif len(inputs.shape) == 3:
            print('single inference')
            predictions = self.model.predict(np.expand_dims(inputs,0))
            predictions = predictions.tolist()
        else:
            raise ValueError('invalid input shape')
        return json.dumps(predictions)

## Register a model using mlflow

- Log user-defined parameters in a remote database through a remote server
- Create a model_wrapper object using ModelWrapper() class in the above cell
- Create a default conda environment that need to be installed on the Docker conatiner that serves a REST API
- Save the model object as a pickle file and conda environment as artifacts (files) in S3 or Blob Storage

## 3.Training

We download the MNIST dataset using utilities. MNIST dataset contains hand written digits. mlflow can automatically log all the metrics along with model. Once the training is complete, mlflow can log the model that needs to be used for inference.

First, we download the dataset and perform preprocessing

In [5]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


Lets build a CNN model for training

In [6]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                1

2021-11-14 19:26:48.571004: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-11-14 19:26:48.571058: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-14 19:26:48.571080: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rlxlgt66-b87c796c-vfrc5): /proc/driver/nvidia/version does not exist
2021-11-14 19:26:48.571417: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


compiling the model and performing the fit

In [7]:
# instantiate the python inference model wrapper for the server
model_wrapper = ModelWrapper()

batch_size = 128
epochs = 5
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

# checkpointing and logging the model in mlflow
artifact_path = './keras-model'
model.save(artifact_path)
model_artifacts = {"model_path" : artifact_path}
env = mlflow.tensorflow.get_default_conda_env()
with mlflow.start_run():
    mlflow.log_param("model_summary",model.summary())
    mlflow.log_param("epochs",epochs)
    mlflow.log_param("batch_size",batch_size)
    mlflow.log_param("training_history",history)
    mlflow.pyfunc.log_model("keras_model", python_model=model_wrapper, artifacts=model_artifacts, conda_env=env)

2021-11-14 19:26:48.771078: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


2021-11-14 19:28:26.467713: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: ./keras-model/assets
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________


## 4. Deploying the model
The above code logs a model in the experiments tab. For more info please refer [here](https://rocketml.gitbook.io/rocketml-user-guide/experiments). 

### 4.1 Find experiment in experiment list and click on it
![experiments_list](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/experiments_list.png)

### 4.2 Find run in runs list and click on it
![runs_list](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/runs_list.png)

### 4.3 Get run details and click on artifacts
![run_details](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/run_details.png)

### 4.4 Check different files logged as artifacts
![artifacts](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/artifacts.png)

- An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools [More Details](https://www.mlflow.org/docs/latest/models.html#storage-format)
- ModelWrapper() object is saved as pkl file
- conda.yaml and requirements.txt file are used to manage Python environment
- Numpy file is saved in artifacts folder within the main folder (np_model)

### 4.5 Deploy ML model as a REST API service

Click on **Convert To Model** and fill the form. **Note: For deploying the Keras model, please select atleast 4096 memory for deployment**

![model_deployment](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/model_deployment.png)

### 4.6 Go to models tab and wait until the model turns to **ON** state
![model_list](https://github.com/rocketmlhq/sciml/raw/e8abbef269c5bee9d2b69398495fc5ced7457708/03_Deployment/model_list.png)

## 5. Use the Endpoint and Query from the server

There are two methods to perform query... The first is using `requests` library and the other using `curl` shell command.

In [10]:
import requests
import json

################################################################################
# *** SET MODEL URL HERE BEFORE RUNNING THIS CELL (instructions above) ***
# Example: https://<random_string>.sciml.rocketml.net/invocations
url = ""
################################################################################

if not url:
    raise ValueError('Model URL not set! Please read instructions on how to deploy model, set the correct URL, and try again.')

headers = {"Content-Type":"text/csv"}

# First case, run inference on single data point
np_array = np.random.rand(28,28,1).tolist()
json_data = json.dumps(np_array)

if url:
    response = requests.post(url,data=json_data,headers=headers)
    if response.status_code == 200:
        output = np.array(json.loads(response.json())).astype(np.float32)
        print(output)
    else:
        print(response.status_code)
        print("REST API deployment is in progress -- please try again in a few minutes!")
else:
    print("Make sure that the model is in ON state. Copy the Endpoint")

# Second case, run inference on multiple data points
np_array = np.random.rand(2,28,28,1).tolist()
json_data = json.dumps(np_array)

if url:
    response = requests.post(url,data=json_data,headers=headers)
    if response.status_code == 200:
        output = np.array(json.loads(response.json())).astype(np.float32)
        print(output)
    else:
        print(response.status_code)
        print("REST API deployment is in progress -- please try again in a few minutes!")
else:
    print("Make sure that the model is in ON state. Copy the Endpoint")


[[5.4550702e-03 4.0400081e-04 3.8316324e-02 2.7348047e-02 4.0564043e-03
  7.5183250e-02 7.6675541e-03 5.5649033e-04 8.3789396e-01 3.1188931e-03]]
[[2.9919872e-03 1.2010236e-03 7.7142179e-02 3.4051750e-02 1.2698015e-02
  9.8876357e-02 2.0492699e-03 1.7935239e-02 7.4503058e-01 8.0236373e-03]
 [9.3202889e-03 2.1819610e-04 4.3830331e-02 3.8132112e-02 5.9867408e-03
  1.1520740e-01 3.1106498e-03 5.8234786e-04 7.7803266e-01 5.5792583e-03]]
