# Simple Docker Model + Inference API

We will containerize a model, create a simple inference endpoint, and curl the endpoint to get test predictions. 

## Boilerplate Code

In [19]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

import pickle

def load_raw_titanc():
    '''
    Returns a dataframe with the popular titanic data.
    '''
    # Load the Titanic dataset
    titanic_url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
    return pd.read_csv(titanic_url)

def clean_titanic(titanic_data):
    '''
    Inputs:
    - titanic_data: A dataframe with the raw titanic data
    Returns a dataframe with cleaned titanic data
    '''
    df = titanic_data.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1)
    # Fill missing values
    df['Age'].fillna(df['Age'].median(), inplace=True)
    df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
    df['Fare'].fillna(df['Fare'].median(), inplace=True)
    # Encode categorical features
    df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
    df['Embarked'] = df['Embarked'].map({'C': 0, 'Q': 1, 'S': 2})
    return df

def load_titanic():
    '''
    Returns the titanic dataset as a tuple (features dataframe, label series) - aka (X,y)
    '''
    df = load_raw_titanc()
    df = clean_titanic(df)
    return df.drop('Survived', axis=1), df['Survived']

## Train and Save Model

In [24]:
# Create train and test split
X, y = load_titanic()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                    random_state=42)

# Create a simple model and save it to local computer
clf = RandomForestClassifier(n_estimators=20, max_depth=5, random_state=42)
clf.fit(X_train, y_train)
with open('model.pkl', 'wb') as f:
    pickle.dump(clf, f)

# Print model accuracy
print(f"Accuracy: {clf.score(X_test, y_test):.4f}")

Accuracy: 0.8045


## Create Inference API Endpoint

Check out the `inference.py` file. 

This script loads the above model from local memory, defines an inference endpoint using Flask, and runs the flask app on port `5432`.

## Containerize the Code

Make sure you've installed docker on your computer. 

Check out the file `Dockerfile`. 

This file loads the model and flask app into the container's memory, exposes port `5432` on the container, and runs the flask app. 

Create a docker image with the model and flask app (inference.py):  
`$ docker build -t docker-model .`

List the images:  
`$ docker images`

Run the container on port `5432`:  
`docker run -p 5432:5432 docker-model`

## Make Predictions

The container is now running and waiting for requests.

Use this command to hit the endpoint from the terminal:  
`curl -X POST http://localhost:5432/predict -H 'Content-Type: application/json' -d '{"data": [3.0, 0.0, 28.0, 1.0, 1.0, 15.2458, 0.0]}'`

Or run the following code to make predictions for all the test data.

In [18]:
import json 
import requests

# Define the url of our docker container's inference endpoint
url = "http://localhost:5432/predict"

# Loop over the test data's rows
for i in range(len(X_test)):
    # Get an example row
    row = list(X_test.iloc[i].values)
    y_true = y_test.iloc[i]

    # Send the POST request to the contiainer's inference endpoint
    payload = json.dumps({"data": row})
    response = requests.post(url, headers={"Content-Type": "application/json"}, data=payload)

    # Print the response
    if response.status_code == 200:
        y_pred = response.json()['prediction']
        print(f"{i} [ {y_pred == y_true} ] y_pred: {y_pred}, y_true: {y_true} ({row})")
    else:
        print(f"Failed to get prediction. Status Code: {response.status_code}, Response: {response.text}")

0 [     0 ] y_pred: 0, y_true: 1 ([3.0, 0.0, 28.0, 1.0, 1.0, 15.2458, 0.0])
1 [     1 ] y_pred: 0, y_true: 0 ([2.0, 0.0, 31.0, 0.0, 0.0, 10.5, 2.0])
2 [     1 ] y_pred: 0, y_true: 0 ([3.0, 0.0, 20.0, 0.0, 0.0, 7.925, 2.0])
3 [     1 ] y_pred: 1, y_true: 1 ([2.0, 1.0, 6.0, 0.0, 1.0, 33.0, 2.0])
4 [     0 ] y_pred: 0, y_true: 1 ([3.0, 1.0, 14.0, 1.0, 0.0, 11.2417, 0.0])
5 [     1 ] y_pred: 1, y_true: 1 ([1.0, 1.0, 26.0, 0.0, 0.0, 78.85, 2.0])
6 [     0 ] y_pred: 0, y_true: 1 ([3.0, 1.0, 28.0, 0.0, 0.0, 7.75, 1.0])
7 [     1 ] y_pred: 0, y_true: 0 ([3.0, 0.0, 16.0, 2.0, 0.0, 18.0, 2.0])
8 [     0 ] y_pred: 0, y_true: 1 ([3.0, 1.0, 16.0, 0.0, 0.0, 7.75, 1.0])
9 [     1 ] y_pred: 1, y_true: 1 ([1.0, 1.0, 19.0, 0.0, 2.0, 26.2833, 2.0])
10 [     1 ] y_pred: 0, y_true: 0 ([1.0, 0.0, 37.0, 1.0, 0.0, 53.1, 2.0])
11 [     1 ] y_pred: 0, y_true: 0 ([3.0, 0.0, 44.0, 0.0, 0.0, 8.05, 2.0])
12 [     1 ] y_pred: 0, y_true: 0 ([3.0, 1.0, 28.0, 3.0, 1.0, 25.4667, 2.0])
13 [     1 ] y_pred: 0, y_true: 0 (