# Create an API for machine learning

In real case, we need to deploy our model into production.
There are two methods to do so, batch and with API.
Today, we will discuss about create API to provide real time prediction

We will build API which contains a model for classifying iris dataset

# Create a model

In [None]:
#import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [None]:
#load data into dataframe
trainFile = "dataset/iris.data"
train = pd.read_csv(trainFile, delimiter=',', names=["sepal_length", "sepal_width", "petal_length", "petal_width","class"])
train.head()

In [None]:
#define feature, excluding 'class' column
features = [feature for feature in train.columns if feature not in ['class']]
print('Total features : {}'.format(len(features)))

#label
label = train['class']

In [None]:
#train and test split - 70% training, 30% testing
seed = 1987

train_data, test_data, train_label, test_label = train_test_split(train[features], label, test_size = 0.3, random_state = seed)

In [None]:
#build a model
model = RandomForestClassifier(random_state = seed).fit(train_data, train_label)

In [None]:
#train score 
model.score(train_data, train_label)

In [None]:
#test score
model.score(test_data, test_label)

In [None]:
#add the prediction and the ground truth on test data
test_data['prediction'] = model.predict(test_data)
test_data['ground truth'] = test_label
test_data.head()

# After that, what should we do?
Mostly, online courses teach machine learning stop the lecture and materials up until evaluating, and continue to optimize the accuracy. This creates confusion for new Data Scientists that Machine Learning is "an art of optimizing score of Testing dataset in Notebook environment". In fact, machine learning should be part of a system, a code or application so that it will impact customer experience and increase business value.

We will create a pickle file for storing the model.
Pickle is usually used for saving the data on our disk, so next time we could call it.

In [None]:
#save the model as a file
from sklearn.externals import joblib

In [None]:
#create pickle for the model
joblib.dump(model, 'ml_model/model_rf_iris.pkl')

# What is the next step, after getting the pickle file?
Usually, this file will be copied to other code based outside of this notebook. But for this tutorial, we will use the same notebook , but pretend that anything below this line is in a different environment. 

Notes: To make sure anything written below is independent from any variable above, you can restart the Kernel, and start running below codes, without run above codes

# Types of Production ML

There are two types of machine learning production: Batch and serve as API.

# 1. Batch 

### Its most common the code not in Notebook format, so it should be like this in a .py file format

In [None]:
#import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Extract
# This lines could be codes that extract or load data from any database directly
file = "dataset/iris_new_data.data"
new_data = pd.read_csv(file, delimiter = ',', names = ["sepal_length", "sepal_width", "petal_length", "petal_width"])

#Transform
rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
new_data['prediction'] = rf_model_load.predict(new_data)

#Load
# This lines could be codes that Export data to any database directly
new_data.to_csv("predicted_iris_data.csv", sep=',',index=False)

print("Job Success")

## Summary of Batch 

Above codes can be scheduled using server scheduler, or have User interface to trigger the ETL job. Extract and Load part of code can be called from any data sources to any data source

# 2. Serve as API

In order to build an API, we will utilize Flask.

Flask is commonly employed for building web application. 

In [None]:
#install the library
!pip install flask

In [None]:
#define flask
app = Flask(__name__)

In [None]:
#create a decorator to define the path of url and method
@app.route('/predict_ml', methods = ['POST'])

#function and the decorator will be mapped based on the method
def predict():
    if request.method == 'POST':
        #request data and store it in json format
        data = sepal.data
        dataDict = json.loads(data)
        
        #convert the json format to dataframe, load the model and predict it
        pandas_df = pd.DataFrame([dataDict])
        rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
        prediction = rf_model_load.predict(pandas_df)[0]
        print(prediction)
        
        #result whenever someone called the API
        return jsonify({'prediction': prediction})

Start below lines to start the web application

Notes that below code is a running service, means that it will never finish as a process. It will always run until you decide to stop the service.

In [None]:
if __name__ == '__main__':
     app.run(host = 'localhost',port = 8080)

# Input the testing data into API

Open postman or can be downloaded as extension in Chrome, then type the input and don't forget to use POST method

### Format input to API
{
    "sepal_length": 5.3,
    "sepal_width": 2.4,
    "petal_length": 2.5,
    "petal_width": 4.2
}


# In Practice
It's not a common practice to start a serving application in Jupyter Notebook!
Usually the above codes will be run in a web server written in a .py code like below

In [None]:
#import libraries
from flask import Flask, jsonify,request,session
from sklearn.externals import joblib
import pandas as pd
import json

import os.path

app = Flask(__name__)

@app.route('/predict_ml', methods=['POST'])
def predict_ml():
    if request.method=='POST':
        data = request.data
        dataDict = json.loads(data)
        
        pandas_df = pd.DataFrame([dataDict])
        rf_model_load = joblib.load('ml_model/model_rf_iris.pkl')
        prediction = rf_model_load.predict(pandas_df)[0]
        print(prediction)
        
        return jsonify({'prediction': prediction})
    
if __name__ == '__main__':
     app.run(host='localhost',port=8080)

What is that really?
Well, up to this point, this is Software Engineer area. It's common for a Software Engineer to use REST API like above.
For example, you can call the link in a REST API Client, looks like below. This shows your model url linked being called, and has 4 inputs variable. When the request made, the response shows the prediction result. 
![alt text](ML Model in Production/screenshot call API.jpg "Title")

# Summary
Machine Learning is not only "An art of increasing accuracy in a notebook", it is a part of software engineering
There are 2 common methods to use Machine Learning model in production Batch and REST API
Hope this tutorial helping you