## Deploying Machine Learning Models 

<img src="CRISPDM_Process_Diagram.png" alt="drawing" width="600"/>

[https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome](https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome)

#### [AWS Sagemaker Video Walkthrough](https://www.youtube.com/watch?v=wZ2G9erPX00&feature=youtu.be) 

## 2 Techniques for Deploying a Machine Learning Model

### API 

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('../Data/kc_house_data.csv')
df.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,10/13/2014,221900.0,3,1.0,1180,5650,1.0,,0.0,...,7,1180,0.0,1955,0.0,98178,47.5112,-122.257,1340,5650
1,6414100192,12/9/2014,538000.0,3,2.25,2570,7242,2.0,0.0,0.0,...,7,2170,400.0,1951,1991.0,98125,47.721,-122.319,1690,7639
2,5631500400,2/25/2015,180000.0,2,1.0,770,10000,1.0,0.0,0.0,...,6,770,0.0,1933,,98028,47.7379,-122.233,2720,8062
3,2487200875,12/9/2014,604000.0,4,3.0,1960,5000,1.0,0.0,0.0,...,7,1050,910.0,1965,0.0,98136,47.5208,-122.393,1360,5000
4,1954400510,2/18/2015,510000.0,3,2.0,1680,8080,1.0,0.0,0.0,...,8,1680,0.0,1987,0.0,98074,47.6168,-122.045,1800,7503


In [3]:
df = df[['price', 'bedrooms', 'bathrooms', 'sqft_living']]

In [4]:
df.isnull().sum()

price          0
bedrooms       0
bathrooms      0
sqft_living    0
dtype: int64

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor

In [6]:
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 50, num = 5)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Create the random grid
param_grid = {'max_depth': max_depth,
               'min_samples_split': min_samples_split}

In [7]:
reg = RandomForestRegressor()

In [8]:
rf_grid = GridSearchCV(estimator = reg, param_grid = param_grid,
                               cv = 3, verbose=2, n_jobs = -1)

In [9]:
rf_grid.fit(df.drop('price', axis = 1), df['price'])

Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:   11.7s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:   21.6s finished


GridSearchCV(cv=3, estimator=RandomForestRegressor(), n_jobs=-1,
             param_grid={'max_depth': [10, 20, 30, 40, 50, None],
                         'min_samples_split': [2, 5, 10]},
             verbose=2)

In [10]:
rf_grid.best_params_

{'max_depth': 10, 'min_samples_split': 10}

In [11]:
rf_grid.best_score_

0.5253271155337025

Retrain on entire dataset 

In [12]:
reg = RandomForestRegressor()
reg = reg.fit(df.drop('price', axis = 1), df['price'])

In [13]:
import pickle
pickle.dump(reg, open('final_prediction.pickle', 'wb'))

## Make API Request 

In [15]:
import requests 
import json 

url = 'http://0.0.0.0:5000/api/'

In [17]:
data = [[4, 4, 2000]]
house_data = json.dumps(data)
headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}
r = requests.post(url, data=house_data, headers=headers)
print(r, r.text)

<Response [200]> "[752538.92857143]"



In [18]:
reg.predict(np.array([4, 4, 2000]).reshape(1, -1))

array([752538.92857143])

## Web Application

More Examples
- https://lyric-writer.herokuapp.com/
- [AI Matchmaker](http://52.91.45.65:8501/)

In [20]:
ar = [1,2,5,8,-4,-3,7,6,5]

In [28]:
ar[::2]

[1, 5, -4, 7, 5]

In [29]:
ar[1::2]

[2, 8, -3, 6]

In [34]:
counter = 0
for first, second in zip(ar[::2], ar[1::2]):
    if abs(first - second) == 1:
        counter += 1
counter

3

In [35]:
import re

In [37]:
test = "gh12cdy695m1"

In [39]:
re.findall('[0-9]+', test)

['12', '695', '1']