[View in Colaboratory](https://colab.research.google.com/github/schwaaweb/aimlds1_07-TheMachineLearningFramework/blob/master/Th07_CCS--DJ--Deploying_A_Trained_Model__Supervised_Learning_CC2_Solution.ipynb)

# Deploying a Trained Model

Once you have trained and optimized a machine learning model to your satisfaction (or your boss's), it is important to be able to take the trained model and deploy it in a way that will allow it to actually be *used* by stakeholders. This will most likely be on a website or through some other application, but mostly likely you wont want to have everyone coming back to your iPython notebook everytime that they want to make a prediction on new data. 

When deploying a model it needs to be deployed in an environment that is using the same version of python and the libraries that were use to originally train it, but what is more important is that we preserve the model's parameterization. A model's parameters and hyperparameters are what we've truly been after this whole time. If we could extract a model's parameterization and move those parameters to a similar deployment environment --regardless of whether that is on a website, phone app, or other hardware-- we would be able to make predictions just like usual. Then **if** (a big *if* here) the new data that your model sees in this new enviroment was represented well by the training data, it might even give good predictions.
___

Today's code challenge is all about getting your model's parameterization out of your iPython notebook in a way that will preserve it and allow the model to be redeployed at a later date.

In [0]:
# LAMBDA SCHOOL
# 
# MACHINE LEARNING
#
# MIT LICENSE

#Code to make your life a little easier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(url, names=names)

Y = df['class']
X = df.drop(['class'], axis=1)

## The Code Challenge:

Your challenge for today, is to:

1) Train a logistic regression classifier on the Pima Indians diabetes dataset. Do this using a C value of 20 and a penalty of "l1" for your hyperparameters. We'll pretend that we have already done hyperparameter tuning on this model previously for time's sake. 

2) use sklearn.externals.joblib in order to save a serialized version of your fitted model (which will contain the all-important model parameterization) to your machine

3) Upload the saved model from your machine and use it to make predictions on some "new" data. I have generated some new (albeit fake) data which lives in this github repository: 

[https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv](https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv)



## Train Logistic Regression

In [16]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.2, random_state=12)
model = LogisticRegression(C=20, penalty="l1")
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print("Accuracy: %.3f%%" % (result*100.0))

Accuracy: 81.169%


## Save the model to "disk" (colab/google drive in our case)

In [17]:
filename = 'finalized_model.sav'
joblib.dump(model, filename) 

['finalized_model.sav']

## Load model from "disk" and score it to prove that it's working.

In [20]:
# Do this in some other environment typically:
 
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)

# The X_test and Y_test above ^ are still coming from my colab.

0.8116883116883117


In [11]:
# Load fake data
fake_url = "https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv"
X = pd.read_csv(fake_url)

# Y_test = fake['churn']

predicted_values = model.predict(X)

# print(predicted_values)

with_predictions = X
with_predictions['churn'] = 0
with_predictions['churn'] = predicted_values

# View full dataframe with churn predictions
with_predictions.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,churn
0,1,134,77,0,0,24.6,1.412,29,0
1,0,168,77,0,0,42.6,1.141,68,1
2,2,190,72,0,0,29.4,0.419,25,1
3,4,132,55,0,0,23.8,0.398,30,0
4,5,166,80,0,0,33.9,0.224,58,1


In [0]:
# Get the model out of colab-land and onto my machine.
from google.colab import files

files.download('finalized_model.sav')

## Running the model on a local Python environment

If your local environment has different versioning than your notebook, then you'll need to use something like pipenv in order to control your library/python versions. Luckily all of my versioning is all up to date (had to wipe my machine last month) so we didn't run into any big isssues.

## Here is a pretty great pipenv tutorial
[http://docs.python-guide.org/en/latest/dev/virtualenvs/](http://docs.python-guide.org/en/latest/dev/virtualenvs/)

In [0]:
# Use this code in a local python environment to prove to the students that the exported parameterization works.

# Load Libraries
from sklearn.externals import joblib
import pandas as pd

# Load fake data
fake_url = "https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv"
X = pd.read_csv(fake_url)

# Load previously exported model
filename='finalized_model.sav'
loaded_model = joblib.load(filename)

# predict some values
predicted_values = loaded_model.predict(X)

# add predictions to dataframe
with_predictions = X
with_predictions['churn'] = 0
with_predictions['churn'] = predicted_values

# export dataframe to csv
with_predictions.to_csv('fake_predictions.csv', index = False)