[View in Colaboratory](https://colab.research.google.com/github/schwaaweb/aimlds1_07-TheMachineLearningFramework/blob/master/Th07_CC--DJ--Deploying_A_Trained_Model.ipynb)

# Deploying a Trained Model

Once you have trained and optimized a machine learning model to your satisfaction (or your boss's), it is important to be able to take the trained model and deploy it in a way that will allow it to actually be *used* by stakeholders. This will most likely be on a website or through some other application, but mostly likely you wont want to have everyone coming back to your iPython notebook everytime that they want to make a prediction on new data. 

When deploying a model it needs to be deployed in an environment that is using the same version of python and the libraries that were use to originally train it, but what is more important is that we preserve the model's parameterization. A model's parameters and hyperparameters are what we've truly been after this whole time. If we could extract a model's parameterization and move those parameters to a similar deployment environment --regardless of whether that is on a website, phone app, or other hardware-- we would be able to make predictions just like usual. Then **if** (a big *if* here) the new data that your model sees in this new enviroment was represented well by the training data, it might even give good predictions.
___

Today's code challenge is all about getting your model's parameterization out of your iPython notebook in a way that will preserve it and allow the model to be redeployed at a later date.

In [1]:
# LAMBDA SCHOOL
# 
# MACHINE LEARNING
#
# MIT LICENSE

#Code to make your life a little easier
import pandas as pd
import numpy as np
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

# I'm going to use a copy of the data I have in my local anaconda environment
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv('pima-indians-diabetes.data.csv', names=names)

Y = df['class']
X = df.drop(['class'], axis=1)

## The Code Challenge:

Your challenge for today, is to:

1) Train a logistic regression classifier on the Pima Indians diabetes dataset. Do this using a C value of 20 and a penalty of "L1" for your hyperparameters. We'll pretend that we have already done hyperparameter tuning on this model previously for time's sake. 

2) use sklearn.externals.joblib in order to save a serialized version of your fitted model (which will contain the all-important model parameterization) to your machine

3) Upload the saved model from your machine and use it to make predictions on some "new" data. I have generated some new (albeit fake) data which lives in this github repository: 

[https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv](https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv)



In [12]:
##### Your Code Here #####
#df.shape
#df.head()

y = df[['class']]
X = df.drop(['class'], axis=1)
print(y.shape)
print(y.head())
print(X.shape)
print(X.head())
print(df.shape)

(768, 1)
   class
0      1
1      0
2      1
3      0
4      1
(768, 8)
   preg  plas  pres  skin  test  mass   pedi  age
0     6   148    72    35     0  33.6  0.627   50
1     1    85    66    29     0  26.6  0.351   31
2     8   183    64     0     0  23.3  0.672   32
3     1    89    66    23    94  28.1  0.167   21
4     0   137    40    35   168  43.1  2.288   33
(768, 9)


In [6]:
#!wget https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv
dffd = pd.read_csv('fake-data.csv')

In [15]:
dffd.shape

(49, 8)

In [16]:
X.shape

(768, 8)

In [20]:
model = LogisticRegression()
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.07, random_state=42)
model.fit(X_train, np.ravel(y_train))
result_test = model.score(X_test, y_test)
f"random_seed = 42  Accuracy:  {result_test}"

'random_seed = 42  Accuracy:  0.7037037037037037'

In [25]:
yhat = model.predict(dffd)

In [26]:
yhat

array([0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       1, 0, 1, 1, 0])

In [27]:
dffdy = dffd

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age
0,1,134,77,0,0,24.6,1.412,29
1,0,168,77,0,0,42.6,1.141,68
2,2,190,72,0,0,29.4,0.419,25
3,4,132,55,0,0,23.8,0.398,30
4,5,166,80,0,0,33.9,0.224,58
5,2,106,94,0,0,20.4,0.243,44
6,4,135,68,0,0,43.8,0.806,68
7,1,86,49,0,0,23.9,0.707,23
8,1,87,85,0,0,26.2,0.988,22
9,2,132,89,0,0,27.4,0.417,66
