# Deploying a Trained Model

Once you have trained and optimized a machine learning model to your satisfaction (or your boss's), it is important to be able to take the trained model and deploy it in a way that will allow it to actually be *used* by stakeholders. This will most likely be on a website or through some other application, but mostly likely you wont want to have everyone coming back to your iPython notebook everytime that they want to make a prediction on new data. 

When deploying a model it needs to be deployed in an environment that is using the same version of python and the libraries that were use to originally train it, but what is more important is that we preserve the model's parameterization. A model's parameters and hyperparameters are what we've truly been after this whole time. If we could extract a model's parameterization and move those parameters to a similar deployment environment --regardless of whether that is on a website, phone app, or other hardware-- we would be able to make predictions just like usual. Then **if** (a big *if* here) the new data that your model sees in this new enviroment was represented well by the training data, it might even give good predictions.
___

Today's code challenge is all about getting your model's parameterization out of your iPython notebook in a way that will preserve it and allow the model to be redeployed at a later date.

In [0]:
# LAMBDA SCHOOL
# 
# MACHINE LEARNING
#
# MIT LICENSE

#Code to make your life a little easier
import pandas as pd
import numpy as np
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(url, names=names)

Y = df['class']
X = df.drop(['class'], axis=1)

## The Code Challenge:

Your challenge for today, is to:

1) Train a logistic regression classifier on the Pima Indians diabetes dataset. Do this using a C value of 20 and a penalty of "L1" for your hyperparameters. We'll pretend that we have already done hyperparameter tuning on this model previously for time's sake. 

2) use sklearn.externals.joblib in order to save a serialized version of your fitted model (which will contain the all-important model parameterization) to your machine

3) Upload the saved model from your machine and use it to make predictions on some "new" data. I have generated some new (albeit fake) data which lives in this github repository: 

[https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv](https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv)



In [0]:
from google.colab import files

model = LogisticRegression(C=20, penalty='l1')
model.fit(X, Y)

# save file in colab environment
joblib.dump(model, 'model.pkl')

# download to local machine
files.download('model.pkl')

In [3]:
# not necessary when in the same colab instance, but I'll pretend I'm in a new one

loaded_model = files.upload()

# makes a copy because there is already a model.pkl

Saving model.pkl to model (1).pkl


In [0]:
loaded_model = joblib.load('model (1).pkl') 

In [5]:
new_data = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/fake-data/master/fake-data.csv')
new_data.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age
0,1,134,77,0,0,24.6,1.412,29
1,0,168,77,0,0,42.6,1.141,68
2,2,190,72,0,0,29.4,0.419,25
3,4,132,55,0,0,23.8,0.398,30
4,5,166,80,0,0,33.9,0.224,58


In [6]:
yhat = loaded_model.predict(new_data)
yhat

array([0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 0])