# EZ-Pickle for the Flask Lesson
___

The `pickle` library allows us to serialize any python object. This saves the object exactly as it sits in our code to an actual file that we can load up later (or even send to someone else). This process can be applied in many ways, it is used here to save a trained model for use later on in another script.

Since everything in python is an object, (almost) anything can be serialized.

**Imports**

Pickle is a python built-in! Simply `import pickle` and you are good to go.

In [47]:
import pickle
import pandas as pd, numpy as np
from sklearn.linear_model import LinearRegression, Ridge, Lasso, RidgeCV, LassoCV, LassoLars

**Read in data and fit a model**

We are fitting a linear regression model on the Ames `train.csv`. This model will power a web form in the flask demo.

In [2]:
df = pd.read_csv('ames.csv')

In [3]:
df['Bedroom AbvGr']

0       3
1       4
2       3
3       3
4       3
       ..
2046    3
2047    1
2048    3
2049    3
2050    3
Name: Bedroom AbvGr, Length: 2051, dtype: int64

In [48]:
# read in the data
df = pd.read_csv('ames.csv')

# pick some columns, drop the nulls.
good_cols = ['Overall Qual', 'Full Bath', 'Garage Area', 'Lot Area','Year Built','Bedroom AbvGr']
df.dropna(subset=good_cols, inplace=True)

# set up feature matrix and target vector
X = df[good_cols]
y = df['SalePrice']



py= PolynomialFeatures()
py.fit(X)
X = py.transform(X)



ss= StandardScaler()
ss.fit(X)
X= ss.transform(X)
# instantiate the model

model_to_be_pickled = LinearRegression()

# fit the model
model_to_be_pickled.fit(X,y)

# print out the score and coefficients
print(f'The model explains {100*model_to_be_pickled.score(X,y):.2f}% of the variance' + '\n-----\n' + 'Coefficients:')
print( np.round(model_to_be_pickled.coef_, 4))

The model explains 80.53% of the variance
-----
Coefficients:
[      0.      -88540.4107   18291.6169 -177973.794   -50464.6328
  148739.3374  130088.957    52927.5489   12953.5379   29503.5288
   15236.6755   62479.4057   -1269.8859   17602.2609    3751.4623
    7874.6763  -41386.4002   -4190.6691    2577.8679  -29679.8823
  174671.5905    7391.0331  -16341.3826   78268.3088    -451.4726
 -142161.467  -136334.1701   10469.1166]


**Pickling**  
Everything above this was just 'normal' modeling. Now we will actually save the model to a file with the '.p' extension
- `open(filename, permissions)`: allows us to write to a file on our computer, can be used in many different ways. 
- `pickle.dump(object, file)`: serializes an object to an open file. 

In [49]:
# put the two functions above together, using 'write binary' permissions
pickle.dump(model_to_be_pickled, open('model.p', 'wb'))

**Check our work**

Let's read in our model and check the score/coefficients.
- `pickle.load(file)`: de-serializes the stored object back into a variable

In [50]:
# use the above function with open() and 'read binary' permissions to get our model back
model_that_was_pickled = pickle.load(open('model.p', 'rb'))

print(f'The model explains {100*model_that_was_pickled.score(X,y):.2f}% of the variance' + '\n-----\n' + 'Coefficients:')
print(dict(zip(list(X.columns), np.round(model_that_was_pickled.coef_, 4))))

The model explains 80.53% of the variance
-----
Coefficients:


AttributeError: 'numpy.ndarray' object has no attribute 'columns'