## Importing pickle
pickle is a module built in to Python that is suitable for pickling base Python objects. You can import it like this:

In [2]:
import pickle

In [3]:
data_object = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ('character string', b'byte string'),
    'c': {None, True, False}
}

## Writing Objects to Pickle¶

In [4]:
with open('data.pickle', 'wb') as f:
    pickle.dump(data_object, f)

## Importing Objects from Pickle Files

In [6]:
with open('data.pickle', 'rb') as f:
    data_object2 = pickle.load(f)
data_object2

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

Important reminder: DO NOT open pickle files unless you trust the source (e.g. you created them yourself). They can contain malicious code and there are not any built-in security constraints on them.

## Instantiating and Fitting a Model

In [7]:
from sklearn.linear_model import LinearRegression

# y = x + 1
X = [[1],[2],[3],[4],[5]]
y = [2, 3, 4, 5, 6]

model = LinearRegression()
model.fit(X, y)

print(f"Fitted model is y = {model.coef_[0]}x + {model.intercept_}")
print(model.predict([[7], [8], [9]]))

Fitted model is y = 1.0000000000000002x + 0.9999999999999991
[ 8.  9. 10.]


## Importing joblib

In [8]:
import joblib
with open('regression_model.pkl', 'wb') as f:
    joblib.dump(model, f)

## Importing Objects with joblib

In [10]:
with open('regression_model.pkl', 'rb') as f:
    model2 = joblib.load(f)
    
print(f"Loaded model is y = {model2.coef_[0]}x + {model2.intercept_}")
print(model2.predict([[10], [11], [12]]))

Loaded model is y = 1.0000000000000002x + 0.9999999999999991
[11. 12. 13.]


## Pickling and Deploying Pipelines

### Creating a Cloud Function

Let's go ahead and create one! We are going to use the format required by Google Cloud Functions

In order to deploy a model, you will need:

         A pickled model file

         A Python file defining the function

         A requirements file

In [11]:
import pandas as pd
from sklearn.datasets import load_iris

data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="class")
pd.concat([X, y], axis=1)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [12]:
X.describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


Applying Logistic regression with default regularization

In [14]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression())
])
pipe.fit(X, y)

Now the pipeline is ready to make predictions on new data!

In the example below, we are sending in X values from a record in the training data. We know that the classification should be 0, so this is a quick check to make sure that the model works and we are getting the results we expect:

In [15]:
example = pd.DataFrame([[5.1, 3.5, 1.4, 0.2]])
example.columns = X.columns
example

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2


In [16]:
pipe.predict(example)[0]

0

It worked!

## Pickling Our Pipeline

In this case, because raw data needs to be preprocessed before our model can use it, we'll pickle the entire pipeline, not just the model.

In [17]:
with open("model.pkl", "wb") as f:
    joblib.dump(pipe, f)

## Creating Our Function

The serialized model is not sufficient on its own for the HTTP API server to know what to do. You also need to write the actual Python function for it to execute.

In [19]:
def iris_prediction(sepal_length, 
                    sepal_width, 
                    petal_length, 
                    petal_width,
                    features):
    """
    Given sepal length, sepal width, petal length, and petal width,
    predict the class of iris
    """
    
    # Load the model from the file
    with open("model.pkl", "rb") as f:
        model = joblib.load(f)
        
    # Construct the 2D matrix of values that .predict is expecting
    X = [[sepal_length, sepal_width, petal_length, petal_width]]
    
    data = pd.DataFrame(X)
    data.columns = features
    
    # Get a list of predictions and select only 1st
    predictions = model.predict(data)
    prediction = predictions[0]
    
    return {"predicted_class": prediction}

Now let's test it out!

In [20]:
features = X.columns
preds = iris_prediction(5.1, 3.5, 1.4, 0.2, features)
preds

{'predicted_class': 0}

In [22]:
import sklearn
sklearn.__version__

'1.4.2'

In [23]:
joblib.__version__

'1.2.0'