# 002 Secondary Mushroom - Load Model from `.joblib`


In [1]:
import joblib

## Import model artifacts

You may remember that we stored our artifacts as a dictionary. Then we
used `joblib.dump()` to store the dictionary into a file.

We'll reload that dictionary using `joblib.load()`.


In [2]:
path = "../models/artifacts.joblib"

with open(path, "rb") as f:
    artifact = joblib.load(path)

In [3]:
display(
    type(artifact),
    artifact.keys())

dict

dict_keys(['preprocessor', 'lr_model', 'X_test', 'y_test', 'post_col_names'])

In [4]:
# preprocessing pipeline
artifact["preprocessor"]

In [5]:
# linear regression model
artifact["lr_model"]

In [6]:
# test features DataFrame (note we haven't imported Pandas)
display(
    type(artifact['X_test']),
    artifact['X_test'].head()
)

pandas.core.frame.DataFrame

Unnamed: 0,cap-diameter,stem-height,stem-width,has-ring,cap-shape
49474,5.19,7.05,15.79,f,x
22798,6.84,5.03,12.98,f,s
60027,10.44,4.58,25.92,f,o
35232,3.9,7.5,8.21,t,x
42968,10.76,11.26,17.32,t,p


In [7]:
# test targets
display(
    type(artifact['y_test']),
    artifact['y_test'].head()
)

pandas.core.series.Series

49474    0
22798    1
60027    1
35232    0
42968    0
Name: poisonous, dtype: int64

### Unpack into variables

You could use these objects directly from the dictionary. We'll unpack
them into variables for ease of use.


In [8]:
preprocessor = artifact["preprocessor"]
lr_model = artifact['lr_model']
X_test = artifact['X_test']
y_test = artifact['y_test']
post_col_names = artifact['post_col_names']

## Use model to predict


In [9]:
# preprocess raw data
X_test_processed = preprocessor.transform(X_test)

In [10]:
# make predictions
y_pred = lr_model.predict(X_test_processed)

### Score predictions

To score the predictions, we'll need to import metrics.


In [11]:
from sklearn.metrics import (
    f1_score, accuracy_score
)

In [12]:
# Score predictions
print(
    f1_score(
        y_true=y_test,
        y_pred=y_pred),
    accuracy_score(
        y_true=y_test,
        y_pred=y_pred))

0.6874297427759043 0.6129850990666448


## Prediction from a single, new observation

You may remember the features of the mushrooms that made it into the
model:

```python
    'cap-diameter',
    'stem-height',
    'stem-width',
    'has-ring',
    'cap-shape'
```

You may also remember the values that we saw in the data:

```python
# continuous values
                   mean        std   min     max
cap-diameter   6.733854   5.264845  0.38   62.34
stem-height    6.581538   3.370017  0.00   33.92
stem-width    12.149410  10.035955  0.00  103.91

# categorical
Index(['f', 't'], dtype='object', name='has-ring')

Index(['x', 'f', 's', 'b', 'o', 'p', 'c'],
      dtype='object', name='cap-shape')
```


Let's make a single example to simulate a mushroom found in the wild. For
now, we'll take care to insure that the values are reasonably within
values we've already observed.


We've used `ColumnTransformer` (as well as `OneHotEncoder` and
`StandardScaler`), so the preprocessor is expecting a Pandas DataFrame as
input.

In order to easily make this a DataFrame, remember that each value will
be a list. Since there's only one observation, we can make this list a
single value.


In [13]:
observation = {
    'cap-diameter': [50],
    'stem-height': [20],
    'stem-width': [30],
    'has-ring': ['t'],
    'cap-shape': ['c']
}

Notice that we've not needed Pandas yet at all. We'll need to import it
in order to make our single observation into a DataFrame.

You'll typically place all imports at the top. We import here to
demonstrate that Pandas was not necessary until this step.

> Note: It's arguable that if you only need `DataFrame` from Pandas, then
> perhaps you should import it explicitly


In [14]:
import pandas as pd

In [15]:
single_observation = pd.DataFrame(observation)

In [16]:
single_obs_processed = preprocessor.transform(single_observation)
single_obs_processed

array([[8.19456826, 3.97901331, 1.77942124, 1.        , 0.        ,
        1.        , 0.        , 0.        , 0.        , 0.        ]])

In [17]:
lr_model.predict(single_obs_processed)

array([0])

In [18]:
lr_model.predict_proba(single_obs_processed)

array([[0.86758864, 0.13241136]])

Note that the model is predicting that this mushroom is not poisonous.


### Side quest: attaching column names post-processing

We saved the post-processing column names. You can re-attach them after processing.


In [19]:
pd.DataFrame(
    single_obs_processed,
    columns=post_col_names)

Unnamed: 0,cap-diameter_z,stem-height_z,stem-width_z,has-ring_t,cap-shape_b,cap-shape_c,cap-shape_f,cap-shape_p,cap-shape_s,cap-shape_x
0,8.194568,3.979013,1.779421,1.0,0.0,1.0,0.0,0.0,0.0,0.0
