# Creating a wrapper for your model

In this short lab, we will create a **wrapper** for your model.

A wrapper is, as the name suggests, a bunch of code used to _wrap_ some more code. We do that so that we have some more control over how we interact with the model, specifically what is the name of the function/method that we use for generating _predictions_. This will become useful when we switch to a different ML framework, that does not have a `predict()` function (simply because it's called in a different way, or maybe it requires extra steps).

We'll write a _class_ that exposes two methods:
- **predict()**: expects several examples as input, produces that many outputs.
- **predictOne()**: expects only one example, and produces only one output.

We'll be a bit fancy here, and actually have a sort-of _abstract_ class called `TrainedModelWrapper` that will serve as a kind of _blueprint_ to determine how wrappers should look like. Then, each specific wrapper (we'll need one for each type of ML framework that we want to adapt) will _inherit_ from that. Don't worry too much about this right now!

Again, we're being quite liberal here for what it concerns how we write the code, but it's ok for now.

In [1]:
# %%writefile model_wrapper.py

import numpy as np

class TrainedModelWrapper:

    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        Assumes X has shape (n, p), where n is the number of samples and p the number of features
        """
        raise Exception("NotImplemented")
    
    def predictOne(self, x: np.ndarray) -> float | int:
        """
        Assumes the input is a one-dimensional array (a vector).
        Its shape should be something like (p,)
        """
        return self.predict(x[np.newaxis, :])[0]

class TrainedSklearnModelWrapper(TrainedModelWrapper):

    def __init__(self, model):
        """
        This is the class' **constructor**. It is called every time you _instantiate_ a new
        object of this class, that is when you write something like
        my_instance = TrainedSklearnModelWrapper(...).

        This specific constructor accepts exactly **one** parameter, which is "saved" in
        a variable so that it is available from within that instance through using `self`.
        These are called _properties_.

        tl;dr: from now on, you can access the model from within other function of this class
        using `self.model`
        """
        self.model = model

    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        Assumes X has shape (n, p), where n is the number of samples and p the number of features
        """

        ### TODO ###
        # Hint: you will likely want to use the `.predict()` function of the model you saved
        # Hint: you can access previously stored properties using `self.`
        return self.model.predict(X)
        ### END ###

Now we check that the wrapper actually works and produces the expected results

In [2]:
# Load the model saved in the previous lab

import pickle as pkl

with open("../artifacts/best_model_sklearn.pkl", 'rb') as f:
    model = pkl.load(f)

# We print the type of the model, to make sure that we're loading the right thing.
# It should print something like "sklearn.<SOMETHINGSOMETHING>"
print(type(model))

<class 'sklearn.linear_model._logistic.LogisticRegression'>


In [3]:
# We create the wrapper object, passing the model as the only argument to the constructor
wrapped_model = TrainedSklearnModelWrapper(model)

We want to test the wrapped model by running on a small subsample of the test set, just to make sure that the results make sense.

In [4]:
from pathlib import Path # Handling files and folder paths

DATA_DIR = "../data/MNIST_CSV"
TEST_CSV = Path(DATA_DIR) / "mnist_test.csv"

In [5]:
import pandas as pd

test_df = pd.read_csv(TEST_CSV, header=None)
test_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,775,776,777,778,779,780,781,782,783,784
0,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


If you remember from the previous lab, the first column indicates the actual digit shown in the picture. Let's isolate it, for clarity.

In [6]:
### TODO ###
# Hint: requires `.iloc` and slicing using square brackets. Refer to previous lab for more hints
test_df.head().iloc[:, 0]
### END ###

0    7
1    2
2    1
3    0
4    4
Name: 0, dtype: int64

So the labels of the first 5 examples in the test set should be `7 2 1 0 4`. Let's see if the algorithm we chose gets those right.

In [7]:
y_pred = wrapped_model.predict(test_df.head().iloc[:5, 1:])
print(y_pred)

[7 2 1 0 4]


In [8]:
# Let's also check that `.predictOne()` works as intended
y_pred = wrapped_model.predictOne(test_df.head().iloc[1, 1:].to_numpy()) # We need to use to_numpy() explicitly here because of how `.predictOne()` works
print(y_pred)

2


Depending on the model you chose, you may have gotten slightly different results, but hopefully most of the predicted labels should be correct!

If you want to play around with your trained model a little more, you can try and classify more than the first 5 examples.

You can control how many rows are displayed when using the `.head()` function by passing it a parameter `n`, for instance if you want to display the first 7 rows you would use `.head(n=7)`. Remember to adjust the slicing when using `.iloc` as well!

In these cases, it's useful to _parametrize_ N, that is saving it in a variable and then using that variable instead of the number directly; this way, if you want to change that number you only have to do that in once place.

Now that you have tested that your wrapper does indeed what it should, go up to the first cell of this notebook, **uncomment** the first line (the one with `%%writefile model_wrapper.py`) and **run** that cell again: you should see a new file called `model_wrapper.py` appearing in the `notebooks` folder (check the column on the left).