# Managing Model Parameters

The MLModel base class dictates that the version of the model should be returned from the “version” property. This version refers to the code version of the model, and does not contain any information about the version of the parameters that the model object contains. This is necessary because the model can vary along two dimensions: the code and the parameters. The code associated with a model version is a combination of the code used to train the model parameters and the code used to make predictions with the model parameters. Both the training and prediction code work together and a set of model parameters trained with a different training codebase will not work with a different version of the prediction codebase.

Model parameters are the data that a model uses to make predictions. A set of model parameters is the result of a training run.

To manage model parameters, we'll need to do two things: 

- make information about the parameters available to the user of our model
- enable the user to instantiate the model with any parameters that they choose

The main goals of this approach to managing model parameters is to hide the implementation details of the parameters that are being used by the model object. We want to do this so that the user of the model does not need know any of the internal implementation details of our model in order to be able to use it.


The relationship between model code and model parameters can be hard to understand. Each version of the code can work with a set of parameters that it is designed to work with. To keep things simple, we can just think of it as adding an element to the normal semantic versioning triplet. So model code version 3.2.1.0 is the model instance that is running code version “3.2.1” and is hosting the parameters version “0”. When the model is retrained a new set of parameters is created, but the code version remains the same, so we get version “3.2.1.1”. If the code of the model changes, we can create a new semantic version such as “3.3.0” and we start the parameters versioning from 0 again, so the first set of parameters will be “3.3.0.0”.

In [1]:
import sys

sys.path.insert(0, "../")

from IPython.display import clear_output

In [2]:
# !pip install ml_base
# clear_output()

## Training a Model

We'll be building a model using scikit-learn so we need to install it:

In [3]:
!pip install scikit-learn
!pip install pandas

clear_output()

In order to show how we can more easily manage parameters with the MLModel base class, we'll first need a model to work with. I created a function that trains a model on the so that I can generate a new model easily:

In [4]:
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = datasets.load_iris(return_X_y=True)

def train_model(random_state: int):
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)
    model = svm.SVC(gamma=1.0, C=1.0)
    
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    
    accuracy = accuracy_score(y_test, predictions)
    
    training_set_size = len(X_train)
    testing_set_size = len(X_test)

    return model, accuracy, training_set_size, testing_set_size

In [5]:
model, accuracy, training_set_size, testing_set_size = train_model(random_state=42)

print("Model: ", model)
print("Accuracy: ", accuracy)

Model:  SVC(gamma=1.0)
Accuracy:  1.0


The "model" variable contains a reference to the model object which contains the model parameters.

## Saving Model Parameters

To manage model parameters, we'll need to save them in a standard format. The pickle format is a simple way to do this, so we'll use it but there are other options as well. We also need to save some metadata about the model parameters that we'll save alongside the model parameters.

First, we'll save the model parameters themselves into a bytes array using the pickle package:

In [6]:
import pickle


model_bytes = pickle.dumps(model)

We'll also need to save some metadata, we'll define it here in a dictionary:

In [7]:
from datetime import datetime


model_parameters_metadata = {
    "model_qualified_name": "iris_model",
    "model_version": "0.1.0",
    "model_parameters_version": "1",
    "description": "A model to predict the species of a flower based on its measurements.",
    "creation_timestamp": datetime.utcnow(),
    "author": "Brian Schmidt",
    "author_email": "6666331+schmidtbri@users.noreply.github.com",
    "metadata": {
        "training_set_size": training_set_size,
        "testing_set_size": testing_set_size,
        "accuracy": accuracy
    },
    "dependencies": [
        "scikit-learn==0.24.2"
        "pandas==1.3.3"
    ]
}

The data structure that we defined above can also be created by using the ModelParametersMetadata type, which is defined in the ml_base.schemas module. We'll paste the code here for clarity:

In [8]:
from typing import Dict, List, Any, Optional
from pydantic import BaseModel, Field
from datetime import datetime


class ModelParametersMetadata(BaseModel):
    model_qualified_name: str = Field(description="Qualified name of the model to which these parameters belong.")
    model_version: str = Field(description="Model code version that these parameters belong to.")
    model_parameters_version: str = Field(description="Version of the model parameters.")
    creation_timestamp: datetime = Field(description="Datetime when the model parameters were created.")
    description: Optional[str] = Field(description="Short description for the model parameters.")
    author: Optional[str] = Field(description="Name of person who created the model parameters.")
    author_email: Optional[str] = Field(description="Email of person who created the model parameters.")
    tags: Optional[List[str]] = Field(description="List of strings that hold information about the parameters.")
    metadata: Optional[Dict[str, Any]] = Field(description="Key value pairs containing any extra metadata about the "
                                                           "model parameters.")
    dependencies: Optional[List[str]] = Field(description="List of code dependencies.")

The information is meant to fully describe a set of parameters for a given model. The fields are described in the code, but we'll go through them here:

- model_qualified_name: the qualified name of the model to which these parameters belong, this means that the MLModel class that has this qualified name should be able to open the model parameters and use them to make a prediction.
- model_version: the version of the model class with which these model parameters should work, this ties together the code of the model class with the parameters so that we can evolve the code safely in the future.
- model_parameters_version: the version of the parameters, this is open ended and can be an integer, a datetime, or a guid, but it should uniquely identify these model parameters.
- creation_timestamp: the date and time when these model parameters where created
- description: a general description of what these model parameters are, this is an optional field
- author: the name or username of the author who created these model parameters, this is an optional field
- author_email: the email of the author who created these model parameters, this is an optional field
- tags: a list of strings that describe the parameters, this is open-ended and can be useful to store metadata about the model parameters, this is an optional field.
- metadata: a dictionary that can store key value pairs with information about the model parameters, this is also open-ended and is similar to tags, this is an optional field.
- dependencies: a list of dependencies that the model parameters require in order to work, this is an optional field.

Now that we have a type associated with the parameter metadata, we'll use it to hold the information that we previously had in a dictionary above:

In [9]:
from ml_base.schemas import ModelParametersMetadata

model_parameters_metadata = ModelParametersMetadata(
    model_qualified_name="iris_model",
    model_version="0.1.0",
    model_parameters_version="1",
    description="A model to predict the species of a flower based on its measurements.",
    creation_timestamp=datetime.utcnow(),
    author="Brian Schmidt",
    author_email="6666331+schmidtbri@users.noreply.github.com",
    metadata={
        "training_set_size": training_set_size,
        "testing_set_size": testing_set_size,
        "accuracy": accuracy
    },
    dependencies=[
        "scikit-learn==0.24.2",
        "pandas==1.3.3"
    ])

The model metadata is not strictly required for storing the model parameters, but it becomes important when choosing which set of model parameters to use later on. We'll store it by serializing the object to a JSON string encoded into a byte array.

In [10]:
json_bytes = model_parameters_metadata.json().encode("utf-8")

In order to make save the model parameters as a single file, we'll combine the pickle and json files into a single zip file:

In [11]:
from zipfile import ZipFile

file_name = "{}-{}-{}.zip".format(model_parameters_metadata.model_qualified_name,
                                  model_parameters_metadata.model_version,
                                  model_parameters_metadata.model_parameters_version)

with ZipFile(file_name, "w") as zip_file:
    zip_file.writestr("model.pickle", model_bytes)
    zip_file.writestr("model_parameters_metadata.json", json_bytes)

The zip will be used later to load the model parameters and metadata from one file. We'll also need to access the model by itself, so we'll save it to disk:

In [12]:
with open("model.pickle", 'wb') as file:
    pickle.dump(model, file)

## Parameter Versioning

In the example above, you can see that I saved the model parameters with the version "1". This is not a requirement and we can provide a version string that can convey more meaning. For example we can use:

- an increasing integer
- the date and time that the training run finished
- a GUID

Although a GUID is easy to generate and always unique, it is sometimes useful to arrange model parameters in the order in which they were created, in those cases it is better to use an increasing integer or a datetime.

An set of parameters are always associated with a model and a particular code version of that model. A set of parameters generated by a specific version of the training code of a model is always guaranteed to work with the prediction code of that version of the model. 

## Creating the Model Class

Now that we have a set of parameters saved, we can create the model class that can use the parameters to make predictions.

First, we'll need to define the model's input and output schemas so we can add them to the model class later:

In [13]:
from pydantic import BaseModel, Field
from pydantic import ValidationError
from enum import Enum


class ModelInput(BaseModel):
    sepal_length: float = Field(gt=5.0, lt=8.0, description="The length of the sepal of the flower.")
    sepal_width: float = Field(gt=2.0, lt=6.0, description="The width of the sepal of the flower.")
    petal_length: float = Field(gt=1.0, lt=6.8, description="The length of the petal of the flower.")
    petal_width: float = Field(gt=0.0, lt=3.0, description="The width of the petal of the flower.")


class Species(str, Enum):
    iris_setosa = "Iris setosa"
    iris_versicolor = "Iris versicolor"
    iris_virginica = "Iris virginica"


class ModelOutput(BaseModel):
    species: Species = Field(description="The predicted species of the flower.")

In [14]:
import os
from numpy import array
from ml_base.ml_model import MLModel, MLModelSchemaValidationException


class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    def __init__(self):
        dir_path = os.path.abspath("")
        with open(os.path.join(dir_path, "model.pickle"), 'rb') as f:
            self._svm_model = pickle.load(f)

    def predict(self, data: ModelInput) -> ModelOutput:
        
        # creating a numpy array using the fields in the input object
        X = array([data.sepal_length, 
                   data.sepal_width, 
                   data.petal_length, 
                   data.petal_width]).reshape(1, -1)
        
        # making a prediction, at this point its a number
        y_hat = int(self._svm_model.predict(X)[0])
        
        # converting the prediction from a number to a string
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        
        # returning the prediction inside an object
        return ModelOutput(species=species)

The model can be used by instantiating the model class like this:

In [15]:
model = IrisModel()

prediction = model.predict(data=ModelInput(**{"sepal_length": 6.0, "sepal_width": 2.1, 
                                              "petal_length": 1.2, "petal_width": 1.3}))

prediction

ModelOutput(species=<Species.iris_virginica: 'Iris virginica'>)

## Adding Parameter Metadata to a Model Class

We've implemented a model class that wraps around our model and is able to make predictions. So far, the code is not any different from the basic example we've seen before, so we are going to add some functionality that is supported by the MLModel base class for parameters metadata.

The MLModel base class has a class method defined that returns model parameter metadata. It is implemented in the base class:

In [16]:
IrisModel.parameters()

[]

The implementation in the base class purposefully returns an empty list so that we can have a default behavior in case the developer does not want to support many model parameters in their model class. The method is supposed to be overriden in any class that inherits from MLModel, but only if the class is able to return metadata about the model parameters that it supports. We'll implement the method now:

In [17]:
import glob
from zipfile import ZipFile
from io import BytesIO
import json
from ml_base.schemas import ModelParametersMetadata


class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        """Return a list of ModelParametersMetadata objects that describe the parameters available for use."""
        model_parameters_metadata = []
        
        # getting a list of the zip files in the directory, filtering the file names to only the 
        # ones that match the model qualified name and version
        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        # iterating through the zip files found
        for zip_file_path in zip_file_paths:
            # opening a zip file
            zip_file = ZipFile(zip_file_path)
            # extracting the metadata in the JSON file to a string
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            # deserializing the JSON string and creating a metadata object from it
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
        
    
    def __init__(self):
        dir_path = os.path.abspath("")
        with open(os.path.join(dir_path, "model.pickle"), 'rb') as f:
            self._svm_model = pickle.load(f)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

The parameters() class method is able to scan the directory for any zip files, opening each one to access the JSON file that we saved with the parameters metadata. The JSON is deserialized into a ModelParametersMetadata object that is then returned in a list from the class method.

## Accessing Model Parameters Metadata

Now we can access the parameters that are available by calling a class method:

In [18]:
model_parameters = IrisModel.parameters()

model_parameters

[ModelParametersMetadata(model_qualified_name='iris_model', model_version='0.1.0', model_parameters_version='2', creation_timestamp=datetime.datetime(2021, 10, 27, 2, 46, 30, 837543), description='A model to predict the species of a flower based on its measurements.', author='Brian Schmidt', author_email='6666331+schmidtbri@users.noreply.github.com', tags=None, metadata={'training_set_size': 120, 'testing_set_size': 30, 'accuracy': 1.0}, dependencies=['scikit-learn==0.24.2', 'pandas==1.3.3']),
 ModelParametersMetadata(model_qualified_name='iris_model', model_version='0.1.0', model_parameters_version='1', creation_timestamp=datetime.datetime(2021, 10, 27, 12, 51, 34, 772958), description='A model to predict the species of a flower based on its measurements.', author='Brian Schmidt', author_email='6666331+schmidtbri@users.noreply.github.com', tags=None, metadata={'training_set_size': 120, 'testing_set_size': 30, 'accuracy': 1.0}, dependencies=['scikit-learn==0.24.2', 'pandas==1.3.3'])]

The model metadata is extracted from the JSON file within the JSON file and it is made available to the user of the model, just by calling the method. The technique that we use to store the model parameters and model metadata can be changed without affecting the users of the model, indeed the model parameters don't event have to be stored in the local computer. What really matters is that the parameters metadata is made available to the users of the model in a standardized way.

The model parameters metadata is accessed from the model class as a class method because we want to enable users of the model to get the metadata without having to instantiate the model beforehand. Next, we'll modify the model class to enable the user to choose which model parameters they want.

## Instantiating a Model Object With Parameters

The initialization method is not able to handle multiple versions of the model parameters. We'll rewrite it and accept a parameter called "model_parameters_version" that allows us to select the parameters we want when we instantiate the model object.

In [19]:
class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        model_parameters_metadata = []

        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        for zip_file_path in zip_file_paths:
            zip_file = ZipFile(zip_file_path)
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
    
    def __init__(self, model_parameters_version: Optional[str] = None):
        # the name of the zip file always matches model parameters version
        zip_file_path = "{}-{}-{}.zip".format(self.qualified_name,
                                              self.version,
                                              model_parameters_version)
        
        # opening the zip file
        zip_file = ZipFile(zip_file_path)
        
        # loading the model parameters in the pickle file
        model_bytes = zip_file.open("model.pickle").read()
        self._svm_model = pickle.loads(model_bytes)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

Now we can request that the model class instantiate itself with parameters version "1":

In [20]:
model = IrisModel(model_parameters_version="1")

prediction = model.predict(data=ModelInput(**{"sepal_length": 6.0, "sepal_width": 2.1, 
                                              "petal_length": 1.2, "petal_width": 1.3}))

prediction

ModelOutput(species=<Species.iris_virginica: 'Iris virginica'>)

What happens if we choose model parameters that don't exist? We'll just handle that by raising an exception:

In [21]:
from ml_base.ml_model import MLModelParametersNotAvailableException


class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        model_parameters_metadata = []

        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        for zip_file_path in zip_file_paths:
            zip_file = ZipFile(zip_file_path)
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
        
    
    def __init__(self, model_parameters_version: Optional[str] = None):
        # the name of the zip file always matches model parameters version
        zip_file_path = "{}-{}-{}.zip".format(self.qualified_name,
                                              self.version,
                                              model_parameters_version)
        
        # opening the zip file
        try:
            zip_file = ZipFile(zip_file_path)
        except FileNotFoundError as e:
            # and raise an exception if we can't find it
            raise MLModelParametersNotAvailableException("Parameters version {} not found.".format(model_parameters_version))

        # loading the model parameters in the pickle file
        model_bytes = zip_file.open("model.pickle").read()
        self._svm_model = pickle.loads(model_bytes)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

Now when we try to load a non-existant set of parameters, we'll get a specific exception type, which is the MLModelParametersNotAvailableException. This is useful for hiding the implementation details behind how the model class is storing and accessing model parameters, this way the users of the model only need to catch one type of exception to find out if the model parameters and not available.

In [22]:
try:
    model = IrisModel(model_parameters_version="2")
except MLModelParametersNotAvailableException as e:
    print("Exception raised: ", e)

## Accessing Model Parameters Metadata

When we have a model object instantiated, we need a way to tell which model parameters version it is currently holding. To do this, we'll implement the "parameters_metadata" property. This property is actually implemented in the base class, but it purposefully returns "None":

In [23]:
print(model.parameters_metadata)

None


It returns None by default so that developers have the option to not handle parameter versioning in their own MLModel class. However, we do want to handle parameter versions, so we'll override the property in our own class:

In [24]:
class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        model_parameters_metadata = []

        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        for zip_file_path in zip_file_paths:
            zip_file = ZipFile(zip_file_path)
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
    
    @property
    def parameters_metadata(self) -> Optional[ModelParametersMetadata]:
        return self.current_model_parameters_metadata
        
    
    def __init__(self, model_parameters_version: Optional[str] = None):
        zip_file_path = "{}-{}-{}.zip".format(self.qualified_name,
                                              self.version,
                                              model_parameters_version)
        
        try:
            zip_file = ZipFile(zip_file_path)
        except FileNotFoundError as e:
            raise MLModelParametersNotAvailableException("Parameters version {} not found.".format(model_parameters_version))
        
        model_bytes = zip_file.open("model.pickle").read()
        self._svm_model = pickle.loads(model_bytes)
        
        # extracting the parameter metadata in the JSON file to a string
        json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
        # deserializing the JSON string and creating a metadata object from it
        metadata_dictionary = json.loads(json_string)
        # saving the metadata to an instance property
        self.current_model_parameters_metadata = ModelParametersMetadata(**metadata_dictionary)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

When the model parametes are loaded in the \_\_init\_\_ method, we save the metadata object to the "self.current_model_parameters_metadata" property. Then we return it from the parameters_metadata property of the model object.

Now when we load a set of parameters, we'll be able to know what the metadata is for them:

In [25]:
model = IrisModel(model_parameters_version="1")

print(model.parameters_metadata)

model_qualified_name='iris_model' model_version='0.1.0' model_parameters_version='1' creation_timestamp=datetime.datetime(2021, 10, 27, 12, 51, 34, 772958) description='A model to predict the species of a flower based on its measurements.' author='Brian Schmidt' author_email='6666331+schmidtbri@users.noreply.github.com' tags=None metadata={'training_set_size': 120, 'testing_set_size': 30, 'accuracy': 1.0} dependencies=['scikit-learn==0.24.2', 'pandas==1.3.3']


## Default Model Parameters Version

The model_parameters argument is actually not required to be provided to the \_\_init\_\_ method. If the parameter is not provided, its a good practice to just choose the latest set of model parameters and load those. 

In [26]:
class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.1.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        model_parameters_metadata = []
        
        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        for zip_file_path in zip_file_paths:
            zip_file = ZipFile(zip_file_path)
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
    
    @property
    def parameters_metadata(self) -> Optional[ModelParametersMetadata]:
        return self.current_model_parameters_metadata        
    
    def __init__(self, model_parameters_version: Optional[str] = None):
        
        if model_parameters_version is None and len(self.parameters()) == 0:
            raise MLModelParametersNotAvailableException("No parameters available.")
        
        # sorting the model parameters by creation_timestamp and selecting the latest one
        elif model_parameters_version is None and len(self.parameters()) > 0:
            sorted_parameters = sorted(self.parameters(), key=lambda parameters: parameters.creation_timestamp, reverse=True)
            model_parameters_version = sorted_parameters[0].model_parameters_version
            
        zip_file_path = "{}-{}-{}.zip".format(self.qualified_name,
                                              self.version,
                                              model_parameters_version)

        # opening the zip file
        try:
            zip_file = ZipFile(zip_file_path)
        except FileNotFoundError as e:
            raise MLModelParametersNotAvailableException("Parameters version {} not found.".format(model_parameters_version))

        # loading the model parameters in the pickle file
        model_bytes = zip_file.open("model.pickle").read()
        self._svm_model = pickle.loads(model_bytes)
        
        # extracting the parameter metadata in the JSON file to a string
        json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
        # deserializing the JSON string and creating a metadata object from it
        metadata_dictionary = json.loads(json_string)
        # saving the metadata to an instance property
        self.current_model_parameters_metadata = ModelParametersMetadata(**metadata_dictionary)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

To show how this works, we'll need a new set of model parameters so we'll just train version "2" and save them to disk. The only thing that's changing is the model parameters version:

In [27]:
model, accuracy, training_set_size, testing_set_size = train_model(random_state=42)

# saving the model pickle to a bytes object
model_bytes = pickle.dumps(model)
    
model_parameters_metadata = ModelParametersMetadata(
    model_qualified_name="iris_model",
    model_version="0.1.0",
    model_parameters_version="2",
    description="A model to predict the species of a flower based on its measurements.",
    creation_timestamp=datetime.utcnow(),
    author="Brian Schmidt",
    author_email="6666331+schmidtbri@users.noreply.github.com",
    metadata={
        "training_set_size": training_set_size,
        "testing_set_size": testing_set_size,
        "accuracy": accuracy
    },
    dependencies=[
        "scikit-learn==0.24.2",
        "pandas==1.3.3"
    ])

# saving the json string to a bytes object
json_bytes = model_parameters_metadata.json().encode("utf-8")
    
file_name = "{}-{}-{}.zip".format(model_parameters_metadata.model_qualified_name,
                                  model_parameters_metadata.model_version,
                                  model_parameters_metadata.model_parameters_version)

with ZipFile(file_name, "w") as zip_file:
    zip_file.writestr("model.pickle", model_bytes)
    zip_file.writestr("model_parameters_metadata.json", json_bytes)

Since the "2" version of the model parameters was trained later, it should be chosen when we don't provide the model_parameters_version to the \_\_init\_\_ method:

In [28]:
model = IrisModel()

print(model.parameters_metadata.model_parameters_version)

2


The parameters version "2" was selected when we didnt provide the desired version to the IrisModel \_\_init\_\_ method.

## Matching Model Parameters Version to Model Code Version

Model parameters are always related to the model class that is designed to load them and use them. This is why we store the model version in the parameters metadata. When the model training code changes, we should be able to load the model parameters correctly in the model class and predict with them correctly as well.

To show how this works, we'll train a new model and change the way that we serialize the model:

In [29]:
from io import BytesIO
import joblib

model, accuracy, training_set_size, testing_set_size = train_model(random_state=42)

model_bytes = BytesIO()
joblib.dump(model, model_bytes)
model_bytes.seek(0)
model_bytes = model_bytes.read()

For this model, we serialized the model object with the joblib package. This will cause the "0.1.0" version of the model class to not be able to read these new parameters.

Next, we'll create the parameters metadata for the new model parameters:

In [30]:
model_parameters_metadata = ModelParametersMetadata(
    model_qualified_name="iris_model",
    model_version="0.2.0",
    model_parameters_version="1",
    description="A model to predict the species of a flower based on its measurements.",
    creation_timestamp=datetime.utcnow(),
    author="Brian Schmidt",
    author_email="6666331+schmidtbri@users.noreply.github.com",
    metadata={
        "training_set_size": training_set_size,
        "testing_set_size": testing_set_size,
        "accuracy": accuracy
    },
    dependencies=[
        "scikit-learn==0.24.2",
        "pandas==1.3.3"
    ])

json_bytes = model_parameters_metadata.json().encode("utf-8")

Notice that the model version went up to "0.2.0" because we are changing the way the model is serialized, which means that the model class will also need to change. The parameters version went back down to "1" because we reset the count when we got a new model version.

Next, we'll save the new model parameters as before:

In [31]:
# building up a filename
file_name = "{}-{}-{}.zip".format(model_parameters_metadata.model_qualified_name,
                                  model_parameters_metadata.model_version,
                                  model_parameters_metadata.model_parameters_version)

# saving the model file and JSON file into the zip file
with ZipFile(file_name, "w") as zip_file:
    zip_file.writestr("model.joblib", model_bytes)
    zip_file.writestr("model_parameters_metadata.json", json_bytes)

Now we need to rewrite the model class to handle the new joblib serialization format:

In [32]:
class IrisModel(MLModel):
    
    display_name= "Iris Model"
    qualified_name = "iris_model"
    description = "A model to predict the species of a flower based on its measurements."
    version = "0.2.0"
    input_schema = ModelInput
    output_schema = ModelOutput
    
    @classmethod
    def parameters(cls) -> List[ModelParametersMetadata]:
        model_parameters_metadata = []
        
        zip_file_paths = glob.glob("./{}-{}-*.zip".format(cls.qualified_name, cls.version))
        
        for zip_file_path in zip_file_paths:
            zip_file = ZipFile(zip_file_path)
            json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
            metadata_dictionary = json.loads(json_string)
            model_parameters_metadata.append(ModelParametersMetadata(**metadata_dictionary))
            
        return model_parameters_metadata
    
    @property
    def parameters_metadata(self) -> Optional[ModelParametersMetadata]:
        return self.current_model_parameters_metadata        
    
    def __init__(self, model_parameters_version: Optional[str] = None):
        
        if model_parameters_version is None and len(self.parameters()) == 0:
            raise MLModelParametersNotAvailableException("No parameters available.")
            
        elif model_parameters_version is None and len(self.parameters()) > 0:
            sorted_parameters = sorted(self.parameters(), key=lambda parameters: parameters.creation_timestamp, reverse=True)
            model_parameters_version = sorted_parameters[0].model_parameters_version
            
        zip_file_path = "{}-{}-{}.zip".format(self.qualified_name,
                                              self.version,
                                              model_parameters_version)
        try:
            zip_file = ZipFile(zip_file_path)
        except FileNotFoundError as e:
            raise MLModelParametersNotAvailableException("Parameters version {} not found.".format(model_parameters_version))

        # loading the model parameters in the pickle file
        model_bytes = zip_file.open("model.joblib").read()
        model_bytes = BytesIO(model_bytes)
        self._svm_model = joblib.load(model_bytes)
        
        json_string = zip_file.open("model_parameters_metadata.json").read().decode('utf-8')
        metadata_dictionary = json.loads(json_string)
        self.current_model_parameters_metadata = ModelParametersMetadata(**metadata_dictionary)

    def predict(self, data: ModelInput) -> ModelOutput:
        X = array([data.sepal_length, data.sepal_width, data.petal_length, data.petal_width]).reshape(1, -1)
        y_hat = int(self._svm_model.predict(X)[0])
        targets = ["Iris setosa", "Iris versicolor", "Iris virginica"]
        species = targets[y_hat]
        return ModelOutput(species=species)

In [33]:
model = IrisModel(model_parameters_version="1")

print(model.parameters_metadata)

model_qualified_name='iris_model' model_version='0.2.0' model_parameters_version='1' creation_timestamp=datetime.datetime(2021, 10, 27, 12, 51, 34, 980899) description='A model to predict the species of a flower based on its measurements.' author='Brian Schmidt' author_email='6666331+schmidtbri@users.noreply.github.com' tags=None metadata={'training_set_size': 120, 'testing_set_size': 30, 'accuracy': 1.0} dependencies=['scikit-learn==0.24.2', 'pandas==1.3.3']


## Managing Model Objects

Now that we have a way to store model parameters along with metadata and a way to load different parameters with the same model class, we'll need a way to manage multiple versions of parameters within the same process. In the previous example, we used the ModelManager class to do this, we'll extend it to be able to manage parameter versions as well.

To work with the ModelManager, we first need to instantiate it:

In [34]:
from ml_base.utilities.model_manager import ModelManager

model_manager = ModelManager()

SyntaxError: invalid syntax (model_manager.py, line 127)

### Adding a Model Object to the Model Manager

Adding a model object with parameters metadata.

In [None]:
from tests.mocks import MLModelMockWithParametersMetadata

metadata = MLModelMockWithParametersMetadata.parameters()
metadata

In [None]:
model = MLModelMockWithParametersMetadata()

model_manager.add_model(model)

In [None]:
model_details = model_manager.get_model_details()

Adding a model object without parameters version.

In [None]:
add_model

### Removing a Model Object from the Model Manager

Removing a model object with parameters version.

In [None]:
remove_model

Removing a model object without parameters version.

In [None]:
remove_model

### Loading a Model


In [None]:
load_model

### Getting a List of Models from the Model Manager


In [None]:
get_models

### Getting a List of Models and Parameters from the Model Manager


In [None]:
get_model_metadata

### Getting a Model Object from the Model Manager

With parameters version

In [None]:
get_model

Without parameters version

In [None]:
get_model

## Conclusion

The technique used to save and access model parameters in this guide is simple and open ended and it not prescriptive. If we wanted save model parameters in another way, we would be able to simply change the code in the model class accordingly. What this guide is showing is a standardized way to report which model parameters are available for a model, and a standardized way to create a model object that uses a specific set of model parameters. 

We are able to return metadata about the model parameters available to be used by adding a property to the MLModel base class that can be implemented by any model class that inherits from it. The property return a list of model metadata objects that correspond to the model parameters available. The metadata object contains information about the model, model version, and other things that are useful to keep track of when storing model parameters. 

When we have decided which model parameters to load, we can load them by using 

What is required?
What is left up to the developer?


The model parameters are always saved with metadata that helps to identify them later. The metadata is in a standardized format. The parameters metadata is accesed from the model class of the model object through a defined API. Parameters metadata is not required 