# Your First MLOps Pipeline

## Aim

> Build several common components found in MLOps pipelines to illustrate the essential concepts of MLOps.


This is by no means a comprehensive overview of all facets of MLOps, but it aims to illustrate a few of the important concepts.

## Outline
- What is MLOps?
- Turning the ML model training process into a pipeline
- Artefact and metadata tracking
  - Tensorboard demo
- Serving
  - APIs
- Drift
  - Data drift
  - Concept drift
- Monitoring
  - Visualising data drift
  - Logging user requests to visualise
- Alerting
  - Pagerduty example
- Retraining
  - Collecting more data to train on
  - Cron
  - Automatic retraining

## What is MLOps?

- MLOps is shorthand for Machine Learning Operations. 
- MLOps is everything other than the model training code required to put AI systems into production
- MLOps is to AI engineering what devops is to software engineering.
- MLOps empowers organisations not just to deploy once, but to deploy over and over again quickly and efficiently by reducing the overhead and increasing automation that usually goes into maintaining machine learning systems in production.

## Why do we need MLOps?

- Serve more models
- Serve new versions of the same model

Why do we need new models?
- Unexpected performance differences between testing and production
- Evolving environments


### About the dataset

In this example, we're working with data from an online retailer, like Amazon, who make timely offers to their customers after every purchase. We want to build a machine learning model to determine the likelihood of a user claiming the offer so that we can confidently offer it to people who will take it. We don't want to offer it to everyone because it costs us to make the offer, which reduces our margins.

- Features:
    - `product_rating`: the difference between the average rating for that product and the user's rating
    - `delivery_duration`: the difference between the claimed delivery time and the actual delivery time
- Label:
    - `used_offer`: Whether the user claimed an offer shared with them after their successful delivery

However, we know that the distribution of data might change over time:
- Buyers may become more or less sensitive to product quality as supply changes
- Buyers may become more or less sensitive to delivery duration
- Buyers may change whether they claim the offer or not due to changes in the economy

> _Note: The datapoints are ordered sequentially in time, so you can see how the data changes over time_

In [1]:
import pandas as pd


def load_data():
    data = pd.read_csv("https://raw.githubusercontent.com/life-efficient/Your-First-MLOps-Pipeline/main/data/initial_data.csv")
    # ^^^ could read from database, filesystem storage, or other source in another application

    features = data.drop(columns=["used_offer"])
    labels = data["used_offer"]

    return features, labels


features, labels = load_data()

print(features.describe())
print(features.head())
print(labels.head())


       product_rating  delivery_duration
count      971.000000         971.000000
mean         0.848218           1.245125
std          1.187715           1.302528
min         -3.151794          -2.930829
25%          0.056051           0.352596
50%          0.820649           1.247354
75%          1.708110           2.196187
max          3.964207           5.192221
   product_rating  delivery_duration
0       -1.435952          -0.934353
1        0.850237           1.462074
2        0.212885           0.326928
3        0.172982          -0.057004
4        0.170494          -1.584580
0    0
1    0
2    0
3    1
4    1
Name: used_offer, dtype: int64


## What goes into training a machine learning model?

You could fill a library with everything that I could mention here, but to summarise it, here are the key steps that go into training a ML model:
- Data preparation
- Model training
- Hyperparameter tuning
- Validation set evaluation
- Test set evaluation

The cells below illustrate what a simple version of this might look like.

In [2]:
# split
from sklearn.model_selection import train_test_split

def split_data(features, labels):
    features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.2, random_state=42) # split data into train and test data
    features_train, features_val, labels_train, labels_val = train_test_split(features_train, labels_train, test_size=0.2, random_state=42) # split train data into train and validation data
    return features_train, features_test, features_val, labels_train, labels_test, labels_val

features_train, features_test, features_val, labels_train, labels_test, labels_val = split_data(features, labels)

Most machine learning algorithms learning procedure is controlled by some parameters that are set before the learning takes place during what is known as model training. These parameters set before training are called hyperparameters.

The cell below defines a function that gets some random hyperparameters. You can tweak these around to try out different model configurations.

I won't go into it here, but in practice, you'd want to systematically sample a range of hyperparameters and use the performance on the validation set to determine which performs best on unseen data.

We'll use a decision tree classifier as our machine learning model. It's a simple model that can still work remarkably well. You can read about it's hyperparameters [here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

In [3]:
def get_hyperparameters():
    return {
        'max_depth': 4,
        'min_samples_split': 2,
        'min_samples_leaf': 1,
    }

hyperparameters = get_hyperparameters()

Now let's train the model

In [4]:
from sklearn.tree import DecisionTreeClassifier

def train_model(features, labels, hyperparameters):
    """Trains a model on the data"""

    model = DecisionTreeClassifier(
        **hyperparameters
    )
    model.fit(features, labels)



    return model

model = train_model(features_train, labels_train, hyperparameters)

Next, we need to evaluate the performance of the model

In [5]:
# implement methods for the different parts of the pipeline, including hyperparameter tuning and evaluation

from sklearn.metrics import accuracy_score

def evaluate_model(model, X_test, y_test):
    """Evaluates the model on the test set"""

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy

accuracy = evaluate_model(model, features_test, labels_test)

print(f"Accuracy: {accuracy}")



Accuracy: 0.9641025641025641


Now create a function to save the model

In [6]:
# save model
import joblib

def save_model(model):
    """Saves the model to disk"""

    joblib.dump(model, "model.joblib")

save_model(model)

Putting that all together...

In [7]:

features, labels = load_data()
train_features, validation_features, test_features, train_labels, validation_labels, test_labels = split_data(features, labels)
hyperparameters = get_hyperparameters()
model = train_model(train_features, train_labels, hyperparameters)
accuracy = evaluate_model(model, test_features, test_labels)
print(f"Accuracy: {accuracy}")


Accuracy: 0.9423076923076923


### Pipelining the ML training process

Imagine you're a data scientist who's developed a machine learning model. You've found out how to create a model that works by setting the right model configuration and processing the data correctly. However, you know that in the future your data is going to change, because you know that over time, the inputs will change with trends. That means you're going to train this model more than once, which is why it's useful to keep this code for re-use.

Now, we can put that all into a function that trains the model from end to end.

In [8]:
def train_and_evaluate_model(features, labels):
    features, labels = load_data()
    train_features, validation_features, test_features, train_labels, validation_labels, test_labels = split_data(features, labels)
    hyperparameters = get_hyperparameters()
    model = train_model(train_features, train_labels, hyperparameters)
    accuracy = evaluate_model(model, test_features, test_labels)
    return model, accuracy, hyperparameters

train_and_evaluate_model(features, labels)
print(f"Accuracy: {accuracy}")

Accuracy: 0.9423076923076923


### End-to-end Pipeline Tools for Model Training used in the Wild

- Airflow
- AWS Sagemaker
- Databricks
- Kubeflow

## Metadata Tracking

### What is model metadata?
Model metadata refers to information about a machine learning model that is not part of the actual model itself, but rather provides context and descriptive details about the model. This metadata can include information such as the model's name, version, author, creation date, input and output formats, hyperparameters, performance metrics, and more.

### Why is it important?
Metadata is important because it allows users to understand the characteristics of a model, how it was created, and how it can be used. This information can help users make informed decisions about which models to use for their particular tasks, and can also aid in reproducibility and collaboration between data scientists and other stakeholders.


### Implementation
The worst way to keep track of your model metadata would be to build your own solution. Everyone's had that idea, and there are simply many way better tools than what you can use off the shelf.

Model metadata can be stored and accessed in various ways, such as through a separate file or database that accompanies the model, or as part of the model's documentation or comments within the code itself. Some machine learning frameworks and platforms also provide tools for automatically generating and managing model metadata.

In this illustration, I'll use Tensorboard to track model metadata. It's in no way the most sophisticated tool for the job - there are many other more advanced solutions used in the wild that I'll share after the following code. Check out the tensorboard documentation [here](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams).

Let's update our training pipeline to log some of the important metadata. Once you run this code, you'll notice a `runs` folder appear, which contains the metadata - it's simply saved in filesystem storage.

_Bonus points if you can implement it with a [decorator](https://www.youtube.com/watch?v=fm_oY5tXD_s)_

In [9]:
# log metadata using tensorboard
from torch.utils.tensorboard import SummaryWriter

def log_metadata(model, accuracy, hyperparameters):
    """Logs metadata to tensorboard"""

    writer = SummaryWriter()
    writer.add_hparams(
        hparam_dict=hyperparameters, 
        metric_dict={"accuracy": accuracy}
    )
    writer.close()

log_metadata(model, accuracy, hyperparameters)

  from .autonotebook import tqdm as notebook_tqdm


### Model Metadata Tracking Tools used in the Wild

- Weights and biases
- MLFlow
- Neptune

## Serving

Now that the model has been trained, the easy part is over. The next step is serving predictions to users. Serving is the essence of what people mean when they talk about "deployment".

The most common pattern is to serve your model predictions through an API. Users can make requests to the API and receive responses. E.g. Someone can make a request for a prediction, and we serve them that prediction by processing their data with our model.

Below is the Python code that defines the API endpoints that could serve the model.

_Note: The last line, which sets the API up to listen for requests, won't work in a notebook because it needs to be run from a Python file. However, the methods above will still be defined, so make sure to run this cell because we'll be using those methods going forward._

In [10]:
from fastapi import FastAPI
import uvicorn

model = joblib.load("model.joblib")  # load model from disk

api = FastAPI()

@api.get("/")
def root():
    return {"Hello": "World"}

@api.get("/predict") # defines the /predict endpoint
def predict(data): # defines that the endpoint takes a parameter called data

    data = pd.read_json(data)
    prediction = model.predict(data)
    return prediction.tolist()


# uvicorn.run(api, host='localhost', port=8000) # UNCOMMENT THIS TO RUN IN A PYTHON FILE

To deploy the API defined in the above cell, here's what you would do with the code:
1. Put into a Python file
1. Define a Docker image that contains that file, the model params, and the model class
1. Run that Docker container from that image on a computer in the cloud

From another file or application, you could make requests to the API like this:

_Note: Like the cell above, we expect an error to be thrown, because we can't run an API from a notebook, so the endpoint won't be found when we try to make a request._

In [11]:
import requests
import json

API_ROOT = "http://localhost:8000"
API_ENDPOINT = "/predict"
URL = API_ROOT + API_ENDPOINT

# features = pd.DataFrame([
#     {
#         "product_rating": 4.5,
#         "delivery_duration": 0.2,
#     },
#     {
#         "product_rating": 1.5,
#         "delivery_duration": 0.1,
#     },
# ])

payload = features.to_json()

# response = requests.get(URL, data=payload)


To simplify the scenario for use in a notebook, I'm just going to call the API method directly.

In [12]:
payload = features.to_json()
predict(payload)


[0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,


We get back two numbers that represent the model's predictions for the two examples we sent with our request.

### API Clients

In our case, here's a simulated example version that works from within a notebook. Often, many APIs, like the OpenAI API for example, come packaged in a Python library. When an API is accessible through a library in a programming language, we call that library a _client_. The client makes the API easy to use because calling the method covers up the direct HTTP request under the hood.

_Note: The below code is not something that you would have in a real world situation, it simply poses as an API for us here, because we can't run one through a notebook_

In a real situation, an API client's source code might look like this:

In [13]:
class APIClient:
    def __init__(self, api_root):
        self.api_root = api_root

    def predict(self, data):
        payload = data.to_json()
        response = requests.get(self.api_root + "/predict", data=payload)
        return response.json()

In our case, that won't work because we can't run an API from within a notebook. So below, I've got another implementation which replaces the request with a direct function call of one of the API's methods. This is NOT something you would see in a real situation.

Make sure you understand the difference:
- The above code is what a real API client's source code might look like
    - It makes a request to an API over the internet
- The below code is NOT something you'd see in practice, because the API would be running on a totally separate machine far away in the cloud
    - You would not have access to the predict method because it would not be in the same file.

In [14]:
class APIClient:
    def __init__(self, testing=False):
        self.model = model
        self.step = 0 # adding this step variable so that we can log metadata later in the next section
        self.writer = SummaryWriter() # also adding this summary writer here so that we can log metadata later in the next section
        if testing:
            self.logging_label = "testing"
        else:
            self.logging_label = "production"

    def predict(self, data):
        data = data.to_json()
        prediction = predict(data) # IN PRACTICE, THIS LINE WOULD BE A REQUEST TO A REMOTE API (see first APIClient class)
        return prediction


Below shows how you would typically use an API client

In [15]:
api_client = APIClient(testing=True)
api_client.predict(features)

[0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,


### Serving Tools used in the Wild

- FastAPI
- AWS Sagemaker
- KFServing
- Kubernetes
- Docker (underpins a lot of the above)

## Monitoring

You can't run ML models in production without keeping an eye on them. We should track some metrics from our API.

### Monitoring Tools used in the Wild
- Prometheus
- Grafana
- Evidently
- Fiddler
- Arize

Let's start off by logging some basic hardware metrics, like [CPU utilisation](psutil.cpu_percent(4)) using the Python library `psutil`

In [16]:
!pip install psutil



In [17]:
# Importing the library
import os
import psutil

def log_hardware_metrics():
    cpu_utilisation_percent = psutil.cpu_percent(4)
    api_client.writer.add_scalar(
        "cpu_utilisation_percent", 
        cpu_utilisation_percent, 
        api_client.step
    ) # log the cpu utilisation to tensorboard  
    print(f"CPU utilisation: {cpu_utilisation_percent}%")

    # load1, load5, load15 = psutil.getloadavg() # get the load average over the last 1, 5 and 15 minutes
    # cpu_usage = round(100 * load15/os.cpu_count(), 1)
    # print(f"15 minute rolling average CPU utilisation: {cpu_usage}%")


Let's add that hardware metric logging to our API by adding to the `predict` method that runs when a prediction is requested. This will overwrite the predict method defined earlier.

In [18]:
def predict(data):  # defines that the endpoint takes a parameter called data

    data = pd.read_json(data)
    prediction = model.predict(data)
    log_hardware_metrics()
    return prediction.tolist()


Now when we request predictions from our API, the API logs the hardware metrics.

In [19]:
api_client = APIClient(testing=True)
api_client.predict(features)

CPU utilisation: 56.6%


[0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,


As well as these hardware metrics though, there are some things that you should track specific to ML. 

These might include:
- Prediction input features
- Predictions
- Confidence of predictions (for models which product probabalistic predictions)

Let's start by logging our model features.

In [20]:

from time import time

def log_model_inputs(data):
    for idx, example in data.iterrows():
        api_client.writer.add_scalar(
            f"product_rating/{api_client.logging_label}",
            example["product_rating"], 
            api_client.step
        )
        api_client.writer.add_scalar(
            f"delivery_duration/{api_client.logging_label}",
            example["delivery_duration"], 
            api_client.step
        )
        api_client.step += 1
    # print("Logging inputs")


In [21]:
api_client = APIClient(testing=True)

def predict(data):  # defines that the endpoint takes a parameter called data
    data = pd.read_json(data)
    log_model_inputs(data) ## add input logging
    prediction = model.predict(data)
    # log_hardware_metrics()
    return prediction.tolist()


api_client.predict(features)


[0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,


Let's run the entire training dataset through this logging process and see how the results look.

_Note: We haven't set a `global_step` parameter in our `add_scalar` function calls, so make sure to view the graphs on "wall time" not "step", because they will all have the same step.

In [22]:
from torch.utils.tensorboard import SummaryWriter # this is just here so that the "Launch Tensorboard Session" shows up in VSCode

api_client.predict(features)


[0,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,


![](./images/training_data_drift.png)

## Drift - Is my problem changing over time?

> _Drift_ is when data changes over time. 

There are two main types of drift:
- Data drift
- Concept drift

### Data Drift

Data drift refers to a change in the statistical properties of the input data over time. This can occur for a variety of reasons, such as changes in the data collection process, changes in the user population, or changes in the underlying distribution of the data. Data drift can lead to a decrease in model accuracy if the model is not updated to reflect the new data distribution.

For example, if a model is trained to classify images of cats and dogs based on certain visual features, and the distribution of images in the real-world data changes over time (e.g., more pictures of certain breeds), this can lead to data drift and cause the model to make more errors in its predictions.

### Concept Drift

Concept drift, on the other hand, refers to a change in the relationship between the input features and the target variable over time. This can occur when the underlying relationships between the features and target variable change due to external factors, such as changes in customer behavior or market trends. Concept drift can lead to a decrease in model accuracy if the model is not updated to reflect the new relationships between the features and target variable.

For example, if a model is trained to predict customer churn based on certain customer attributes (e.g., age, income, location), and the factors that influence customer churn change over time (e.g., new competitors entering the market), this can lead to concept drift and cause the model to make more errors in its predictions.

### Visualising data drift

Below, I use a class `RealWorldDataStream` that I've implemented to simulate a sequence of data being sent from real customers to your API.

In [23]:
from production_data import RealWorldDataStream

api_client = APIClient(testing=False)

for idx, datapoint in enumerate(RealWorldDataStream()):
    api_client.predict(datapoint) # send request to API and log metrics of production data


KeyboardInterrupt: 

### Challenge: Build a drift detection system that prints when drift is detected

## Alerting

> Without alerting, all the monitoring in the world might not make you aware of the problem.

Detecting and responding to data drift and other ML-related issues is critical to maintaining the accuracy, fairness, and performance of a machine learning model in production. By setting up alerts and automating the monitoring process, you can ensure that issues are detected early and addressed in a timely manner.

One typical practice is for MLOps engineers to take shifts where they are on call, prepared and ready to look into and address any issues that come up. 

One industry-grade tool that I like to use is PagerDuty. It's a tool that can trigger phone calls, texts, and push notification alerts to members on your team. 

I once heard people talking about "carrying the pager", and made the mistake of thinking that someone had an old-school, physical pager. These days, alerts come straight to your phone.

Check out the pagerduty Python client documentation [here](https://pagerduty.github.io/pdpyras/user_guide.html#:~:text=Events%20API%20v2-,%C2%B6,-Trigger%20and%20resolve).

_Note: PagerDuty is just an API like the one we developed above. The Python client sends requests to the PagerDuty API, and in response you get a phone call. APIs can do whatever you program them to, and many companies are just a huge scale API._

In [None]:
import pdpyras

API_KEY = "YOUR_API_KEY_HERE"

session = pdpyras.EventsAPISession(API_KEY)

dedup_key = session.trigger("Server is on fire", 'dusty.old.server.net')


## Retraining

Automatic retraining of machine learning models is a technique used to improve the performance of a model over time by continuously updating it with new data. This approach is particularly useful when the underlying data is changing or evolving, such as in the case of sensor data, social media data, or financial data.

### Challenge: Implement automatic retraining and redeployment of the model when significant data drift is detected