# Performance Degradation in Machine Learning Models

In many real-world machine learning applications, it is expected that a model's performance will degrade over time. Understanding and anticipating this performance degradation during the model development phase is crucial for building a robust predictive system.
Anticipating Performance Degradation

Is it possible to predict how much a model's performance will degrade over time during the model construction process?

Yes, by employing a validation workflow that incorporates time, we can gain insight into the potential degradation. Instead of randomly splitting the dataset for training and testing, we can use a time-based split, where:

- Data before a specific cutoff date is used for training.
- Data after the cutoff date is used for testing.

## Validation Workflow for Time-Based Prediction Models

To estimate the degradation of a model before it is deployed to production, follow this approach:

- Calculate the model's accuracy (or another performance metric) at different time intervals (such as daily, weekly, etc.).
- Plot the accuracy over time using a bar graph to visualize how the model's performance decreases as new data is introduced.

This experiment allows you to project the model's expected behavior based on the evidence already present in the training data. However, it's important to remember that unforeseen changes, such as shifts in user behavior, may cause the model to degrade even faster in production than anticipated.

## Simulating the Impact of Data Changes on Model Performance

### Scenario 1: No Degradation

In the first scenario, we simulate random data that remains relatively stable over time. In this case, the model does not experience significant performance degradation.

The simulation generates a synthetic dataset with no significant drift or change over time. After splitting the dataset into training and testing sets based on a time cutoff, we train a **RandomForestClassifier** and evaluate its performance on the test set. The results are then displayed as a bar graph, showing the model’s performance over time.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    ConfusionMatrixDisplay,
)

In [2]:
# Generate a simulated dataset
X, y = make_classification(
    n_samples=300*100,
    n_features=10,
    n_informative=5,
    n_redundant=2,
    random_state=42
)

start_date = pd.to_datetime("2023-01-01")
num_days = X.shape[0] // 50  # Number of days based on batch size
dates = pd.date_range(start=start_date, periods=num_days, freq="D").repeat(100)

# Create a DataFrame from the simulated data and the dates column
df = pd.DataFrame(X, columns=[f"Feature_{i}" for i in range(1, 11)])
df["Label"] = y
df["Date"] = dates[:X.shape[0]]

df.head()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,Label,Date
0,0.182264,-0.234382,0.177566,1.65081,0.334894,1.356317,-0.178905,0.423151,-0.626189,1.17419,0,2023-01-01
1,1.583839,2.179018,-2.149535,-0.052005,0.55801,-3.259897,0.627321,-0.791987,1.417631,-1.444508,1,2023-01-01
2,0.201082,2.198913,-2.914435,-0.183695,-1.094415,0.667563,2.623809,2.58263,-1.218585,1.064815,0,2023-01-01
3,-0.312984,1.122585,-1.269978,-0.728335,0.569287,-0.300377,0.558103,-0.230394,1.430662,1.323242,1,2023-01-01
4,2.276166,0.297265,-0.406471,-0.549914,1.327052,-0.153719,0.903378,1.085651,-0.653437,-0.72903,1,2023-01-01


In [3]:
X = df.drop(["Label", "Date"], axis=1)
y = df["Label"]


In [4]:
filter_train = df["Date"] < "2023-10-01"

X_train = X.loc[filter_train, :]
X_test = X.loc[~filter_train, :]

y_train = y.loc[filter_train]
y_test = y.loc[~filter_train]
date_test = df.loc[~filter_train, "Date"]


In [5]:
clf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(f"Accuracy score: {accuracy_score(y_test, y_pred):.2f}")

Accuracy score: 0.93


In [6]:
df_pred = pd.DataFrame({"y_true": y_test, "y_pred": y_pred, "date": date_test})
accuracy_by_date = df_pred.groupby("date").apply(lambda x: accuracy_score(x["y_true"], x["y_pred"]))
accuracy_by_date = accuracy_by_date.reset_index()
accuracy_by_date.columns = ["date", "accuracy"]

px.bar(accuracy_by_date, x="date", y="accuracy", title="Model Performance Over Time")

  accuracy_by_date = df_pred.groupby("date").apply(lambda x: accuracy_score(x["y_true"], x["y_pred"]))


The bar graph should display a stable performance over time, with no visible degradation.

### Scenario 2: Simulated Performance Degradation

Now, let's simulate a scenario where data changes over time by introducing date-dependent noise. This change mimics real-world scenarios where data may drift or evolve over time, leading to reduced model accuracy as the prediction data moves further away from the training cutoff date.

In [7]:
noise_magnitude = np.arange(1, len(X_test) + 1)/len(X_test) * 7.5
np.random.seed(1234)
noise = np.random.normal(0, noise_magnitude[:, np.newaxis], size=X_test.shape)
X_test_noise = X_test + noise

In this case, the model's performance drops as time progresses, as shown by the bar graph that plots accuracy over time with the noisy test data.

In [8]:
y_pred_noise = clf.predict(X_test_noise)

print(f"Accuracy score: {accuracy_score(y_test, y_pred_noise):.2f}")

df_pred_noise = pd.DataFrame({"y_true": y_test, "y_pred": y_pred_noise, "date": date_test})
accuracy_by_date_noise = df_pred_noise.groupby("date").apply(lambda x: accuracy_score(x["y_true"], x["y_pred"]))
accuracy_by_date_noise = accuracy_by_date_noise.reset_index()
accuracy_by_date_noise.columns = ["date", "accuracy"]
accuracy_by_date_noise

px.bar(accuracy_by_date_noise, x="date", y="accuracy", color="accuracy", title="Performance Decrease Over Time")

Accuracy score: 0.65






This simulation provides an early indication of how performance might decline in a production setting, giving you insight into the extent of the performance drop before the model is even deployed.

### Handling Model Performance Degradation

What should you do when the model's performance falls below an acceptable threshold?

- **Retrain the model**: Regularly retraining the model with fresh data can help address performance degradation.
- **Collect more data**: Gathering new data helps the model stay up-to-date with evolving patterns.
- **Add new features**: Introducing new features or improving feature engineering can help maintain or enhance the model’s performance.
- **Test new algorithms**: Evaluate different algorithms to see if they handle the evolving data better.

### Practical Insights for Production

By analyzing the accuracy degradation over time, you can make informed decisions about:
- **Retraining intervals**: Determine how frequently the model needs to be retrained based on the observed rate of performance degradation.
- **Monitoring metrics**: Set performance thresholds that trigger retraining or model updates when accuracy falls below a certain point.

Regular monitoring and retraining are critical to maintaining the effectiveness of machine learning models in production environments.