# Step 10: Model Monitoring (Detecting Data Drift)

### Objectives:
- Load the train and production datasets.
- Use alibi-detect to detect data drift.
- Print test results in a tabular format.

### 1. Install & Import Required Libraries

In [2]:
!pip install alibi-detect scikit-learn pandas numpy tabulate

Collecting alibi-detect
  Downloading alibi_detect-0.12.0-py3-none-any.whl.metadata (28 kB)
Collecting tabulate
  Using cached tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Collecting numpy
  Using cached numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
Collecting Pillow<11.0.0,>=5.4.1 (from alibi-detect)
  Using cached pillow-10.4.0-cp312-cp312-win_amd64.whl.metadata (9.3 kB)
Collecting opencv-python<5.0.0,>=3.2.0 (from alibi-detect)
  Downloading opencv_python-4.11.0.86-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting scikit-image<0.23,>=0.19 (from alibi-detect)
  Downloading scikit_image-0.22.0-cp312-cp312-win_amd64.whl.metadata (13 kB)
Collecting dill<0.4.0,>=0.3.0 (from alibi-detect)
  Downloading dill-0.3.9-py3-none-any.whl.metadata (10 kB)
Collecting catalogue<3.0.0,>=2.0.0 (from alibi-detect)
  Using cached catalogue-2.0.10-py3-none-any.whl.metadata (14 kB)
Collecting numba!=0.54.0,<0.60.0,>=0.50.0 (from alibi-detect)
  Downloading numba-0.59.1-cp312-cp312-win_amd6

  You can safely remove it manually.
  You can safely remove it manually.

[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
import pandas as pd
import numpy as np
from alibi_detect.cd import KSDrift
from tabulate import tabulate

### 2. Load Train & Production Data

In [5]:
# Load train dataset
train_data_path = "https://raw.githubusercontent.com/riasingh-13/Bank_Marketing_Project/main/datasets/train.parquet"
df_train = pd.read_parquet(train_data_path, engine="pyarrow")

# Load production dataset
prod_data_path = "https://raw.githubusercontent.com/riasingh-13/Bank_Marketing_Project/main/datasets/prod.parquet"
df_prod = pd.read_parquet(prod_data_path, engine="pyarrow")

# Drop target column if present
target_col = "y"
if target_col in df_train.columns:
    df_train = df_train.drop(columns=[target_col])

if target_col in df_prod.columns:
    df_prod = df_prod.drop(columns=[target_col])

# Ensure column names match
df_prod = df_prod[df_train.columns]

# Display dataset sizes
print(f"âœ… Train Data Shape: {df_train.shape}")
print(f"âœ… Production Data Shape: {df_prod.shape}")

âœ… Train Data Shape: (24712, 20)
âœ… Production Data Shape: (8238, 20)


### 3. Preprocess Data for Drift Detection

In [6]:
# Convert categorical variables to numeric
df_train_encoded = pd.get_dummies(df_train)
df_prod_encoded = pd.get_dummies(df_prod)

# Ensure both datasets have the same columns
missing_cols = set(df_train_encoded.columns) - set(df_prod_encoded.columns)
for col in missing_cols:
    df_prod_encoded[col] = 0  # Add missing columns with default value

df_prod_encoded = df_prod_encoded[df_train_encoded.columns]  # Reorder columns

# Convert to NumPy arrays for drift detection (WITHOUT preprocess_drift)
X_train = df_train_encoded.values
X_prod = df_prod_encoded.values

print("âœ… Data preprocessing completed for drift detection.")

âœ… Data preprocessing completed for drift detection.


### 4. Detect Data Drift Using Alibi-Detect

In [7]:
# Initialize Kolmogorov-Smirnov Drift Detector
cd = KSDrift(X_train, p_val=0.05)

# Run drift test on production data
preds = cd.predict(X_prod)

# Extract results
drift_status = "Drift Detected" if preds["data"]["is_drift"] == 1 else "No Significant Drift"
p_values = preds["data"]["p_val"]
feature_names = df_train_encoded.columns.tolist()

# Prepare table
drift_results = []
for i, feature in enumerate(feature_names):
    drift_results.append([feature, p_values[i], "Drift" if p_values[i] < 0.05 else "No Drift"])

# Display results in tabular format
print("\nðŸ“Š **Data Drift Analysis Results:**\n")
print(tabulate(drift_results, headers=["Feature", "p-value", "Drift Status"], tablefmt="grid"))

# Print overall drift status
print(f"\nâœ… **Overall Drift Status:** {drift_status}")


ðŸ“Š **Data Drift Analysis Results:**

+-------------------------------+-----------+----------------+
| Feature                       |   p-value | Drift Status   |
| age                           |  0.166333 | No Drift       |
+-------------------------------+-----------+----------------+
| duration                      |  0.498735 | No Drift       |
+-------------------------------+-----------+----------------+
| campaign                      |  0.222693 | No Drift       |
+-------------------------------+-----------+----------------+
| pdays                         |  0.993563 | No Drift       |
+-------------------------------+-----------+----------------+
| previous                      |  0.999986 | No Drift       |
+-------------------------------+-----------+----------------+
| emp.var.rate                  |  0.999998 | No Drift       |
+-------------------------------+-----------+----------------+
| cons.price.idx                |  0.998293 | No Drift       |
+--------------