# Chapter 40: Why Models Fail Over Time

⚠️ **DO NOT SKIP THIS CELL**

## Run the Next cell.
### Before executing any other cell you must run the next cell to set up the project folder environment.

In [None]:
from pathlib import Path

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    drive.mount("/content/drive")
    PROJECT_ROOT = Path("/content/drive/MyDrive/DataScience/census-education-analysis")
else:
    PROJECT_ROOT = Path.cwd().parent

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
STAGING_DIR = DATA_DIR / "staging"
PROCESSED_DIR = DATA_DIR / "processed"
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

PROJECT_ROOT


## Problem 1: What Dataset Are We Monitoring?

In [None]:
import pandas as pd

input_path = OUTPUTS_DIR / "india_bi_ready.csv"
df = pd.read_csv(input_path)

df.head()

## Problem 2: What Actually Changes After a Model Goes Live?

## Problem 3: What Is Data Drift in Plain Terms?

In [None]:
df[[
    "literacy_rate",
    "gender_literacy_gap",
    "risk_score"
]].describe()

## Problem 4: How Do Pipelines Break Without Throwing Errors?

## Problem 5: Why Are Silent Failures More Dangerous Than Crashes?

## Problem 6: Why Should We Monitor Data, Not Just Models?

In [None]:
df.isna().mean()

## Problem 7: Why Must Analysts Stay Involved After Deployment?

## Problem 8: Creating a Baseline Snapshot for Drift Detection

In [None]:
baseline_df = df.copy()
baseline_df["baseline_date"] = pd.Timestamp.today().date()

In [None]:
output_path = OUTPUTS_DIR / "india_bi_baseline_snapshot.csv"
baseline_df.to_csv(output_path, index=False)

output_path

## End-of-Chapter Direction