# Chapter 32: When Do You Actually Need a Model?

⚠️ **DO NOT SKIP THIS CELL**

## Run the Next cell.
### Before executing any other cell you must run the next cell to set up the project folder environment.

In [None]:
from pathlib import Path

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    drive.mount("/content/drive")
    PROJECT_ROOT = Path("/content/drive/MyDrive/DataScience/census-education-analysis")
else:
    PROJECT_ROOT = Path.cwd().parent

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
STAGING_DIR = DATA_DIR / "staging"
PROCESSED_DIR = DATA_DIR / "processed"
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

PROJECT_ROOT


## Problem 1: What Dataset Are We Starting From?

In [None]:
import pandas as pd

ai_path = OUTPUTS_DIR / "india_ai_ready.csv"
ai_df = pd.read_csv(ai_path)

ai_df.head()

## Problem 2: What Decision Are We Trying to Make?

## Problem 3: Can a Simple Rule Solve This?

In [None]:
ai_df["low_literacy_rule"] = ai_df["literacy_rate"] < 0.75

In [None]:
ai_df["low_literacy_rule"].value_counts()

## Problem 4: When Do Thresholds Improve on Simple Rules?

In [None]:
ai_df["high_gender_gap_threshold"] = (
    ai_df["gender_literacy_gap"] > 0.10
)

## Problem 5: How Do We Combine Multiple Simple Decisions?

In [None]:
ai_df["priority_flag"] = (
    ai_df["low_literacy_rule"] |
    ai_df["high_gender_gap_threshold"]
)

In [None]:
ai_df["priority_flag"].value_counts()

## Problem 6: When Do Simple Rules Start to Break?

## Problem 7: What Actually Changes When You Introduce a Model?

## Problem 8: Freezing Rule-Based Decisions for Comparison

In [None]:
decision_df = ai_df[[
    "state_name",
    "total_persons",
    "literacy_rate",
    "gender_literacy_gap",
    "low_literacy_rule",
    "high_gender_gap_threshold",
    "priority_flag"
]].copy()

In [None]:
output_path = OUTPUTS_DIR / "india_rule_based_decisions.csv"
decision_df.to_csv(output_path, index=False)

output_path

## End-of-Chapter Direction