# Chapter 35: Building a Responsible First Model

⚠️ **DO NOT SKIP THIS CELL**

## Run the Next cell.
### Before executing any other cell you must run the next cell to set up the project folder environment.

In [None]:
from pathlib import Path

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    drive.mount("/content/drive")
    PROJECT_ROOT = Path("/content/drive/MyDrive/DataScience/census-education-analysis")
else:
    PROJECT_ROOT = Path.cwd().parent

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
STAGING_DIR = DATA_DIR / "staging"
PROCESSED_DIR = DATA_DIR / "processed"
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

PROJECT_ROOT


## Problem 1: What Dataset Are We Starting From?

In [None]:
import pandas as pd

input_path = OUTPUTS_DIR / "india_feature_ready.csv"
df = pd.read_csv(input_path)

df.head()

## Problem 2: What Decision Are We Trying to Support?

## Problem 3: Why Choose the Simplest Possible Model?

## Problem 4: How Do We Separate Training from Testing (Conceptually)?

In [None]:
from sklearn.model_selection import train_test_split

features = [
    "literacy_rate",
    "gender_literacy_gap",
    "total_persons"
]

X = df[features]
y = df["priority_flag"]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

## Problem 5: What Kind of Model Fits This Goal?

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

## Problem 6: What Does the Model Actually Produce?

In [None]:
df["risk_score"] = model.predict_proba(X)[:, 1]

## Problem 7: How Do We Evaluate Usefulness (Not Just Accuracy)?

In [None]:
df.sort_values("risk_score", ascending=False).head()

## Problem 8: How Do We Keep Humans in the Loop?

In [None]:
df.sort_values("risk_score", ascending=False).head()

## Problem 9: Saving the Model Output for the Next Chapter

In [None]:
output_path = OUTPUTS_DIR / "india_model_scored.csv"
df.to_csv(output_path, index=False)

output_path

## End-of-Chapter Direction