# Chapter 25: Aggregation and National Metrics

⚠️ **DO NOT SKIP THIS CELL**

## Run the Next cell.
### Before executing any other cell you must run the next cell to set up the project folder environment.

In [None]:
from pathlib import Path

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    drive.mount("/content/drive")
    PROJECT_ROOT = Path("/content/drive/MyDrive/DataScience/census-education-analysis")
else:
    PROJECT_ROOT = Path.cwd().parent

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
STAGING_DIR = DATA_DIR / "staging"
PROCESSED_DIR = DATA_DIR / "processed"
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

PROJECT_ROOT


## Problem 1: What Data Are We Aggregating?

In [None]:
import pandas as pd

india_path = PROCESSED_DIR / "india_national.csv"
india_df = pd.read_csv(india_path)

india_df.head()

## Problem 2: Why Must We Remove State-Level Rows Before Aggregation?

In [None]:
district_df = india_df[india_df["district_code"] != 0]

## Problem 3: Why Must Aggregation Respect `area_type`?

## Problem 4: What Level of Analysis Do We Want First?

## Problem 5: Which Columns Can Be Safely Aggregated?

## Problem 6: How Do We Aggregate Districts into States?

In [None]:
state_area_totals = (
    district_df
    .groupby(["state_name", "area_type"], as_index=False)
    .agg({
        "total_persons": "sum",
        "male_persons": "sum",
        "female_persons": "sum",
        "total_literate": "sum",
        "male_literate": "sum",
        "female_literate": "sum",
        "total_illiterate": "sum",
        "male_illiterate": "sum",
        "female_illiterate": "sum",
    })
)
state_area_totals.head()

## Problem 7: How Do We Compute Meaningful State Metrics?

In [None]:
state_area_totals["literacy_rate"] = (
    state_area_totals["total_literate"]
    / state_area_totals["total_persons"]
)

## Problem 8: How Do We Quickly Read National Patterns?

In [None]:
state_area_totals.sort_values(
    ["area_type", "literacy_rate"],
    ascending=[True, False]
).head()

## Problem 9: How Do We Save Aggregated Results for the Next Chapter?

In [None]:
output_path = PROCESSED_DIR / "india_literacy.csv"
state_area_totals.to_csv(output_path, index=False)

output_path

## End-of-Chapter Direction