# Chapter 26: Gender-Based Education Analysis

⚠️ **DO NOT SKIP THIS CELL**

## Run the Next cell.
### Before executing any other cell you must run the next cell to set up the project folder environment.

In [None]:
from pathlib import Path

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    drive.mount("/content/drive")
    PROJECT_ROOT = Path("/content/drive/MyDrive/DataScience/census-education-analysis")
else:
    PROJECT_ROOT = Path.cwd().parent

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
STAGING_DIR = DATA_DIR / "staging"
PROCESSED_DIR = DATA_DIR / "processed"
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

PROJECT_ROOT


## Problem 1: What Data Are We Starting With?

In [None]:
import pandas as pd

india_path = PROCESSED_DIR / "india_literacy.csv"
india_df = pd.read_csv(india_path)

india_df.head()

## Problem 2: Why Are Raw Literacy Counts Misleading for Gender Analysis?

## Problem 3: Which Rows Should Be Used for Gender Analysis?

In [None]:
gender_df = india_df[india_df["area_type"] == "total"].copy()
gender_df.shape

## Problem 4: What Basic Numbers Do We Already Have?

## Problem 5: How Do We Compute Male and Female Literacy Rates?

In [None]:
gender_df["male_literacy_rate"] = (
    gender_df["male_literate"] / gender_df["male_persons"]
)

gender_df["female_literacy_rate"] = (
    gender_df["female_literate"] / gender_df["female_persons"]
)

## Problem 6: How Do We Measure the Gender Literacy Gap Directly?

In [None]:
gender_df["gender_literacy_gap"] = (
    gender_df["male_literacy_rate"] -
    gender_df["female_literacy_rate"]
)

## Problem 7: How Do We Summarize Gender Inequality at the State Level?

In [None]:
state_gender_gap = (
    gender_df
    .groupby("state_name")["gender_literacy_gap"]
    .mean()
    .reset_index()
    .sort_values("gender_literacy_gap", ascending=False)
)

## Problem 8: How Do We Sanity-Check These Results?

In [None]:
state_gender_gap.head()
state_gender_gap.tail()

## Problem 9: How Do We Save Gender Metrics for the Next Chapter?

In [None]:
gender_output_path = PROCESSED_DIR / "india_gender_metrics.csv"
gender_df.to_csv(gender_output_path, index=False)

gender_output_path

## End-of-Chapter Direction