# 01 — AHS Ingestion (American Housing Survey)

Unique cleaning and variable construction for AHS. Output must conform to the **analysis-ready schema** so that `02_analysis_AHS.ipynb` and `03_comparative_master.ipynb` can use it with `scripts/core_metrics.py`.

## Analysis-ready schema (standardized column names)

- **Target:** `Multigen_Rate`
- **IDs:** `GEOID` (or household id), `Area_Name` (optional)
- **Weight (optional):** `_total_hh`
- **Features:** See `scripts/core_metrics.ANALYSIS_READY_SCHEMA["feature_cols"]` (e.g. `Pct_65Plus`, `Median_HH_Income`, `Pct_Owner`, …)

Save output to `data/processed/ahs_analysis_ready.csv`.

In [None]:
import os
import sys
import pandas as pd

REPO_ROOT = os.path.dirname(os.getcwd()) if os.path.basename(os.getcwd()) == "notebooks" else os.getcwd()
sys.path.insert(0, os.path.join(REPO_ROOT, "scripts"))
from core_metrics import ANALYSIS_READY_SCHEMA

DATA_RAW = os.path.join(REPO_ROOT, "data", "raw", "ahs")
DATA_PROCESSED = os.path.join(REPO_ROOT, "data", "processed")
os.makedirs(DATA_PROCESSED, exist_ok=True)
print("Schema feature_cols (first 10):", ANALYSIS_READY_SCHEMA["feature_cols"][:10])

## AHS-specific cleaning

1. Place AHS raw files in `data/raw/ahs/`.
2. Map AHS variables to the schema (multigenerational indicator → `Multigen_Rate`, demographics → `Pct_*`, etc.).
3. Write analysis-ready DataFrame to `data/processed/ahs_analysis_ready.csv` with **identical column names**.

In [None]:
# After building df_ahs with schema columns:
# out_path = os.path.join(DATA_PROCESSED, "ahs_analysis_ready.csv")
# df_ahs.to_csv(out_path, index=False)
# print("Saved:", out_path)