# 01 — ASEC Ingestion (CPS Annual Social and Economic Supplement)

Unique cleaning and variable construction for ASEC. Output must conform to the **analysis-ready schema** (same as AHS) so that `02_analysis_ASEC.ipynb` and `03_comparative_master.ipynb` can use it with `scripts/core_metrics.py`.

Save output to `data/processed/asec_analysis_ready.csv` with **identical column naming** as in `ANALYSIS_READY_SCHEMA`.

In [1]:
import os
import sys
import pandas as pd

REPO_ROOT = os.path.dirname(os.getcwd()) if os.path.basename(os.getcwd()) == "notebooks" else os.getcwd()
sys.path.insert(0, os.path.join(REPO_ROOT, "scripts"))
from core_metrics import ANALYSIS_READY_SCHEMA

DATA_RAW = os.path.join(REPO_ROOT, "data", "raw", "asec")
DATA_PROCESSED = os.path.join(REPO_ROOT, "data", "processed")
os.makedirs(DATA_PROCESSED, exist_ok=True)
print("Target:", ANALYSIS_READY_SCHEMA["target_col"], "| Features:", len(ANALYSIS_READY_SCHEMA["feature_cols"]))

Target: Multigen_Rate | Features: 42


In [8]:
hh = pd.read_csv('/Users/elyas/vscode/capstone_multigen_housing_econometric_analysis/data/raw/asec/hhpub25.csv')
p = pd.read_csv('/Users/elyas/vscode/capstone_multigen_housing_econometric_analysis/data/raw/asec/pppub25.csv')
f = pd.read_csv('/Users/elyas/vscode/capstone_multigen_housing_econometric_analysis/data/raw/asec/ffpub25.csv')

In [36]:
# --- STEP A: Identify Basic Generations (Relative to Ref Person) ---
p['is_ref'] = p['PERRP'].isin([40, 41]).astype(int)
p['is_child_of_ref'] = (p['PERRP'] == 48).astype(int)
p['is_grandchild_of_ref'] = (p['PERRP'] == 49).astype(int)
p['is_parent_of_ref'] = (p['PERRP'] == 50).astype(int)

# --- STEP B: Identify Adult Children (Ages 25+) ---
# This is the key to getting from 6% to 17%
p['is_adult_child_of_ref'] = ((p['is_child_of_ref'] == 1) & (p['A_AGE'] >= 25)).astype(int)

# --- STEP C: Identify Parent-Child links anywhere (Using Parent Pointers) ---
# PEPAR1 and PEPAR2 point to the line number (A_LINENO) of a parent in the house
p['has_parent_in_hh'] = (p['PEPAR1'] > 0) | (p['PEPAR2'] > 0)

# --- STEP D: Aggregate to Household Level (PH_SEQ) ---
hh_flags = p.groupby('PH_SEQ').agg({
    'is_ref': 'max',
    'is_child_of_ref': 'max',
    'is_grandchild_of_ref': 'max',
    'is_parent_of_ref': 'max',
    'is_adult_child_of_ref': 'max',
    'has_parent_in_hh': 'max'
}).reset_index()

# --- STEP E: The "Broad" Multigen Definition ---
# A household is multigen if:
# 1. It's a strict 3-gen (Ref + Child + Grandchild)
# 2. OR it's two adult generations (Ref + Adult Child 25+)
# 3. OR it's two adult generations (Ref + Parent 50)
hh_flags['is_multigen_broad'] = (
    ((hh_flags['is_ref'] == 1) & (hh_flags['is_child_of_ref'] == 1) & (hh_flags['is_grandchild_of_ref'] == 1)) |
    ((hh_flags['is_ref'] == 1) & (hh_flags['is_adult_child_of_ref'] == 1)) |
    ((hh_flags['is_ref'] == 1) & (hh_flags['is_parent_of_ref'] == 1))
).astype(int)

In [37]:
# Merge back to p (the person dataframe)
p = p.merge(hh_flags[['PH_SEQ', 'is_multigen_broad']], on='PH_SEQ', how='left')

# Calculate the share
total_pop = p['MARSUPWT'].sum()
multigen_pop = p.loc[p['is_multigen_broad'] == 1, 'MARSUPWT'].sum()

print(f"Share of US People living in Multigen Households: {(multigen_pop / total_pop) * 100:.2f}%")

Share of US People living in Multigen Households: 17.68%


## ASEC-specific cleaning

1. Place ASEC raw files in `data/raw/asec/`.
2. Map ASEC variables to the same schema (e.g. multigenerational → `Multigen_Rate`, income → `Median_HH_Income` or equivalent).
3. Write to `data/processed/asec_analysis_ready.csv` with standardized column names.

In [None]:
# After building df_asec:
# out_path = os.path.join(DATA_PROCESSED, "asec_analysis_ready.csv")
# df_asec.to_csv(out_path, index=False)
# print("Saved:", out_path)