# Tutorial 1: ACS PUMS Microdata Analysis

This tutorial demonstrates analyzing person-level microdata from the American Community Survey (ACS) Public Use Microdata Sample (PUMS).

**Goal:** Get age and sex data for adults in California and Texas, then create weighted frequency tables stratified by state.

## Setup

In [7]:
import os
from cendat import CenDatHelper
from dotenv import load_dotenv

# Load your API key from environment
load_dotenv()
cdh = CenDatHelper(years=[2022], key=os.getenv("CENSUS_API_KEY"))

✅ Years set to: [2022]
✅ API key loaded successfully.


## Step 1: Find and Select the PUMS Product

In [8]:
# Search for the ACS 1-year PUMS product
# The \b ensures we match the exact endpoint, not subpaths
cdh.list_products(patterns=r"acs/acs1/pums\b")
cdh.set_products()

✅ Product set: '2022 American Community Survey: 1-Year Estimates - Public Use Microdata Sample (2022/acs/acs1/pums)' (Vintage: [2022])


## Step 2: Select Geography and Variables

In [9]:
# For PUMS, geography is simpler—we just need "state"
cdh.set_geos(values="state", by="desc")

# Select the variables we need:
# - SEX: Person's sex
# - AGEP: Person's age
# - ST: State code
# - PWGTP: Person weight (crucial for microdata!)
cdh.set_variables(names=["SEX", "AGEP", "ST", "PWGTP"])

✅ Geographies set: 'state'
✅ Variables set:
  - Product: 2022 American Community Survey: 1-Year Estimates - Public Use Microdata Sample (2022/acs/acs1/pums) (Vintage: [2022])
    Variables: SEX, PWGTP, ST, AGEP


## Step 3: Get Data

In [10]:
# Fetch data for California (06) and Texas (48)
response = cdh.get_data(
    within={"state": ["06", "48"]}
)

✅ Parameters created for 1 geo-variable/group combinations.
✅ Data fetching complete. Stacking results.


## Step 4: Analyze with Tabulate

The `tabulate()` method creates Stata-style frequency tables with proper weighting:

In [11]:
# Age distribution by sex, stratified by state
# Only adults (AGEP > 17), using person weights
response.tabulate(
    "SEX", "AGEP",
    strat_by="ST",
    weight_var="PWGTP",
    where="AGEP > 17"
)

shape: (292, 7)
┌────┬─────┬──────┬─────────┬─────┬────────────┬────────┐
│ ST ┆ SEX ┆ AGEP ┆       n ┆ pct ┆       cumn ┆ cumpct │
╞════╪═════╪══════╪═════════╪═════╪════════════╪════════╡
│ 06 ┆   1 ┆   18 ┆ 263,169 ┆ 0.9 ┆    263,169 ┆    0.9 │
│ 06 ┆   1 ┆   19 ┆ 255,365 ┆ 0.8 ┆    518,534 ┆    1.7 │
│ 06 ┆   1 ┆   20 ┆ 279,423 ┆ 0.9 ┆    797,957 ┆    2.6 │
│ 06 ┆   1 ┆   21 ┆ 281,050 ┆ 0.9 ┆  1,079,007 ┆    3.5 │
│ 06 ┆   1 ┆   22 ┆ 269,824 ┆ 0.9 ┆  1,348,831 ┆    4.4 │
│ 06 ┆   1 ┆   23 ┆ 265,563 ┆ 0.9 ┆  1,614,394 ┆    5.3 │
│ 06 ┆   1 ┆   24 ┆ 267,327 ┆ 0.9 ┆  1,881,721 ┆    6.2 │
│ 06 ┆   1 ┆   25 ┆ 273,599 ┆ 0.9 ┆  2,155,320 ┆    7.1 │
│ 06 ┆   1 ┆   26 ┆ 287,357 ┆ 0.9 ┆  2,442,677 ┆    8.0 │
│ 06 ┆   1 ┆   27 ┆ 280,104 ┆ 0.9 ┆  2,722,781 ┆    8.9 │
│ 06 ┆   1 ┆   28 ┆ 287,141 ┆ 0.9 ┆  3,009,922 ┆    9.9 │
│ 06 ┆   1 ┆   29 ┆ 292,246 ┆ 1.0 ┆  3,302,168 ┆   10.8 │
│ 06 ┆   1 ┆   30 ┆ 315,175 ┆ 1.0 ┆  3,617,343 ┆   11.8 │
│ 06 ┆   1 ┆   31 ┆ 304,168 ┆ 1.0 ┆  3,921,511 ┆   12.8 

## Step 5: Convert to DataFrame

In [None]:
# For further analysis, convert to a DataFrame
df = response.to_polars(concat=True, destring=True)
print(df.head())

shape: (5, 9)
┌─────┬───────┬─────┬──────┬───┬─────────────────────────────────┬─────────┬────────┬───────┐
│ SEX ┆ PWGTP ┆ ST  ┆ AGEP ┆ … ┆ product                         ┆ vintage ┆ sumlev ┆ desc  │
│ --- ┆ ---   ┆ --- ┆ ---  ┆   ┆ ---                             ┆ ---     ┆ ---    ┆ ---   │
│ i64 ┆ i64   ┆ str ┆ i64  ┆   ┆ str                             ┆ str     ┆ str    ┆ str   │
╞═════╪═══════╪═════╪══════╪═══╪═════════════════════════════════╪═════════╪════════╪═══════╡
│ 1   ┆ 55    ┆ 48  ┆ 36   ┆ … ┆ 2022 American Community Survey… ┆ 2022    ┆ 040    ┆ state │
│ 1   ┆ 50    ┆ 48  ┆ 65   ┆ … ┆ 2022 American Community Survey… ┆ 2022    ┆ 040    ┆ state │
│ 1   ┆ 80    ┆ 48  ┆ 15   ┆ … ┆ 2022 American Community Survey… ┆ 2022    ┆ 040    ┆ state │
│ 1   ┆ 13    ┆ 48  ┆ 18   ┆ … ┆ 2022 American Community Survey… ┆ 2022    ┆ 040    ┆ state │
│ 2   ┆ 5     ┆ 48  ┆ 87   ┆ … ┆ 2022 American Community Survey… ┆ 2022    ┆ 040    ┆ state │
└─────┴───────┴─────┴──────┴───┴──────────────