# Tutorial 1: ACS PUMS Microdata Analysis

This tutorial demonstrates analyzing person-level microdata from the American Community Survey (ACS) Public Use Microdata Sample (PUMS).

**Goal:** Get age and sex data for adults in California and Texas, then create weighted frequency tables stratified by state.

## Setup

In [None]:
import os
from cendat import CenDatHelper
from dotenv import load_dotenv

# Load your API key from environment
load_dotenv()
cdh = CenDatHelper(years=[2022], key=os.getenv("CENSUS_API_KEY"))

## Step 1: Find and Select the PUMS Product

In [None]:
# Search for the ACS 1-year PUMS product
# The \b ensures we match the exact endpoint, not subpaths
cdh.list_products(patterns=r"acs/acs1/pums\b")
cdh.set_products()

## Step 2: Select Geography and Variables

In [None]:
# For PUMS, geography is simplerâ€”we just need "state"
cdh.set_geos(values="state", by="desc")

# Select the variables we need:
# - SEX: Person's sex
# - AGEP: Person's age
# - ST: State code
# - PWGTP: Person weight (crucial for microdata!)
cdh.set_variables(names=["SEX", "AGEP", "ST", "PWGTP"])

## Step 3: Get Data

In [None]:
# Fetch data for California (06) and Texas (48)
response = cdh.get_data(
    within={"state": ["06", "48"]}
)

## Step 4: Analyze with Tabulate

The `tabulate()` method creates Stata-style frequency tables with proper weighting:

In [None]:
# Age distribution by sex, stratified by state
# Only adults (AGEP > 17), using person weights
response.tabulate(
    "SEX", "AGEP",
    strat_by="ST",
    weight_var="PWGTP",
    where="AGEP > 17"
)

## Step 5: Convert to DataFrame

In [None]:
# For further analysis, convert to a DataFrame
df = response.to_polars(concat=True, destring=True)
print(df.head())