# 02 - Fetch and Merge ACS Variables for CVD Mortality

This notebook fetches selected variables from the ACS Census API for years 2012-2019 and merges them with preprocessed CVD mortality data.

## Variables Selected (19 total):

### Standard Tables - B (14 variables):
- Median Household Income
- Total Population
- Gini Index
- Median Age
- Hispanic Population
- Black Population
- White Population
- No Vehicle (Owner)
- No Vehicle (Renter)
- Total Occupied Households
- Rent Burden Count (+50%)
- Rent Denominator
- Total Families (Single Mother)
- Total Families

### Summary Tables - S (5 variables):
- Poverty Rate
- Unemployment Rate
- Disability Rate
- Bachelor's Degree or Higher (%)
- High School Degree or Higher (%)

**Note:** Variable names are in Title Case format (publication-ready). Raw counts will be converted to percentages in later notebooks.

## 1. Import Libraries

In [2]:
import pandas as pd
import requests
import os

## 2. API Key Configuration

Set your Census API key as an environment variable for security:
```bash
export CENSUS_API_KEY='your_api_key_here'
```

In [3]:
# Option 1: Load from environment variable (recommended)
API_KEY = os.getenv('CENSUS_API_KEY')

# Option 2: Set directly if not using environment variable
if API_KEY is None:
    API_KEY = '2a58865a16f7670d452bcfcb4a5b767db1ce8973'  # Replace with your key
    print("Warning: Using hardcoded API key. Consider using environment variable for security.")
else:
    print("API key loaded from environment variable")



## 3. Function Definitions

Two functions to handle different ACS table types: B (standard) and S (summary).

### 3.1 Function for Standard Tables (B)

In [4]:
def fetch_and_merge_acs_variable(variable_code, variable_name, year, api_key):
    """
    Fetches a standard ACS variable (B table) from the Census API and merges it with 
    the preprocessed CVD mortality data.
    
    Parameters:
    -----------
    variable_code : str
        ACS variable code to fetch (e.g., 'B19013_001E' for Median Household Income)
    variable_name : str
        Descriptive name for the variable in Title Case (e.g., 'Median Household Income')
    year : int
        Year of the ACS data (2012-2019)
    api_key : str
        API key for accessing the Census API
    
    Returns:
    --------
    pd.DataFrame
        Merged DataFrame with CVD mortality and the ACS variable
    """
    # Input and output paths - using data_cvd folder structure
    preprocessed_path = f'../data_cvd/processed/preprocessed_fips_cvd/preprocessed_cvd_fips_{year}.csv'
    output_path = f'../data_cvd/processed/acs_individual_variables/dataset_with_{variable_name}_{year}.csv'
    
    # Load preprocessed CVD and FIPS data
    print(f"Loading preprocessed data for {year}...")
    preprocessed_df = pd.read_csv(preprocessed_path)
    
    # Construct API endpoint and parameters
    print(f"Fetching {variable_name} ({variable_code}) from Census API for {year}...")
    acs_endpoint = f'https://api.census.gov/data/{year}/acs/acs5'
    params = {
        'get': variable_code,
        'for': 'county:*',  # All counties
        'in': 'state:*',    # All states
        'key': api_key
    }
    
    # Make the GET request
    response = requests.get(acs_endpoint, params=params)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch {variable_name}. Status code: {response.status_code}")
    
    # Parse the API response into a DataFrame
    acs_data = response.json()
    acs_df = pd.DataFrame(columns=acs_data[0], data=acs_data[1:])
    acs_df = acs_df.rename(columns={
        variable_code: variable_name,
        'state': 'State_FIPS',
        'county': 'County_FIPS'
    })

    # Format FIPS codes with leading zeros (e.g., '01' for Alabama, '001' for county)
    preprocessed_df['State_FIPS'] = preprocessed_df['State_FIPS'].astype(str).str.zfill(2)
    preprocessed_df['County_FIPS'] = preprocessed_df['County_FIPS'].astype(str).str.zfill(3)
    acs_df['State_FIPS'] = acs_df['State_FIPS'].str.zfill(2)
    acs_df['County_FIPS'] = acs_df['County_FIPS'].str.zfill(3)
    
    # Convert variable to numeric (handles missing values as NaN)
    acs_df[variable_name] = pd.to_numeric(acs_df[variable_name], errors='coerce')
    
    # Merge ACS data with preprocessed CVD mortality data on FIPS codes
    final_df = pd.merge(
        preprocessed_df,
        acs_df[['State_FIPS', 'County_FIPS', variable_name]],
        on=['State_FIPS', 'County_FIPS'],
        how='left'
    )
    
    # Verify that CVD mortality column exists
    if 'cvd_mortality_rate' not in final_df.columns:
        raise ValueError("'cvd_mortality_rate' column is missing in the final merged dataset.")
    
    # Ensure output directory exists
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    
    # Save the merged dataset
    final_df.to_csv(output_path, index=False)
    print(f"  Saved: {output_path}")
    
    return final_df

### 3.2 Function for Summary Tables (S)

In [5]:
def fetch_and_merge_acs_variable_summary(variable_code, variable_name, year, api_key):
    """
    Fetches an ACS summary table variable (S table) from the Census API and merges it 
    with the preprocessed CVD mortality data.
    
    Parameters:
    -----------
    variable_code : str
        ACS variable code to fetch (e.g., 'S1701_C03_001E' for Poverty Rate)
    variable_name : str
        Descriptive name for the variable (e.g., 'Poverty Rate')
    year : int
        Year of the ACS data (e.g., 2012, 2013, ...)
    api_key : str
        API key for accessing the Census API
    
    Returns:
    --------
    pd.DataFrame
        Merged DataFrame with CVD mortality and the ACS variable
    """
    # Input and output paths - using data_cvd folder structure
    preprocessed_path = f'../data_cvd/processed/preprocessed_fips_cvd/preprocessed_cvd_fips_{year}.csv'
    output_path = f'../data_cvd/processed/acs_individual_variables/dataset_with_{variable_name}_{year}.csv'
    
    # Load preprocessed CVD and FIPS data
    print(f"Loading preprocessed data for {year}...")
    preprocessed_df = pd.read_csv(preprocessed_path)
    
    # Construct API endpoint for summary tables (note the '/subject' endpoint)
    print(f"Fetching {variable_name} ({variable_code}) from Census API for {year}...")
    acs_endpoint = f'https://api.census.gov/data/{year}/acs/acs5/subject'
    params = {
        'get': variable_code,
        'for': 'county:*',
        'in': 'state:*',
        'key': api_key
    }
    
    # Make the GET request
    response = requests.get(acs_endpoint, params=params)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch {variable_name}. Status code: {response.status_code}")
    
    # Parse the API response
    acs_data = response.json()
    acs_df = pd.DataFrame(columns=acs_data[0], data=acs_data[1:])
    acs_df = acs_df.rename(columns={
        variable_code: variable_name,
        'state': 'State_FIPS',
        'county': 'County_FIPS'
    })

    # Format FIPS codes with leading zeros (e.g., '01' for Alabama, '001' for county)
    preprocessed_df['State_FIPS'] = preprocessed_df['State_FIPS'].astype(str).str.zfill(2)
    preprocessed_df['County_FIPS'] = preprocessed_df['County_FIPS'].astype(str).str.zfill(3)
    acs_df['State_FIPS'] = acs_df['State_FIPS'].str.zfill(2)
    acs_df['County_FIPS'] = acs_df['County_FIPS'].str.zfill(3)
    
    # Convert variable to numeric (handles missing values as NaN)
    acs_df[variable_name] = pd.to_numeric(acs_df[variable_name], errors='coerce')
    
    # Merge ACS data with preprocessed CVD mortality data on FIPS codes
    final_df = pd.merge(
        preprocessed_df,
        acs_df[['State_FIPS', 'County_FIPS', variable_name]],
        on=['State_FIPS', 'County_FIPS'],
        how='left'
    )
    
    # Verify that CVD mortality column exists
    if 'cvd_mortality_rate' not in final_df.columns:
        raise ValueError("'cvd_mortality_rate' column is missing in the final merged dataset.")
    
    # Ensure output directory exists
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    
    # Save the merged dataset
    final_df.to_csv(output_path, index=False)
    print(f"  Saved: {output_path}")
    
    return final_df

## 4. Fetch Standard Variables (B Tables)

Demographic and economic variables for all counties, 2012-2019.

In [6]:
# Define standard variables (B tables)
standard_variables = [
    {'code': 'B19013_001E', 'name': 'Median Household Income'},
    {'code': 'B01003_001E', 'name': 'Total Population'},
    {'code': 'B19083_001E', 'name': 'Gini Index'},
    {'code': 'B01002_001E', 'name': 'Median Age'},
    {'code': 'B03003_003E', 'name': 'Hispanic Population'},
    {'code': 'B02001_003E', 'name': 'Black Population'},
    {'code': 'B02001_002E', 'name': 'White Population'},
    {'code': 'B25044_003E', 'name': 'No Vehicle (Owner)'},
    {'code': 'B25044_010E', 'name': 'No Vehicle (Renter)'},
    {'code': 'B25044_001E', 'name': 'Total Occupied Households'},
    {'code': 'B25070_010E', 'name': 'Rent Burden Count (+50%)'},
    {'code': 'B25070_001E', 'name': 'Rent Denominator'},
    {'code': 'B11003_016E', 'name': 'Total Families (Single Mother)'},
    {'code': 'B11003_001E', 'name': 'Total Families'}
]

# Fetch all standard variables for years 2012-2019
for year in range(2012, 2020):
    print(f"\n--- Year {year} ---")
    for var in standard_variables:
        fetch_and_merge_acs_variable(
            variable_code=var['code'],
            variable_name=var['name'],
            year=year,
            api_key=API_KEY
        )

print("\nAll standard variables fetched successfully!")


--- Year 2012 ---
Loading preprocessed data for 2012...
Fetching Median Household Income (B19013_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Median Household Income_2012.csv
Loading preprocessed data for 2012...
Fetching Total Population (B01003_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Total Population_2012.csv
Loading preprocessed data for 2012...
Fetching Gini Index (B19083_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Gini Index_2012.csv
Loading preprocessed data for 2012...
Fetching Median Age (B01002_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Median Age_2012.csv
Loading preprocessed data for 2012...
Fetching Hispanic Population (B03003_003E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Hispanic 

## 5. Fetch Summary Table Variables (S Tables)

Health, social, and education variables.

In [7]:
# Summary table variables (S tables)
summary_variables = [
    {'code': 'S1701_C03_001E', 'name': 'Poverty Rate'},
    {'code': 'S2301_C04_001E', 'name': 'Unemployment Rate'},
    {'code': 'S1810_C03_001E', 'name': 'Disability Rate'},
    {'code': 'S1501_C02_015E', 'name': "Bachelor's Degree or Higher (%)"},
    {'code': 'S1501_C02_014E', 'name': 'High School Degree or Higher (%)'}
]

print("=" * 60)
print("FETCHING SUMMARY TABLE VARIABLES (S Tables)")
print("=" * 60)

for year in range(2012, 2020):
    print(f"\n--- Year {year} ---")
    for var in summary_variables:
        fetch_and_merge_acs_variable_summary(
            variable_code=var['code'],
            variable_name=var['name'],
            year=year,
            api_key=API_KEY
        )

print("\nAll summary table variables fetched successfully!")

FETCHING SUMMARY TABLE VARIABLES (S Tables)

--- Year 2012 ---
Loading preprocessed data for 2012...
Fetching Poverty Rate (S1701_C03_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Poverty Rate_2012.csv
Loading preprocessed data for 2012...
Fetching Unemployment Rate (S2301_C04_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Unemployment Rate_2012.csv
Loading preprocessed data for 2012...
Fetching Disability Rate (S1810_C03_001E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Disability Rate_2012.csv
Loading preprocessed data for 2012...
Fetching Bachelor's Degree or Higher (%) (S1501_C02_015E) from Census API for 2012...
  Saved: ../data_cvd/processed/acs_individual_variables/dataset_with_Bachelor's Degree or Higher (%)_2012.csv
Loading preprocessed data for 2012...
Fetching High School Degree or Higher (%) (S1501_C02_014E) fr

## 6. Summary

All ACS variables have been fetched and merged with CVD mortality data for years 2012-2019.

**Output location:** `../data_cvd/processed/acs_individual_variables/`

**Total files created:** 19 variables Ã— 8 years = 152 CSV files

Each file contains:
- County and State information
- State_FIPS and County_FIPS codes
- Fips (full 5-digit FIPS code)
- `cvd_mortality_rate` (target variable)
- The corresponding ACS variable (in Title Case)

**Next step:** Proceed to notebook 03 to combine all features into a single dataset for each year.

In [8]:
# Verify the output
output_dir = '../data_cvd/processed/acs_individual_variables/'
if os.path.exists(output_dir):
    files = [f for f in os.listdir(output_dir) if f.endswith('.csv')]
    print(f"Total files created: {len(files)}")
    print(f"\nSample files:")
    for f in sorted(files)[:5]:
        print(f"  {f}")

Total files created: 152

Sample files:
  dataset_with_Bachelor's Degree or Higher (%)_2012.csv
  dataset_with_Bachelor's Degree or Higher (%)_2013.csv
  dataset_with_Bachelor's Degree or Higher (%)_2014.csv
  dataset_with_Bachelor's Degree or Higher (%)_2015.csv
  dataset_with_Bachelor's Degree or Higher (%)_2016.csv
