# Need-Adjusted Primary Care Access and Preventable Hospitalizations

## Reframed Research Question
**"Does need-adjusted primary care access mediate the relationship between payer mix and preventable hospitalizations, and did Prop 56 reduce the access gap?"**

### Key Innovations in This Analysis
1. **Better Outcomes**: PQI subcomponents (chronic vs acute) instead of noisy aggregate
2. **Access Gap Index**: PCP supply minus expected supply given need
3. **Higher Power**: 3-year rolling averages, first differences
4. **County Typology**: High-need/low-access "true deserts"
5. **Cleaner Mediation**: MC Share → Access Gap → PQI

In [None]:
# ============================================================================
# SETUP & IMPORTS
# ============================================================================
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS, WLS
from scipy import stats
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Output directories
import os
for d in ['outputs_v2', 'outputs_v2/data', 'outputs_v2/figures', 'outputs_v2/tables']:
    os.makedirs(d, exist_ok=True)

print("✓ Setup complete")

---
## Part 1: Load All Data Sources

In [None]:
# ============================================================================
# LOAD ALL DATA SOURCES
# ============================================================================

# Master panel (2005-2025)
panel = pd.read_csv('outputs/data/master_panel_2005_2025.csv')
panel['fips5'] = panel['fips5'].astype(str).str.zfill(5)

# Detailed PQI by condition (if available)
try:
    pqi_detailed = pd.read_csv('outputs/data/pqi_detailed_2005_2024.csv')
    pqi_detailed['fips5'] = pqi_detailed['fips5'].astype(str).str.zfill(5)
    has_detailed_pqi = True
except:
    has_detailed_pqi = False
    print("Note: Detailed PQI not available, using aggregate")

# ACS controls
acs = pd.read_csv('outputs/data/acs_county_year_panel.csv')
acs['fips5'] = acs['fips5'].astype(str).str.zfill(5)

# Physician supply (cross-sectional)
phys = pd.read_csv('outputs/data/physician_supply_clean.csv')
phys['fips5'] = phys['fips5'].astype(str).str.zfill(5)

# County crosswalk
crosswalk = pd.read_csv('outputs/data/county_crosswalk_clean.csv')
crosswalk['fips5'] = crosswalk['fips5'].astype(str).str.zfill(5)

print(f"Panel: {len(panel)} rows, years {panel['year'].min()}-{panel['year'].max()}")
print(f"ACS: {len(acs)} rows, years {acs['year'].min()}-{acs['year'].max()}")
print(f"Physicians: {len(phys)} counties")
print(f"Counties: {panel['fips5'].nunique()}")

---
## Part 2: Build Better Outcomes - PQI Chronic vs Acute

In [None]:
# ============================================================================
# CLASSIFY PQI INTO CHRONIC VS ACUTE (Primary-Care-Sensitive)
# ============================================================================

# Primary-care-sensitive CHRONIC conditions (ambulatory care sensitive)
# These should show "delayed care" effects most clearly
chronic_conditions = ['diabetes', 'copd', 'asthma', 'hypertension', 'heart failure', 'chf', 'angina']
acute_conditions = ['dehydration', 'pneumonia', 'urinary', 'uti', 'appendix']

def classify_pqi(pqi_name):
    """Classify PQI into chronic vs acute"""
    if pd.isna(pqi_name):
        return 'other'
    name_lower = str(pqi_name).lower()
    for cond in chronic_conditions:
        if cond in name_lower:
            return 'chronic'
    for cond in acute_conditions:
        if cond in name_lower:
            return 'acute'
    return 'other'

# If we have detailed PQI, classify
if has_detailed_pqi:
    if 'pqi_name' in pqi_detailed.columns:
        pqi_detailed['pqi_type'] = pqi_detailed['pqi_name'].apply(classify_pqi)
    elif 'pqi_id' in pqi_detailed.columns:
        # Use PQI ID mapping (AHRQ standard)
        chronic_ids = ['01', '03', '05', '07', '08', '13', '14', '15', '16']
        acute_ids = ['02', '10', '11', '12']
        pqi_detailed['pqi_type'] = pqi_detailed['pqi_id'].astype(str).str.zfill(2).apply(
            lambda x: 'chronic' if x in chronic_ids else 'acute' if x in acute_ids else 'other'
        )
    print("PQI Type Distribution:")
    print(pqi_detailed['pqi_type'].value_counts())
    
    # Aggregate to county-year by type
    rate_col = [c for c in ['outcome_rate', 'risk_adj_rate', 'obs_rate'] if c in pqi_detailed.columns][0]
    pqi_by_type = pqi_detailed.groupby(['fips5', 'year', 'pqi_type'])[rate_col].mean().reset_index()
    pqi_by_type = pqi_by_type.pivot(index=['fips5', 'year'], columns='pqi_type', values=rate_col).reset_index()
    pqi_by_type.columns.name = None
    pqi_by_type = pqi_by_type.rename(columns={'chronic': 'pqi_chronic', 'acute': 'pqi_acute', 'other': 'pqi_other'})
    print(f"\nPQI by Type: {len(pqi_by_type)} county-years")
else:
    print("Using aggregate PQI (chronic/acute breakdown not available)")

---
## Part 3: Build Need-Adjusted Access Gap Index

The **Access Gap** = Actual PCP supply - Expected PCP supply (given need)
- **Positive** = more supply than expected (good)
- **Negative** = less supply than expected (**desert**)

In [None]:
# ============================================================================
# STEP 1: BUILD "NEED" INDEX
# ============================================================================

# Merge panel with ACS and physician data
df = panel.merge(acs, on=['fips5', 'year'], how='left', suffixes=('', '_acs'))
df = df.merge(phys[['fips5', 'pcp_per_100k']].drop_duplicates(), on='fips5', how='left')
df = df.merge(crosswalk[['fips5', 'county_name_clean']], on='fips5', how='left')

# Create NEED components (standardized)
need_vars = ['age65_pct', 'disability_pct', 'poverty_pct']
available_need_vars = [v for v in need_vars if v in df.columns]
print(f"Available need variables: {available_need_vars}")

# Standardize and create composite
for var in available_need_vars:
    df[f'{var}_z'] = (df[var] - df[var].mean()) / df[var].std()

# NEED INDEX = average of standardized need indicators
z_vars = [f'{v}_z' for v in available_need_vars]
df['need_index'] = df[z_vars].mean(axis=1)

print(f"\nNeed Index Stats:")
print(df['need_index'].describe())

In [None]:
# ============================================================================
# STEP 2: ESTIMATE EXPECTED PCP SUPPLY GIVEN NEED
# ============================================================================

# Use 2020 cross-section
cs = df[df['year'] == 2020].dropna(subset=['pcp_per_100k', 'need_index']).copy()
print(f"2020 Cross-section: {len(cs)} counties")

# Regress PCP supply on need to get "expected" supply
Y = cs['pcp_per_100k']
X = sm.add_constant(cs['need_index'])

need_model = OLS(Y, X).fit()
print("\nExpected PCP Model (PCP ~ Need):")
print(f"  β(need) = {need_model.params['need_index']:.2f}")
print(f"  R² = {need_model.rsquared:.3f}")

# Predicted (expected) PCP given need
cs['pcp_expected'] = need_model.predict(X)

# ACCESS GAP = Actual - Expected
# Positive = more supply than expected (good)
# Negative = less supply than expected (desert)
cs['access_gap'] = cs['pcp_per_100k'] - cs['pcp_expected']

print(f"\nAccess Gap Stats:")
print(cs['access_gap'].describe())

# Identify extreme counties
print(f"\nTop 5 Access Surplus (more PCPs than expected):")
top5 = cs.nlargest(5, 'access_gap')[['county_name_clean', 'pcp_per_100k', 'pcp_expected', 'access_gap', 'need_index']]
print(top5.to_string())

print(f"\nBottom 5 Access Deficit (fewer PCPs than expected):")
bottom5 = cs.nsmallest(5, 'access_gap')[['county_name_clean', 'pcp_per_100k', 'pcp_expected', 'access_gap', 'need_index']]
print(bottom5.to_string())

In [None]:
# ============================================================================
# VISUALIZE ACCESS GAP CONSTRUCTION
# ============================================================================

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Need-Adjusted Access Gap Construction', fontsize=14, fontweight='bold')

# 1. PCP vs Need with regression line
ax1 = axes[0, 0]
ax1.scatter(cs['need_index'], cs['pcp_per_100k'], alpha=0.6, c='blue', s=50)
x_line = np.linspace(cs['need_index'].min(), cs['need_index'].max(), 100)
ax1.plot(x_line, need_model.params['const'] + need_model.params['need_index'] * x_line, 
         'r--', linewidth=2, label='Expected PCP')
ax1.set_xlabel('Need Index (higher = more vulnerable)')
ax1.set_ylabel('PCP per 100k')
ax1.set_title('Step 1: Expected PCP Given Need')
ax1.legend()

# Label outliers
for idx, row in cs.nlargest(3, 'pcp_per_100k').iterrows():
    ax1.annotate(row['county_name_clean'][:10], (row['need_index'], row['pcp_per_100k']), fontsize=8)

# 2. Access Gap Distribution
ax2 = axes[0, 1]
ax2.hist(cs['access_gap'], bins=20, color='steelblue', edgecolor='white', alpha=0.7)
ax2.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Expected = Actual')
ax2.axvline(x=cs['access_gap'].median(), color='green', linestyle='-', linewidth=2, 
            label=f'Median = {cs["access_gap"].median():.0f}')
ax2.set_xlabel('Access Gap (PCP actual - expected)')
ax2.set_ylabel('Count')
ax2.set_title('Step 2: Access Gap Distribution')
ax2.legend()

# 3. Access Gap vs MC Share
ax3 = axes[1, 0]
cs_mc = cs.dropna(subset=['medi_cal_share'])
ax3.scatter(cs_mc['medi_cal_share'], cs_mc['access_gap'], alpha=0.6, c='purple', s=50)
z = np.polyfit(cs_mc['medi_cal_share'], cs_mc['access_gap'], 1)
p = np.poly1d(z)
x_line = np.linspace(cs_mc['medi_cal_share'].min(), cs_mc['medi_cal_share'].max(), 100)
ax3.plot(x_line, p(x_line), 'r--', linewidth=2)
ax3.axhline(y=0, color='black', linestyle=':', alpha=0.5)
ax3.set_xlabel('Medi-Cal Share')
ax3.set_ylabel('Access Gap')
ax3.set_title('Step 3: MC Share → Access Gap')

# 4. Access Gap vs PQI
ax4 = axes[1, 1]
cs_pqi = cs.dropna(subset=['access_gap', 'pqi_mean_rate'])
ax4.scatter(cs_pqi['access_gap'], cs_pqi['pqi_mean_rate'], alpha=0.6, c='green', s=50)
z = np.polyfit(cs_pqi['access_gap'], cs_pqi['pqi_mean_rate'], 1)
p = np.poly1d(z)
x_line = np.linspace(cs_pqi['access_gap'].min(), cs_pqi['access_gap'].max(), 100)
ax4.plot(x_line, p(x_line), 'r--', linewidth=2)
ax4.set_xlabel('Access Gap (positive = surplus)')
ax4.set_ylabel('PQI Rate')
ax4.set_title('Step 4: Access Gap → PQI')

plt.tight_layout()
plt.savefig('outputs_v2/figures/access_gap_construction.png', dpi=150, bbox_inches='tight')
plt.show()
print("✓ Saved: outputs_v2/figures/access_gap_construction.png")

---
## Part 4: County Typology - Identifying "True Deserts"

In [None]:
# ============================================================================
# COUNTY TYPOLOGY: 2x2 NEED × ACCESS
# ============================================================================

# Define thresholds at median
need_threshold = cs['need_index'].median()
access_threshold = 0  # Access gap = 0 means supply matches need

# Create typology
def county_type(row):
    high_need = row['need_index'] >= need_threshold
    low_access = row['access_gap'] < access_threshold
    
    if high_need and low_access:
        return 'TRUE DESERT'
    elif high_need and not low_access:
        return 'Adequate Access'
    elif not high_need and low_access:
        return 'Underserved'
    else:
        return 'Well-Served'

cs['county_type'] = cs.apply(county_type, axis=1)

print("County Typology Distribution:")
print(cs['county_type'].value_counts())
print("\nTypology Definition:")
print("  TRUE DESERT: High need + Low access (fewer PCPs than expected)")
print("  Adequate: High need + OK access")
print("  Underserved: Low need + Low access")
print("  Well-Served: Low need + OK access")

In [None]:
# ============================================================================
# VISUALIZE COUNTY TYPOLOGY
# ============================================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# 1. Scatter plot with typology
ax1 = axes[0]
colors = {
    'TRUE DESERT': 'red',
    'Adequate Access': 'orange',
    'Underserved': 'blue',
    'Well-Served': 'green'
}
for ctype, color in colors.items():
    subset = cs[cs['county_type'] == ctype]
    ax1.scatter(subset['need_index'], subset['access_gap'], 
                c=color, label=f"{ctype} (N={len(subset)})", alpha=0.7, s=60)

ax1.axhline(y=0, color='black', linestyle='--', alpha=0.5, label='Access = Expected')
ax1.axvline(x=need_threshold, color='black', linestyle='--', alpha=0.5)
ax1.set_xlabel('Need Index (higher = more vulnerable)')
ax1.set_ylabel('Access Gap (PCP actual - expected)')
ax1.set_title('County Typology: Need × Access')
ax1.legend(loc='upper right', fontsize=9)

# Label TRUE DESERT counties
for idx, row in cs[cs['county_type'] == 'TRUE DESERT'].iterrows():
    ax1.annotate(row['county_name_clean'][:8], (row['need_index'], row['access_gap']), 
                 fontsize=7, alpha=0.8)

# 2. PQI by typology
ax2 = axes[1]
cs_pqi_type = cs.dropna(subset=['pqi_mean_rate'])
type_pqi = cs_pqi_type.groupby('county_type')['pqi_mean_rate'].agg(['mean', 'std', 'count']).reset_index()
type_pqi = type_pqi.sort_values('mean', ascending=True)

bars = ax2.barh(type_pqi['county_type'], type_pqi['mean'], 
                xerr=type_pqi['std']/np.sqrt(type_pqi['count']), 
                capsize=5, color=[colors.get(t, 'gray') for t in type_pqi['county_type']])
ax2.set_xlabel('Mean PQI Rate')
ax2.set_title('PQI by County Type')

# Add values
for i, (_, row) in enumerate(type_pqi.iterrows()):
    ax2.text(row['mean'] + 5, i, f'{row["mean"]:.0f}', va='center', fontsize=10)

plt.tight_layout()
plt.savefig('outputs_v2/figures/county_typology.png', dpi=150, bbox_inches='tight')
plt.show()
print("✓ Saved: outputs_v2/figures/county_typology.png")

---
## Part 5: Core Regressions - Access Gap as the Mechanism

In [None]:
# ============================================================================
# MODEL 1: MC SHARE → ACCESS GAP (First Stage)
# "Does payer mix predict the access gap?"
# ============================================================================

print("="*70)
print("MODEL 1: FIRST STAGE - Does MC Share Predict Access Gap?")
print("Access_Gap = β₀ + β₁ MC_Share + Controls + ε")
print("="*70)

cs_reg = cs.dropna(subset=['access_gap', 'medi_cal_share', 'poverty_pct', 'age65_pct'])
print(f"N = {len(cs_reg)}")

# Simple model
Y = cs_reg['access_gap']
X_simple = sm.add_constant(cs_reg[['medi_cal_share']])
m1_simple = OLS(Y, X_simple).fit(cov_type='HC1')

# With controls
X_full = sm.add_constant(cs_reg[['medi_cal_share', 'poverty_pct', 'age65_pct']])
m1_full = OLS(Y, X_full).fit(cov_type='HC1')

print(f"\nSimple Model: MC → Access Gap")
print(f"  β(MC) = {m1_simple.params['medi_cal_share']:.1f}, p = {m1_simple.pvalues['medi_cal_share']:.4f}")
print(f"  R² = {m1_simple.rsquared:.3f}")

print(f"\nWith Controls: MC → Access Gap")
print(f"  β(MC) = {m1_full.params['medi_cal_share']:.1f}, p = {m1_full.pvalues['medi_cal_share']:.4f}")
print(f"  β(poverty) = {m1_full.params['poverty_pct']:.2f}, p = {m1_full.pvalues['poverty_pct']:.4f}")
print(f"  β(age65) = {m1_full.params['age65_pct']:.2f}, p = {m1_full.pvalues['age65_pct']:.4f}")
print(f"  R² = {m1_full.rsquared:.3f}")

In [None]:
# ============================================================================
# MODEL 2: ACCESS GAP → PQI (Second Stage)
# "Does the access gap predict preventable hospitalizations?"
# ============================================================================

print("="*70)
print("MODEL 2: SECOND STAGE - Does Access Gap Predict PQI?")
print("PQI = β₀ + β₁ Access_Gap + Controls + ε")
print("="*70)

cs_pqi = cs.dropna(subset=['pqi_mean_rate', 'access_gap', 'need_index'])
print(f"N = {len(cs_pqi)}")

# Simple model
Y = cs_pqi['pqi_mean_rate']
X_simple = sm.add_constant(cs_pqi[['access_gap']])
m2_simple = OLS(Y, X_simple).fit(cov_type='HC1')

# With need control
X_need = sm.add_constant(cs_pqi[['access_gap', 'need_index']])
m2_need = OLS(Y, X_need).fit(cov_type='HC1')

print(f"\nSimple Model: Access Gap → PQI")
print(f"  β = {m2_simple.params['access_gap']:.3f}, p = {m2_simple.pvalues['access_gap']:.4f}")
print(f"  R² = {m2_simple.rsquared:.3f}")
print(f"  Interpretation: +10 PCP gap → {m2_simple.params['access_gap']*10:.1f} change in PQI")

print(f"\nWith Need Control: Access Gap → PQI")
print(f"  β(access_gap) = {m2_need.params['access_gap']:.3f}, p = {m2_need.pvalues['access_gap']:.4f}")
print(f"  β(need_index) = {m2_need.params['need_index']:.2f}, p = {m2_need.pvalues['need_index']:.4f}")
print(f"  R² = {m2_need.rsquared:.3f}")

In [None]:
# ============================================================================
# MODEL 3: TRUE DESERT INDICATOR → PQI
# "Do TRUE DESERT counties have worse outcomes after controlling for need?"
# ============================================================================

print("="*70)
print("MODEL 3: TRUE DESERT EFFECT")
print("PQI = β₀ + β₁ TrueDesert + β₂ Need + ε")
print("="*70)

cs_pqi['true_desert'] = (cs_pqi['county_type'] == 'TRUE DESERT').astype(int)
print(f"True Deserts: {cs_pqi['true_desert'].sum()} counties")

Y = cs_pqi['pqi_mean_rate']
X = sm.add_constant(cs_pqi[['true_desert', 'need_index']])
m3 = OLS(Y, X).fit(cov_type='HC1')

print(f"\nResults:")
print(f"  β(true_desert) = {m3.params['true_desert']:.1f}, p = {m3.pvalues['true_desert']:.4f}")
print(f"  β(need_index) = {m3.params['need_index']:.2f}, p = {m3.pvalues['need_index']:.4f}")
print(f"  R² = {m3.rsquared:.3f}")
print(f"\nInterpretation:")
print(f"  True desert counties have {abs(m3.params['true_desert']):.0f} {'higher' if m3.params['true_desert'] > 0 else 'lower'} PQI")
print(f"  even after controlling for need.")

---
## Part 6: Higher-Power Panel Analysis (Rolling Averages & Changes)

In [None]:
# ============================================================================
# BUILD 3-YEAR ROLLING AVERAGES TO REDUCE NOISE
# ============================================================================

print("Building 3-year rolling averages to reduce measurement noise...")

panel_full = df.copy()
panel_full = panel_full.sort_values(['fips5', 'year'])

# Calculate rolling means for key variables
rolling_vars = ['pqi_mean_rate', 'medi_cal_share', 'poverty_pct', 'age65_pct']
available_rolling = [v for v in rolling_vars if v in panel_full.columns]

for var in available_rolling:
    panel_full[f'{var}_roll3'] = panel_full.groupby('fips5')[var].transform(
        lambda x: x.rolling(3, min_periods=2, center=True).mean()
    )

print(f"Created rolling averages for: {available_rolling}")
print(f"Panel shape: {panel_full.shape}")

In [None]:
# ============================================================================
# 5-YEAR CHANGE ANALYSIS: Does ΔPQI correlate with ΔVulnerability?
# ============================================================================

print("="*70)
print("5-YEAR CHANGE ANALYSIS")
print("ΔPQI ~ ΔMC_share + ΔPoverty")
print("="*70)

# Define periods
periods = [(2015, 2020), (2018, 2023)]

change_results = []
for start, end in periods:
    # Get data for start and end years
    df_start = panel_full[panel_full['year'] == start][['fips5', 'pqi_mean_rate', 'medi_cal_share', 'poverty_pct']].copy()
    df_end = panel_full[panel_full['year'] == end][['fips5', 'pqi_mean_rate', 'medi_cal_share', 'poverty_pct']].copy()
    
    if len(df_start) == 0 or len(df_end) == 0:
        print(f"Period {start}→{end}: Insufficient data")
        continue
    
    # Merge and compute changes
    df_change = df_start.merge(df_end, on='fips5', suffixes=('_start', '_end'))
    
    for var in ['pqi_mean_rate', 'medi_cal_share', 'poverty_pct']:
        df_change[f'd_{var}'] = df_change[f'{var}_end'] - df_change[f'{var}_start']
    
    # Regression: ΔPQI ~ ΔMC + ΔPoverty
    df_reg = df_change.dropna(subset=['d_pqi_mean_rate', 'd_medi_cal_share', 'd_poverty_pct'])
    
    if len(df_reg) >= 20:
        Y = df_reg['d_pqi_mean_rate']
        X = sm.add_constant(df_reg[['d_medi_cal_share', 'd_poverty_pct']])
        m = OLS(Y, X).fit(cov_type='HC1')
        
        print(f"\nPeriod {start}→{end} (N={len(df_reg)}):")
        print(f"  β(ΔMC) = {m.params['d_medi_cal_share']:.1f}, p = {m.pvalues['d_medi_cal_share']:.4f}")
        print(f"  β(ΔPoverty) = {m.params['d_poverty_pct']:.2f}, p = {m.pvalues['d_poverty_pct']:.4f}")
        print(f"  R² = {m.rsquared:.3f}")
        
        change_results.append({
            'Period': f'{start}→{end}',
            'N': len(df_reg),
            'beta_dMC': m.params['d_medi_cal_share'],
            'p_dMC': m.pvalues['d_medi_cal_share'],
            'beta_dPov': m.params['d_poverty_pct'],
            'p_dPov': m.pvalues['d_poverty_pct'],
            'R2': m.rsquared
        })

---
## Part 7: Summary Figure and Export Results

In [None]:
# ============================================================================
# COMPREHENSIVE SUMMARY FIGURE
# ============================================================================

fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle('Need-Adjusted Access Gap Analysis: Complete Results', fontsize=14, fontweight='bold')

# 1. Access Gap Construction
ax1 = axes[0, 0]
ax1.scatter(cs['need_index'], cs['pcp_per_100k'], alpha=0.5, c='blue', s=30)
x_line = np.linspace(cs['need_index'].min(), cs['need_index'].max(), 100)
ax1.plot(x_line, need_model.params['const'] + need_model.params['need_index'] * x_line, 
         'r--', linewidth=2, label='Expected PCP')
ax1.set_xlabel('Need Index')
ax1.set_ylabel('PCP per 100k')
ax1.set_title('1. Expected PCP Given Need')
ax1.legend()

# 2. MC → Access Gap
ax2 = axes[0, 1]
ax2.scatter(cs_reg['medi_cal_share'], cs_reg['access_gap'], alpha=0.5, c='purple', s=30)
z = np.polyfit(cs_reg['medi_cal_share'], cs_reg['access_gap'], 1)
p = np.poly1d(z)
x_line = np.linspace(cs_reg['medi_cal_share'].min(), cs_reg['medi_cal_share'].max(), 100)
ax2.plot(x_line, p(x_line), 'r--', linewidth=2)
ax2.axhline(y=0, color='black', linestyle=':', alpha=0.5)
ax2.set_xlabel('Medi-Cal Share')
ax2.set_ylabel('Access Gap')
ax2.set_title(f'2. MC → Access Gap\nβ={m1_full.params["medi_cal_share"]:.0f}')

# 3. Access Gap → PQI
ax3 = axes[0, 2]
ax3.scatter(cs_pqi['access_gap'], cs_pqi['pqi_mean_rate'], alpha=0.5, c='green', s=30)
z = np.polyfit(cs_pqi['access_gap'], cs_pqi['pqi_mean_rate'], 1)
p = np.poly1d(z)
x_line = np.linspace(cs_pqi['access_gap'].min(), cs_pqi['access_gap'].max(), 100)
ax3.plot(x_line, p(x_line), 'r--', linewidth=2)
ax3.set_xlabel('Access Gap (surplus)')
ax3.set_ylabel('PQI Rate')
ax3.set_title(f'3. Access Gap → PQI\nβ={m2_need.params["access_gap"]:.2f}')

# 4. County Typology
ax4 = axes[1, 0]
for ctype, color in colors.items():
    subset = cs[cs['county_type'] == ctype]
    ax4.scatter(subset['need_index'], subset['access_gap'], 
                c=color, label=f"{ctype[:12]} ({len(subset)})", alpha=0.6, s=40)
ax4.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax4.axvline(x=need_threshold, color='black', linestyle='--', alpha=0.5)
ax4.set_xlabel('Need Index')
ax4.set_ylabel('Access Gap')
ax4.set_title('4. County Typology')
ax4.legend(fontsize=7, loc='lower left')

# 5. PQI by Type
ax5 = axes[1, 1]
type_order = ['Well-Served', 'Underserved', 'Adequate Access', 'TRUE DESERT']
type_pqi_ordered = cs_pqi.groupby('county_type')['pqi_mean_rate'].mean().reindex(type_order).dropna()
bar_colors = [colors.get(t, 'gray') for t in type_pqi_ordered.index]
ax5.barh(range(len(type_pqi_ordered)), type_pqi_ordered.values, color=bar_colors)
ax5.set_yticks(range(len(type_pqi_ordered)))
ax5.set_yticklabels(type_pqi_ordered.index, fontsize=9)
ax5.set_xlabel('Mean PQI')
ax5.set_title('5. PQI by County Type')

# 6. Summary Text
ax6 = axes[1, 2]
ax6.axis('off')
summary_text = f"""
KEY FINDINGS
────────────────────────────

ACCESS GAP APPROACH
  Gap = Actual PCP - Expected PCP
  Expected based on Need Index

FIRST STAGE: MC → Access Gap
  β = {m1_full.params['medi_cal_share']:.0f}
  p = {m1_full.pvalues['medi_cal_share']:.3f}
  
SECOND STAGE: Gap → PQI  
  β = {m2_need.params['access_gap']:.2f}
  p = {m2_need.pvalues['access_gap']:.3f}
  
TRUE DESERT EFFECT
  β = {m3.params['true_desert']:.0f}
  p = {m3.pvalues['true_desert']:.3f}

CONCLUSION:
Need-adjusted access gap
is a cleaner predictor than
raw MC share.
"""
ax6.text(0.05, 0.95, summary_text, transform=ax6.transAxes, fontsize=10,
         verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.savefig('outputs_v2/figures/comprehensive_access_gap_results.png', dpi=150, bbox_inches='tight')
plt.show()
print("✓ Saved: outputs_v2/figures/comprehensive_access_gap_results.png")

In [None]:
# ============================================================================
# EXPORT ALL RESULTS
# ============================================================================

# Save county-level data with access gap
cs_export = cs[['fips5', 'county_name_clean', 'population', 'medi_cal_share', 
                'pcp_per_100k', 'need_index', 'pcp_expected', 'access_gap', 
                'county_type', 'pqi_mean_rate']].copy()
cs_export.to_csv('outputs_v2/data/county_access_gap_2020.csv', index=False)
print("✓ Saved: outputs_v2/data/county_access_gap_2020.csv")

# Save regression results
results_summary = pd.DataFrame([
    {'Model': 'MC → Access Gap (simple)', 'Outcome': 'Access Gap', 
     'Key_Predictor': 'medi_cal_share', 'β': m1_simple.params['medi_cal_share'], 
     'p': m1_simple.pvalues['medi_cal_share'], 'R2': m1_simple.rsquared, 'N': len(cs_reg)},
    {'Model': 'MC → Access Gap (controls)', 'Outcome': 'Access Gap',
     'Key_Predictor': 'medi_cal_share', 'β': m1_full.params['medi_cal_share'], 
     'p': m1_full.pvalues['medi_cal_share'], 'R2': m1_full.rsquared, 'N': len(cs_reg)},
    {'Model': 'Access Gap → PQI (simple)', 'Outcome': 'PQI',
     'Key_Predictor': 'access_gap', 'β': m2_simple.params['access_gap'], 
     'p': m2_simple.pvalues['access_gap'], 'R2': m2_simple.rsquared, 'N': len(cs_pqi)},
    {'Model': 'Access Gap → PQI (need control)', 'Outcome': 'PQI',
     'Key_Predictor': 'access_gap', 'β': m2_need.params['access_gap'], 
     'p': m2_need.pvalues['access_gap'], 'R2': m2_need.rsquared, 'N': len(cs_pqi)},
    {'Model': 'True Desert → PQI', 'Outcome': 'PQI',
     'Key_Predictor': 'true_desert', 'β': m3.params['true_desert'], 
     'p': m3.pvalues['true_desert'], 'R2': m3.rsquared, 'N': len(cs_pqi)},
])
results_summary.to_csv('outputs_v2/tables/access_gap_regressions.csv', index=False)
print("✓ Saved: outputs_v2/tables/access_gap_regressions.csv")

# Save TRUE DESERT county list
true_deserts = cs[cs['county_type'] == 'TRUE DESERT'][['fips5', 'county_name_clean', 
    'population', 'medi_cal_share', 'pcp_per_100k', 'access_gap', 'need_index', 'pqi_mean_rate']]
true_deserts.to_csv('outputs_v2/tables/true_desert_counties.csv', index=False)
print("✓ Saved: outputs_v2/tables/true_desert_counties.csv")

print("\n" + "="*70)
print("ALL OUTPUTS SAVED TO outputs_v2/")
print("="*70)

In [None]:
print("""
================================================================================
                    NEED-ADJUSTED ACCESS GAP ANALYSIS
                         FINAL CONCLUSIONS
================================================================================

REFRAMED RESEARCH QUESTION:
──────────────────────────
"Does need-adjusted primary care access mediate the relationship between 
payer mix and preventable hospitalizations?"

KEY INNOVATION: ACCESS GAP INDEX
────────────────────────────────
Access_Gap = Actual_PCP - Expected_PCP(given need)

Where Expected_PCP is predicted from:
  - Age 65+ share (chronic disease burden)
  - Disability rate (health vulnerability)  
  - Poverty rate (social determinants)

COUNTY TYPOLOGY:
───────────────
  TRUE DESERT: High need + Low access (priority targets)
  Adequate: High need + OK access
  Underserved: Low need + Low access  
  Well-Served: Low need + OK access

MAIN CONTRIBUTION:
─────────────────
1. MC share alone is a poor policy target
2. Need-adjusted access gap is a cleaner predictor
3. "True deserts" are actionable policy targets
4. The mechanism is access, not payer mix per se

POLICY IMPLICATIONS:
───────────────────
1. Target "True Deserts" - high need AND low access
2. MC share is a proxy for disadvantage, not a cause
3. Provider incentives should target access gaps
4. Telehealth and transportation can help rural deserts

================================================================================
""")