# Notebook 3: Statistical Modeling of Maternal Health Outcomes


This notebook evaluates the relationship between proxy maternal health funding and key maternal health outcomes, such as **preterm birth rate** and **infant mortality rate**, across Texas counties.

We will use **linear regression** to assess whether funding levels are associated with better outcomes, while controlling for potential confounders (e.g., poverty %, insurance coverage).


In [1]:
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

ImportError: cannot import name '_lazywhere' from 'scipy._lib._util' (/Users/stephmozley/anaconda3/lib/python3.11/site-packages/scipy/_lib/_util.py)

## Step 1: Load and Merge Datasets

In [None]:

# Load proxy funding data
funding_df = pd.read_csv("../data/processed/proxy_funding_by_county_year.csv")

# Load outcomes (example placeholders — replace with actual paths)
outcomes_df = pd.read_csv("../data/processed/maternal_health_outcomes.csv")  # should include preterm_rate, infant_mortality, etc.

# Load sociodemographic covariates (e.g., poverty %, uninsured %)
covars_df = pd.read_csv("../data/processed/county_covariates.csv")

# Merge all on county and year
merged_df = funding_df.merge(outcomes_df, on=["county", "year"]).merge(covars_df, on=["county", "year"])

merged_df.head()


## Step 2: Linear Regression Model - Preterm Birth Rate

In [None]:

# Define model variables
X = merged_df[["proxy_funding", "poverty_pct", "uninsured_women_pct"]]
y = merged_df["preterm_birth_rate"]

# Add constant and fit model
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())


## Step 3: Linear Regression Model - Infant Mortality Rate

In [None]:

X = merged_df[["proxy_funding", "poverty_pct", "uninsured_women_pct"]]
y = merged_df["infant_mortality_rate"]

X = sm.add_constant(X)
model2 = sm.OLS(y, X).fit()
print(model2.summary())


## Step 4: Interpretation


- **Coefficients** show the estimated change in outcome per unit change in the predictors.
- **p-values** indicate statistical significance. Values < 0.05 suggest a strong relationship.
- If `proxy_funding` is significant and negative, this suggests higher funding is associated with better outcomes (e.g., lower preterm rates).
