In [5]:
import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv('~/Pycharmprojects/PredictingPatientReadmission/datasets/primary.csv')
df.head()

Unnamed: 0,Year,County,ICD Version,Total Admits (Consolidated),30-day Readmits (Consolidated),30-day Readmission Rate (Consolidated),PCPI,Population,Total Admits (Proportion),30-day Readmits (Proportion)
0,2011,Alameda,ICD-9,75103.0,11377.0,15.1,50432.0,1530893,0.049058,0.007432
1,2011,Alpine,ICD-9,13.0,1.0,7.7,54040.0,1093,0.011894,0.000915
2,2011,Amador,ICD-9,2657.0,345.0,13.0,34847.0,37539,0.07078,0.00919
3,2011,Butte,ICD-9,20422.0,3198.0,15.7,33669.0,219983,0.092834,0.014537
4,2011,Calaveras,ICD-9,3253.0,392.0,12.1,37516.0,45159,0.072034,0.00868


In [7]:
# Step 1: Regress mediator (30-day Readmits Proportion) on PCPI
mediator_model1 = smf.ols("Q('30-day Readmits (Proportion)') ~ PCPI", data=df).fit()

# Step 2: Regress outcome (30-day Readmission Rate) on PCPI and 30-day Readmits Proportion
outcome_model1 = smf.ols(
    "Q('30-day Readmission Rate (Consolidated)') ~ PCPI + Q('30-day Readmits (Proportion)')",
    data=df
).fit()

# Display Results
print("Mediator Model 1 (30-day Readmits Proportion ~ PCPI):")
print(mediator_model1.summary())

print("\nOutcome Model 1 (Readmission Rate ~ PCPI + 30-day Readmits Proportion):")
print(outcome_model1.summary())


Mediator Model 1 (30-day Readmits Proportion ~ PCPI):
                                    OLS Regression Results                                   
Dep. Variable:     Q('30-day Readmits (Proportion)')   R-squared:                       0.069
Model:                                           OLS   Adj. R-squared:                  0.068
Method:                                Least Squares   F-statistic:                     50.76
Date:                               Tue, 31 Dec 2024   Prob (F-statistic):           2.66e-12
Time:                                       20:14:58   Log-Likelihood:                 3156.4
No. Observations:                                684   AIC:                            -6309.
Df Residuals:                                    682   BIC:                            -6300.
Df Model:                                          1                                         
Covariance Type:                           nonrobust                                         
      

PCPI has a mixed relationship with 30-day Readmission Rates. On the one hand, it indirectly reduces readmission rates through lower 30-day Readmits (Proportion); on the other hand, it has a direct positive association with readmission rates when the mediator is controlled for.

Higher income counties indirectly reduce readmission rates by lowering the proportional burden of 30-day readmits. However, they show a direct positive relationship with readmission rates, possibly due to better healthcare access, reporting, or utilization patterns. This highlights the complex interplay between income, healthcare access, and outcomes.

In [8]:
# Step 1: Regress mediator (Total Admits Proportion) on PCPI
mediator_model2 = smf.ols("Q('Total Admits (Proportion)') ~ PCPI", data=df).fit()

# Step 2: Regress outcome (30-day Readmission Rate) on PCPI and Total Admits Proportion
outcome_model2 = smf.ols(
    "Q('30-day Readmission Rate (Consolidated)') ~ PCPI + Q('Total Admits (Proportion)')",
    data=df
).fit()

# Display Results
print("Mediator Model 2 (Total Admits Proportion ~ PCPI):")
print(mediator_model2.summary())

print("\nOutcome Model 2 (Readmission Rate ~ PCPI + Total Admits Proportion):")
print(outcome_model2.summary())


Mediator Model 2 (Total Admits Proportion ~ PCPI):
                                  OLS Regression Results                                  
Dep. Variable:     Q('Total Admits (Proportion)')   R-squared:                       0.119
Model:                                        OLS   Adj. R-squared:                  0.118
Method:                             Least Squares   F-statistic:                     92.09
Date:                            Tue, 31 Dec 2024   Prob (F-statistic):           1.53e-20
Time:                                    20:15:13   Log-Likelihood:                 1973.6
No. Observations:                             684   AIC:                            -3943.
Df Residuals:                                 682   BIC:                            -3934.
Df Model:                                       1                                         
Covariance Type:                        nonrobust                                         
                 coef    std err       

PCPI has both direct and indirect effects on readmission rates. The indirect effect (via Total Admits Proportion) suggests reduced hospital utilization in wealthier counties, lowering readmission rates. However, the direct effect indicates wealthier counties may have factors (e.g., better reporting, healthcare complexity) driving slightly higher readmission rates.

While PCPI and Readmission Rates had a weak direct correlation, this analysis demonstrates that the mediators (Total Admits and 30-Day Readmits Proportion) play a significant role in linking these variables, thereby revealing a more nuanced relationship. This supports the hypothesis that the mediating variables provide a pathway for PCPI to influence healthcare outcomes, even when the direct relationship is weak.