<a href="https://colab.research.google.com/github/kumarrajesh1992-arch/kumarrajesh1992-arch.github.io/blob/main/Technical_Efficiency_Health_Education_Expenditure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I begin by importing the required Python libraries. pandas is used for data handling and cleaning, while statsmodels is used to estimate OLS regressions and obtain statistical inference (coefficients, standard errors, and p-values)

In [41]:
import pandas as pd
import statsmodels.api as sm

I load the merged state-level dataset containing HDI residuals and sectoral technical efficiency measures for health and education. These efficiency scores proxy implementation effectiveness conditional on inputs.

In [42]:
df = pd.read_csv(
    "Chart6_NITI_Aayog_Health_Education_Index_with_Technical_Efficiency.csv"
)

In [43]:
df.columns

Index(['State', 'residual_HDI', 'Edu_gov', 'Hea_gov', 'Per_Capita_Social_Exp',
       'Education_Technical_Efficiency', 'Health_Technical_Efficiency'],
      dtype='object')

Since technical efficiency scores are not available for all states, I restrict the analytical sample to states with non-missing values. This ensures that the regression is estimated on a consistent sample and avoids bias from missing data.

In [44]:
reg_df = df[
    ["State",
     "residual_HDI",
     "Education_Technical_Efficiency",
     "Health_Technical_Efficiency"]
].dropna()

In [45]:
reg_df.shape

(22, 4)

In [46]:
reg_df.head()

Unnamed: 0,State,residual_HDI,Education_Technical_Efficiency,Health_Technical_Efficiency
0,Andhra Pradesh,-0.032105,1.0,0.983
2,Assam,0.015827,0.81,0.966
3,Bihar,-0.006484,1.0,0.976
4,Chhattisgarh,-0.009414,0.93,0.966
6,Gujarat,-0.035715,1.0,0.988


I estimate an OLS regression of income-adjusted HDI (HDI residuals) on education and health technical efficiency. This specification tests whether states that use resources more efficiently systematically outperform their income-predicted human development levels.

In [47]:
Y = reg_df["residual_HDI"]

X = reg_df[
    ["Education_Technical_Efficiency",
     "Health_Technical_Efficiency"]
]

X = sm.add_constant(X)

The model is estimated using ordinary least squares. I report coefficient estimates, statistical significance, and overall model fit to assess whether sectoral efficiency is associated with residual HDI.

In [48]:
model = sm.OLS(Y, X)
results = model.fit()

print(results.summary())

                            OLS Regression Results                            
Dep. Variable:           residual_HDI   R-squared:                       0.229
Model:                            OLS   Adj. R-squared:                  0.148
Method:                 Least Squares   F-statistic:                     2.829
Date:                Tue, 30 Dec 2025   Prob (F-statistic):             0.0840
Time:                        11:59:41   Log-Likelihood:                 55.667
No. Observations:                  22   AIC:                            -105.3
Df Residuals:                      19   BIC:                            -102.1
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const       

To facilitate interpretation and comparability across variables measured on different scales, I standardise HDI residuals and technical efficiency measures into z-scores. This allows coefficients to be interpreted as standard-deviation changes.

In [49]:
reg_df["Edu_TE_z"] = (
    reg_df["Education_Technical_Efficiency"]
    - reg_df["Education_Technical_Efficiency"].mean()
) / reg_df["Education_Technical_Efficiency"].std()

reg_df["Health_TE_z"] = (
    reg_df["Health_Technical_Efficiency"]
    - reg_df["Health_Technical_Efficiency"].mean()
) / reg_df["Health_Technical_Efficiency"].std()

reg_df["HDI_residual_z"] = (
    reg_df["residual_HDI"]
    - reg_df["residual_HDI"].mean()
) / reg_df["residual_HDI"].std()

I re-estimate the regression using standardised variables to assess the relative strength of associations between education and health efficiency and income-adjusted HDI.

In [50]:
Xz = sm.add_constant(
    reg_df[["Edu_TE_z", "Health_TE_z"]]
)
Yz = reg_df["HDI_residual_z"]

model_z = sm.OLS(Yz, Xz).fit()
print(model_z.summary())

                            OLS Regression Results                            
Dep. Variable:         HDI_residual_z   R-squared:                       0.229
Model:                            OLS   Adj. R-squared:                  0.148
Method:                 Least Squares   F-statistic:                     2.829
Date:                Tue, 30 Dec 2025   Prob (F-statistic):             0.0840
Time:                        11:59:41   Log-Likelihood:                -27.837
No. Observations:                  22   AIC:                             61.67
Df Residuals:                      19   BIC:                             64.95
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const       -1.388e-17      0.197  -7.05e-17      

To enable an interactive Vega-Lite visualisation with a dropdown selector, I reshape the dataset into long format, stacking education and health efficiency into a single variable.

In [51]:
viz_df = reg_df[
    ["State", "HDI_residual_z", "Edu_TE_z", "Health_TE_z"]
].copy()

viz_long = viz_df.melt(
    id_vars=["State", "HDI_residual_z"],
    value_vars=["Edu_TE_z", "Health_TE_z"],
    var_name="Efficiency_Type",
    value_name="Efficiency_z"
)

viz_long.head()

Unnamed: 0,State,HDI_residual_z,Efficiency_Type,Efficiency_z
0,Andhra Pradesh,-1.358831,Edu_TE_z,0.685078
1,Assam,0.774481,Edu_TE_z,-2.095141
2,Bihar,-0.218511,Edu_TE_z,0.685078
3,Chhattisgarh,-0.348941,Edu_TE_z,-0.339213
4,Gujarat,-1.519515,Edu_TE_z,0.685078


I export the cleaned and reshaped dataset to CSV for use in Vega-Lite visualisations embedded in the HTML portfolio.

In [53]:
viz_long.to_csv(
    "HDI_residual_vs_efficiency_long.csv",
    index=False
)