<a href="https://colab.research.google.com/github/kumarrajesh1992-arch/kumarrajesh1992-arch.github.io/blob/main/Chart5_Per_Capita_Social_Sector_Residual_HDI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I begin by loading the merged state-level dataset containing residual HDI and per-capita social sector expenditure.

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv("/content/PC4_Per_Capita_Social_Sector.csv")
df.head()

Unnamed: 0,State,residual_HDI,Per_Capita_Exp
0,Andhra Pradesh,-0.032105,16449.40989
1,Arunachal Pradesh,0.004563,40275.27723
2,Assam,0.015827,11626.6016
3,Bihar,-0.006484,8212.896988
4,Chhattisgarh,-0.009414,13619.23168


I verify the variable names and check for missing values before estimation.

In [None]:
df.columns

Index(['State', 'residual_HDI', 'Per_Capita_Exp'], dtype='object')

In [None]:
df.isna().sum().sort_values(ascending=False).head(15)

Unnamed: 0,0
State,0
residual_HDI,0
Per_Capita_Exp,0


I standardise column names to avoid errors in modelling and charts.

In [None]:
# Optional: edit these ONLY if your column names differ
rename_map = {
    "State_UT": "State",
    "PerCapita_SocialSector_INR_2021": "pc_social_inr",
    "PerCapita_Social_Sector_INR": "pc_social_inr",
    "PerCapita_SocialSector": "pc_social_inr",
    "Residual_HDI": "residual_HDI"
}

df = df.rename(columns={k:v for k,v in rename_map.items() if k in df.columns})
df.columns

Index(['State', 'residual_HDI', 'Per_Capita_Exp'], dtype='object')

I create a clean estimation sample by dropping rows with missing values in the core variables.

In [None]:
needed = ["State", "residual_HDI", "pc_social_inr"]
missing = [c for c in needed if c not in df.columns]
missing

['pc_social_inr']

In [None]:
dfm = df.dropna(subset=[c for c in needed if c in df.columns]).copy()
dfm.shape, dfm.head()

((31, 3),
                State  residual_HDI  Per_Capita_Exp
 0     Andhra Pradesh     -0.032105    16449.409890
 1  Arunachal Pradesh      0.004563    40275.277230
 2              Assam      0.015827    11626.601600
 3              Bihar     -0.006484     8212.896988
 4       Chhattisgarh     -0.009414    13619.231680)

I check summary statistics to ensure per-capita values and residuals are on reasonable scales.

In [None]:
dfm[["residual_HDI", "Per_Capita_Exp"]].describe()

Unnamed: 0,residual_HDI,Per_Capita_Exp
count,31.0,31.0
mean,0.000771,19032.800145
std,0.019764,9903.190291
min,-0.035715,7323.086784
25%,-0.011386,12993.233925
50%,0.002492,16537.02133
75%,0.011811,21144.821415
max,0.050576,46883.30871


I estimate a simple cross-sectional association between income-adjusted human development performance and per-capita social sector spending

In [None]:
import statsmodels.api as sm

X = sm.add_constant(dfm["Per_Capita_Exp"])
y = dfm["residual_HDI"]

model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,residual_HDI,R-squared:,0.126
Model:,OLS,Adj. R-squared:,0.096
Method:,Least Squares,F-statistic:,4.193
Date:,"Sat, 20 Dec 2025",Prob (F-statistic):,0.0497
Time:,21:17:25,Log-Likelihood:,80.256
No. Observations:,31,AIC:,-156.5
Df Residuals:,29,BIC:,-153.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0127,0.007,-1.719,0.096,-0.028,0.002
Per_Capita_Exp,7.093e-07,3.46e-07,2.048,0.050,8.55e-10,1.42e-06

0,1,2,3
Omnibus:,2.605,Durbin-Watson:,1.868
Prob(Omnibus):,0.272,Jarque-Bera (JB):,1.562
Skew:,0.531,Prob(JB):,0.458
Kurtosis:,3.289,Cond. No.,46900.0


Because per-capita spending is in rupees, I also estimate a standardised specification to interpret effects in standard deviation terms.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
dfm["pc_social_inr_z"] = scaler.fit_transform(dfm[["Per_Capita_Exp"]])

Xz = sm.add_constant(dfm["pc_social_inr_z"])
model_z = sm.OLS(dfm["residual_HDI"], Xz).fit()
model_z.summary()

0,1,2,3
Dep. Variable:,residual_HDI,R-squared:,0.126
Model:,OLS,Adj. R-squared:,0.096
Method:,Least Squares,F-statistic:,4.193
Date:,"Sat, 20 Dec 2025",Prob (F-statistic):,0.0497
Time:,21:17:28,Log-Likelihood:,80.256
No. Observations:,31,AIC:,-156.5
Df Residuals:,29,BIC:,-153.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0008,0.003,0.229,0.821,-0.006,0.008
pc_social_inr_z,0.0069,0.003,2.048,0.050,8.33e-06,0.014

0,1,2,3
Omnibus:,2.605,Durbin-Watson:,1.868
Prob(Omnibus):,0.272,Jarque-Bera (JB):,1.562
Skew:,0.531,Prob(JB):,0.458
Kurtosis:,3.289,Cond. No.,1.0


In [None]:
dfm["residual_pred_spend"] = model.predict(sm.add_constant(dfm["Per_Capita_Exp"]))
dfm["spend_model_error"] = dfm["residual_HDI"] - dfm["residual_pred_spend"]

out = dfm[["State", "residual_HDI", "Per_Capita_Exp", "pc_social_inr_z", "residual_pred_spend", "spend_model_error"]].copy()
out.to_csv("/content/PCX_PerCapitaSpending_ResidualHDI.csv", index=False)

print("Exported: PCX_PerCapitaSpending_ResidualHDI.csv")
out.head()

Exported: PCX_PerCapitaSpending_ResidualHDI.csv


Unnamed: 0,State,residual_HDI,Per_Capita_Exp,pc_social_inr_z,residual_pred_spend,spend_model_error
0,Andhra Pradesh,-0.032105,16449.40989,-0.265177,-0.001061,-0.031044
1,Arunachal Pradesh,0.004563,40275.27723,2.180471,0.015839,-0.011276
2,Assam,0.015827,11626.6016,-0.760222,-0.004482,0.020309
3,Bihar,-0.006484,8212.896988,-1.110628,-0.006903,0.00042
4,Chhattisgarh,-0.009414,13619.23168,-0.555685,-0.003069,-0.006346
