<a href="https://colab.research.google.com/github/kumarrajesh1992-arch/kumarrajesh1992-arch.github.io/blob/main/Chart_8_Vision_2047_Very_High_HDI_Scenarios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I begin by importing the required libraries. pandas is used for data handling and reshaping; numpy is used for interpolation, logs, and geometric mean calculations; and re supports regex extraction of years from the UNDP “key–value” export structure.

In [1]:
import pandas as pd
import numpy as np
import re

I load the raw UNDP HDRO country export (stored in “key–value” format). This preserves data lineage from the original source file and ensures the analysis is reproducible directly from the repository.

In [2]:
raw_data_url = "https://raw.githubusercontent.com/kumarrajesh1992-arch/kumarrajesh1992-arch.github.io/refs/heads/main/project/raw_original_data/Chart8_India_HDI_Time_Series.csv"

print("Loading raw data from GitHub...")
df_raw = pd.read_csv(raw_data_url)

print("Raw dataset shape:", df_raw.shape)
df_raw.head()

Loading raw data from GitHub...
Raw dataset shape: (1035, 2)


Unnamed: 0,key,value
0,ISO3,IND
1,HDR Country Name,India
2,Human Development Groups,Medium
3,UNDP Developeing Regions,SA
4,HDI Rank (2023),130


The raw key column ends with a year in parentheses, e.g., “Indicator (unit) (2023)”. I extract the year using regex, drop metadata rows without a year, and create a stable indicator label by removing the year suffix.

In [3]:
# Extract year (four digits in parentheses at the end)
df_raw["Year"] = df_raw["key"].str.extract(r"\((\d{4})\)$")

# Keep only rows with a year (time-series rows)
df_clean = df_raw.dropna(subset=["Year"]).copy()
df_clean["Year"] = df_clean["Year"].astype(int)

# Remove trailing " (YYYY)" to get stable indicator labels
df_clean["Indicator_Raw"] = df_clean["key"].str.replace(r"\s*\(\d{4}\)$", "", regex=True)

df_clean[["Year", "Indicator_Raw", "value"]].head()

Unnamed: 0,Year,Indicator_Raw,value
4,2023,HDI Rank,130.0
5,1990,Human Development Index (value),0.446
6,1991,Human Development Index (value),0.448
7,1992,Human Development Index (value),0.453
8,1993,Human Development Index (value),0.458


I restrict the dataset to the five series needed for the HDI narrative and projections: HDI, life expectancy, expected years of schooling, mean years of schooling, and GNI per capita (PPP). I also convert values into numeric form to enable calculations.

In [4]:
keep_indicators = [
    "Human Development Index (value)",
    "Life Expectancy at Birth (years)",
    "Expected Years of Schooling (years)",
    "Mean Years of Schooling (years)",
    "Gross National Income Per Capita (2021 PPP$)"
]

df_clean = df_clean[df_clean["Indicator_Raw"].isin(keep_indicators)].copy()

# Convert to numeric (coerce errors to NaN)
df_clean["Value"] = pd.to_numeric(df_clean["value"], errors="coerce")

df_clean[["Year", "Indicator_Raw", "Value"]].head()

Unnamed: 0,Year,Indicator_Raw,Value
5,1990,Human Development Index (value),0.446
6,1991,Human Development Index (value),0.448
7,1992,Human Development Index (value),0.453
8,1993,Human Development Index (value),0.458
9,1994,Human Development Index (value),0.463


I reshape the long key–value time series into a clean wide-format dataset with one row per year and one column per indicator. I then rename the columns into concise labels to keep the downstream projection and HDI calculations readable.

In [5]:
# Pivot to wide format: one row per year
df_wide = df_clean.pivot(index="Year", columns="Indicator_Raw", values="Value").reset_index()

rename_map = {
    "Human Development Index (value)": "HDI",
    "Life Expectancy at Birth (years)": "Life_expectancy",
    "Expected Years of Schooling (years)": "Expected_years_schooling",
    "Mean Years of Schooling (years)": "Mean_years_schooling",
    "Gross National Income Per Capita (2021 PPP$)": "GNI_per_capita"
}

df_wide = df_wide.rename(columns=rename_map)

print(f"Data successfully cleaned. Time series range: {df_wide['Year'].min()} - {df_wide['Year'].max()}")
df_wide.tail()

Data successfully cleaned. Time series range: 1990 - 2023


Indicator_Raw,Year,Expected_years_schooling,GNI_per_capita,HDI,Life_expectancy,Mean_years_schooling
29,2019,11.75398,7895.441397,0.651,70.746,6.28138
30,2020,12.12914,7331.951385,0.652,70.156,6.49
31,2021,12.40237,7992.775136,0.647,67.282,6.53
32,2022,12.95646,8475.67988,0.676,71.698,6.57
33,2023,12.95454,9046.756336,0.685,72.003,6.88


I define two 2047 pathways. Scenario A represents a feasible “entry” trajectory consistent with the high-income/very-high-HDI threshold. Scenario B represents an aspirational “convergence” trajectory aligned with developed-country benchmarks.

In [6]:
target_A = {
    "Life_expectancy": 76.0,
    "Expected_years_schooling": 14.0,
    "Mean_years_schooling": 10.0,
    "GNI_per_capita": 23215.0
}

target_B = {
    "Life_expectancy": 82.0,
    "Expected_years_schooling": 16.5,
    "Mean_years_schooling": 12.5,
    "GNI_per_capita": 53014.0
}

I construct a long-format dataset with four fields required for Vega-Lite: Year, Indicator, Value, Scenario. I first append observed values up to 2023. Then I project 2024–2047 values under both scenarios. Social indicators are interpolated linearly; income is interpolated geometrically (CAGR-style). Finally, I recompute HDI each year using the UNDP goalpost method and geometric mean.

In [7]:
years = np.arange(2023, 2048)
scenarios = {"Scenario A (Entry)": target_A, "Scenario B (Convergence)": target_B}
final_rows = []

# Append observed historical data (Observed)
hist_data = df_wide[df_wide["Year"] <= 2023]
for _, row in hist_data.iterrows():
    for ind in ["HDI", "Life_expectancy", "Expected_years_schooling", "Mean_years_schooling", "GNI_per_capita"]:
        final_rows.append({
            "Year": int(row["Year"]),
            "Indicator": ind,
            "Value": row[ind],
            "Scenario": "Observed"
        })

# Baseline for projections (2023)
base_2023 = df_wide[df_wide["Year"] == 2023].iloc[0]

# Scenario projections + HDI recalculation
for scenario_name, targets in scenarios.items():
    for year in years:
        if year == 2023:
            continue  # Skip base year

        frac = (year - 2023) / (2047 - 2023)
        year_vals = {}

        for ind, target_val in targets.items():
            start_val = base_2023[ind]

            # Linear interpolation for social indicators
            if ind != "GNI_per_capita":
                curr_val = start_val + frac * (target_val - start_val)
            # CAGR (geometric) interpolation for income
            else:
                curr_val = start_val * ((target_val / start_val) ** frac)

            year_vals[ind] = curr_val
            final_rows.append({
                "Year": year,
                "Indicator": ind,
                "Value": curr_val,
                "Scenario": scenario_name
            })

        # Recalculate HDI using UNDP methodology (goalposts + geometric mean)
        i_health = (year_vals["Life_expectancy"] - 20) / (85 - 20)
        i_eys = (year_vals["Expected_years_schooling"] - 0) / 18
        i_mys = (year_vals["Mean_years_schooling"] - 0) / 15
        i_edu = (i_eys + i_mys) / 2
        i_inc = (np.log(year_vals["GNI_per_capita"]) - np.log(100)) / (np.log(75000) - np.log(100))

        hdi = (i_health * i_edu * i_inc) ** (1/3)

        final_rows.append({
            "Year": year,
            "Indicator": "HDI",
            "Value": hdi,
            "Scenario": scenario_name
        })

df_final = pd.DataFrame(final_rows)

print("Projection complete. Preview:")
df_final.head()

Projection complete. Preview:


Unnamed: 0,Year,Indicator,Value,Scenario
0,1990,HDI,0.446,Observed
1,1990,Life_expectancy,58.618,Observed
2,1990,Expected_years_schooling,8.204444,Observed
3,1990,Mean_years_schooling,2.780574,Observed
4,1990,GNI_per_capita,2167.222109,Observed


I export the final long-format dataset to CSV. This file is directly used by my Vega-Lite specification embedded in the HTML project page.

In [8]:
output_filename = "Chart8_HDI_Projected_Scenarios.csv"
df_final.to_csv(output_filename, index=False)

print(f"File saved: {output_filename}")

# Sanity check: final-year values by scenario and indicator
df_final.groupby(["Scenario", "Indicator"])["Value"].last().unstack()

File saved: Chart8_HDI_Projected_Scenarios.csv


Indicator,Expected_years_schooling,GNI_per_capita,HDI,Life_expectancy,Mean_years_schooling
Scenario,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Observed,12.95454,9046.756336,0.685,72.003,6.88
Scenario A (Entry),14.0,23215.0,0.8,76.0,10.0
Scenario B (Convergence),16.5,53014.0,0.924775,82.0,12.5
