<a href="https://colab.research.google.com/github/lawrennd/fitkit/blob/main/examples/world_bank_indicators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# World Bank Development Indicators Exploration

This notebook demonstrates how to load and analyze various World Bank indicators using `fitkit`.
We'll explore relationships between:
- GDP per capita
- Life expectancy
- Human Capital Index (HCI)
- Other development metrics

All visualizations are interactive with hover tooltips showing country names.

In [None]:
import sys
import subprocess
from pathlib import Path


def _pip_install(args: list[str]) -> None:
    cmd = [sys.executable, "-m", "pip", *args]
    print("Running:", " ".join(cmd))
    subprocess.check_call(cmd)


def ensure_fitkit_installed() -> None:
    """Prefer editable local install; fall back to GitHub.

    - Local (typical): `pip install -e ..` when running from `examples/`
    - Colab/remote: `pip install git+https://github.com/lawrennd/fitkit.git`
    """
    try:
        import fitkit  # noqa: F401

        return
    except ImportError:
        pass

    here = Path.cwd().resolve()
    candidates = [here, here.parent, here.parent.parent]

    for root in candidates:
        if (root / "pyproject.toml").exists() and (root / "fitkit").is_dir():
            _pip_install(["install", "-e", str(root)])
            return

    _pip_install(["install", "git+https://github.com/lawrennd/fitkit.git"])


ensure_fitkit_installed()
import fitkit

print("fitkit version:", getattr(fitkit, "__version__", "unknown"))

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from scipy.stats import spearmanr, pearsonr

from fitkit import (
    load_gdp_per_capita,
    load_life_expectancy,
    load_human_capital_index,
    load_worldbank_indicator,
    load_country_names
)

## 1. Load Data

Load various World Bank indicators for 2020 (most recent complete year for most indicators).

In [None]:
# Load country names for visualization
country_names = load_country_names()
print(f"Loaded {len(country_names)} country names\n")

# Load core indicators
print("Loading World Bank indicators...")
gdp_df = load_gdp_per_capita(start_year=2020, end_year=2020)
life_exp_df = load_life_expectancy(start_year=2020, end_year=2020)
hci_df = load_human_capital_index(start_year=2020, end_year=2020)

print(f"\nGDP per capita: {len(gdp_df)} countries")
print(f"Life expectancy: {len(life_exp_df)} countries")
print(f"Human Capital Index: {len(hci_df)} countries")

## 2. Merge Indicators

Create a unified dataframe with all indicators for countries that have complete data.

In [None]:
# Merge all indicators
indicators = gdp_df[[2020]].rename(columns={2020: 'gdp_per_capita'})
indicators = indicators.join(life_exp_df[[2020]].rename(columns={2020: 'life_expectancy'}), how='inner')
indicators = indicators.join(hci_df[[2020]].rename(columns={2020: 'hci'}), how='inner')

# Add country names
indicators['country_name'] = [country_names.get(c, c) for c in indicators.index]

# Remove any rows with missing data
indicators = indicators.dropna()

print(f"Complete data available for {len(indicators)} countries\n")
print("Summary statistics:")
print(indicators.describe())

print("\nSample countries:")
print(indicators.head(10))

## 3. GDP vs Life Expectancy

Classic relationship: wealthier countries tend to have higher life expectancy.

In [None]:
# Compute correlation
r_pearson, p_pearson = pearsonr(np.log10(indicators['gdp_per_capita']), indicators['life_expectancy'])
r_spearman, p_spearman = spearmanr(indicators['gdp_per_capita'], indicators['life_expectancy'])

# Create scatter plot
fig = px.scatter(
    indicators,
    x='gdp_per_capita',
    y='life_expectancy',
    hover_name='country_name',
    log_x=True,
    labels={
        'gdp_per_capita': 'GDP per Capita (current US$, log scale)',
        'life_expectancy': 'Life Expectancy at Birth (years)'
    },
    title=f'GDP per Capita vs Life Expectancy (2020)<br><sub>Pearson r = {r_pearson:.3f}, Spearman ρ = {r_spearman:.3f}</sub>',
    color='life_expectancy',
    color_continuous_scale='RdYlGn'
)

fig.update_traces(
    marker=dict(size=8, opacity=0.7, line=dict(width=0.5, color='DarkSlateGray'))
)

fig.update_layout(
    width=900,
    height=600,
    hovermode='closest'
)

fig.show()

## 4. Human Capital Index Relationships

Compare HCI with both GDP and life expectancy side-by-side.

In [None]:
# Create side-by-side subplots
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('HCI vs GDP per Capita', 'HCI vs Life Expectancy'),
    horizontal_spacing=0.12
)

# HCI vs GDP
r_hci_gdp, _ = spearmanr(indicators['hci'], np.log10(indicators['gdp_per_capita']))
fig.add_trace(
    go.Scatter(
        x=indicators['hci'],
        y=np.log10(indicators['gdp_per_capita']),
        mode='markers',
        marker=dict(size=8, opacity=0.7, color='steelblue', line=dict(width=0.5, color='DarkSlateGray')),
        text=indicators['country_name'],
        customdata=np.column_stack((indicators.index, indicators['hci'], indicators['gdp_per_capita'])),
        hovertemplate='<b>%{text}</b> (%{customdata[0]})<br>' +
                      'HCI: %{customdata[1]:.3f}<br>' +
                      'GDP per Capita: $%{customdata[2]:,.0f}<extra></extra>',
        showlegend=False
    ),
    row=1, col=1
)

# HCI vs Life Expectancy
r_hci_life, _ = spearmanr(indicators['hci'], indicators['life_expectancy'])
fig.add_trace(
    go.Scatter(
        x=indicators['hci'],
        y=indicators['life_expectancy'],
        mode='markers',
        marker=dict(size=8, opacity=0.7, color='coral', line=dict(width=0.5, color='DarkSlateGray')),
        text=indicators['country_name'],
        customdata=np.column_stack((indicators.index, indicators['hci'], indicators['life_expectancy'])),
        hovertemplate='<b>%{text}</b> (%{customdata[0]})<br>' +
                      'HCI: %{customdata[1]:.3f}<br>' +
                      'Life Expectancy: %{customdata[2]:.1f} years<extra></extra>',
        showlegend=False
    ),
    row=1, col=2
)

# Update axes
fig.update_xaxes(title_text='Human Capital Index', row=1, col=1)
fig.update_xaxes(title_text='Human Capital Index', row=1, col=2)
fig.update_yaxes(title_text='log₁₀(GDP per Capita)', row=1, col=1)
fig.update_yaxes(title_text='Life Expectancy (years)', row=1, col=2)

fig.update_layout(
    width=1400,
    height=500,
    title_text=f'Human Capital Index Relationships (2020)<br><sub>HCI-GDP: ρ={r_hci_gdp:.3f} | HCI-Life: ρ={r_hci_life:.3f}</sub>',
    hovermode='closest'
)

fig.show()

## 5. Top and Bottom Performers

Identify countries with exceptional or concerning development outcomes.

In [None]:
print("="*80)
print("TOP 10 COUNTRIES BY INDICATOR")
print("="*80)

print("\nHighest GDP per Capita:")
top_gdp = indicators.nlargest(10, 'gdp_per_capita')[['country_name', 'gdp_per_capita', 'life_expectancy', 'hci']]
print(top_gdp.to_string())

print("\n" + "-"*80)
print("\nHighest Life Expectancy:")
top_life = indicators.nlargest(10, 'life_expectancy')[['country_name', 'life_expectancy', 'gdp_per_capita', 'hci']]
print(top_life.to_string())

print("\n" + "-"*80)
print("\nHighest Human Capital Index:")
top_hci = indicators.nlargest(10, 'hci')[['country_name', 'hci', 'gdp_per_capita', 'life_expectancy']]
print(top_hci.to_string())

print("\n" + "="*80)
print("BOTTOM 10 COUNTRIES BY INDICATOR")
print("="*80)

print("\nLowest Life Expectancy:")
bottom_life = indicators.nsmallest(10, 'life_expectancy')[['country_name', 'life_expectancy', 'gdp_per_capita', 'hci']]
print(bottom_life.to_string())

print("\n" + "-"*80)
print("\nLowest Human Capital Index:")
bottom_hci = indicators.nsmallest(10, 'hci')[['country_name', 'hci', 'gdp_per_capita', 'life_expectancy']]
print(bottom_hci.to_string())

## 6. Outlier Analysis

Find countries that perform better or worse than expected based on their GDP.

In [None]:
# Fit linear model: life expectancy ~ log(GDP)
log_gdp = np.log10(indicators['gdp_per_capita'])
coeffs = np.polyfit(log_gdp, indicators['life_expectancy'], 1)
expected_life_exp = np.polyval(coeffs, log_gdp)

indicators['life_exp_residual'] = indicators['life_expectancy'] - expected_life_exp

print("Countries with HIGHER life expectancy than expected from GDP:")
overperformers = indicators.nlargest(10, 'life_exp_residual')[['country_name', 'life_expectancy', 'gdp_per_capita', 'life_exp_residual']]
print(overperformers.to_string())

print("\n" + "-"*80)
print("\nCountries with LOWER life expectancy than expected from GDP:")
underperformers = indicators.nsmallest(10, 'life_exp_residual')[['country_name', 'life_expectancy', 'gdp_per_capita', 'life_exp_residual']]
print(underperformers.to_string())

## 7. Correlation Matrix

Visualize correlations between all indicators.

In [None]:
# Compute correlation matrix
corr_data = indicators[['gdp_per_capita', 'life_expectancy', 'hci']].copy()
corr_data['log_gdp'] = np.log10(corr_data['gdp_per_capita'])
corr_data = corr_data.drop('gdp_per_capita', axis=1)
corr_data.columns = ['Life Expectancy', 'HCI', 'log₁₀(GDP)']

corr_matrix = corr_data.corr()

# Create heatmap
fig = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.columns,
    colorscale='RdBu',
    zmid=0,
    zmin=-1,
    zmax=1,
    text=corr_matrix.values,
    texttemplate='%{text:.3f}',
    textfont={"size": 14},
    colorbar=dict(title='Pearson r')
))

fig.update_layout(
    title='Correlation Matrix: World Bank Development Indicators (2020)',
    width=600,
    height=600
)

fig.show()

print("\nCorrelation Matrix:")
print(corr_matrix)

## 8. Time Series: Selected Countries

Compare trends over time for a few selected countries.

In [None]:
# Load longer time series
gdp_ts = load_gdp_per_capita(start_year=2000, end_year=2020)
life_ts = load_life_expectancy(start_year=2000, end_year=2020)

# Select interesting countries
countries_to_plot = ['USA', 'CHN', 'IND', 'DEU', 'JPN', 'BRA', 'NGA']
available_countries = [c for c in countries_to_plot if c in gdp_ts.index and c in life_ts.index]

# Create subplots
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('GDP per Capita Over Time', 'Life Expectancy Over Time'),
    horizontal_spacing=0.12
)

years = list(range(2000, 2021))

# Plot GDP trends
for country in available_countries:
    country_name = country_names.get(country, country)
    values = gdp_ts.loc[country, years].values
    fig.add_trace(
        go.Scatter(
            x=years,
            y=values,
            mode='lines+markers',
            name=country_name,
            hovertemplate=f'<b>{country_name}</b><br>Year: %{{x}}<br>GDP: $%{{y:,.0f}}<extra></extra>'
        ),
        row=1, col=1
    )

# Plot life expectancy trends
for country in available_countries:
    country_name = country_names.get(country, country)
    values = life_ts.loc[country, years].values
    fig.add_trace(
        go.Scatter(
            x=years,
            y=values,
            mode='lines+markers',
            name=country_name,
            showlegend=False,
            hovertemplate=f'<b>{country_name}</b><br>Year: %{{x}}<br>Life Exp: %{{y:.1f}} years<extra></extra>'
        ),
        row=1, col=2
    )

fig.update_xaxes(title_text='Year', row=1, col=1)
fig.update_xaxes(title_text='Year', row=1, col=2)
fig.update_yaxes(title_text='GDP per Capita (current US$)', row=1, col=1)
fig.update_yaxes(title_text='Life Expectancy (years)', row=1, col=2)

fig.update_layout(
    width=1400,
    height=500,
    title_text='Development Trends: Selected Countries (2000-2020)',
    hovermode='closest'
)

## 9. Loading Additional Indicators

The `load_worldbank_indicator()` function can load any World Bank indicator.
Here are some examples:

In [None]:
print("Examples of other World Bank indicators you can load:\n")
print("Education:")
print("  - School enrollment, tertiary: 'SE.TER.ENRR'")
print("  - Government expenditure on education: 'SE.XPD.TOTL.GD.ZS'")
print("\nHealth:")
print("  - Infant mortality rate: 'SP.DYN.IMRT.IN'")
print("  - Health expenditure: 'SH.XPD.CHEX.GD.ZS'")
print("\nInfrastructure:")
print("  - Internet users: 'IT.NET.USER.ZS'")
print("  - Mobile subscriptions: 'IT.CEL.SETS.P2'")
print("\nEnvironment:")
print("  - CO2 emissions: 'EN.ATM.CO2E.PC'")
print("  - Forest area: 'AG.LND.FRST.ZS'")
print("\nEconomy:")
print("  - Inflation: 'FP.CPI.TOTL.ZG'")
print("  - Unemployment: 'SL.UEM.TOTL.ZS'")
print("\nUsage example:")
print("  from fitkit import load_worldbank_indicator")
print("  internet_df = load_worldbank_indicator('IT.NET.USER.ZS', start_year=2020, end_year=2020)")

## Summary

This notebook demonstrates:

1. **Data Loading**: Easy access to World Bank indicators via `fitkit`
2. **Data Integration**: Merging multiple indicators for comprehensive analysis
3. **Visualization**: Interactive plots with country name hover tooltips
4. **Analysis**: Correlations, outliers, and time trends

Key findings (2020 data):
- Strong correlation between GDP and life expectancy (r ≈ 0.8-0.9)
- Human Capital Index correlates with both GDP and life expectancy
- Some countries (e.g., Costa Rica, Vietnam) achieve higher life expectancy than GDP alone would predict
- Development indicators show different growth patterns across countries

For economic complexity analysis, see `atlas_fitness_comparison.ipynb`.