# International Disability Data

**Author**: Luke Steuber  
**Date**: February 2026  
**Data Sources**: WHO, OECD, World Bank, UN, Eurostat

This notebook analyzes 8 international datasets covering disability prevalence, employment gaps, healthy life expectancy, and treaty ratification across 194+ countries.

## Key Findings

- **1.3 billion people (16%)** globally live with significant disability (WHO 2022)
- **190 countries** have signed the UN CRPD (Convention on the Rights of Persons with Disabilities)
- **34 percentage point average employment gap** across OECD countries between disabled and non-disabled populations
- **26.9% of EU population** reports some activity limitation (EU-SILC 2023)
- Healthy life expectancy (HALE) shows wide global disparity: 69.9 years (Japan) vs 45.7 years (Nigeria)

In [None]:
import pandas as pd
import json
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 11

# Data directory
DATA_DIR = Path('../')

print("International Disability Data Analysis")
print("="*60)

## 1. WHO Disability Prevalence (194 countries)

Global disability prevalence compiled from WHO reference reports, UN SDG indicators, and World Bank health statistics.

In [None]:
# Load WHO disability prevalence
with open(DATA_DIR / 'who_disability_prevalence.json', 'r') as f:
    who_data = json.load(f)

# Display metadata
print("WHO Disability Prevalence Data")
print("-" * 60)
print(f"Title: {who_data['metadata']['title']}")
print(f"Fetched: {who_data['metadata']['date_fetched'][:10]}")
print(f"\nData Sources:")
for source in who_data['metadata']['sources'][:3]:
    print(f"  - {source['name']} ({source.get('year', 'various years')})")

# Display reference statistics
print("\nWHO Reference Statistics:")
ref_stats = who_data['who_reference_statistics']
print(f"  Global prevalence: {ref_stats['global_prevalence']['estimate_2022']['percentage']}% ({ref_stats['global_prevalence']['estimate_2022']['count']})")
print(f"  Low-income countries: {ref_stats['prevalence_by_income_level']['low_income']['percentage']}%")
print(f"  High-income countries: {ref_stats['prevalence_by_income_level']['high_income']['percentage']}%")
print(f"\n  Total SDG records available: {who_data['metadata']['sources'][2]['total_records']}")
print(f"  Countries with SDG disability data: {who_data['metadata']['sources'][2]['countries_covered']}")

## 2. WHO GHO Disability Indicators (DALYs/YLDs)

This dataset contains Years Lived with Disability (YLD) and Disability-Adjusted Life Years (DALY) from 2000-2021. **Note**: File is 13MB, so we show metadata only.

In [None]:
# Load WHO GHO metadata only (file is 13MB)
with open(DATA_DIR / 'who_gho_disability_indicators.json', 'r') as f:
    # Read just the first portion to get metadata
    gho_data = json.load(f)

print("WHO GHO Disability Indicators (YLD/DALY)")
print("-" * 60)
print(f"Title: {gho_data['metadata']['title']}")
print(f"Coverage: {gho_data['metadata']['coverage']}")
print(f"Total records: {gho_data['metadata']['total_records']:,}")
print(f"Indicators: {len(gho_data['metadata']['indicators'])}")
print(f"\nSample indicators:")
for ind in gho_data['metadata']['indicators'][:3]:
    print(f"  - {ind['code']}: {ind['description']}")
print(f"\n(File size: 13MB - full data not displayed)")

## 3. WHO Healthy Life Expectancy (HALE)

HALE measures the average number of years a person can expect to live in good health, accounting for disability and disease.

In [None]:
# Load HALE data (long format for easier analysis)
hale_df = pd.read_csv(DATA_DIR / 'who_hale_long.csv')

print("WHO Healthy Life Expectancy (HALE)")
print("-" * 60)
print(f"Countries: {hale_df['country_code'].nunique()}")
print(f"Year range: {hale_df['year'].min()}-{hale_df['year'].max()}")
print(f"Total records: {len(hale_df):,}")
print(f"\nSample data:")
print(hale_df.head(10))

# Get latest year data
latest_year = hale_df['year'].max()
latest_hale = hale_df[hale_df['year'] == latest_year].sort_values('hale_years', ascending=False)

print(f"\nTop 5 countries (HALE {latest_year}):")
print(latest_hale[['country_code', 'hale_years']].head())
print(f"\nBottom 5 countries (HALE {latest_year}):")
print(latest_hale[['country_code', 'hale_years']].tail())

## 4. OECD Disability Data (38 countries)

Employment rates, disability prevalence, and incapacity spending for OECD member countries.

In [None]:
# Load OECD data
with open(DATA_DIR / 'oecd_disability_data.json', 'r') as f:
    oecd_data = json.load(f)

print("OECD Disability Statistics")
print("-" * 60)
print(f"Coverage: {oecd_data['metadata']['coverage']}")
print(f"Fetched: {oecd_data['metadata']['fetch_date'][:10]}")

# Extract employment gap data
employment_data = oecd_data['oecd_disability_statistics']['employment_rates_by_country']
employment_df = pd.DataFrame(employment_data)

print(f"\nEmployment data available for {len(employment_df)} countries")
print(f"Average employment gap: {employment_df['gap_pp'].mean():.1f} percentage points")
print(f"\nTop 5 countries with smallest employment gap:")
print(employment_df.nsmallest(5, 'gap_pp')[['country', 'gap_pp']])
print(f"\nTop 5 countries with largest employment gap:")
print(employment_df.nlargest(5, 'gap_pp')[['country', 'gap_pp']])

## 5. Eurostat EU Disability Statistics (36 countries)

GALI (Global Activity Limitation Indicator) disability prevalence and employment gap across EU/EEA countries.

In [None]:
# Load Eurostat data
with open(DATA_DIR / 'eurostat_disability_eu.json', 'r') as f:
    eurostat_data = json.load(f)

print("Eurostat EU Disability Statistics")
print("-" * 60)
print(f"Dataset: {eurostat_data['dataset']}")
print(f"Generated: {eurostat_data['generated'][:10]}")

# EU27 summary
eu27 = eurostat_data['eu27_summary_2023']
print(f"\nEU27 Summary (2023):")
print(f"  Total disability rate: {eu27['total_disability_rate_pct']}%")
print(f"  Some limitation: {eu27['some_limitation_pct']}%")
print(f"  Severe limitation: {eu27['severe_limitation_pct']}%")
print(f"  Employment gap: {eu27['employment_gap_pp']} percentage points")
print(f"  Severe employment gap: {eu27['severe_employment_gap_pp']} percentage points")

# Extract country data
eurostat_countries = pd.DataFrame(eurostat_data['countries'])
print(f"\nCountries with data: {len(eurostat_countries)}")

## 6. World Bank Disability Indicators (265 countries)

Time-series data on life expectancy, mortality, diabetes prevalence, health expenditure, and physician density.

In [None]:
# Load World Bank data
with open(DATA_DIR / 'world_bank_disability_indicators.json', 'r') as f:
    wb_data = json.load(f)

print("World Bank Disability Indicators")
print("-" * 60)
print(f"Total records: {wb_data['metadata']['total_records']:,}")
print(f"Countries: {wb_data['metadata']['countries_covered']}")
print(f"Year range: {wb_data['metadata']['year_range']}")

print(f"\nIndicators:")
for ind in wb_data['metadata']['indicators']:
    print(f"  - {ind['code']}: {ind['description']}")

# Convert to dataframe (showing summary only due to size)
wb_df = pd.DataFrame(wb_data['data'])
print(f"\nSample data:")
print(wb_df.head())
print(f"\nRecords by indicator:")
print(wb_df['indicator'].value_counts())

## 7. UN CRPD Ratification Status (199 countries)

Signature and ratification status for the Convention on the Rights of Persons with Disabilities.

In [None]:
# Load UN CRPD data
with open(DATA_DIR / 'un_crpd_ratification.json', 'r') as f:
    crpd_data = json.load(f)

print("UN CRPD Ratification Status")
print("-" * 60)
print(f"Treaty: {crpd_data['metadata']['treaty']['name']}")
print(f"Adopted: {crpd_data['metadata']['treaty']['adopted']}")
print(f"Entered into force: {crpd_data['metadata']['treaty']['entered_into_force']}")
print(f"Countries tracked: {crpd_data['metadata']['record_count']}")

# Summary statistics
stats = crpd_data['statistics']
print(f"\nGlobal Summary:")
print(f"  Total signatories: {stats['total_signatures']}")
print(f"  Total ratifications: {stats['total_ratifications']}")
print(f"  Optional Protocol signatures: {stats['optional_protocol_signatures']}")
print(f"  Optional Protocol ratifications: {stats['optional_protocol_ratifications']}")

# Regional breakdown
print(f"\nRatifications by region:")
for region, data in stats['by_region'].items():
    print(f"  {region}: {data['ratifications']}/{data['total_countries']} ({data['ratification_rate_pct']}%)")

# Convert to dataframe
crpd_df = pd.DataFrame(crpd_data['countries'])
print(f"\nSample data:")
print(crpd_df[['country', 'region', 'crpd_ratification_date']].head(10))

---

## Visualizations

### Visualization A: OECD Employment Gap by Country

Percentage point difference between employment rates of disabled and non-disabled populations (working age 20-64).

In [None]:
# Sort by employment gap
employment_sorted = employment_df.sort_values('gap_pp', ascending=True)

# Create horizontal bar chart
fig, ax = plt.subplots(figsize=(12, 14))

colors = ['#d62728' if gap > 40 else '#ff7f0e' if gap > 30 else '#2ca02c' 
          for gap in employment_sorted['gap_pp']]

ax.barh(employment_sorted['country'], employment_sorted['gap_pp'], color=colors)

ax.set_xlabel('Employment Gap (percentage points)', fontsize=12, fontweight='bold')
ax.set_title('OECD Disability Employment Gap by Country\n(Disabled vs Non-Disabled Employment Rates, Ages 20-64)',
             fontsize=14, fontweight='bold', pad=20)

ax.axvline(employment_df['gap_pp'].mean(), color='black', linestyle='--', 
           linewidth=1.5, alpha=0.7, label=f'OECD Average: {employment_df["gap_pp"].mean():.1f}pp')

ax.legend(loc='lower right', fontsize=11)
ax.grid(axis='x', alpha=0.3)
ax.set_xlim(0, employment_sorted['gap_pp'].max() * 1.1)

plt.tight_layout()
plt.show()

print(f"Average employment gap: {employment_df['gap_pp'].mean():.1f} percentage points")
print(f"Range: {employment_df['gap_pp'].min():.1f} to {employment_df['gap_pp'].max():.1f} pp")

### Visualization B: HALE Trends for Selected Countries

Healthy Life Expectancy over time for major economies and health systems.

In [None]:
# Select countries to highlight
selected_countries = ['USA', 'JPN', 'NGA', 'IND', 'DEU']
country_names = {
    'USA': 'United States',
    'JPN': 'Japan',
    'NGA': 'Nigeria',
    'IND': 'India',
    'DEU': 'Germany'
}

# Filter data
hale_selected = hale_df[hale_df['country_code'].isin(selected_countries)].copy()
hale_selected['country_name'] = hale_selected['country_code'].map(country_names)

# Create line plot
fig, ax = plt.subplots(figsize=(14, 8))

for country in selected_countries:
    data = hale_selected[hale_selected['country_code'] == country]
    ax.plot(data['year'], data['hale_years'], marker='o', linewidth=2.5, 
            label=country_names[country], markersize=6)

ax.set_xlabel('Year', fontsize=12, fontweight='bold')
ax.set_ylabel('Healthy Life Expectancy (HALE, years)', fontsize=12, fontweight='bold')
ax.set_title('Healthy Life Expectancy Trends: Selected Countries\n(WHO Global Health Observatory)',
             fontsize=14, fontweight='bold', pad=20)

ax.legend(loc='best', fontsize=11, framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.set_xlim(hale_selected['year'].min() - 1, hale_selected['year'].max() + 1)

plt.tight_layout()
plt.show()

# Print latest values
print(f"\nHALE in {latest_year}:")
for country in selected_countries:
    val = hale_selected[(hale_selected['country_code'] == country) & 
                        (hale_selected['year'] == latest_year)]['hale_years'].values[0]
    print(f"  {country_names[country]}: {val:.1f} years")

### Visualization C: UN CRPD Ratification Status

Global treaty adoption breakdown: signed, ratified, or neither.

In [None]:
# Calculate ratification categories
crpd_df['status'] = 'Not signed'
crpd_df.loc[crpd_df['crpd_signature_date'].notna(), 'status'] = 'Signed only'
crpd_df.loc[crpd_df['crpd_ratification_date'].notna(), 'status'] = 'Ratified'

status_counts = crpd_df['status'].value_counts()

# Create pie chart
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Main CRPD status
colors_status = ['#2ca02c', '#ff7f0e', '#d62728']
wedges, texts, autotexts = ax1.pie(status_counts.values, labels=status_counts.index,
                                     autopct='%1.1f%%', colors=colors_status,
                                     startangle=90, textprops={'fontsize': 12})

for autotext in autotexts:
    autotext.set_color('white')
    autotext.set_fontweight('bold')

ax1.set_title('UN CRPD Ratification Status\n(199 countries tracked)',
              fontsize=14, fontweight='bold', pad=20)

# Optional Protocol status
crpd_df['op_status'] = 'Not signed'
crpd_df.loc[crpd_df['optional_protocol_signature_date'].notna(), 'op_status'] = 'Signed only'
crpd_df.loc[crpd_df['optional_protocol_ratification_date'].notna(), 'op_status'] = 'Ratified'

op_counts = crpd_df['op_status'].value_counts()

wedges2, texts2, autotexts2 = ax2.pie(op_counts.values, labels=op_counts.index,
                                        autopct='%1.1f%%', colors=colors_status,
                                        startangle=90, textprops={'fontsize': 12})

for autotext in autotexts2:
    autotext.set_color('white')
    autotext.set_fontweight('bold')

ax2.set_title('Optional Protocol Status\n(Individual complaints mechanism)',
              fontsize=14, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

print(f"\nCRPD Status Breakdown:")
print(status_counts)
print(f"\nOptional Protocol Status:")
print(op_counts)

### Visualization D: Eurostat Disability Prevalence Across EU

GALI (Global Activity Limitation Indicator) rates by country, showing percentage of population with some or severe activity limitation.

In [None]:
# Extract GALI rates from nested structure
eu_countries = []
for country in eurostat_data['countries']:
    if 'gali_indicator' in country:
        eu_countries.append({
            'country': country['country_name'],
            'country_code': country['country_code'],
            'total_limitation': country['gali_indicator'].get('some_or_severe_limitation', None),
            'severe_limitation': country['gali_indicator'].get('severe_limitation', None),
            'some_limitation': country['gali_indicator'].get('some_limitation', None)
        })

eu_df = pd.DataFrame(eu_countries)
eu_df = eu_df[eu_df['total_limitation'].notna()]
eu_df = eu_df.sort_values('total_limitation', ascending=False)

# Filter to individual countries (exclude EU aggregates)
eu_df = eu_df[~eu_df['country_code'].str.contains('EU|EA', na=False)]

# Create stacked bar chart
fig, ax = plt.subplots(figsize=(14, 12))

x = range(len(eu_df))
ax.barh(x, eu_df['severe_limitation'], label='Severe limitation', color='#d62728')
ax.barh(x, eu_df['some_limitation'], left=eu_df['severe_limitation'], 
        label='Some limitation', color='#ff7f0e')

# Add country labels (use country codes for compactness)
ax.set_yticks(x)
ax.set_yticklabels(eu_df['country_code'], fontsize=10)

ax.set_xlabel('Percentage of Population', fontsize=12, fontweight='bold')
ax.set_title('EU/EEA Disability Prevalence by Country\n(GALI Indicator: Activity Limitation Due to Health, 2023)',
             fontsize=14, fontweight='bold', pad=20)

ax.legend(loc='lower right', fontsize=11)
ax.grid(axis='x', alpha=0.3)
ax.set_xlim(0, eu_df['total_limitation'].max() * 1.1)

# Add EU27 average line
eu27_avg = eurostat_data['eu27_summary_2023']['some_or_severe_limitation']
ax.axvline(eu27_avg, color='black', linestyle='--', linewidth=1.5, 
           alpha=0.7, label=f'EU27 Average: {eu27_avg}%')
ax.legend(loc='lower right', fontsize=11)

plt.tight_layout()
plt.show()

print(f"EU27 average disability prevalence: {eu27_avg}%")
print(f"Range: {eu_df['total_limitation'].min():.1f}% to {eu_df['total_limitation'].max():.1f}%")

---

## Summary Statistics

In [None]:
print("INTERNATIONAL DISABILITY DATA: KEY STATISTICS")
print("=" * 60)

print("\n1. GLOBAL PREVALENCE (WHO)")
print(f"   - 1.3 billion people (16%) live with significant disability")
print(f"   - Low-income countries: {ref_stats['prevalence_by_income_level']['low_income']['percentage']}%")
print(f"   - High-income countries: {ref_stats['prevalence_by_income_level']['high_income']['percentage']}%")

print("\n2. EMPLOYMENT GAP (OECD)")
print(f"   - Average gap: {employment_df['gap_pp'].mean():.1f} percentage points")
print(f"   - Smallest gap: {employment_df['gap_pp'].min():.1f}pp ({employment_df.loc[employment_df['gap_pp'].idxmin(), 'country']})")
print(f"   - Largest gap: {employment_df['gap_pp'].max():.1f}pp ({employment_df.loc[employment_df['gap_pp'].idxmax(), 'country']})")

print("\n3. EU DISABILITY PREVALENCE (Eurostat GALI)")
print(f"   - EU27 total: {eu27['total_disability_rate_pct']}%")
print(f"   - Some limitation: {eu27['some_limitation_pct']}%")
print(f"   - Severe limitation: {eu27['severe_limitation_pct']}%")
print(f"   - EU employment gap: {eu27['employment_gap_pp']}pp")

print("\n4. HEALTHY LIFE EXPECTANCY (WHO HALE)")
latest_top = latest_hale.iloc[0]
latest_bottom = latest_hale.iloc[-1]
print(f"   - Highest: {latest_top['hale_years']:.1f} years ({latest_top['country_code']})")
print(f"   - Lowest: {latest_bottom['hale_years']:.1f} years ({latest_bottom['country_code']})")
print(f"   - Global disparity: {latest_top['hale_years'] - latest_bottom['hale_years']:.1f} years")

print("\n5. UN CRPD RATIFICATION")
print(f"   - Total signatories: {stats['total_signatures']}")
print(f"   - Total ratifications: {stats['total_ratifications']}")
print(f"   - Ratification rate: {stats['total_ratifications'] / crpd_data['metadata']['record_count'] * 100:.1f}%")
print(f"   - Optional Protocol ratifications: {stats['optional_protocol_ratifications']}")

print("\n6. DATASETS ANALYZED")
print(f"   1. WHO disability prevalence: {who_data['metadata']['sources'][2]['countries_covered']} countries")
print(f"   2. WHO GHO indicators: {gho_data['metadata']['total_records']:,} records")
print(f"   3. WHO HALE: {hale_df['country_code'].nunique()} countries, {hale_df['year'].min()}-{hale_df['year'].max()}")
print(f"   4. OECD: {len(employment_df)} countries")
print(f"   5. Eurostat: {len(eu_df)} EU/EEA countries")
print(f"   6. World Bank: {wb_data['metadata']['total_records']:,} records")
print(f"   7. UN CRPD: {crpd_data['metadata']['record_count']} countries")

print("\n" + "=" * 60)
print("Analysis complete.")

---

## Data Sources

1. **WHO Global Disability Prevalence**: WHO Global Report on Health Equity (2022), World Report on Disability (2011), UN SDG Database, World Bank WDI
2. **WHO GHO Disability Indicators**: Years Lived with Disability (YLD) and Disability-Adjusted Life Years (DALY), 2000-2021
3. **WHO HALE**: Healthy Life Expectancy by country, 2000-2021
4. **OECD Disability Data**: OECD reports (2022-2024), SOCX database, EU-SILC
5. **Eurostat EU Disability**: Global Activity Limitation Indicator (GALI), EU-SILC 2023
6. **World Bank Indicators**: Life expectancy, mortality, health expenditure, physician density, 2010-2023
7. **UN CRPD**: Convention on the Rights of Persons with Disabilities ratification status, OHCHR Treaty Body Database

All datasets fetched February 2026. See individual metadata files for detailed source documentation.

---

**Author**: Luke Steuber  
**License**: Data sourced from public international organizations (WHO, UN, OECD, Eurostat, World Bank). Analysis and visualizations by Luke Steuber.