# 13 – Health Inequalities: Measuring the Gradient

**Learning Objectives:**
- Understand the distinction between absolute and relative measures of inequality
- Calculate the Slope Index of Inequality (SII) and Relative Index of Inequality (RII)
- Interpret concentration curves and concentration indices
- Apply these measures to dietary intake and nutrition-related health outcomes
- Critically evaluate whether inequalities are widening or narrowing over time

---

## 1. Introduction: Why Measure Inequalities?

Average population health can improve while inequalities widen. Consider:

- Life expectancy in England increased for all groups between 2001-2019
- But the gap between the most and least deprived areas *also* increased

If we only track averages, we miss this divergence. Health inequality metrics help us:

1. **Describe** the current distribution of health across social groups
2. **Monitor** whether policies are reducing or widening gaps
3. **Target** interventions toward those in greatest need

### The Marmot Curve

Health follows a **gradient** — it's not simply that the poorest are unhealthy. Each step up the socioeconomic ladder is associated with better health outcomes. This gradient exists for:

- Mortality
- Chronic disease prevalence
- Dietary quality
- Obesity prevalence
- Mental health

## 2. Setup

In [None]:
# ============================================================
# Bootstrap cell (works both locally and in Colab)
# ============================================================

import os
import sys
import pathlib
import subprocess

REPO_URL = "https://github.com/ggkuhnle/phn-epi.git"
REPO_DIR = "phn-epi"

cwd = pathlib.Path.cwd()

if (cwd / "scripts" / "epi_utils.py").is_file():
    repo_root = cwd
elif (cwd.parent / "scripts" / "epi_utils.py").is_file():
    repo_root = cwd.parent
else:
    repo_root = cwd / REPO_DIR
    if not repo_root.is_dir():
        print(f"Cloning repository from {REPO_URL} ...")
        subprocess.run(["git", "clone", REPO_URL, str(repo_root)], check=True)
    else:
        print(f"Using existing repository at {repo_root}")
    os.chdir(repo_root)
    repo_root = pathlib.Path.cwd()

scripts_dir = repo_root / "scripts"
if str(scripts_dir) not in sys.path:
    sys.path.insert(0, str(scripts_dir))

print(f"Repository root: {repo_root}")
print("Bootstrap completed successfully.")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from ipywidgets import interact, FloatSlider, Dropdown, VBox, HBox, Output
import ipywidgets as widgets
from IPython.display import display

from epi_utils import (
    calculate_sii, calculate_rii, calculate_concentration_index,
    plot_concentration_curve, INEQUALITY_EXAMPLE_DATA
)

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = [10, 6]
np.random.seed(42)

print("Libraries loaded successfully.")

## 3. Example Data: Health Outcomes by Deprivation

We'll use data structured by **Index of Multiple Deprivation (IMD) quintiles**, where Q1 is most deprived and Q5 is least deprived.

In [None]:
# Example: Life expectancy by deprivation quintile (England, illustrative)
life_expectancy_data = pd.DataFrame({
    'quintile': ['Q1 (Most deprived)', 'Q2', 'Q3', 'Q4', 'Q5 (Least deprived)'],
    'quintile_rank': [1, 2, 3, 4, 5],
    'population_share': [0.20, 0.20, 0.20, 0.20, 0.20],
    'life_expectancy_male': [74.0, 77.2, 79.1, 80.5, 83.2],
    'life_expectancy_female': [78.8, 81.2, 82.8, 84.0, 86.3]
})

print("Life Expectancy by Deprivation Quintile (Illustrative Data)")
print("=" * 70)
display(life_expectancy_data)

In [None]:
# Visualise the gradient
fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(5)
width = 0.35

bars1 = ax.bar(x - width/2, life_expectancy_data['life_expectancy_male'], width, 
               label='Male', color='steelblue')
bars2 = ax.bar(x + width/2, life_expectancy_data['life_expectancy_female'], width,
               label='Female', color='coral')

ax.set_xlabel('Deprivation Quintile')
ax.set_ylabel('Life Expectancy (years)')
ax.set_title('The Social Gradient in Life Expectancy')
ax.set_xticks(x)
ax.set_xticklabels(['Q1\n(Most deprived)', 'Q2', 'Q3', 'Q4', 'Q5\n(Least deprived)'])
ax.legend()
ax.set_ylim(70, 90)

# Add gap annotation
gap_male = life_expectancy_data['life_expectancy_male'].iloc[-1] - life_expectancy_data['life_expectancy_male'].iloc[0]
ax.annotate(f'Gap: {gap_male:.1f} years', xy=(4, 83.5), fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nMale life expectancy gap (Q5 - Q1): {gap_male:.1f} years")
gap_female = life_expectancy_data['life_expectancy_female'].iloc[-1] - life_expectancy_data['life_expectancy_female'].iloc[0]
print(f"Female life expectancy gap (Q5 - Q1): {gap_female:.1f} years")

## 4. Absolute vs Relative Measures

### Simple Gap Measures

The simplest inequality measures compare extremes:

- **Absolute gap** = Value in least deprived − Value in most deprived
- **Relative gap (ratio)** = Value in least deprived ÷ Value in most deprived

### The Problem with Simple Gaps

These measures only use two data points and ignore the middle quintiles. They can give misleading impressions if the gradient is non-linear.

In [None]:
# Example: Obesity prevalence by deprivation
obesity_data = pd.DataFrame({
    'quintile': ['Q1', 'Q2', 'Q3', 'Q4', 'Q5'],
    'quintile_rank': [1, 2, 3, 4, 5],
    'population_share': [0.20, 0.20, 0.20, 0.20, 0.20],
    'obesity_prevalence': [0.34, 0.30, 0.27, 0.24, 0.20]
})

# Simple gap measures
absolute_gap = obesity_data['obesity_prevalence'].iloc[0] - obesity_data['obesity_prevalence'].iloc[-1]
relative_ratio = obesity_data['obesity_prevalence'].iloc[0] / obesity_data['obesity_prevalence'].iloc[-1]

print("Obesity Prevalence by Deprivation Quintile")
print("=" * 50)
display(obesity_data)
print(f"\nAbsolute gap (Q1 - Q5): {absolute_gap:.1%}")
print(f"Relative ratio (Q1 / Q5): {relative_ratio:.2f}")
print(f"\nInterpretation: Obesity prevalence is {relative_ratio:.1f}x higher in the most vs least deprived quintile")

## 5. Slope Index of Inequality (SII)

The SII uses **all** data points by fitting a regression line across the social gradient.

### Method

1. Rank groups from most to least deprived
2. Calculate the **cumulative population midpoint** (ridit) for each group
3. Regress the health outcome on the ridit score
4. The slope coefficient is the SII

$$\text{SII} = \beta_1 \text{ from: } Y = \beta_0 + \beta_1 \times \text{ridit}$$

The SII represents the **absolute difference** in the outcome between the theoretical most and least deprived individuals.

In [None]:
def calculate_sii_manual(data, outcome_col, pop_share_col='population_share'):
    """
    Calculate Slope Index of Inequality.
    
    Assumes data is ordered from most to least deprived.
    """
    # Calculate cumulative population share
    data = data.copy()
    data['cum_pop'] = data[pop_share_col].cumsum()
    data['cum_pop_lag'] = data['cum_pop'].shift(1, fill_value=0)
    
    # Ridit = midpoint of cumulative population
    data['ridit'] = (data['cum_pop'] + data['cum_pop_lag']) / 2
    
    # Weighted least squares regression
    # Weight by population share
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        data['ridit'], data[outcome_col]
    )
    
    return {
        'sii': slope,
        'intercept': intercept,
        'r_squared': r_value**2,
        'p_value': p_value,
        'data_with_ridit': data
    }

# Calculate SII for obesity
sii_result = calculate_sii_manual(obesity_data, 'obesity_prevalence')

print("SII Calculation for Obesity Prevalence")
print("=" * 50)
print(f"\nSlope Index of Inequality (SII): {sii_result['sii']:.3f}")
print(f"\nInterpretation: Moving from the most to least deprived")
print(f"position is associated with a {abs(sii_result['sii']):.1%} point")
print(f"{'decrease' if sii_result['sii'] < 0 else 'increase'} in obesity prevalence.")

# Show the ridit calculation
print("\nRidit scores (cumulative population midpoints):")
display(sii_result['data_with_ridit'][['quintile', 'obesity_prevalence', 'ridit']])

In [None]:
# Visualise the SII
fig, ax = plt.subplots(figsize=(10, 6))

data = sii_result['data_with_ridit']

# Plot points
ax.scatter(data['ridit'], data['obesity_prevalence'], s=100, color='steelblue', zorder=5)

# Plot regression line
x_line = np.linspace(0, 1, 100)
y_line = sii_result['intercept'] + sii_result['sii'] * x_line
ax.plot(x_line, y_line, 'r-', linewidth=2, label=f'SII = {sii_result["sii"]:.3f}')

# Labels
for i, row in data.iterrows():
    ax.annotate(row['quintile'], (row['ridit'], row['obesity_prevalence']),
                textcoords="offset points", xytext=(0, 10), ha='center')

ax.set_xlabel('Ridit Score (0 = most deprived, 1 = least deprived)')
ax.set_ylabel('Obesity Prevalence')
ax.set_title('Slope Index of Inequality for Obesity')
ax.legend()
ax.set_xlim(-0.05, 1.05)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

plt.tight_layout()
plt.show()

## 6. Relative Index of Inequality (RII)

The RII expresses inequality in **relative** terms:

$$\text{RII} = \frac{\text{SII}}{\text{Mean outcome}}$$

Or equivalently, it can be calculated as a ratio of predicted values at the extremes of the distribution.

The RII is useful for:
- Comparing inequalities across outcomes with different scales
- Comparing inequalities across time when absolute levels change

In [None]:
# Calculate RII
mean_obesity = obesity_data['obesity_prevalence'].mean()
rii = sii_result['sii'] / mean_obesity

# Alternative: ratio of predicted values
predicted_q1 = sii_result['intercept'] + sii_result['sii'] * 0  # ridit = 0
predicted_q5 = sii_result['intercept'] + sii_result['sii'] * 1  # ridit = 1
rii_ratio = predicted_q1 / predicted_q5

print("Relative Index of Inequality")
print("=" * 50)
print(f"Mean obesity prevalence: {mean_obesity:.1%}")
print(f"SII: {sii_result['sii']:.3f}")
print(f"RII (SII/mean): {rii:.2f}")
print(f"\nPredicted at most deprived (ridit=0): {predicted_q1:.1%}")
print(f"Predicted at least deprived (ridit=1): {predicted_q5:.1%}")
print(f"RII as ratio: {rii_ratio:.2f}")

## 7. Concentration Curves and Index

The **concentration curve** is analogous to the Lorenz curve for income inequality.

### Construction

1. Rank the population by socioeconomic position (poorest to richest)
2. Plot cumulative % of population (x-axis) against cumulative % of health outcome (y-axis)
3. The **line of equality** is the 45° diagonal

### Interpretation

- Curve **below** diagonal: outcome concentrated among the better-off
- Curve **above** diagonal: outcome concentrated among the worse-off
- **Concentration Index** = 2 × area between curve and diagonal (ranges -1 to +1)

In [None]:
def plot_concentration_curve_manual(data, outcome_col, pop_col='population_share', ax=None):
    """
    Plot concentration curve for a health outcome.
    
    Assumes data is ordered from most to least deprived.
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 8))
    
    data = data.copy()
    
    # Calculate cumulative shares
    total_outcome = (data[outcome_col] * data[pop_col]).sum()
    data['outcome_share'] = (data[outcome_col] * data[pop_col]) / total_outcome
    
    data['cum_pop'] = data[pop_col].cumsum()
    data['cum_outcome'] = data['outcome_share'].cumsum()
    
    # Add origin
    cum_pop = np.concatenate([[0], data['cum_pop'].values])
    cum_outcome = np.concatenate([[0], data['cum_outcome'].values])
    
    # Plot
    ax.plot([0, 1], [0, 1], 'k--', label='Line of equality')
    ax.plot(cum_pop, cum_outcome, 'b-', linewidth=2, marker='o', label='Concentration curve')
    ax.fill_between(cum_pop, cum_pop, cum_outcome, alpha=0.3)
    
    ax.set_xlabel('Cumulative % of population (most to least deprived)')
    ax.set_ylabel('Cumulative % of health outcome')
    ax.legend()
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.set_aspect('equal')
    
    # Calculate concentration index (simplified)
    # CI = 1 - 2 * area under concentration curve
    area = np.trapz(cum_outcome, cum_pop)
    ci = 1 - 2 * area
    
    return ci, data

# Plot for obesity
fig, ax = plt.subplots(figsize=(8, 8))
ci, _ = plot_concentration_curve_manual(obesity_data, 'obesity_prevalence', ax=ax)
ax.set_title(f'Concentration Curve for Obesity\nConcentration Index = {ci:.3f}')
plt.tight_layout()
plt.show()

print(f"\nConcentration Index: {ci:.3f}")
print(f"\nInterpretation: A negative CI indicates the outcome (obesity)")
print(f"is concentrated among the more deprived groups.")

## 8. Application: Dietary Quality by Deprivation

Let's apply these measures to dietary intake data.

In [None]:
# Example: Fruit and vegetable intake by deprivation (g/day)
diet_data = pd.DataFrame({
    'quintile': ['Q1', 'Q2', 'Q3', 'Q4', 'Q5'],
    'quintile_rank': [1, 2, 3, 4, 5],
    'population_share': [0.20, 0.20, 0.20, 0.20, 0.20],
    'fruit_veg_g': [220, 260, 300, 340, 400],  # grams per day
    'free_sugar_pct': [14.5, 13.0, 11.5, 10.5, 9.0],  # % of energy
    'fibre_g': [14, 16, 18, 20, 23]  # grams per day
})

print("Dietary Intake by Deprivation Quintile")
print("=" * 60)
display(diet_data)

In [None]:
# Calculate SII and RII for each dietary variable
dietary_inequalities = []

for outcome in ['fruit_veg_g', 'free_sugar_pct', 'fibre_g']:
    sii_res = calculate_sii_manual(diet_data, outcome)
    mean_val = diet_data[outcome].mean()
    rii = sii_res['sii'] / mean_val
    
    dietary_inequalities.append({
        'Outcome': outcome,
        'Mean': mean_val,
        'SII': sii_res['sii'],
        'RII': rii,
        'Direction': 'Pro-rich' if sii_res['sii'] > 0 else 'Pro-poor'
    })

ineq_df = pd.DataFrame(dietary_inequalities)

print("Dietary Inequalities Summary")
print("=" * 70)
display(ineq_df.style.format({'Mean': '{:.1f}', 'SII': '{:.1f}', 'RII': '{:.2f}'}))

print("\nInterpretation:")
print("- Fruit & veg: Higher intake among less deprived (pro-rich, desirable)")
print("- Free sugar: Higher intake among more deprived (pro-poor, undesirable)")
print("- Fibre: Higher intake among less deprived (pro-rich, desirable)")

## 9. Monitoring Change Over Time

A key use of inequality measures is tracking whether gaps are widening or narrowing.

In [None]:
# Example: Obesity trends over time
time_data = {
    2010: {'Q1': 0.28, 'Q2': 0.25, 'Q3': 0.23, 'Q4': 0.21, 'Q5': 0.18},
    2015: {'Q1': 0.31, 'Q2': 0.27, 'Q3': 0.25, 'Q4': 0.22, 'Q5': 0.19},
    2020: {'Q1': 0.34, 'Q2': 0.30, 'Q3': 0.27, 'Q4': 0.24, 'Q5': 0.20}
}

# Calculate SII for each year
trend_results = []

for year, prevalences in time_data.items():
    df = pd.DataFrame({
        'quintile': list(prevalences.keys()),
        'population_share': [0.20] * 5,
        'obesity': list(prevalences.values())
    })
    sii_res = calculate_sii_manual(df, 'obesity')
    mean_val = df['obesity'].mean()
    
    trend_results.append({
        'Year': year,
        'Mean prevalence': mean_val,
        'Q1 prevalence': prevalences['Q1'],
        'Q5 prevalence': prevalences['Q5'],
        'Absolute gap': prevalences['Q1'] - prevalences['Q5'],
        'SII': sii_res['sii'],
        'RII': sii_res['sii'] / mean_val
    })

trend_df = pd.DataFrame(trend_results)

print("Obesity Inequality Trends")
print("=" * 80)
display(trend_df.style.format({
    'Mean prevalence': '{:.1%}',
    'Q1 prevalence': '{:.1%}',
    'Q5 prevalence': '{:.1%}',
    'Absolute gap': '{:.1%}',
    'SII': '{:.3f}',
    'RII': '{:.2f}'
}))

In [None]:
# Visualise trends
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Prevalence by quintile over time
ax = axes[0]
for q in ['Q1', 'Q5']:
    values = [time_data[y][q] for y in [2010, 2015, 2020]]
    ax.plot([2010, 2015, 2020], values, marker='o', label=q, linewidth=2)
ax.set_xlabel('Year')
ax.set_ylabel('Obesity Prevalence')
ax.set_title('Obesity Prevalence: Q1 vs Q5')
ax.legend()
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

# Plot 2: Absolute gap
ax = axes[1]
ax.bar(trend_df['Year'].astype(str), trend_df['Absolute gap'], color='coral')
ax.set_xlabel('Year')
ax.set_ylabel('Absolute Gap (Q1 - Q5)')
ax.set_title('Absolute Inequality (Gap)')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

# Plot 3: SII
ax = axes[2]
ax.bar(trend_df['Year'].astype(str), trend_df['SII'].abs(), color='steelblue')
ax.set_xlabel('Year')
ax.set_ylabel('Slope Index of Inequality (absolute)')
ax.set_title('SII Over Time')

plt.tight_layout()
plt.show()

print("\nConclusion: Both absolute gap and SII have increased over time,")
print("indicating widening inequalities in obesity prevalence.")

## 10. Interactive Explorer

Explore how different patterns of inequality affect the SII and concentration index.

In [None]:
# Interactive inequality explorer
q1_slider = FloatSlider(value=0.35, min=0.10, max=0.50, step=0.01, description='Q1:')
q3_slider = FloatSlider(value=0.27, min=0.10, max=0.50, step=0.01, description='Q3:')
q5_slider = FloatSlider(value=0.20, min=0.10, max=0.50, step=0.01, description='Q5:')

output = Output()

def update_inequality(change=None):
    # Interpolate Q2 and Q4
    q1, q3, q5 = q1_slider.value, q3_slider.value, q5_slider.value
    q2 = (q1 + q3) / 2
    q4 = (q3 + q5) / 2
    
    data = pd.DataFrame({
        'quintile': ['Q1', 'Q2', 'Q3', 'Q4', 'Q5'],
        'population_share': [0.20] * 5,
        'outcome': [q1, q2, q3, q4, q5]
    })
    
    with output:
        output.clear_output(wait=True)
        
        # Calculate measures
        sii_res = calculate_sii_manual(data, 'outcome')
        mean_val = data['outcome'].mean()
        rii = sii_res['sii'] / mean_val
        
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Bar chart
        ax = axes[0]
        ax.bar(data['quintile'], data['outcome'], color='steelblue')
        ax.set_ylabel('Prevalence')
        ax.set_title(f'Outcome by Quintile\nSII = {sii_res["sii"]:.3f}, RII = {rii:.2f}')
        ax.set_ylim(0, 0.55)
        
        # Concentration curve
        ax = axes[1]
        ci, _ = plot_concentration_curve_manual(data, 'outcome', ax=ax)
        ax.set_title(f'Concentration Curve\nCI = {ci:.3f}')
        
        plt.tight_layout()
        plt.show()

for slider in [q1_slider, q3_slider, q5_slider]:
    slider.observe(update_inequality, names='value')

print("Adjust prevalence values to see how inequality measures change:")
print("(Q2 and Q4 are interpolated)")
display(VBox([q1_slider, q3_slider, q5_slider, output]))
update_inequality()

## 11. Discussion Questions

1. **Absolute vs relative**: If obesity prevalence doubles in all quintiles, what happens to the absolute gap? The relative ratio? Which gives a better picture of inequality?

2. **Policy targets**: PHE/OHID targets often focus on reducing absolute gaps. What are the advantages and disadvantages of this approach compared to targeting relative inequality?

3. **The concentration index for mortality**: If we calculate a concentration index for mortality (which is "bad"), should we expect it to be positive or negative? Why?

4. **Proportionate universalism**: Marmot's review recommended "proportionate universalism" — universal services with intensity proportionate to need. How might you measure whether a dietary intervention follows this principle?

## 12. Exercises

### Exercise 1: Calculate SII for Life Expectancy

Using the `life_expectancy_data` from Section 3, calculate the SII for male life expectancy.

In [None]:
# YOUR CODE HERE



### Exercise 2: Compare Two Time Points

The data below shows fruit and vegetable intake in 2010 and 2020. Calculate the SII for both years. Have inequalities widened or narrowed?

In [None]:
fv_2010 = pd.DataFrame({
    'quintile': ['Q1', 'Q2', 'Q3', 'Q4', 'Q5'],
    'population_share': [0.20] * 5,
    'fruit_veg_g': [200, 230, 260, 290, 340]
})

fv_2020 = pd.DataFrame({
    'quintile': ['Q1', 'Q2', 'Q3', 'Q4', 'Q5'],
    'population_share': [0.20] * 5,
    'fruit_veg_g': [220, 260, 300, 340, 400]
})

# YOUR CODE HERE



---

## Summary

- Health inequalities follow a **gradient** — not just a gap between richest and poorest
- **SII** (Slope Index of Inequality) measures the absolute difference across the gradient
- **RII** (Relative Index of Inequality) expresses this relative to the mean
- **Concentration curves and indices** visualise and quantify the distribution of health
- Monitoring over time reveals whether inequalities are widening or narrowing
- Dietary quality shows strong socioeconomic gradients, contributing to health inequalities

---

## References

- Marmot M. (2010). Fair Society, Healthy Lives (The Marmot Review).
- Mackenbach JP & Kunst AE. (1997). Measuring the magnitude of socio-economic inequalities in health. *Social Science & Medicine*.
- O'Donnell O et al. (2008). Analyzing Health Equity Using Household Survey Data. World Bank.
- Public Health England. (2017). Health Profile for England.