# World Cup Socioeconomic Choropleth

Interactively explore GDP per capita, HDI, life expectancy, and mean years of schooling for World Cup participant countries.

# Interactive Choropleth Visualization

This section provides an interactive choropleth tool for exploring the geographic distribution of socioeconomic indicators across countries for any selected year. The interface allows users to examine how different metrics vary spatially and to compare relative performance across nations.

## Purpose

The choropleth visualization illustrates cross-country differences in a wide range of socioeconomic metrics, including:

* GDP per capita
* Human Development Index (HDI)
* Life expectancy
* Mean years of schooling
* Median age
* Age skew
* Normalized versions of each metric for distribution-based comparisons

By selecting a metric and a year, the user can study global patterns and identify broad development trends, disparities, and clusters.

## How It Works

1. **Data Source**
   The visualization relies on `metrics_df`, a unified long-form dataset produced from multiple raw socioeconomic files.
   The dataset is standardized using a normalization function that ensures all files share the same structural columns:

   * `country`
   * `iso3`
   * `year`
   * `value`
   * `metric`

   All datasets are truncated to a shared maximum year to ensure alignment across metrics.

2. **Metric Selection**
   A dropdown widget provides access to all available metrics, including both raw and normalized variants.
   Selecting a metric determines which values will be mapped onto the world choropleth.

3. **Year Selection**
   A slider widget lets the user choose any year within the valid range of the dataset.
   The year range is computed automatically based on the merged dataset.

4. **Dynamic Choropleth Construction**
   A dedicated function builds the choropleth using Plotly.
   The function:

   * Filters the dataset to the chosen metric and year
   * Sets the color scale dynamically based on the data range
   * Produces a hover-enabled world map with country-level values
   * Handles missing data gracefully by displaying a placeholder message when necessary

5. **Widget Callbacks**
   The `observe` method is used to connect the dropdown and slider to an update function.
   When the user changes either control, the map is rebuilt, the output container is cleared, and the updated figure is rendered.

## Components

* **Metric Dropdown:** Allows selection of the socioeconomic indicator to visualize.
* **Year Slider:** Allows selection of the year to be displayed.
* **Output Container:** Displays a dynamic Plotly choropleth map.
* **Update Function:** Refreshes the map when the user changes metric or year.

## Features

* Unified and standardized data loading across multiple source files
* Automatic detection and correction of inconsistent column names
* Enforcement of a shared year range for all metrics
* Safe re-run behavior through widget cleanup and output clearing
* Consistent color scaling and clear thematic mapping
* Works seamlessly with normalized and non-normalized versions of each metric

In [2]:
import pandas as pd
import plotly.express as px
from pathlib import Path
import ipywidgets as widgets
from IPython.display import display, clear_output

# -----------------------------------------------------------
# Locate data directory
# -----------------------------------------------------------
PROJECT_ROOT = Path(__file__).resolve().parents[2] if "__file__" in globals() else Path.cwd()
while not (PROJECT_ROOT / "data").exists():
    PROJECT_ROOT = PROJECT_ROOT.parent

DATA_DIR = PROJECT_ROOT / "data" / "created_datasets" / "socioeconomic"


# -----------------------------------------------------------
# Helper to normalize column names across all files
# -----------------------------------------------------------
def normalize_columns(df):
    """Normalize inconsistent column naming conventions across datasets.

    Many datasets use different labels for the same concept (e.g., "ISO3 Alpha-code",
    "iso3", "Country", "Name", "Year", "Time"). This helper standardizes them into
    consistent names so downstream merging is reliable.

    Args:
        df (pd.DataFrame): A raw socioeconomic dataset with heterogeneous columns.

    Returns:
        pd.DataFrame: A copy of the DataFrame with the following standardized columns:
            - 'country': country name
            - 'iso3': ISO3 country code
            - 'year': numeric year
    """
    rename_map = {}

    # Standardize country name
    for c in df.columns:
        if c.lower() in ["country", "nation", "name"]:
            rename_map[c] = "country"

    # Standardize ISO3 country code
    for c in df.columns:
        if c.lower() in ["iso3", "iso3 code", "iso3 alpha-code", "alpha_3", "code3"]:
            rename_map[c] = "iso3"

    # Standardize year
    for c in df.columns:
        if c.lower() in ["year", "yr", "time"]:
            rename_map[c] = "year"

    df = df.rename(columns=rename_map)
    return df


# -----------------------------------------------------------
# Metric list (your updated full list)
# -----------------------------------------------------------
metric_sources = {
    # GDP
    "GDP per capita": ("gdp_world_cup.csv", "gdp_per_capita"),
    "Normalized GDP/capita": ("gdp_world_cup.csv", "norm_gdp_per_capita"),

    # HDI
    "HDI": ("hdi_world_cup.csv", "hdi"),
    "Normalized HDI": ("hdi_world_cup.csv", "norm_hdi"),

    # Life Expectancy
    "Life expectancy": ("life_expectancy_world_cup.csv", "life_expectancy"),
    "Normalized Life expectancy": ("life_expectancy_world_cup.csv", "norm_life_expectancy"),

    # Schooling
    "Mean years of schooling": ("schooling_world_cup.csv", "mean_school_years"),
    "Normalized Mean years of schooling": ("schooling_world_cup.csv", "norm_mean_school_years"),

    # Population / Age Structure
    "Median age": ("world_cup_pop_long.csv", "Median Age"),
    "Normalized Median age": ("world_cup_pop_long.csv", "norm_median_age"),
    "Age Skew": ("world_cup_pop_long.csv", "skew"),
    "Normalized Age Skew": ("world_cup_pop_long.csv", "norm_skew"),
}


# -----------------------------------------------------------
# Find a COMMON maximum year across all files
# -----------------------------------------------------------
max_years = []

for label, (filename, value_col) in metric_sources.items():
    df = pd.read_csv(DATA_DIR / filename)
    df = normalize_columns(df)

    if "year" not in df.columns:
        raise KeyError(
            f"File '{filename}' is missing a recognized 'year' column "
            f"after normalization. Available columns: {df.columns.tolist()}"
        )

    max_years.append(df["year"].max())

COMMON_YEAR_MAX = min(max_years)


# -----------------------------------------------------------
# Load and normalize all metrics
# -----------------------------------------------------------
frames = []

for label, (filename, value_col) in metric_sources.items():
    df = pd.read_csv(DATA_DIR / filename)
    df = normalize_columns(df)

    # Ensure required structural columns exist
    if "country" not in df.columns or "iso3" not in df.columns:
        raise KeyError(
            f"File '{filename}' is missing 'country' or 'iso3' after normalization. "
            f"Columns: {df.columns.tolist()}"
        )

    # Restrict to valid year range
    df = df[df["year"] <= COMMON_YEAR_MAX]

    if value_col not in df.columns:
        raise KeyError(
            f"Column '{value_col}' missing in '{filename}'. "
            f"Columns: {df.columns.tolist()}"
        )

    # Create tidy subset
    subset = df[["country", "iso3", "year", value_col]].copy()
    subset = subset.rename(columns={value_col: "value"})
    subset["metric"] = label
    frames.append(subset)

# Merge all metrics into one long-form DataFrame
metrics_df = pd.concat(frames, ignore_index=True)
metrics_df["year"] = metrics_df["year"].astype(int)

year_min = metrics_df["year"].min()
year_max = metrics_df["year"].max()


# -----------------------------------------------------------
# Choropleth builder
# -----------------------------------------------------------
def build_choropleth(metric_label, year):
    """Construct a choropleth map for a given metric and year.

    Args:
        metric_label (str): Name of the socioeconomic metric
            (must match keys in metric_sources).
        year (int): Selected year for visualization.

    Returns:
        plotly.graph_objects.Figure: Fully assembled choropleth map.
    """
    df = metrics_df[
        (metrics_df["metric"] == metric_label) &
        (metrics_df["year"] == year)
    ]

    if df.empty:
        fig = px.choropleth()
        fig.add_annotation(text="No data available", showarrow=False)
        return fig

    fig = px.choropleth(
        df,
        locations="iso3",
        color="value",
        hover_name="country",
        color_continuous_scale="Viridis",
        labels={"value": metric_label},
        range_color=(df["value"].min(), df["value"].max()),
    )

    fig.update_layout(
        title=f"{metric_label} ({year})",
        margin=dict(l=0, r=0, t=40, b=0)
    )

    return fig


# -----------------------------------------------------------
# Widget UI (clean, no duplicates)
# -----------------------------------------------------------
metric_dropdown = widgets.Dropdown(
    options=sorted(metric_sources.keys()),
    value="GDP per capita",
    description="Metric:",
    layout=widgets.Layout(width="350px"),
)

year_slider = widgets.IntSlider(
    min=year_min,
    max=year_max,
    value=year_min,
    step=1,
    description="Year:",
    continuous_update=False,
)

fig_output = widgets.Output()


def update_plot(*args):
    """Update the choropleth visualization based on widget selections.

    This callback is triggered whenever the metric dropdown or year slider
    is changed. It rebuilds the figure using `build_choropleth()`, clears
    the previous output, and displays the updated map.

    Args:
        *args: Ignored widget event metadata required by ipywidgets.
    """
    fig = build_choropleth(metric_dropdown.value, year_slider.value)
    with fig_output:
        clear_output(wait=True)
        display(fig)


# Attach widget listeners
metric_dropdown.observe(update_plot, names="value")
year_slider.observe(update_plot, names="value")

# Build UI
ui = widgets.VBox([
    widgets.HBox([metric_dropdown, year_slider]),
    fig_output
])

display(ui)
update_plot()


VBox(children=(HBox(children=(Dropdown(description='Metric:', index=1, layout=Layout(width='350px'), options=(…

# Interactive Trend-Line Visualization

This section provides an interactive tool for examining how key socioeconomic metrics evolve over time for any country included in the unified dataset. The interface uses dropdown widgets to dynamically update a Plotly line chart based on user selections.

## Purpose

The trend-line component allows the user to visualize temporal patterns in several important socioeconomic indicators, including:

* GDP per capita
* Human Development Index (HDI)
* Life expectancy
* Mean years of schooling
* Median age
* Age skew (a measure of the population age distribution)

These metrics help illustrate long-run development trajectories and enable comparisons between countries across decades.

## How It Works

1. **Data Source**
   The visualization uses `metrics_df`, the standardized long-form dataset created in the choropleth section. It contains the following fields:

   * `country`
   * `iso3`
   * `year`
   * `metric`
   * `value`

2. **Metric Selection**
   Only non-normalized metrics are included.
   Normalized metrics are removed from the trend-line interface because they are often normalized within each year, resulting in identical trend shapes that do not reflect actual temporal changes.

3. **Country Selection**
   Selecting a country triggers the loading of that country’s time series for the chosen metric.

4. **Dynamic Plot Generation**
   The visualization uses Plotly to create an interactive line chart with:

   * Year markers
   * Hover tooltips
   * Responsive updates without producing duplicate figures

   Output is managed using an `ipywidgets.Output` container to ensure clean redraws.

5. **Widget Callbacks**
   The `observe` method is used to attach callback functions to the dropdowns.
   When either selection changes, the callback clears the previous plot and renders a new one.

## Components

* **Country Dropdown:** Allows the user to select the country whose time series will be displayed.
* **Metric Dropdown:** Provides a filtered list of non-normalized socioeconomic indicators.
* **Output Area:** Displays the Plotly time-series chart.
* **Callback Function:** Rebuilds the visualization whenever the country or metric selection changes.

## Features

* Prevents duplicated plots through proper output clearing
* Safe to rerun without accumulating widget state
* Includes Google-style docstrings for maintainability and clarity
* Fully compatible with the unified dataset created earlier in the notebook

In [4]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import plotly.express as px

# -----------------------------------------------------------
# Cleanup any prior widgets (avoid duplicates)
# -----------------------------------------------------------
for wname in ["trend_country_dropdown", "trend_metric_dropdown",
              "trend_output", "trend_ui"]:
    obj = globals().get(wname)
    if obj is not None and hasattr(obj, "close"):
        obj.close()


# -----------------------------------------------------------
# Filter metrics: EXCLUDE normalized metrics
# -----------------------------------------------------------
def get_non_normalized_metrics(metrics_df):
    """Return a sorted list of metric names excluding normalized versions.

    Args:
        metrics_df (pd.DataFrame): Master socioeconomic dataset containing
            columns ['country', 'iso3', 'year', 'value', 'metric'].

    Returns:
        List[str]: Sorted list of metric names that do NOT start with
            'Normalized'.
    """
    return sorted([
        m for m in metrics_df["metric"].unique()
        if not m.lower().startswith("normalized")
    ])


metrics_no_norm = get_non_normalized_metrics(metrics_df)


# -----------------------------------------------------------
# Dropdown widgets
# -----------------------------------------------------------
trend_country_dropdown = widgets.Dropdown(
    options=sorted(metrics_df["country"].unique()),
    value="United States",
    description="Country:",
    layout=widgets.Layout(width="350px")
)

trend_metric_dropdown = widgets.Dropdown(
    options=metrics_no_norm,
    value=metrics_no_norm[0],
    description="Metric:",
    layout=widgets.Layout(width="350px")
)

trend_output = widgets.Output()


# -----------------------------------------------------------
# Trend plotting function
# -----------------------------------------------------------
def update_trend_plot(change=None):
    """Update the trend line plot based on selected country and metric.

    This function retrieves the selected country and metric from the
    dropdown widgets, filters the metrics_df DataFrame, and renders a
    Plotly line chart showing how the chosen metric evolves over time.

    Args:
        change (dict, optional): Callback event information from widget
            interaction. Unused but required for ipywidgets.observe().
    """
    country = trend_country_dropdown.value
    metric = trend_metric_dropdown.value

    # Filter data for selected country & metric
    sub = (
        metrics_df[
            (metrics_df["country"] == country)
            & (metrics_df["metric"] == metric)
        ]
        .sort_values("year")
    )

    with trend_output:
        clear_output(wait=True)

        if sub.empty:
            print("No data available for this selection.")
            return

        # Build trend line
        fig = px.line(
            sub,
            x="year",
            y="value",
            markers=True,
            title=f"{metric} over time — {country}",
            labels={"value": metric, "year": "Year"},
        )

        fig.update_layout(
            margin=dict(l=0, r=0, t=40, b=0)
        )

        display(fig)


# -----------------------------------------------------------
# Attach observers
# -----------------------------------------------------------
trend_country_dropdown.observe(update_trend_plot, names="value")
trend_metric_dropdown.observe(update_trend_plot, names="value")


# -----------------------------------------------------------
# UI layout
# -----------------------------------------------------------
trend_ui = widgets.VBox([
    widgets.HBox([trend_country_dropdown, trend_metric_dropdown]),
    trend_output
])


# -----------------------------------------------------------
# Display UI and initial plot
# -----------------------------------------------------------
display(trend_ui)
update_trend_plot()


VBox(children=(HBox(children=(Dropdown(description='Country:', index=67, layout=Layout(width='350px'), options…