# 🌍 COVID‑19 Trade Data Analysis

This interactive notebook allows exploration of trade data during the COVID‑19 pandemic:
- Select multiple countries & date ranges
- View correlation heatmaps
- Run ARIMA forecasting for trade recovery
- Cluster countries by trade recovery patterns

# 📌 Executive Summary

This project presents an in-depth, interactive analysis of international trade patterns during the COVID-19 pandemic using real trade data. The goal is to identify which countries were most affected, how quickly they recovered, and how trade behavior shifted across different transport modes and commodities.

Using a combination of **Python**, **pandas**, **matplotlib**, **seaborn**, and **ipywidgets**, the project provides dynamic visual insights that allow users to:
- Compare trade trends across selected countries over time.
- Track recovery speed by identifying when countries returned to 90% of pre-COVID trade levels.
- Analyze normalized recovery curves to compare the pace and strength of rebound across nations.
- Cluster countries based on trade recovery patterns using KMeans clustering.
- Explore transport mode contributions and how they were impacted during and after the pandemic.
- Switch between **trade values in Tonnes and Dollars** for flexible perspective.

Additional summary widgets offer on-demand answers to key questions such as:
- Which country recovered the fastest?
- Which transport mode was most affected?
- Which countries saw the sharpest trade drops?
- What changed in trade behavior post-COVID?

The final notebook serves as both a technical portfolio piece and a functional interactive tool for exploring COVID-era trade dynamics.

## 📌 Interactive Trade Data Explorer

This section allows you to explore COVID-19 trade data interactively using filters such as country, direction (import/export), transport mode, trade value, and date period.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display
import numpy as np

# Load your dataset
df = pd.read_csv("effects-of-covid-19-on-trade-at-15-december-2021-provisional.csv")

# Convert date column
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df.dropna(subset=['Date'], inplace=True)
df['Year'] = df['Date'].dt.year

# Replace inf values
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.dropna(subset=['Value'], inplace=True)

# Display basic info
df.info()
df.head()

## This code helps to determine for each country the first month when their average trade volume(value) recovered to **at least 90%** of their 2019 average, which is treated as the **"Pre-COVID"** benchmark.

In [None]:
# STEP 1: Get 2019 average trade per country
pre_covid = df[df['Date'].dt.year == 2019].groupby('Country')['Value'].mean()

# STEP 2: For each country, find when it first hit 90% of its 2019 average
recovery_list = []
for country in df['Country'].unique():
    country_df = df[df['Country'] == country].sort_values('Date')
    baseline = pre_covid.get(country)
    if pd.isna(baseline) or baseline == 0:
        continue
    recovery_threshold = 0.9 * baseline
    monthly_avg = country_df.resample('M', on='Date')['Value'].mean()
    recovery_date = monthly_avg[monthly_avg >= recovery_threshold].first_valid_index()
    if pd.notna(recovery_date):
        recovery_list.append((country, recovery_date))

# STEP 3: Convert to DataFrame
fastest_recovery_df = pd.DataFrame(recovery_list, columns=['Country', 'Recovery Date'])
fastest_recovery_df['Recovery Date'] = pd.to_datetime(fastest_recovery_df['Recovery Date'])
fastest_recovery_df = fastest_recovery_df.dropna().sort_values('Recovery Date')

## 📊 Summary Insight Dropdown

This section introduces an interactive dropdown widget that allows users to explore quick, high-level insights from the trade dataset based on their selected filters (Country, Direction, Transport Mode, Year Range, and Measure(Tonnes, $)).

The dropdown provides the following insights:

- **📈 Peak Trade Year**: Displays the year with the highest total trade value within the selected filter.
- **🔻 Lowest Trade Year**: Shows the year with the lowest trade performance in the given scope.
- **📉 Overall Trend Direction**: Analyzes whether the total trade value is generally increasing or decreasing over time.

These insights dynamically update based on the current selections made in the main widgets above.

This feature helps users quickly identify meaningful trends and highlights in the data without needing to analyze charts manually.

In [None]:
# Ensure datetime and year
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dropna(subset=['Date', 'Value'], inplace=True)
df['Year'] = df['Date'].dt.year

# --- Format function for labels ---
def format_billions(x):
    if x >= 1_000_000_000_000:
        return f"{x / 1_000_000_000_000:.1f}T"
    elif x >= 1_000_000_000:
        return f"{x / 1_000_000_000:.1f}B"
    elif x >= 1_000_000:
        return f"{x / 1_000_000:.1f}M"
    return f"{x:,.0f}"

# --- WIDGET SETUP ---
unique_countries = sorted(set(df['Country'].dropna().unique()) - {'All'})
unique_directions = sorted(set(df['Direction'].dropna().unique()) - {'All'})
unique_modes = sorted(set(df['Transport_Mode'].dropna().unique()) - {'All'})
unique_measures = sorted(df['Measure'].dropna().unique())

year_range = (df['Year'].min(), df['Year'].max())

# Define widgets
country_widget = widgets.Dropdown(options=['All'] + unique_countries, description='Country:')
direction_widget = widgets.Dropdown(options=['All'] + unique_directions, description='Direction:')
mode_widget = widgets.Dropdown(options=['All'] + unique_modes, description='Mode:')
measure_widget = widgets.Dropdown(options=unique_measures, description='Measure:')
year_widget = widgets.IntRangeSlider(
    value=[year_range[0], year_range[1]],
    min=year_range[0],
    max=year_range[1],
    step=1,
    description='Year Range:',
    continuous_update=False
)

# --- INTERACTIVE FUNCTION ---
def interactive_eda(country, year_range, direction, mode, measure):
    filtered = df[df['Measure'] == measure].copy()

    if country != 'All':
        filtered = filtered[filtered['Country'] == country]
    if direction != 'All':
        filtered = filtered[filtered['Direction'] == direction]
    if mode != 'All':
        filtered = filtered[filtered['Transport_Mode'] == mode]
    filtered = filtered[(filtered['Year'] >= year_range[0]) & (filtered['Year'] <= year_range[1])]
    
    if filtered.empty:
        print("⚠️ No data for current selection.")
        return
    
    # Grouped data
    grouped = filtered.groupby('Year')['Value'].sum()

    # Plot
    plt.figure(figsize=(10, 5))
    bars = sns.barplot(x=grouped.index, y=grouped.values, palette='viridis')

    # Add labels
    for bar, val in zip(bars.patches, grouped.values):
        bars.annotate(format_billions(val),
                      (bar.get_x() + bar.get_width() / 2, bar.get_height()),
                      ha='center', va='bottom', fontsize=9, color='black')
    
    plt.title(f"Trade Value Over Time ({measure})")
    plt.ylabel("Value")
    plt.xlabel("Year")
    plt.tight_layout()
    plt.show()

# Display widgets
ui = widgets.VBox([country_widget, year_widget, direction_widget, mode_widget, measure_widget])
out = widgets.interactive_output(interactive_eda, {
    'country': country_widget,
    'year_range': year_widget,
    'direction': direction_widget,
    'mode': mode_widget,
    'measure': measure_widget
})

display(ui, out)

## 📊 Forecasting Trade Recovery Using ARIMA

In this section, we forecast the trade value trends for selected countries using the ARIMA model. The goal is to understand how trade volumes are expected to evolve based on historical patterns.

Please select a country from the dropdown below to view its 12-month trade value forecast.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display, Markdown
from statsmodels.tsa.arima.model import ARIMA
import warnings

warnings.filterwarnings("ignore")

warnings.filterwarnings("ignore")
# --- Load and prepare data ---
df = pd.read_csv("effects-of-covid-19-on-trade-at-15-december-2021-provisional.csv")
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df.dropna(subset=['Date', 'Value'], inplace=True)
df['Measure'] = df['Measure'].astype(str).str.strip()

# Unique countries excluding 'All'
country_list = sorted([c for c in df['Country'].unique() if c != 'All'])
measure_list = ['$','Tonnes']

# --- Widgets ---
country_selector = widgets.Dropdown(
    options=country_list,
    description='Country:',
    layout=widgets.Layout(width='50%')
)

measure_selector = widgets.ToggleButtons(
    options=measure_list,
    description='Measure:',
    button_style='info'
)

# --- Forecast function ---
def plot_forecast(selected_country, selected_measure):
    filtered = df[(df['Country'] == selected_country) & (df['Measure'].str.contains(selected_measure, na=False))]
    
    # Aggregate monthly
    series = filtered.set_index('Date')['Value'].resample('M').sum().dropna()
    
    if len(series) < 24:
        display(Markdown(f"⚠️ Not enough data for `{selected_country}` ({selected_measure}) — need at least 24 months."))
        return

    # Fit and forecast
    model = ARIMA(series, order=(1, 1, 1))
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=12)

    # Plot
    plt.figure(figsize=(12, 5))
    plt.plot(series.index, series.values / 1e9, label='Observed', color='blue')
    plt.plot(forecast.index, forecast.values / 1e9, label='Forecast', color='red')
    plt.title(f"📈 12-Month Forecast for {selected_country} ({selected_measure})")
    plt.xlabel("Date")
    plt.ylabel("Trade Value (Billions)")
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# --- Display interactive widgets ---
widgets.interact(plot_forecast, selected_country=country_selector, selected_measure=measure_selector);

## 🔍 Features:
	
     •	Select countries, transport modes, directions
	
     •	Filter by date range
	
     •	View dynamic correlation matrix heatmap
	
     •	Great for uncovering trade trend similarities during COVID-19

In [None]:
from IPython.display import clear_output

# --- Load and preprocess data ---
df = pd.read_csv("effects-of-covid-19-on-trade-at-15-december-2021-provisional.csv")
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dropna(subset=['Date'], inplace=True)
df['Year'] = df['Date'].dt.year
df['Measure'] = df['Measure'].astype(str).str.strip()

# --- Widgets ---
measure_widget = widgets.ToggleButtons(
    options=['$', 'Tonnes'],
    description='Measure:',
    button_style='info'
)

direction_widget = widgets.Dropdown(
    options=['All'] + sorted(df['Direction'].dropna().unique()),
    description='Direction:',
    layout={'width': '250px'}
)

mode_widget = widgets.Dropdown(
    options=['All'] + sorted(df['Transport_Mode'].dropna().unique()),
    description='Transport:',
    layout={'width': '250px'}
)

country_widget = widgets.Dropdown(
    options=['All'] + sorted(df['Country'].dropna().unique()),
    description='Country:',
    layout={'width': '250px'}
)

year_range_widget = widgets.IntRangeSlider(
    value=[df['Year'].min(), df['Year'].max()],
    min=df['Year'].min(),
    max=df['Year'].max(),
    step=1,
    description='Year Range:',
    continuous_update=False,
    layout={'width': '600px'}
)

# --- Output area ---
heatmap_output = widgets.Output()

# --- Callback function ---
def update_heatmap(measure, direction, transport, country, year_range):
    with heatmap_output:
        clear_output()
        
        filtered = df.copy()
        if measure:
            filtered = filtered[filtered['Measure'].str.contains(measure, na=False)]
        if direction != 'All':
            filtered = filtered[filtered['Direction'] == direction]
        if transport != 'All':
            filtered = filtered[filtered['Transport_Mode'] == transport]
        if country != 'All':
            filtered = filtered[filtered['Country'] == country]
        filtered = filtered[(filtered['Year'] >= year_range[0]) & (filtered['Year'] <= year_range[1])]
        
        if filtered.empty:
            print("⚠️ No data for selected filters.")
            return
        
        pivot = filtered.groupby(['Country', 'Year'])['Value'].sum().unstack(fill_value=0)
        pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]

        plt.figure(figsize=(13, 8))
        sns.heatmap(pivot, cmap='YlOrRd', linewidths=0.3, linecolor='gray')
        plt.title(f"🌐 Trade Heatmap by Country & Year ({measure})")
        plt.xlabel("Year")
        plt.ylabel("Country")
        plt.tight_layout()
        plt.show()

# --- Interactive display ---
ui = widgets.VBox([
    widgets.HBox([measure_widget, direction_widget, mode_widget]),
    widgets.HBox([country_widget]),
    year_range_widget
])

interactive_heatmap = widgets.interactive_output(update_heatmap, {
    'measure': measure_widget,
    'direction': direction_widget,
    'transport': mode_widget,
    'country': country_widget,
    'year_range': year_range_widget
})

display(ui, heatmap_output, interactive_heatmap)

## We are using clustering to group countries that showed similar trade recovery patterns over time.

## For example:
	•	Cluster 1: Countries that recovered quickly and stayed high.
	•	Cluster 2: Countries with a slow and steady recovery.
	•	Cluster 3: Countries that dipped again after recovering.

In [None]:
from IPython.display import clear_output
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Ensure Date is datetime and extract Month
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Month'] = df['Date'].dt.to_period('M')

# --- Widget Setup ---
measure_options = sorted(df['Measure'].dropna().unique())
measure_widget = widgets.Dropdown(
    options=measure_options,
    description='Tonnes / $:',
    style={'description_width': 'initial'}
)

country_options = sorted(df['Country'].dropna().unique())
country_widget = widgets.SelectMultiple(
    options=country_options,
    description='Countries:',
    layout={'height': '200px'},
    style={'description_width': 'initial'}
)

cluster_slider = widgets.IntSlider(
    value=3, min=2, max=6, step=1, description='Clusters (k):'
)

# --- Interactive Plot Function ---
def update_clusters(measure, countries, k):
    clear_output(wait=True)

    # Filter by measure and countries
    filtered = df[df['Measure'] == measure]
    if countries:
        filtered = filtered[filtered['Country'].isin(countries)]

    # Group monthly
    monthly = filtered.groupby(['Country', 'Month'])['Value'].sum().reset_index()
    pivot = monthly.pivot(index='Month', columns='Country', values='Value').fillna(0)

    # If not enough data
    if pivot.shape[1] < k:
        print(f"⚠️ Not enough countries selected for {k} clusters.")
        return

    # Scale (countries as rows)
    scaled = StandardScaler().fit_transform(pivot.T)

    # KMeans
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(scaled)

    # Cluster assignment table
    clustered = pd.DataFrame({
        'Country': pivot.columns,
        'Cluster': labels
    }).sort_values('Cluster')

    # 👉 Display the cluster result table
    display(clustered)

    # Cluster mapping for plot
    cluster_map = dict(zip(clustered['Country'], clustered['Cluster']))

    # Plot
    plt.figure(figsize=(15, 5))
    for cluster in range(k):
        plt.subplot(1, k, cluster + 1)
        cluster_countries = [c for c in pivot.columns if cluster_map[c] == cluster]
        for country in cluster_countries:
            plt.plot(pivot.index.to_timestamp(), pivot[country], label=country)
        plt.title(f"Cluster {cluster}")
        plt.xticks(rotation=45)
        plt.tight_layout()
    
    plt.suptitle(f"📈 Trade Recovery Clustering ({measure})", fontsize=14, y=1.05)
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.show()

# --- Display Widgets ---
ui = widgets.VBox([measure_widget, country_widget, cluster_slider])
out = widgets.interactive_output(update_clusters, {
    'measure': measure_widget,
    'countries': country_widget,
    'k': cluster_slider
})

display(ui, out)

## 🧠 What this does:
	•	Normalizes trade recovery values across countries.
	•	Groups them into 3 clusters.
	•	Shows how many countries are in each.
	•	Then shows which country is in which cluster.

## This code below shows a graphical representation of Numbers of countries in each cluster and also represent each clusters in a graphical representation

In [None]:
from sklearn.cluster import KMeans
# Assume recovery is your pivoted monthly data: rows = dates, columns = countries
recovery = df.pivot_table(index='Date', columns='Country', values='Value', aggfunc='sum').fillna(0)

# Transpose and normalize
recovery_T = recovery.T
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
recovery_norm = scaler.fit_transform(recovery_T)

# Cluster
kmeans = KMeans(n_clusters=3, random_state=42)
cluster_labels = kmeans.fit_predict(recovery_norm)

# Map countries to clusters
country_cluster_map = pd.DataFrame({
    'Country': recovery_T.index,
    'Cluster': cluster_labels
})

# Plot as bar chart
plt.figure(figsize=(10, 6))
sns.countplot(data=country_cluster_map, x='Cluster', palette='tab10')
plt.title('Number of Countries in Each Cluster')
plt.xlabel('Cluster')
plt.ylabel('Country Count')
plt.show()

# Optional: Bar chart showing which country is in which cluster
plt.figure(figsize=(10, 8))
sns.barplot(data=country_cluster_map.sort_values('Cluster'), x='Cluster', y='Country', palette='Set2')
plt.title('Country Cluster Assignments')
plt.show()

## Interactive Comparative Visuals

## Let users select two or more countries and compare:
	•	Trade drop during COVID-19
	•	Speed of recovery
	•	Transport mode contributions


In [None]:
from IPython.display import clear_output

# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

# --- Widgets ---
country_selector = widgets.SelectMultiple(
    options=sorted(df['Country'].dropna().unique()),
    value=['China', 'United States'],
    description='Select Countries:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

min_date = df['Date'].min()
max_date = df['Date'].max()

date_range = widgets.SelectionRangeSlider(
    options=pd.date_range(min_date, max_date, freq='MS'),
    index=(0, len(pd.date_range(min_date, max_date, freq='MS')) - 1),
    description='Date Range:',
    layout=widgets.Layout(width='90%'),
    style={'description_width': 'initial'}
)

measure_selector = widgets.Dropdown(
    options=sorted(df['Measure'].dropna().unique()),
    description='Tonnes / $:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

# --- Format for large values ---
def format_billions(val):
    return f'{val/1e9:.1f}B'

# --- Main interactive function ---
def compare_countries(selected_countries, date_range, selected_measure):
    clear_output(wait=True)

    if not selected_countries:
        print("⚠️ Please select at least one country.")
        return

    start_date, end_date = date_range

    filtered_df = df[
        (df['Country'].isin(selected_countries)) &
        (df['Date'] >= start_date) &
        (df['Date'] <= end_date) &
        (df['Measure'] == selected_measure)
    ]

    if filtered_df.empty:
        print("⚠️ No data available for the selected filters. Try a different range, measure, or country.")
        return

    # --- Line Plot ---
    plt.figure(figsize=(12, 5))
    sns.lineplot(data=filtered_df, x='Date', y='Value', hue='Country')
    plt.title(f'Trade Value Over Time ({selected_measure})')
    plt.ylabel('Trade Value')
    plt.xlabel('Date')
    plt.legend(title='Country')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

    # --- Normalized Recovery Plot ---
    recovery_df = filtered_df.groupby(['Country', 'Date'])['Value'].sum().reset_index()
    recovery_df['Normalized'] = recovery_df.groupby('Country')['Value'].transform(
        lambda x: (x - x.min()) / (x.max() - x.min()) if x.max() != x.min() else 0
    )

    plt.figure(figsize=(12, 5))
    sns.lineplot(data=recovery_df, x='Date', y='Normalized', hue='Country')
    plt.title('Normalized Recovery Trend')
    plt.ylabel('Normalized Trade Value')
    plt.xlabel('Date')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

    # --- Transport Mode Contribution ---
    mode_df = filtered_df.groupby(['Country', 'Transport_Mode'])['Value'].sum().reset_index()

    plt.figure(figsize=(12, 6))
    barplot = sns.barplot(data=mode_df, x='Country', y='Value', hue='Transport_Mode')

    for container in barplot.containers:
        labels = [format_billions(val) for val in container.datavalues]
        barplot.bar_label(container, labels=labels, label_type='edge', padding=3)

    plt.title(f'Transport Mode Contribution ({selected_measure})')
    plt.ylabel('Total Trade Value (B)')
    plt.xlabel('Country')
    plt.xticks(rotation=45)
    plt.legend(title='Transport Mode')
    plt.tight_layout()
    plt.show()

# --- Display interactive dashboard ---
widgets.interact(
    compare_countries,
    selected_countries=country_selector,
    date_range=date_range,
    selected_measure=measure_selector
)

## Full Interactive Recovery Speed Analysis Code.
## Features of this code:

-- You can explore how recovery varies with different definitions of "baseline" and "recovery" time ranges.


-- The threshold slider lets you test stricter or looser definitions (e.g. 80% vs 95% recovery).


-- Results are sorted and printed interactively based on the settings.


In [None]:
# Ensure correct datetime formatting
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df = df.dropna(subset=['Date'])
df['Period'] = df['Date'].dt.to_period('M')

# Widgets
baseline_range = widgets.SelectionRangeSlider(
    options=sorted(df['Date'].dt.to_period('M').unique()),
    index=(0, 12),
    description='Baseline:',
    layout={'width': '500px'}
)

recovery_range = widgets.SelectionRangeSlider(
    options=sorted(df['Date'].dt.to_period('M').unique()),
    index=(15, len(df['Period'].unique()) - 1),
    description='Recovery:',
    layout={'width': '500px'}
)

threshold_slider = widgets.FloatSlider(
    value=0.9,
    min=0.5,
    max=1.0,
    step=0.05,
    description='Recovery Threshold:',
    readout_format='.0%',
    layout={'width': '400px'}
)

run_button = widgets.Button(description="Calculate Recovery", button_style='success')

output = widgets.Output()

def calculate_recovery(b):
    with output:
        output.clear_output()
        
        display(Markdown("### 🟢 Countries That Recovered Fastest from COVID-19 Trade Drop"))
        
        # Parse user inputs
        baseline_start, baseline_end = baseline_range.value
        recovery_start, recovery_end = recovery_range.value
        threshold = threshold_slider.value

        # Prepare grouped data
        recovery_stats = df.groupby(['Country', 'Date'])['Value'].sum().reset_index()
        recovery_stats['Period'] = recovery_stats['Date'].dt.to_period('M')
        
        # Calculate baseline average
        baseline = recovery_stats[
            (recovery_stats['Period'] >= baseline_start) & 
            (recovery_stats['Period'] <= baseline_end)
        ].groupby('Country')['Value'].mean()

        # Filter for recovery period
        post_covid = recovery_stats[
            (recovery_stats['Period'] >= recovery_start) & 
            (recovery_stats['Period'] <= recovery_end)
        ]

        recovered_countries = {}
        for country in post_covid['Country'].unique():
            baseline_val = baseline.get(country, None)
            if baseline_val is None:
                continue
            country_data = post_covid[post_covid['Country'] == country]
            recovery_point = country_data[country_data['Value'] >= threshold * baseline_val]
            if not recovery_point.empty:
                recovered_countries[country] = recovery_point.iloc[0]['Date']

        # Sort and display
        fastest_recovery = sorted(recovered_countries.items(), key=lambda x: x[1])
        if not fastest_recovery:
            print("⚠️ No countries met the recovery threshold in the selected period.")
        else:
            for country, date in fastest_recovery[:10]:
                print(f"{country}: Recovered by {date.strftime('%b %Y')}")

run_button.on_click(calculate_recovery)

# Display all widgets
display(widgets.VBox([baseline_range, recovery_range, threshold_slider, run_button, output]))

## 🔍 What This Interactive Code Does:
	•	Lets you define what counts as “baseline” and “post-COVID”.
	•	Computes the normalized drop in trade by transport mode.
	•	Shows the worst point after COVID as a percentage of the pre-COVID average with visualization
    .       Allows you to select base on the measure values(Tonnes or $)

In [None]:
# Ensure datetime and period formatting
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df = df.dropna(subset=['Date'])
df['Period'] = df['Date'].dt.to_period('M')

# --- WIDGETS ---
# Period selectors
periods = sorted(df['Period'].dropna().unique())

baseline_range = widgets.SelectionRangeSlider(
    options=periods,
    index=(0, 12),
    description='Baseline:',
    layout={'width': '500px'}
)

recovery_range = widgets.SelectionRangeSlider(
    options=periods,
    index=(15, len(periods) - 1),
    description='Recovery:',
    layout={'width': '500px'}
)

# Measure selector (Tonnes or $)
measure_selector = widgets.Dropdown(
    options=sorted(df['Measure'].dropna().unique()),
    value=sorted(df['Measure'].dropna().unique())[0],
    description='Measure:',
    layout={'width': '300px'}
)

# Recovery threshold
threshold_slider = widgets.FloatSlider(
    value=0.9,
    min=0.5,
    max=1.0,
    step=0.05,
    description='Recovery Threshold:',
    readout_format='.0%',
    layout={'width': '400px'}
)

# Button and output
run_button = widgets.Button(description="Calculate Recovery", button_style='success')
output = widgets.Output()

# --- Recovery Function ---
def calculate_recovery(b):
    with output:
        output.clear_output()
        
        display(Markdown("### 🟢 Countries That Recovered Fastest from COVID-19 Trade Drop"))
        
        baseline_start, baseline_end = baseline_range.value
        recovery_start, recovery_end = recovery_range.value
        threshold = threshold_slider.value
        selected_measure = measure_selector.value

        # Filter by selected measure
        filtered_df = df[df['Measure'] == selected_measure].copy()

        # Grouping
        recovery_stats = filtered_df.groupby(['Country', 'Date'])['Value'].sum().reset_index()
        recovery_stats['Period'] = recovery_stats['Date'].dt.to_period('M')
        
        # Baseline average per country
        baseline = recovery_stats[
            (recovery_stats['Period'] >= baseline_start) & 
            (recovery_stats['Period'] <= baseline_end)
        ].groupby('Country')['Value'].mean()

        # Recovery window
        post_covid = recovery_stats[
            (recovery_stats['Period'] >= recovery_start) & 
            (recovery_stats['Period'] <= recovery_end)
        ]

        recovered_countries = {}
        for country in post_covid['Country'].unique():
            baseline_val = baseline.get(country, None)
            if baseline_val is None or baseline_val == 0:
                continue
            country_data = post_covid[post_covid['Country'] == country]
            recovery_point = country_data[country_data['Value'] >= threshold * baseline_val]
            if not recovery_point.empty:
                recovered_countries[country] = recovery_point.iloc[0]['Date']

        # Results
        fastest_recovery = sorted(recovered_countries.items(), key=lambda x: x[1])
        if not fastest_recovery:
            print("⚠️ No countries met the recovery threshold in the selected period.")
        else:
            for country, date in fastest_recovery[:10]:
                print(f"{country}: Recovered by {date.strftime('%b %Y')}")

# --- Bind ---
run_button.on_click(calculate_recovery)

# --- Display UI ---
display(Markdown("## 📈 Trade Recovery Analysis"))
display(widgets.VBox([
    widgets.HBox([baseline_range, recovery_range]),
    widgets.HBox([measure_selector, threshold_slider]),
    run_button,
    output
]))

In [None]:
import pandas as pd
import plotly.express as px

# Load your dataset
df = pd.read_csv("effects-of-covid-19-on-trade-at-15-december-2021-provisional.csv")

# Ensure correct datetime parsing
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dropna(subset=['Date'], inplace=True)

# Filter to include only trade value in DOLLAR
df = df[df['Measure'].str.contains("$", na=False)]

# Extract Year from Date
df['Year'] = df['Date'].dt.year

# Group by Year, Country, Direction
yearly_data = df.groupby(['Year', 'Country', 'Direction'])['Value'].sum().reset_index()

# Plot animated bar chart by year and direction
fig = px.bar(
    yearly_data,
    x='Country',
    y='Value',
    color='Direction',
    animation_frame='Year',
    title='📦 Trade Value Over Time (Imports, Exports, Re-Imports)',
    labels={'Value': 'Trade Value ($)'},
    height=600
)

fig.update_layout(
    xaxis_tickangle=-30,
    yaxis_title='Trade Value ($)',
    xaxis_title='Country',
    showlegend=True
)

fig.show()

## 📊 Interactive COVID-19 Trade Insights Dashboard

This interactive dashboard enables exploration of trade trends and recovery patterns during and after the COVID-19 pandemic. It allows users to:

- 🔍 Select from key insights such as:
  - Fastest recovering countries
  - Most affected transport modes
  - Biggest trade drops during COVID-19
  - Countries’ bounce-back speed
  - Post-COVID transport mode changes

- 💱 Toggle between **trade value ($)** and **volume (Tonnes)** using the **"Measure"** dropdown for more precise analysis.

- 📅 View dynamic charts and summaries that respond to the selected filters.

This tool is built using `ipywidgets`, `pandas`, `matplotlib`, and `seaborn` to support interactive and visual data exploration.

In [None]:
# Sample fallback values
if 'fastest_recovery' not in locals():
    fastest_recovery = [('Country A', '2021-01-01'), ('Country B', '2021-03-01'), ('Country C', '2021-05-01')]

if 'min_points' not in locals():
    min_points = pd.Series({'Air': 55.2, 'Sea': 62.8, 'All': 50.0})

# Convert fastest_recovery to DataFrame
top_recovery_df = pd.DataFrame(fastest_recovery, columns=['Country', 'Recovery Date'])
top_recovery_df['Recovery Date'] = pd.to_datetime(top_recovery_df['Recovery Date'], errors='coerce')
top_recovery_df = top_recovery_df.dropna(subset=['Recovery Date'])

# --- Dropdown Widgets ---
insight_selector = widgets.Dropdown(
    options=[
        "Which country recovered fastest?",
        "Which transport mode was most affected?",
        "Top trade drop during COVID-19",
        "How fast did countries bounce back?",
        "What changed after COVID-19?"
    ],
    description='Insight:',
    layout={'width': '500px'}
)

measure_selector = widgets.Dropdown(
    options=sorted(df['Measure'].dropna().unique()),
    description='Measure:',
    layout={'width': '300px'}
)

# --- Output area ---
insight_output = widgets.Output()

# --- Callback Function ---
def display_insight(change):
    with insight_output:
        clear_output()
        insight = insight_selector.value
        selected_measure = measure_selector.value

        filtered_df = df[df['Measure'] == selected_measure].copy()
        filtered_df['Date'] = pd.to_datetime(filtered_df['Date'], errors='coerce')
        filtered_df.dropna(subset=['Date', 'Value'], inplace=True)

        if insight == "Which country recovered fastest?":
            display(Markdown("### 🟢 Fastest Recovering Countries"))
            display(Markdown("These are countries that returned to 90% of pre-COVID trade levels earliest."))

            top5 = top_recovery_df.copy()
            top5['Months to Recover'] = (top5['Recovery Date'] - pd.to_datetime('2020-04-01')).dt.days // 30
            top5 = top5.sort_values('Months to Recover').head(5)

            if top5.empty:
                display(Markdown("⚠️ No valid recovery dates available for plotting."))
            else:
                plt.figure(figsize=(8, 4))
                ax = sns.barplot(data=top5, x='Months to Recover', y='Country', palette='Greens')
                for i, row in top5.iterrows():
                    ax.text(row['Months to Recover'] + 0.2, i, f"{row['Recovery Date'].strftime('%b %Y')}",
                            va='center', color='black', fontsize=9)
                plt.title("Top 5 Countries by Fastest Recovery")
                plt.xlabel("Months from April 2020")
                plt.ylabel("Country")
                plt.grid(True)
                plt.tight_layout()
                plt.show()

        elif insight == "Which transport mode was most affected?":
            display(Markdown("### 🚨 Most Affected Transport Modes"))
            plt.figure(figsize=(6, 3))
            min_points.sort_values().plot(kind='barh', color='salmon')
            plt.xlabel("% of Pre-COVID Trade")
            plt.title("Minimum Post-COVID Trade Levels")
            plt.grid(True)
            plt.tight_layout()
            plt.show()

        elif insight == "Top trade drop during COVID-19":
            display(Markdown("### 📉 Country-wise Trade Collapse"))
            try:
                drop_df = filtered_df[filtered_df['Date'].dt.year.isin([2019, 2020])].groupby(
                    ['Country', filtered_df['Date'].dt.year])['Value'].sum().unstack()
                drop_df['Drop (%)'] = 100 * (1 - (drop_df[2020] / drop_df[2019]))
                drop_df = drop_df.sort_values('Drop (%)', ascending=False).dropna().head(10)

                plt.figure(figsize=(10, 5))
                sns.barplot(x=drop_df['Drop (%)'], y=drop_df.index, color='skyblue')
                for i, v in enumerate(drop_df['Drop (%)']):
                    plt.text(v + 1, i, f"{v:.1f}%", va='center')
                plt.title("Top 10 Trade Drops (2019→2020)")
                plt.xlabel("Drop (%)")
                plt.grid(True)
                plt.tight_layout()
                plt.show()
            except:
                display(Markdown("⚠️ Trade drop data not available or filtering failed."))

        elif insight == "How fast did countries bounce back?":
            display(Markdown("### ⏱️ Recovery Duration"))
            if not top_recovery_df.empty:
                bounce = top_recovery_df.copy()
                bounce['Months'] = (bounce['Recovery Date'] - pd.to_datetime('2020-04-01')).dt.days // 30
                bounce = bounce.sort_values('Months').head(5)

                plt.figure(figsize=(8, 4))
                sns.barplot(data=bounce, x='Months', y='Country', palette='Blues_r')
                for i, v in enumerate(bounce['Months']):
                    plt.text(v + 0.3, i, f"{v} months", va='center')
                plt.title("Bounce-Back Duration from COVID Drop")
                plt.grid(True)
                plt.tight_layout()
                plt.show()
            else:
                display(Markdown("⚠️ No valid recovery data found."))

        elif insight == "What changed after COVID-19?":
            display(Markdown("### 🔍 Post-COVID Transport Trends"))
            try:
                trend = filtered_df[filtered_df['Date'] > '2020-03'].groupby(
                    ['Transport_Mode', filtered_df['Date'].dt.year])['Value'].sum().unstack()
                trend = trend.div(trend.sum(axis=0), axis=1)
                trend.T.plot(kind='bar', stacked=True, figsize=(9, 5), colormap='Paired')
                plt.title(f"Transport Mode Share by Year (Post-COVID) - {selected_measure}")
                plt.ylabel("Share of Trade")
                plt.xticks(rotation=0)
                plt.tight_layout()
                plt.grid(True)
                plt.show()
            except:
                display(Markdown("⚠️ Transport mode breakdown not available."))

# --- Bind and display ---
insight_selector.observe(display_insight, names='value')
measure_selector.observe(display_insight, names='value')

display(Markdown("## 📌 Summary Insights"))
display(widgets.HBox([insight_selector, measure_selector]), insight_output)

# Trigger once at start
display_insight({'new': insight_selector.value})

## 📊 Interactive Trade Value Animation by Measure

This interactive visualization shows how trade values changed year-by-year across countries and trade directions (Imports, Exports, Re-Imports). 

Use the **Measure toggle** to switch between:

- **Tonnes** — physical quantity of trade
- **$** — monetary value of trade

The animation highlights the impact of COVID-19 and how trade recovered over time for each country and direction.

In [None]:
# Extract measure types (Tonnes, $)
available_measures = sorted(df['Measure'].dropna().unique())

# Create widget
measure_widget = widgets.Dropdown(
    options=available_measures,
    value=available_measures[0],
    description='Measure:',
    layout={'width': '300px'}
)

# Output container
output = widgets.Output()

# Function to update chart
def update_chart(change):
    with output:
        clear_output()

        selected_measure = change['new']

        # Filter data
        filtered_df = df[df['Measure'] == selected_measure]

        if filtered_df.empty:
            print(f"No data found for measure: {selected_measure}")
            return

        # Group
        grouped = filtered_df.groupby(['Year', 'Country', 'Direction'])['Value'].sum().reset_index()

        # Plot
        fig = px.bar(
            grouped,
            x='Country',
            y='Value',
            color='Direction',
            animation_frame='Year',
            title=f"📦 Trade Value Over Time ({selected_measure})",
            labels={'Value': f'Trade Value ({selected_measure})'},
            height=600
        )
        fig.update_layout(xaxis_tickangle=-30, yaxis_title=f'Trade Value ({selected_measure})')
        fig.show()

# Initial call and binding
measure_widget.observe(update_chart, names='value')
display(measure_widget, output)
update_chart({'new': measure_widget.value})

## 🔍 Summary Statistics & Key Insights

In this section, we explore overall trade behavior during and after COVID-19:

- 🏁 **Fastest Recovery Country**: Which country bounced back earliest and strongest.
- 🚢 **Most Affected Transport Mode**: Mode that faced the sharpest decline.
- 🔄 **Shifting Trade Trends**: What changed before vs after COVID-19?

This analysis helps explain *how countries adapted and which sectors struggled or thrived*.

In [None]:
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, Markdown, clear_output

# Ensure datetime and period columns
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
df['Period'] = df['Date'].dt.to_period('M')

# --- Widgets ---
available_periods = sorted(df['Period'].dropna().unique())

summary_date_range = widgets.SelectionRangeSlider(
    options=available_periods,
    index=(0, len(available_periods)-1),
    description='Date Range:',
    layout=widgets.Layout(width='80%'),
    style={'description_width': 'initial'}
)

measure_selector = widgets.Dropdown(
    options=sorted(df['Measure'].dropna().unique()),
    value=sorted(df['Measure'].dropna().unique())[0],
    description='Measure:',
    layout={'width': '250px'}
)

output_summary = widgets.Output()

# --- Summary Stats Function ---
def display_summary_stats(change=None):
    with output_summary:
        clear_output(wait=True)

        start, end = summary_date_range.value
        start = pd.Period(start, freq='M')
        end = pd.Period(end, freq='M')
        selected_measure = measure_selector.value

        filtered_df = df[
            (df['Period'] >= start) &
            (df['Period'] <= end) &
            (df['Measure'] == selected_measure)
        ]

        if filtered_df.empty:
            display(Markdown("⚠️ **No data available for the selected filters.**"))
            return

        # Recovery by country
        monthly = filtered_df.groupby(['Country', 'Period'])['Value'].sum().unstack(fill_value=0)
        recovery_growth = monthly.diff(axis=1).mean(axis=1).sort_values(ascending=False)
        fastest_recovery = recovery_growth.idxmax()

        # Most affected mode
        affected_mode = filtered_df.groupby('Transport_Mode')['Value'].sum().idxmin()

        # Post-COVID growth comparison
        pre_covid = df[(df['Period'] < pd.Period('2020-03')) & (df['Measure'] == selected_measure)]\
            .groupby('Country')['Value'].mean()
        post_covid = df[(df['Period'] >= pd.Period('2020-06')) & (df['Measure'] == selected_measure)]\
            .groupby('Country')['Value'].mean()
        trend_change = (post_covid - pre_covid).sort_values(ascending=False).head(3)

        # --- Output ---
        display(Markdown(f"### 📊 Summary for {selected_measure}"))
        display(Markdown(f"**✅ Fastest Recovery:** {fastest_recovery}"))
        display(Markdown(f"**📉 Most Affected Transport Mode:** {affected_mode}"))
        display(Markdown("**📈 Post-COVID Trade Boost (Top 3 Countries):**"))
        for country, value in trend_change.items():
            unit = "tonnes" if "tonne" in selected_measure.lower() else "$"
            formatted_val = f"{value:,.0f} {unit}"
            display(Markdown(f"- {country}: +{formatted_val}"))

# --- Bind events ---
summary_date_range.observe(display_summary_stats, names='value')
measure_selector.observe(display_summary_stats, names='value')

# --- Display UI ---
display(Markdown("## 🧮 Summary Statistics and Insights"))
display(widgets.HBox([summary_date_range, measure_selector]))
display(output_summary)

# Trigger once initially
display_summary_stats()

## ✅ Conclusion

This analysis shows the severe impact of COVID-19 on global trade and how countries responded at different speeds. It highlights:
- Varying recovery trajectories by country
- Transport modes most affected
- Shifts in trade patterns post-pandemic

These insights are crucial for understanding economic resilience and future preparedness.

---

📌 **Tools Used**: pandas, matplotlib, seaborn, ipywidgets  
📊 **Focus**: Interactive data exploration, ARIMA forecasting, clustering, comparative visuals  
🧠 **Skills Demonstrated**: Time series analysis, clustering, data visualization, widget interactivity