<a href="https://colab.research.google.com/github/lawrennd/fitkit/blob/main/examples/atlas_fitness_comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Economic Fitness Analysis: Comparing 2000 vs 2020

This notebook demonstrates the Fitness-Complexity algorithm using real-world trade data from the Harvard Atlas of Economic Complexity. We compare economic fitness and product complexity between 2000 and 2020 to understand how countries and products evolved over this 20-year period.

## What is Economic Fitness?

The Fitness-Complexity algorithm (Tacchella et al., 2012) measures:
- **Economic Fitness (F)**: A country's ability to produce complex products
- **Product Complexity (Q)**: How difficult a product is to produce

These are computed iteratively from the country-product export matrix, where F_c depends on Q_p and vice versa.

In [None]:
import sys
import subprocess
from pathlib import Path


def _pip_install(args: list[str]) -> None:
    cmd = [sys.executable, "-m", "pip", *args]
    print("Running:", " ".join(cmd))
    subprocess.check_call(cmd)


def ensure_fitkit_installed() -> None:
    """Prefer editable local install; fall back to GitHub.

    - Local (typical): `pip install -e ..` when running from `examples/`
    - Colab/remote: `pip install git+https://github.com/lawrennd/fitkit.git`
    """
    try:
        import fitkit  # noqa: F401

        return
    except ImportError:
        pass

    here = Path.cwd().resolve()
    candidates = [here, here.parent, here.parent.parent]

    for root in candidates:
        if (root / "pyproject.toml").exists() and (root / "fitkit").is_dir():
            _pip_install(["install", "-e", str(root)])
            return

    _pip_install(["install", "git+https://github.com/lawrennd/fitkit.git"])


ensure_fitkit_installed()
import fitkit

print("fitkit version:", getattr(fitkit, "__version__", "unknown"))

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import spearmanr, pearsonr

from fitkit import load_atlas_trade, list_atlas_available_years
from fitkit.algorithms import FitnessComplexity

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

## 1. Data Loading

First, let's check what years are available and load data for 2000 and 2020.

In [None]:
# Check available years
years = list_atlas_available_years('hs92')
print(f"Atlas HS92 data available for {len(years)} years")
print(f"Range: {min(years)} - {max(years)}")
print(f"\nYears: {years}")

In [None]:
# Load data for 2000 and 2020 at 4-digit product level
print("Loading 2000 data...")
M_2000, countries_2000, products_2000 = load_atlas_trade(
    year=2000,
    classification='hs92',
    product_level=4,
    rca_threshold=1.0
)

print("\nLoading 2020 data...")
M_2020, countries_2020, products_2020 = load_atlas_trade(
    year=2020,
    classification='hs92',
    product_level=4,
    rca_threshold=1.0
)

print("\n" + "="*60)
print("DATA SUMMARY")
print("="*60)
print(f"\n2000: {M_2000.shape[0]} countries × {M_2000.shape[1]} products")
print(f"      Density: {M_2000.nnz / (M_2000.shape[0] * M_2000.shape[1]):.4f}")
print(f"      Non-zero entries: {M_2000.nnz:,}")

print(f"\n2020: {M_2020.shape[0]} countries × {M_2020.shape[1]} products")
print(f"      Density: {M_2020.nnz / (M_2020.shape[0] * M_2020.shape[1]):.4f}")
print(f"      Non-zero entries: {M_2020.nnz:,}")

## 2. Computing Fitness and Complexity

Now we compute the Fitness-Complexity metrics for both years. The algorithm iteratively updates country fitness and product complexity until convergence.

In [None]:
# Compute fitness-complexity for 2000
print("Computing fitness-complexity for 2000...")
fc_2000 = FitnessComplexity()
F_2000, Q_2000 = fc_2000.fit_transform(M_2000)
history_2000 = fc_2000.history_
print(f"  Converged in {len(history_2000['dF'])} iterations")

# Compute fitness-complexity for 2020
print("\nComputing fitness-complexity for 2020...")
fc_2020 = FitnessComplexity()
F_2020, Q_2020 = fc_2020.fit_transform(M_2000)
history_2020 = fc_2020.history_
print(f"  Converged in {len(history_2020['dF'])} iterations")


In [None]:
history_2000

In [None]:
# Plot convergence
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.semilogy(history_2000['iteration'], history_2000['error'], 'o-', label='2000')
ax1.semilogy(history_2020['iteration'], history_2020['error'], 's-', label='2020')
ax1.set_xlabel('Iteration')
ax1.set_ylabel('Relative Error')
ax1.set_title('Convergence of Fitness-Complexity Algorithm')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Distribution comparison
ax2.hist(np.log10(F_2000 + 1e-10), bins=30, alpha=0.5, label='2000', density=True)
ax2.hist(np.log10(F_2020 + 1e-10), bins=30, alpha=0.5, label='2020', density=True)
ax2.set_xlabel('log₁₀(Fitness)')
ax2.set_ylabel('Density')
ax2.set_title('Distribution of Country Fitness')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Country Rankings and Changes

Let's examine which countries have the highest economic fitness in each year and how rankings changed.

In [None]:
# Add fitness to dataframes
countries_2000['fitness'] = F_2000
countries_2000['year'] = 2000
countries_2000 = countries_2000.sort_values('fitness', ascending=False).reset_index(drop=True)
countries_2000['rank_2000'] = countries_2000.index + 1

countries_2020['fitness'] = F_2020
countries_2020['year'] = 2020
countries_2020 = countries_2020.sort_values('fitness', ascending=False).reset_index(drop=True)
countries_2020['rank_2020'] = countries_2020.index + 1

# Show top countries in each year
print("="*60)
print("TOP 20 COUNTRIES BY ECONOMIC FITNESS")
print("="*60)

top_2000 = countries_2000[['rank_2000', 'country', 'fitness']].head(20).copy()
top_2020 = countries_2020[['rank_2020', 'country', 'fitness']].head(20).copy()

comparison_top = pd.merge(
    top_2000.rename(columns={'fitness': 'fitness_2000'}),
    top_2020.rename(columns={'fitness': 'fitness_2020'}),
    on='country',
    how='outer'
).fillna({'rank_2000': 999, 'rank_2020': 999})

comparison_top['rank_2000'] = comparison_top['rank_2000'].astype(int)
comparison_top['rank_2020'] = comparison_top['rank_2020'].astype(int)
comparison_top = comparison_top.sort_values('rank_2020')

print("\n{:<10} {:>10} {:>15} {:>10} {:>15}".format(
    'Country', 'Rank 2000', 'Fitness 2000', 'Rank 2020', 'Fitness 2020'
))
print("-"*70)

for _, row in comparison_top.head(20).iterrows():
    r2000 = str(row['rank_2000']) if row['rank_2000'] < 999 else '-'
    f2000 = f"{row['fitness_2000']:.4f}" if pd.notna(row['fitness_2000']) else '-'
    r2020 = str(row['rank_2020']) if row['rank_2020'] < 999 else '-'
    f2020 = f"{row['fitness_2020']:.4f}" if pd.notna(row['fitness_2020']) else '-'

    print(f"{row['country']:<10} {r2000:>10} {f2000:>15} {r2020:>10} {f2020:>15}")

## 4. Fitness Changes Over Time

Now let's analyze which countries gained or lost the most economic fitness between 2000 and 2020.

In [None]:
# Merge data for countries present in both years
comparison = countries_2000[['country', 'fitness', 'rank_2000']].merge(
    countries_2020[['country', 'fitness', 'rank_2020']],
    on='country',
    how='inner',
    suffixes=('_2000', '_2020')
)

comparison['fitness_change'] = comparison['fitness_2020'] - comparison['fitness_2000']
comparison['fitness_pct_change'] = 100 * (comparison['fitness_2020'] / comparison['fitness_2000'] - 1)
comparison['rank_change'] = comparison['rank_2000'] - comparison['rank_2020']  # Positive = improved

print(f"\nAnalyzing {len(comparison)} countries present in both years\n")

# Biggest gainers and losers
print("="*60)
print("TOP 10 FITNESS GAINERS (2000-2020)")
print("="*60)
gainers = comparison.nlargest(10, 'fitness_change')
print("\n{:<10} {:>12} {:>12} {:>15} {:>12}".format(
    'Country', 'Fit. 2000', 'Fit. 2020', 'Abs. Change', 'Rank Change'
))
print("-"*70)
for _, row in gainers.iterrows():
    print(f"{row['country']:<10} {row['fitness_2000']:>12.4f} {row['fitness_2020']:>12.4f} "
          f"{row['fitness_change']:>+15.4f} {row['rank_change']:>+12.0f}")

print("\n" + "="*60)
print("TOP 10 FITNESS LOSERS (2000-2020)")
print("="*60)
losers = comparison.nsmallest(10, 'fitness_change')
print("\n{:<10} {:>12} {:>12} {:>15} {:>12}".format(
    'Country', 'Fit. 2000', 'Fit. 2020', 'Abs. Change', 'Rank Change'
))
print("-"*70)
for _, row in losers.iterrows():
    print(f"{row['country']:<10} {row['fitness_2000']:>12.4f} {row['fitness_2020']:>12.4f} "
          f"{row['fitness_change']:>+15.4f} {row['rank_change']:>+12.0f}")

## 5. Visualization: Fitness Scatter Plot

Let's create a scatter plot showing fitness in 2000 vs 2020, with countries colored by their change in fitness.

In [None]:
# Create scatter plot with color-coded changes
fig, ax = plt.subplots(figsize=(12, 10))

# Plot all countries
scatter = ax.scatter(
    comparison['fitness_2000'],
    comparison['fitness_2020'],
    c=comparison['fitness_change'],
    cmap='RdYlGn',
    s=100,
    alpha=0.6,
    edgecolors='black',
    linewidth=0.5
)

# Add diagonal line (no change)
max_val = max(comparison['fitness_2000'].max(), comparison['fitness_2020'].max())
ax.plot([0, max_val], [0, max_val], 'k--', alpha=0.3, linewidth=1, label='No change')

# Label top gainers and losers
top_movers = pd.concat([
    comparison.nlargest(5, 'fitness_change'),
    comparison.nsmallest(5, 'fitness_change')
])

for _, row in top_movers.iterrows():
    ax.annotate(
        row['country'],
        (row['fitness_2000'], row['fitness_2020']),
        xytext=(5, 5),
        textcoords='offset points',
        fontsize=9,
        bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor='gray', alpha=0.7)
    )

ax.set_xlabel('Economic Fitness 2000', fontsize=12)
ax.set_ylabel('Economic Fitness 2020', fontsize=12)
ax.set_title('Economic Fitness: 2000 vs 2020\n(Countries above diagonal improved, below declined)', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Fitness Change (2020 - 2000)', fontsize=11)

plt.tight_layout()
plt.show()

# Compute correlation
r_pearson, p_pearson = pearsonr(comparison['fitness_2000'], comparison['fitness_2020'])
r_spearman, p_spearman = spearmanr(comparison['fitness_2000'], comparison['fitness_2020'])

print(f"\nCorrelation between 2000 and 2020 fitness:")
print(f"  Pearson r = {r_pearson:.3f} (p < {p_pearson:.1e})")
print(f"  Spearman ρ = {r_spearman:.3f} (p < {p_spearman:.1e})")

## 6. Product Complexity Analysis

Now let's examine which products are most complex and how complexity changed over time.

In [None]:
# Add complexity to product dataframes
products_2000['complexity'] = Q_2000
products_2000 = products_2000.sort_values('complexity', ascending=False).reset_index(drop=True)
products_2000['rank_2000'] = products_2000.index + 1

products_2020['complexity'] = Q_2020
products_2020 = products_2020.sort_values('complexity', ascending=False).reset_index(drop=True)
products_2020['rank_2020'] = products_2020.index + 1

# Merge for comparison
product_comparison = products_2000[['product', 'complexity', 'rank_2000']].merge(
    products_2020[['product', 'complexity', 'rank_2020']],
    on='product',
    how='inner',
    suffixes=('_2000', '_2020')
)

product_comparison['complexity_change'] = product_comparison['complexity_2020'] - product_comparison['complexity_2000']

print("="*60)
print("TOP 15 MOST COMPLEX PRODUCTS (2020)")
print("="*60)
print("\n{:<12} {:>15} {:>15} {:>15}".format(
    'Product', 'Complex. 2000', 'Complex. 2020', 'Change'
))
print("-"*60)

for _, row in product_comparison.nlargest(15, 'complexity_2020').iterrows():
    print(f"{row['product']:<12} {row['complexity_2000']:>15.4f} "
          f"{row['complexity_2020']:>15.4f} {row['complexity_change']:>+15.4f}")

In [None]:
# Plot product complexity distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Histogram of complexity
axes[0, 0].hist(np.log10(Q_2000 + 1e-10), bins=30, alpha=0.5, label='2000', density=True)
axes[0, 0].hist(np.log10(Q_2020 + 1e-10), bins=30, alpha=0.5, label='2020', density=True)
axes[0, 0].set_xlabel('log₁₀(Complexity)')
axes[0, 0].set_ylabel('Density')
axes[0, 0].set_title('Distribution of Product Complexity')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Complexity scatter plot
axes[0, 1].scatter(
    product_comparison['complexity_2000'],
    product_comparison['complexity_2020'],
    alpha=0.5,
    s=50
)
max_q = max(product_comparison['complexity_2000'].max(), product_comparison['complexity_2020'].max())
axes[0, 1].plot([0, max_q], [0, max_q], 'k--', alpha=0.3)
axes[0, 1].set_xlabel('Product Complexity 2000')
axes[0, 1].set_ylabel('Product Complexity 2020')
axes[0, 1].set_title('Product Complexity: 2000 vs 2020')
axes[0, 1].grid(True, alpha=0.3)

# Complexity changes
axes[1, 0].hist(product_comparison['complexity_change'], bins=40, edgecolor='black', alpha=0.7)
axes[1, 0].axvline(0, color='red', linestyle='--', linewidth=1, alpha=0.5)
axes[1, 0].set_xlabel('Change in Complexity (2020 - 2000)')
axes[1, 0].set_ylabel('Number of Products')
axes[1, 0].set_title('Distribution of Complexity Changes')
axes[1, 0].grid(True, alpha=0.3)

# Rank changes
product_comparison['rank_change'] = product_comparison['rank_2000'] - product_comparison['rank_2020']
axes[1, 1].hist(product_comparison['rank_change'], bins=40, edgecolor='black', alpha=0.7)
axes[1, 1].axvline(0, color='red', linestyle='--', linewidth=1, alpha=0.5)
axes[1, 1].set_xlabel('Rank Change (positive = more complex in 2020)')
axes[1, 1].set_ylabel('Number of Products')
axes[1, 1].set_title('Distribution of Complexity Rank Changes')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 7. Diversification Analysis

Let's examine how product diversification (number of products exported) relates to economic fitness.

In [None]:
# Compute diversification (number of products per country)
div_2000 = M_2000.sum(axis=1).A1
div_2020 = M_2020.sum(axis=1).A1

comparison['diversification_2000'] = div_2000[comparison.index]
comparison['diversification_2020'] = div_2020[comparison.index]
comparison['div_change'] = comparison['diversification_2020'] - comparison['diversification_2000']

# Plot diversification vs fitness
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# 2000
axes[0].scatter(
    comparison['diversification_2000'],
    comparison['fitness_2000'],
    alpha=0.6,
    s=80,
    edgecolors='black',
    linewidth=0.5
)
axes[0].set_xlabel('Number of Products Exported', fontsize=12)
axes[0].set_ylabel('Economic Fitness', fontsize=12)
axes[0].set_title('Diversification vs Fitness (2000)', fontsize=13)
axes[0].grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(comparison['diversification_2000'], comparison['fitness_2000'], 1)
p = np.poly1d(z)
x_line = np.linspace(comparison['diversification_2000'].min(), comparison['diversification_2000'].max(), 100)
axes[0].plot(x_line, p(x_line), 'r--', alpha=0.5, linewidth=2, label=f'Linear fit')
axes[0].legend()

# 2020
axes[1].scatter(
    comparison['diversification_2020'],
    comparison['fitness_2020'],
    alpha=0.6,
    s=80,
    edgecolors='black',
    linewidth=0.5
)
axes[1].set_xlabel('Number of Products Exported', fontsize=12)
axes[1].set_ylabel('Economic Fitness', fontsize=12)
axes[1].set_title('Diversification vs Fitness (2020)', fontsize=13)
axes[1].grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(comparison['diversification_2020'], comparison['fitness_2020'], 1)
p = np.poly1d(z)
x_line = np.linspace(comparison['diversification_2020'].min(), comparison['diversification_2020'].max(), 100)
axes[1].plot(x_line, p(x_line), 'r--', alpha=0.5, linewidth=2, label=f'Linear fit')
axes[1].legend()

plt.tight_layout()
plt.show()

# Compute correlations
r_2000, p_2000 = spearmanr(comparison['diversification_2000'], comparison['fitness_2000'])
r_2020, p_2020 = spearmanr(comparison['diversification_2020'], comparison['fitness_2020'])

print(f"\nCorrelation between diversification and fitness:")
print(f"  2000: Spearman ρ = {r_2000:.3f} (p < {p_2000:.1e})")
print(f"  2020: Spearman ρ = {r_2020:.3f} (p < {p_2020:.1e})")

## 8. Summary Statistics

Let's summarize the key findings from our analysis.

In [None]:
print("="*70)
print("SUMMARY OF CHANGES (2000-2020)")
print("="*70)

print(f"\nCountries analyzed: {len(comparison)}")
print(f"Products compared: {len(product_comparison)}")

print(f"\n{'FITNESS CHANGES:':<40}")
print(f"  Mean change: {comparison['fitness_change'].mean():>25.4f}")
print(f"  Median change: {comparison['fitness_change'].median():>23.4f}")
print(f"  Std dev: {comparison['fitness_change'].std():>29.4f}")
print(f"  Countries that improved: {(comparison['fitness_change'] > 0).sum():>16}")
print(f"  Countries that declined: {(comparison['fitness_change'] < 0).sum():>16}")

print(f"\n{'DIVERSIFICATION CHANGES:':<40}")
print(f"  Mean change in # products: {comparison['div_change'].mean():>14.1f}")
print(f"  Median change in # products: {comparison['div_change'].median():>12.1f}")
print(f"  Countries more diversified: {(comparison['div_change'] > 0).sum():>15}")
print(f"  Countries less diversified: {(comparison['div_change'] < 0).sum():>15}")

print(f"\n{'PRODUCT COMPLEXITY:':<40}")
print(f"  Mean complexity change: {product_comparison['complexity_change'].mean():>17.4f}")
print(f"  Products more complex: {(product_comparison['complexity_change'] > 0).sum():>18}")
print(f"  Products less complex: {(product_comparison['complexity_change'] < 0).sum():>18}")

print("\n" + "="*70)

## 9. Key Insights

This analysis reveals several important patterns:

1. **Persistence**: High correlation between fitness in 2000 and 2020 shows that economic capabilities are persistent - countries that were fit in 2000 tend to remain fit.

2. **Diversification matters**: The positive correlation between the number of products exported and economic fitness confirms that capability diversity is important.

3. **Complexity evolution**: Product complexity rankings are relatively stable, but some products become more or less complex as global production patterns shift.

4. **Winner and losers**: Some countries made significant gains in fitness (often emerging economies), while others declined (often due to specialization in declining sectors).

## References

- Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A., & Pietronero, L. (2012). A new metrics for countries' fitness and products' complexity. *Scientific reports*, 2(1), 723.
- Hausmann, R., et al. (2014). *The Atlas of Economic Complexity*. MIT Press.
- Data source: Harvard Growth Lab's Atlas of Economic Complexity (https://atlas.hks.harvard.edu/)