# üè• European Health Analysis: Infrastructure vs Outcomes
## Portfolio Project Using Eurostat API

**Goal:** Analyze the relationship between healthcare infrastructure and population health outcomes across Europe

**Data Source:** Eurostat API (Official EU Statistics)

---

## Step 1: Setup

In [None]:
# Install packages
!pip install eurostat pandas matplotlib seaborn plotly requests numpy -q
print("‚úÖ Packages installed!")

In [None]:
# Imports
import eurostat
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 7)

print("üìä Ready to analyze!")

## Step 2: Fetch Data from Eurostat API

In [None]:
print("üîÑ Fetching European health data...\n")
print("‚è±Ô∏è  This takes ~30-60 seconds...\n")

# Dataset 1: Life expectancy at birth
print("üì• 1/2 Downloading life expectancy...")
life_exp_raw = eurostat.get_data_df('demo_mlexpec')
print(f"    ‚úÖ {len(life_exp_raw):,} records\n")

# Dataset 2: Hospital beds
print("üì• 2/2 Downloading hospital beds...")
beds_raw = eurostat.get_data_df('hlth_rs_bds')
print(f"    ‚úÖ {len(beds_raw):,} records\n")

print("‚ú® Data download complete!")

## Step 3: Process Life Expectancy

In [None]:
# Clean life expectancy data
life_exp = life_exp_raw.copy()

# Filter: age = less than 1 year (birth), sex = total
life_exp = life_exp[
    (life_exp['age'] == 'Y_LT1') & 
    (life_exp['sex'] == 'T')
]

# Get year columns and find most recent with data
year_cols = sorted([col for col in life_exp.columns if col.isdigit()], reverse=True)

for year in year_cols:
    if life_exp[year].notna().sum() > 15:
        life_year = year
        break

# Extract data
life_clean = life_exp[['geo', life_year]].copy()
life_clean.columns = ['country', 'life_expectancy']
life_clean['life_expectancy'] = pd.to_numeric(life_clean['life_expectancy'], errors='coerce')
life_clean = life_clean.dropna()

print(f"üìä Life Expectancy ({life_year})")
print(f"Countries: {len(life_clean)}\n")
print(life_clean.sort_values('life_expectancy', ascending=False).head(10))

## Step 4: Process Hospital Beds

In [None]:
# Clean hospital beds data
beds = beds_raw.copy()

# Filter for beds per 100k population
beds = beds[beds['unit'] == 'P_HTHAB']

# Find most recent year
year_cols = sorted([col for col in beds.columns if col.isdigit()], reverse=True)

for year in year_cols:
    if beds[year].notna().sum() > 15:
        beds_year = year
        break

# Extract data
beds_clean = beds[['geo', beds_year]].copy()
beds_clean.columns = ['country', 'beds_per_100k']
beds_clean['beds_per_100k'] = pd.to_numeric(beds_clean['beds_per_100k'], errors='coerce')
beds_clean = beds_clean.dropna()

print(f"üè• Hospital Beds ({beds_year})")
print(f"Countries: {len(beds_clean)}\n")
print(beds_clean.sort_values('beds_per_100k', ascending=False).head(10))

## Step 5: Combine & Enrich Dataset

In [None]:
# Merge datasets
df = life_clean.merge(beds_clean, on='country', how='inner')

# Add country names
country_map = {
    'AT': 'Austria', 'BE': 'Belgium', 'BG': 'Bulgaria', 'HR': 'Croatia',
    'CY': 'Cyprus', 'CZ': 'Czechia', 'DK': 'Denmark', 'EE': 'Estonia',
    'FI': 'Finland', 'FR': 'France', 'DE': 'Germany', 'EL': 'Greece',
    'HU': 'Hungary', 'IE': 'Ireland', 'IT': 'Italy', 'LV': 'Latvia',
    'LT': 'Lithuania', 'LU': 'Luxembourg', 'MT': 'Malta', 'NL': 'Netherlands',
    'PL': 'Poland', 'PT': 'Portugal', 'RO': 'Romania', 'SK': 'Slovakia',
    'SI': 'Slovenia', 'ES': 'Spain', 'SE': 'Sweden', 'NO': 'Norway',
    'IS': 'Iceland', 'CH': 'Switzerland', 'UK': 'United Kingdom'
}

df['name'] = df['country'].map(country_map).fillna(df['country'])

# Calculate categories
df['life_category'] = pd.cut(df['life_expectancy'], 
                              bins=[0, 77, 80, 100], 
                              labels=['Lower', 'Medium', 'High'])

df['beds_category'] = pd.cut(df['beds_per_100k'], 
                              bins=[0, 400, 600, 10000], 
                              labels=['Low', 'Medium', 'High'])

print(f"üìä Combined Dataset: {len(df)} countries\n")
print("Sample:")
print(df[['name', 'life_expectancy', 'beds_per_100k']].head())
print("\nStats:")
print(df[['life_expectancy', 'beds_per_100k']].describe())

## Step 6: Key Metrics

In [None]:
# Calculate correlation
correlation = df['beds_per_100k'].corr(df['life_expectancy'])

# European averages
avg_life = df['life_expectancy'].mean()
avg_beds = df['beds_per_100k'].mean()

# Top and bottom performers
top_life = df.nlargest(3, 'life_expectancy')[['name', 'life_expectancy', 'beds_per_100k']]
bottom_life = df.nsmallest(3, 'life_expectancy')[['name', 'life_expectancy', 'beds_per_100k']]
top_beds = df.nlargest(3, 'beds_per_100k')[['name', 'life_expectancy', 'beds_per_100k']]

print("üéØ KEY FINDINGS")
print("="*70)
print(f"\nüìä Correlation (Beds vs Life Expectancy): {correlation:.3f}")
print(f"\nüìà European Averages:")
print(f"   Life Expectancy: {avg_life:.1f} years")
print(f"   Hospital Beds: {avg_beds:.1f} per 100k")
print(f"\nüèÜ Top 3 Life Expectancy:")
for _, row in top_life.iterrows():
    print(f"   {row['name']}: {row['life_expectancy']:.1f} years ({row['beds_per_100k']:.0f} beds)")
print(f"\n‚ö†Ô∏è  Bottom 3 Life Expectancy:")
for _, row in bottom_life.iterrows():
    print(f"   {row['name']}: {row['life_expectancy']:.1f} years ({row['beds_per_100k']:.0f} beds)")
print(f"\nüè• Top 3 Hospital Beds:")
for _, row in top_beds.iterrows():
    print(f"   {row['name']}: {row['beds_per_100k']:.0f} per 100k ({row['life_expectancy']:.1f} years)")

## Step 7: Visualizations üìä

In [None]:
# Chart 1: Life Expectancy Rankings
fig = px.bar(
    df.sort_values('life_expectancy', ascending=False),
    x='name',
    y='life_expectancy',
    color='life_expectancy',
    color_continuous_scale='Viridis',
    title=f'Life Expectancy at Birth Across Europe ({life_year})',
    labels={'life_expectancy': 'Years', 'name': 'Country'},
    height=500
)
fig.update_layout(xaxis_tickangle=-45, showlegend=False)
fig.show()

In [None]:
# Chart 2: Hospital Beds Distribution
fig = px.bar(
    df.sort_values('beds_per_100k', ascending=False),
    x='name',
    y='beds_per_100k',
    color='beds_per_100k',
    color_continuous_scale='Blues',
    title=f'Hospital Beds per 100,000 Population ({beds_year})',
    labels={'beds_per_100k': 'Beds per 100k', 'name': 'Country'},
    height=500
)
fig.update_layout(xaxis_tickangle=-45, showlegend=False)
fig.show()

In [None]:
# Chart 3: THE KEY INSIGHT - Correlation Scatter
fig = px.scatter(
    df,
    x='beds_per_100k',
    y='life_expectancy',
    text='name',
    size='life_expectancy',
    color='life_expectancy',
    color_continuous_scale='RdYlGn',
    trendline='ols',
    title=f'Healthcare Infrastructure vs Outcomes (Correlation: {correlation:.3f})',
    labels={
        'beds_per_100k': 'Hospital Beds per 100,000',
        'life_expectancy': 'Life Expectancy (years)'
    },
    height=650
)
fig.update_traces(textposition='top center', textfont_size=9)
fig.add_annotation(
    text=f"European Average: {avg_life:.1f} years",
    xref="paper", yref="paper",
    x=0.02, y=0.98, showarrow=False,
    bgcolor="white", bordercolor="black", borderwidth=1
)
fig.show()

print(f"\nüí° INSIGHT: Correlation is {correlation:.3f}")
if correlation > 0.5:
    print("   ‚Üí Strong positive relationship")
elif correlation > 0.3:
    print("   ‚Üí Moderate positive relationship")
else:
    print("   ‚Üí Weak relationship - other factors matter more!")

In [None]:
# Chart 4: Efficiency Analysis
# Countries doing well with less infrastructure = efficient!

df['efficiency'] = df['life_expectancy'] / (df['beds_per_100k'] / 100)

fig = px.scatter(
    df,
    x='beds_per_100k',
    y='life_expectancy',
    text='name',
    size='efficiency',
    color='efficiency',
    color_continuous_scale='Plasma',
    title='Healthcare System Efficiency (Outcomes per Unit Infrastructure)',
    labels={
        'beds_per_100k': 'Hospital Beds per 100,000',
        'life_expectancy': 'Life Expectancy (years)',
        'efficiency': 'Efficiency Score'
    },
    height=650
)
fig.update_traces(textposition='top center', textfont_size=9)
fig.add_hline(y=avg_life, line_dash="dash", line_color="red", 
              annotation_text="EU Average Life Expectancy")
fig.add_vline(x=avg_beds, line_dash="dash", line_color="blue",
              annotation_text="EU Average Beds")
fig.show()

# Find most efficient countries
efficient = df.nlargest(5, 'efficiency')[['name', 'life_expectancy', 'beds_per_100k', 'efficiency']]
print("\nüåü Most Efficient Healthcare Systems:")
print("(High life expectancy relative to infrastructure)")
for _, row in efficient.iterrows():
    print(f"   {row['name']}: {row['life_expectancy']:.1f} years with {row['beds_per_100k']:.0f} beds")

In [None]:
# Chart 5: Category Heatmap
pivot = pd.crosstab(df['life_category'], df['beds_category'])

fig = px.imshow(
    pivot,
    text_auto=True,
    aspect='auto',
    color_continuous_scale='YlOrRd',
    title='Distribution: Life Expectancy vs Hospital Capacity Categories',
    labels={'x': 'Hospital Beds Category', 'y': 'Life Expectancy Category', 'color': 'Count'}
)
fig.show()

print("\nüìä Category Breakdown:")
print(pivot)

## Step 8: Export Results

In [None]:
# Save dataset
df.to_csv('eu_health_analysis.csv', index=False)
print("‚úÖ Data saved: eu_health_analysis.csv")

# Create summary JSON
summary = {
    'metadata': {
        'analysis_date': datetime.now().strftime('%Y-%m-%d'),
        'data_source': 'Eurostat API',
        'countries': len(df)
    },
    'data_years': {
        'life_expectancy': life_year,
        'hospital_beds': beds_year
    },
    'european_benchmarks': {
        'avg_life_expectancy': round(float(avg_life), 2),
        'avg_beds_per_100k': round(float(avg_beds), 2)
    },
    'correlation': round(float(correlation), 3),
    'top_performers': {
        'life_expectancy': top_life['name'].tolist(),
        'efficiency': efficient['name'].tolist()
    },
    'key_insights': [
        f"Analyzed {len(df)} European countries using official Eurostat data",
        f"Correlation between beds and life expectancy: {correlation:.3f}",
        f"Average European life expectancy: {avg_life:.1f} years",
        "Infrastructure alone doesn't determine outcomes",
        "Some countries achieve high outcomes with moderate infrastructure (efficiency)"
    ]
}

with open('analysis_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("‚úÖ Summary saved: analysis_summary.json")
print("\nüìÇ GitHub-ready files created!")

## üéØ Portfolio Impact Summary

In [None]:
print("\n" + "="*80)
print("üéØ ERASMUS+ APPLICATION HIGHLIGHTS")
print("="*80)

print(f"\n‚úÖ EUROPEAN DATA FOCUS")
print(f"   ‚Ä¢ Used official Eurostat API (EU Commission)")
print(f"   ‚Ä¢ Analyzed {len(df)} European countries")
print(f"   ‚Ä¢ Demonstrates European healthcare knowledge")

print(f"\n‚úÖ TECHNICAL SKILLS")
print(f"   ‚Ä¢ REST API integration (no authentication)")
print(f"   ‚Ä¢ Data cleaning & preprocessing")
print(f"   ‚Ä¢ Statistical analysis (correlation)")
print(f"   ‚Ä¢ Interactive visualizations (Plotly)")

print(f"\n‚úÖ HEALTHCARE INSIGHTS")
print(f"   ‚Ä¢ Infrastructure ‚â† Outcomes (correlation: {correlation:.3f})")
print(f"   ‚Ä¢ Efficiency matters more than capacity")
print(f"   ‚Ä¢ Policy-relevant findings")

print(f"\n‚úÖ DELIVERABLES")
print(f"   ‚Ä¢ Working Jupyter notebook")
print(f"   ‚Ä¢ Clean dataset (CSV)")
print(f"   ‚Ä¢ Analysis summary (JSON)")
print(f"   ‚Ä¢ 5 publication-ready visualizations")

print(f"\nüìù MEDIUM ARTICLE ANGLE:")
print(f'   "Why More Hospital Beds Don\'t Always Mean Longer Lives:')
print(f'    A Data Analysis of {len(df)} European Healthcare Systems"')

print(f"\nüí° KEY MESSAGE FOR APPLICATIONS:")
print(f"   This project demonstrates my ability to:")
print(f"   1. Work with European data sources")
print(f"   2. Deliver end-to-end analytics solutions")
print(f"   3. Extract policy-relevant insights")
print(f"   4. Communicate findings visually")

print("\n" + "="*80)
print("\nüöÄ NEXT: Upload to GitHub + Write Medium article + Deploy as Streamlit app")
print("="*80)

## üìö Additional Analysis Ideas

### To strengthen your portfolio further:

1. **Time Series Analysis**
   - Track changes over 5-10 years
   - Identify trends and outliers
   
2. **Clustering Analysis**
   - Group countries by healthcare model
   - K-means or hierarchical clustering

3. **Predictive Modeling**
   - Build regression model
   - Predict life expectancy from multiple factors

4. **COVID-19 Impact**
   - Before/after comparison
   - Infrastructure stress testing

5. **Interactive Dashboard**
   - Deploy on Streamlit Cloud
   - Allow users to explore data
   - Add filters and comparisons

---

**This notebook is ready for your GitHub portfolio!** üéâ