# Chart 1: Immigration to Sweden by Region of Origin (2000–2024)

## Data Source
**Statistics Sweden (SCB)** — Official Swedish government statistics

| Field | Value |
|-------|-------|
| Table | `BE0101M3` — Immigrations and emigrations by country of birth and sex |
| URL | https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__BE__BE0101__BE0101J/ImmiEmiFod/ |
| Coverage | 2000–2024 |
| Unit | Number of persons immigrating to Sweden per year |

## Methodology
1. Download immigration data by country of birth from SCB
2. Group ~200 countries into 8 geographic regions
3. Aggregate annual totals by region
4. Export to JSON for Vega-Lite visualization

In [5]:
import pandas as pd
import json

print("Libraries loaded successfully")

Libraries loaded successfully


In [8]:
from google.colab import files
uploaded = files.upload()

# Load data directly from GitHub
# Raw CSV downloaded from SCB, hosted on the repository in project/data/p1_raw.csv
# Alternatively you can go to: https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__BE__BE0101__BE0101J/ImmiEmiFod/
# Select: Immigrations → All countries → All years (2000-2024)

DATA_URL = "https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p1_raw.csv"

# SCB uses special characters, so we need latin-1 encoding
df_raw = pd.read_csv(DATA_URL, encoding='latin-1')

print(f"Loaded from: {DATA_URL}")
print(f"Shape: {df_raw.shape[0]} countries × {df_raw.shape[1]} columns")
print(f"\nColumns: {df_raw.columns.tolist()}")

print("First 10 rows:")
df_raw.head(10)

Loaded from: https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p1_raw.csv
Shape: 208 countries × 26 columns

Columns: ['country of birth', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']
First 10 rows:


Unnamed: 0,country of birth,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,832,950,952,929,851,577,1592,816,971,...,2974,3607,9297,8093,6845,2270,2022,2732,2546,1436
1,Albania,62,45,70,65,71,103,170,97,110,...,498,676,1085,1331,1445,911,759,1050,889,626
2,Algeria,92,89,90,88,100,135,184,164,160,...,150,183,168,169,167,101,100,128,138,161
3,Andorra,0,0,0,0,0,0,0,0,0,...,1,0,0,1,1,2,0,2,1,1
4,Angola,23,34,41,38,33,33,97,35,25,...,17,32,24,31,29,22,19,17,29,18
5,Anguilla,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Antigua and Barbuda,0,0,1,1,0,1,0,0,1,...,0,0,1,0,0,0,1,1,0,1
7,Argentina,83,130,169,148,95,116,123,120,147,...,135,144,138,234,327,299,396,361,294,251
8,Armenia,76,74,90,82,94,125,279,149,119,...,235,170,197,189,151,111,154,88,94,170
9,Australia,357,336,313,321,322,295,368,367,397,...,501,527,537,553,427,319,303,401,373,376


In [10]:
# Rename the first column for easier handling
df = df_raw.copy()
df = df.rename(columns={df.columns[0]: 'country'})

# List all countries in the dataset
print(f"Total countries: {len(df)}")
print(f"\nAll countries:")
print(df['country'].tolist())

# Check data types and missing values
print("Data types:")
print(df.dtypes)

print(f"\nMissing values: {df.isnull().sum().sum()}")

Total countries: 208

All countries:
['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Congo, the Republic of the', 'Costa Rica', 'Cote d´Ivoire', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Czechoslovakia', 'Democratic Republic of the Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 'Gambia', 'Gaza Strip', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', '

In [11]:
# Quick look at total immigration per year (before grouping)
year_cols = [col for col in df.columns if col.isdigit()]
print(f"Year columns found: {year_cols}")

print("\nTotal immigration by year (all countries):")
for year in ['2000', '2010', '2015', '2016', '2020', '2024']:
    if year in df.columns:
        total = df[year].sum()
        print(f"  {year}: {total:,}")

Year columns found: ['2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']

Total immigration by year (all countries):
  2000: 58,659
  2010: 98,801
  2015: 134,240
  2016: 163,005
  2020: 82,518
  2024: 116,197


| Region | Description |
|--------|-------------|
| Sweden | Returning Swedish-born residents |
| Other Nordic | Denmark, Norway, Finland, Iceland |
| EU (excl. Nordic) | Other EU member states |
| European non-EU | Ukraine, Russia, Balkans, Switzerland, etc. |
| Middle East & North Africa | Syria, Iraq, Iran, Afghanistan, Turkey, etc. |
| Sub-Saharan Africa | Eritrea, Somalia, Ethiopia, etc. |
| Asia | India, China, Thailand, Pakistan, etc. |
| Americas & Oceania | USA, South America, Australia, etc. |

In [12]:
# Complete region mapping dictionary
# Maps each country name (as it appears in SCB data) to a region

region_mapping = {
    # ========== SWEDEN ==========
    'Sweden': 'Sweden',

    # ========== OTHER NORDIC ==========
    'Denmark': 'Other Nordic',
    'Norway': 'Other Nordic',
    'Finland': 'Other Nordic',
    'Iceland': 'Other Nordic',

    # ========== EU (EXCL. NORDIC) ==========
    # Western Europe
    'Germany': 'EU (excl. Nordic)',
    'United Kingdom': 'EU (excl. Nordic)',
    'France': 'EU (excl. Nordic)',
    'Netherlands': 'EU (excl. Nordic)',
    'Kingdom of the Netherlands ': 'EU (excl. Nordic)',  # Note: trailing space in SCB data
    'Belgium': 'EU (excl. Nordic)',
    'Austria': 'EU (excl. Nordic)',
    'Ireland': 'EU (excl. Nordic)',
    'Luxembourg': 'EU (excl. Nordic)',

    # Southern Europe
    'Spain': 'EU (excl. Nordic)',
    'Italy': 'EU (excl. Nordic)',
    'Portugal': 'EU (excl. Nordic)',
    'Greece': 'EU (excl. Nordic)',
    'Malta': 'EU (excl. Nordic)',
    'Cyprus': 'EU (excl. Nordic)',

    # Eastern Europe (EU members)
    'Poland': 'EU (excl. Nordic)',
    'Romania': 'EU (excl. Nordic)',
    'Bulgaria': 'EU (excl. Nordic)',
    'Hungary': 'EU (excl. Nordic)',
    'Czech Republic': 'EU (excl. Nordic)',
    'Czechoslovakia': 'EU (excl. Nordic)',  # Historical
    'Slovakia': 'EU (excl. Nordic)',
    'Slovenia': 'EU (excl. Nordic)',
    'Croatia': 'EU (excl. Nordic)',
    'Estonia': 'EU (excl. Nordic)',
    'Latvia': 'EU (excl. Nordic)',
    'Lithuania': 'EU (excl. Nordic)',

    # Microstates
    'Liechtenstein': 'EU (excl. Nordic)',
    'Monaco': 'EU (excl. Nordic)',
    'Andorra': 'EU (excl. Nordic)',
    'San Marino': 'EU (excl. Nordic)',
    'Gibraltar': 'EU (excl. Nordic)',

    # ========== EUROPEAN NON-EU ==========
    'Ukraine': 'European non-EU',
    'Russian Federation': 'European non-EU',
    'Belarus': 'European non-EU',
    'Moldova': 'European non-EU',
    'Moldova, Republic of': 'European non-EU',

    # Balkans
    'Serbia': 'European non-EU',
    'Serbia and Montenegro': 'European non-EU',  # Historical
    'Bosnia and Herzegovina': 'European non-EU',
    'Kosovo': 'European non-EU',
    'North Macedonia': 'European non-EU',
    'Albania': 'European non-EU',
    'Montenegro': 'European non-EU',
    'Yugoslavia': 'European non-EU',  # Historical

    # Caucasus
    'Georgia': 'European non-EU',
    'Armenia': 'European non-EU',
    'Azerbaijan': 'European non-EU',

    # Other
    'Switzerland': 'European non-EU',
    'Soviet Union': 'European non-EU',  # Historical

    # ========== MIDDLE EAST & NORTH AFRICA ==========
    # Levant & Iraq
    'Syrian Arab Republic': 'Middle East & North Africa',
    'Iraq': 'Middle East & North Africa',
    'Lebanon': 'Middle East & North Africa',
    'Jordan': 'Middle East & North Africa',
    'Palestinian territory, occupied': 'Middle East & North Africa',
    'Gaza Strip': 'Middle East & North Africa',
    'Israel': 'Middle East & North Africa',

    # Iran & Afghanistan
    'Iran, Islamic Republic of': 'Middle East & North Africa',
    'Iran (Islamic Republic of)': 'Middle East & North Africa',
    'Afghanistan': 'Middle East & North Africa',

    # Turkey
    'Türkiye': 'Middle East & North Africa',

    # Arabian Peninsula
    'Saudi Arabia': 'Middle East & North Africa',
    'United Arab Emirates': 'Middle East & North Africa',
    'Kuwait': 'Middle East & North Africa',
    'Qatar': 'Middle East & North Africa',
    'Bahrain': 'Middle East & North Africa',
    'Oman': 'Middle East & North Africa',
    'Yemen': 'Middle East & North Africa',

    # North Africa
    'Egypt': 'Middle East & North Africa',
    'Morocco': 'Middle East & North Africa',
    'Algeria': 'Middle East & North Africa',
    'Tunisia': 'Middle East & North Africa',
    'Libya': 'Middle East & North Africa',
    'Libyan Arab Jamahiriya': 'Middle East & North Africa',  # Historical name

    # ========== SUB-SAHARAN AFRICA ==========
    # Horn of Africa
    'Eritrea': 'Sub-Saharan Africa',
    'Somalia': 'Sub-Saharan Africa',
    'Ethiopia': 'Sub-Saharan Africa',
    'Djibouti': 'Sub-Saharan Africa',

    # East Africa
    'Kenya': 'Sub-Saharan Africa',
    'Uganda': 'Sub-Saharan Africa',
    'Tanzania, United Republic of': 'Sub-Saharan Africa',
    'Rwanda': 'Sub-Saharan Africa',
    'Burundi': 'Sub-Saharan Africa',

    # Central Africa
    'Democratic Republic of the Congo': 'Sub-Saharan Africa',
    'Congo, the Republic of the': 'Sub-Saharan Africa',
    'Cameroon': 'Sub-Saharan Africa',
    'Central African Republic': 'Sub-Saharan Africa',
    'Chad': 'Sub-Saharan Africa',
    'Gabon': 'Sub-Saharan Africa',
    'Equatorial Guinea': 'Sub-Saharan Africa',

    # West Africa
    'Nigeria': 'Sub-Saharan Africa',
    'Ghana': 'Sub-Saharan Africa',
    'Gambia': 'Sub-Saharan Africa',
    'Senegal': 'Sub-Saharan Africa',
    'Mali': 'Sub-Saharan Africa',
    'Guinea': 'Sub-Saharan Africa',
    'Guinea-Bissau': 'Sub-Saharan Africa',
    'Sierra Leone': 'Sub-Saharan Africa',
    'Liberia': 'Sub-Saharan Africa',
    "Cote d´Ivoire": 'Sub-Saharan Africa',
    'Burkina Faso': 'Sub-Saharan Africa',
    'Niger': 'Sub-Saharan Africa',
    'Benin': 'Sub-Saharan Africa',
    'Togo': 'Sub-Saharan Africa',
    'Mauritania': 'Sub-Saharan Africa',
    'Cape Verde': 'Sub-Saharan Africa',

    # Southern Africa
    'South Africa': 'Sub-Saharan Africa',
    'Zimbabwe': 'Sub-Saharan Africa',
    'Zambia': 'Sub-Saharan Africa',
    'Mozambique': 'Sub-Saharan Africa',
    'Moçambique': 'Sub-Saharan Africa',
    'Angola': 'Sub-Saharan Africa',
    'Namibia': 'Sub-Saharan Africa',
    'Botswana': 'Sub-Saharan Africa',
    'Eswatini': 'Sub-Saharan Africa',
    'Lesotho': 'Sub-Saharan Africa',
    'Malawi': 'Sub-Saharan Africa',
    'Madagascar': 'Sub-Saharan Africa',
    'Mauritius': 'Sub-Saharan Africa',
    'Comoros': 'Sub-Saharan Africa',
    'Seychelles': 'Sub-Saharan Africa',
    'Sao Tome and Principe': 'Sub-Saharan Africa',

    # Sudan
    'Sudan': 'Sub-Saharan Africa',
    'South Sudan': 'Sub-Saharan Africa',

    # ========== ASIA ==========
    # South Asia
    'India': 'Asia',
    'Pakistan': 'Asia',
    'Bangladesh': 'Asia',
    'Sri Lanka': 'Asia',
    'Nepal': 'Asia',
    'Bhutan': 'Asia',
    'Maldives': 'Asia',

    # East Asia
    'China': 'Asia',
    'Japan': 'Asia',
    'Korea, Republic of': 'Asia',
    'Korea, Republic of Korea': 'Asia',
    'Korea, Democratic People´s Republic of': 'Asia',
    "Korea, Dem. People´s Republic of": 'Asia',
    'Mongolia': 'Asia',
    'Taiwan': 'Asia',
    'Hong Kong': 'Asia',

    # Southeast Asia
    'Thailand': 'Asia',
    'Vietnam': 'Asia',
    'Viet Nam': 'Asia',
    'Philippines': 'Asia',
    'Indonesia': 'Asia',
    'Malaysia': 'Asia',
    'Singapore': 'Asia',
    'Myanmar': 'Asia',
    'Cambodia': 'Asia',
    'Laos': 'Asia',
    "Lao People's Democratic Republic": 'Asia',
    "Lao People´s Democratic Republic": 'Asia',
    'Brunei': 'Asia',
    'Timor-Leste': 'Asia',

    # Central Asia
    'Kazakhstan': 'Asia',
    'Uzbekistan': 'Asia',
    'Kyrgyzstan': 'Asia',
    'Tadzjikistan': 'Asia',
    'Turkmenistan': 'Asia',

    # ========== AMERICAS & OCEANIA ==========
    # North America
    'United States of America': 'Americas & Oceania',
    'Canada': 'Americas & Oceania',
    'Mexico': 'Americas & Oceania',

    # Central America
    'Guatemala': 'Americas & Oceania',
    'Honduras': 'Americas & Oceania',
    'El Salvador': 'Americas & Oceania',
    'Nicaragua': 'Americas & Oceania',
    'Costa Rica': 'Americas & Oceania',
    'Panama': 'Americas & Oceania',
    'Belize': 'Americas & Oceania',

    # Caribbean
    'Cuba': 'Americas & Oceania',
    'Dominican Republic': 'Americas & Oceania',
    'Jamaica': 'Americas & Oceania',
    'Haiti': 'Americas & Oceania',
    'Trinidad and Tobago': 'Americas & Oceania',
    'Bahamas': 'Americas & Oceania',
    'Barbados': 'Americas & Oceania',
    'Grenada': 'Americas & Oceania',
    'Saint Lucia': 'Americas & Oceania',
    'Saint Vincent and the Grenadines': 'Americas & Oceania',
    'Saint Kitts and Nevis': 'Americas & Oceania',
    'Antigua and Barbuda': 'Americas & Oceania',
    'Dominica': 'Americas & Oceania',
    'Anguilla': 'Americas & Oceania',
    'Bermuda': 'Americas & Oceania',
    'British Virgin Islands': 'Americas & Oceania',

    # South America
    'Brazil': 'Americas & Oceania',
    'Argentina': 'Americas & Oceania',
    'Chile': 'Americas & Oceania',
    'Colombia': 'Americas & Oceania',
    'Peru': 'Americas & Oceania',
    'Venezuela': 'Americas & Oceania',
    'Ecuador': 'Americas & Oceania',
    'Bolivia': 'Americas & Oceania',
    'Uruguay': 'Americas & Oceania',
    'Paraguay': 'Americas & Oceania',
    'Guyana': 'Americas & Oceania',
    'Suriname': 'Americas & Oceania',

    # Oceania
    'Australia': 'Americas & Oceania',
    'New Zealand': 'Americas & Oceania',
    'Fiji': 'Americas & Oceania',
    'Papua New Guinea': 'Americas & Oceania',
    'Samoa': 'Americas & Oceania',
    'Tonga': 'Americas & Oceania',
    'Vanuatu': 'Americas & Oceania',
    'Solomon Islands': 'Americas & Oceania',
    'Kiribati': 'Americas & Oceania',
    'Marshall Islands': 'Americas & Oceania',
    'Micronesia': 'Americas & Oceania',
    'Palau': 'Americas & Oceania',
    'Nauru': 'Americas & Oceania',
    'Tuvalu': 'Americas & Oceania',
}

print(f"Defined mappings for {len(region_mapping)} countries")
print(f"Regions: {sorted(set(region_mapping.values()))}")

Defined mappings for 216 countries
Regions: ['Americas & Oceania', 'Asia', 'EU (excl. Nordic)', 'European non-EU', 'Middle East & North Africa', 'Other Nordic', 'Sub-Saharan Africa', 'Sweden']


In [13]:
# Function to map country to region
def get_region(country):
    """Map a country name to its region. Returns 'Other' if not found."""
    if country in region_mapping:
        return region_mapping[country]
    return 'Other'

# Apply mapping
df['region'] = df['country'].apply(get_region)

# Check results
print("Region counts:")
print(df['region'].value_counts())

Region counts:
region
Americas & Oceania            52
Sub-Saharan Africa            49
EU (excl. Nordic)             31
Asia                          30
Middle East & North Africa    22
European non-EU               17
Other Nordic                   4
Other                          2
Sweden                         1
Name: count, dtype: int64


In [14]:
# Identify any unmapped countries
unmapped = df[df['region'] == 'Other']['country'].tolist()

if unmapped:
    print(f"Unmapped countries ({len(unmapped)}):")
    for c in unmapped:
        print(f"  - {c}")
else:
    print("All countries mapped successfully!")

Unmapped countries (2):
  - Vatican City
  - unknown country of birth


In [15]:
# Preview the mapping results
print("Sample of mapped data:")
df[['country', 'region', '2000', '2016', '2024']].head(20)

Sample of mapped data:


Unnamed: 0,country,region,2000,2016,2024
0,Afghanistan,Middle East & North Africa,832,3607,1436
1,Albania,European non-EU,62,676,626
2,Algeria,Middle East & North Africa,92,183,161
3,Andorra,EU (excl. Nordic),0,0,1
4,Angola,Sub-Saharan Africa,23,32,18
5,Anguilla,Americas & Oceania,0,0,0
6,Antigua and Barbuda,Americas & Oceania,0,0,1
7,Argentina,Americas & Oceania,83,144,251
8,Armenia,European non-EU,76,170,170
9,Australia,Americas & Oceania,357,527,376


In [16]:
# Get year columns (2000-2024)
year_cols = [str(y) for y in range(2000, 2025)]
year_cols = [c for c in year_cols if c in df.columns]  # Only include columns that exist

print(f"Year columns: {year_cols}")

Year columns: ['2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']


In [17]:
# Aggregate immigration totals by region
df_agg = df.groupby('region')[year_cols].sum().reset_index()

print("Immigration by region (wide format):")
df_agg

Immigration by region (wide format):


Unnamed: 0,region,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Americas & Oceania,4265,4339,4315,4279,4298,4382,5287,4964,5471,...,5456,5542,6329,6724,6729,5233,6371,6669,6055,5984
1,Asia,4610,4931,5707,7085,7662,9160,10503,11451,13228,...,12794,13938,17886,21467,22002,13877,17289,20570,19894,19013
2,EU (excl. Nordic),8247,8684,9119,8514,10512,12098,18257,23918,23962,...,24434,24757,24495,23062,20601,17001,19751,26713,24705,21143
3,European non-EU,5820,5328,5459,5615,5385,5256,9483,5458,5784,...,7325,7852,9278,9851,9517,7397,7347,8502,6745,34083
4,Middle East & North Africa,11329,11955,13207,12025,9839,9419,22232,23575,21895,...,47329,72044,51828,39747,27763,15483,16251,18141,15672,13268
5,Other,119,57,57,80,77,52,97,104,104,...,175,210,215,199,132,71,184,66,49,24
6,Other Nordic,8604,9223,10076,9961,8847,9096,9765,9581,8439,...,7333,7541,7208,6597,5981,5069,4961,5084,5037,4805
7,Sub-Saharan Africa,2183,2481,2881,3648,3941,4700,7305,8094,8900,...,14814,15803,12822,12150,11125,6727,7997,6822,5764,5970
8,Sweden,13482,13797,13266,12588,11467,11066,12821,12340,13388,...,14580,15318,14428,12805,11955,11660,10480,9869,10593,11907


In [18]:
# Convert to long format for Vega-Lite
df_long = df_agg.melt(
    id_vars=['region'],
    value_vars=year_cols,
    var_name='year',
    value_name='immigrants'
)

# Convert year to integer
df_long['year'] = df_long['year'].astype(int)

# Sort by year and region
df_long = df_long.sort_values(['year', 'region']).reset_index(drop=True)

print(f"Long format: {len(df_long)} rows")
df_long.head(20)

Long format: 225 rows


Unnamed: 0,region,year,immigrants
0,Americas & Oceania,2000,4265
1,Asia,2000,4610
2,EU (excl. Nordic),2000,8247
3,European non-EU,2000,5820
4,Middle East & North Africa,2000,11329
5,Other,2000,119
6,Other Nordic,2000,8604
7,Sub-Saharan Africa,2000,2183
8,Sweden,2000,13482
9,Americas & Oceania,2001,4339


In [19]:
# Verify totals match original data
print("Validation: Total immigration by year")
print("="*50)

for year in [2000, 2010, 2015, 2016, 2020, 2024]:
    original_total = df[str(year)].sum()
    grouped_total = df_long[df_long['year'] == year]['immigrants'].sum()
    match = "✓" if original_total == grouped_total else "✗"
    print(f"{year}: Original={original_total:,} | Grouped={grouped_total:,} {match}")

Validation: Total immigration by year
2000: Original=58,659 | Grouped=58,659 ✓
2010: Original=98,801 | Grouped=98,801 ✓
2015: Original=134,240 | Grouped=134,240 ✓
2016: Original=163,005 | Grouped=163,005 ✓
2020: Original=82,518 | Grouped=82,518 ✓
2024: Original=116,197 | Grouped=116,197 ✓


In [20]:
# Key statistics for the chart
print("\nKey findings:")
print("="*50)

# Peak year
yearly_totals = df_long.groupby('year')['immigrants'].sum()
peak_year = yearly_totals.idxmax()
peak_value = yearly_totals.max()
print(f"Peak immigration: {peak_year} ({peak_value:,} immigrants)")

# 2016 breakdown (Syrian crisis)
print(f"\n2016 breakdown (Syrian refugee crisis):")
df_2016 = df_long[df_long['year'] == 2016].sort_values('immigrants', ascending=False)
for _, row in df_2016.iterrows():
    if row['region'] != 'Other':
        pct = row['immigrants'] / df_2016['immigrants'].sum() * 100
        print(f"  {row['region']}: {row['immigrants']:,} ({pct:.1f}%)")

# 2024 breakdown (Ukraine)
print(f"\n2024 breakdown (Ukraine war):")
df_2024 = df_long[df_long['year'] == 2024].sort_values('immigrants', ascending=False)
for _, row in df_2024.iterrows():
    if row['region'] != 'Other':
        pct = row['immigrants'] / df_2024['immigrants'].sum() * 100
        print(f"  {row['region']}: {row['immigrants']:,} ({pct:.1f}%)")


Key findings:
Peak immigration: 2016 (163,005 immigrants)

2016 breakdown (Syrian refugee crisis):
  Middle East & North Africa: 72,044 (44.2%)
  EU (excl. Nordic): 24,757 (15.2%)
  Sub-Saharan Africa: 15,803 (9.7%)
  Sweden: 15,318 (9.4%)
  Asia: 13,938 (8.6%)
  European non-EU: 7,852 (4.8%)
  Other Nordic: 7,541 (4.6%)
  Americas & Oceania: 5,542 (3.4%)

2024 breakdown (Ukraine war):
  European non-EU: 34,083 (29.3%)
  EU (excl. Nordic): 21,143 (18.2%)
  Asia: 19,013 (16.4%)
  Middle East & North Africa: 13,268 (11.4%)
  Sweden: 11,907 (10.2%)
  Americas & Oceania: 5,984 (5.1%)
  Sub-Saharan Africa: 5,970 (5.1%)
  Other Nordic: 4,805 (4.1%)


In [21]:
# Export to JSON for Vega-Lite
output_filename = 'p1_data.json'

# Convert to list of dictionaries
chart_data = df_long.to_dict(orient='records')

# Save to file
with open(output_filename, 'w') as f:
    json.dump(chart_data, f, indent=2)

print(f"Saved: {output_filename}")
print(f"Records: {len(chart_data)}")

# Preview the JSON structure
print("\nJSON structure (first 3 records):")
print(json.dumps(chart_data[:3], indent=2))

Saved: p1_data.json
Records: 225

JSON structure (first 3 records):
[
  {
    "region": "Americas & Oceania",
    "year": 2000,
    "immigrants": 4265
  },
  {
    "region": "Asia",
    "year": 2000,
    "immigrants": 4610
  },
  {
    "region": "EU (excl. Nordic)",
    "year": 2000,
    "immigrants": 8247
  }
]


In [22]:
# Download the JSON file
from google.colab import files
files.download(output_filename)

print(f"\n✓ Downloaded: {output_filename}")
print("\nNext steps:")
print("1. Upload to GitHub: project/data/chart1_immigration_by_region.json")
print("2. Test in Vega Editor: https://vega.github.io/editor/")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


✓ Downloaded: p1_data.json

Next steps:
1. Upload to GitHub: project/data/chart1_immigration_by_region.json
2. Test in Vega Editor: https://vega.github.io/editor/


---
## Summary

### Data Pipeline
```
SCB Table BE0101M3 (208 countries × 25 years)
    ↓
Loaded from GitHub: kelvinchwng.github.io/project/data/p1_raw.csv
    ↓
Region mapping (8 regions)
    ↓
Aggregation by region and year
    ↓
JSON export (200 records)
```

### Output Schema
```json
[
  {"region": "Sweden", "year": 2000, "immigrants": 13482},
  {"region": "Other Nordic", "year": 2000, "immigrants": 8604},
  ...
]
```

### Key Findings
- **Peak year:** 2016 (163,005 immigrants) — Syrian refugee crisis
- **COVID impact:** 2020 saw 82,518 immigrants (lowest since 2004)
- **Ukraine surge:** 2024 European non-EU jumped to 34,083 (vs 7,852 in 2016)