# Chart 3: Employment Rate by Migration Category & Years in Sweden

## Data Source
**Statistics Sweden (SCB)** — Integration Statistics (Register-based)

| Field | Value |
|-------|-------|
| Table | Number in population by employment status, grounds for settlement, number of years in Sweden, region of birth, sex, age and year |
| URL | https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__AA__AA0003__AA0003B/ |
| Coverage | 2020–2023 (annual) |
| Unit | Number of persons → converted to employment rate (%) |

## How to Replicate (Download from SCB)
1. Go to: https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__AA__AA0003__AA0003B/
2. Select table: **Persons aged 20-74 by occupation, grounds for settlement, number of years in Sweden, region of birth, sex and age**
3. Choose variables:
   - **employment status**: select `total` and `gainfully employed population`
   - **grounds for settlement**: select ALL
   - **number of years in Sweden**: select ALL
   - **region of birth**: select `total`
   - **sex**: select `total`
   - **age**: select `total`
   - **year**: select ALL (2020-2023)
4. Click **Continue**
5. Download as: **CSV with heading**
6. Save as `p3_raw.csv`

## Output
Heat map data showing employment rate by:
- **Rows**: Migration category (Labour, Refugee, Family, Other)
- **Columns**: Years in Sweden (0-3, 4-9, 10+)
- **Color**: Employment rate (%)

In [10]:
import pandas as pd
import json

print("Libraries loaded successfully")

Libraries loaded successfully


In [11]:
# Load data directly from GitHub, or replicate using instructions above
DATA_URL = "https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p3_raw.csv"

# Skip the title row (row 0), use row 1 as header
df_raw = pd.read_csv(DATA_URL, encoding='utf-8', skiprows=1)

print(f"Loaded from: {DATA_URL}")
print(f"Shape: {df_raw.shape}")
print(f"\nColumns: {df_raw.columns.tolist()}")

# Preview the data
df_raw.head(10)

Loaded from: https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p3_raw.csv
Shape: (6804, 10)

Columns: ['employment status', 'grounds for settlement', 'number of years in Sweden', 'region of birth', 'sex', 'age', '2020', '2021', '2022', '2023']


Unnamed: 0,employment status,grounds for settlement,number of years in Sweden,region of birth,sex,age,2020,2021,2022,2023
0,total,total,total,total,total,total,6964186,6979822,7009602,7019050
1,total,total,total,total,total,20-24 years,579464,578214,585492,591869
2,total,total,total,total,total,25-29 years,717758,692315,666652,642641
3,total,total,total,total,total,30-39 years,1398677,1437787,1475318,1496129
4,total,total,total,total,total,40-49 years,1301464,1299227,1300728,1299753
5,total,total,total,total,total,50-59 years,1310242,1326134,1339202,1343104
6,total,total,total,total,total,60-64 years,569227,569806,575089,587195
7,total,total,total,total,total,65-69 years,536735,540223,542741,543719
8,total,total,total,total,total,70-74 years,550616,536117,524385,514636
9,total,total,total,Sweden,total,total,5301126,5273091,5250604,5232057


In [12]:
# Check unique values for key columns
print("Employment status values:")
print(df_raw['employment status'].unique())

print("\nGrounds for settlement values:")
print(df_raw['grounds for settlement'].unique())

print("\nNumber of years in Sweden values:")
print(df_raw['number of years in Sweden'].unique())

Employment status values:
['total' 'not gainfully employed population'
 'gainfully employed population']

Grounds for settlement values:
['total' 'relatives - not refugees' 'labour market/students'
 'refugees/persons in need of subsidiary protection/relatives '
 'other/GFS not relevant' 'data not available']

Number of years in Sweden values:
['total' 'born in Sweden' '0-3 years' '4-9 years' '10- years'
 'data not available']


In [13]:
# Filter for what we need:
# - Total population and employed population
# - Specific grounds for settlement (not 'total' or 'data not available')
# - Specific years in Sweden (not 'total' or 'born in Sweden' or 'data not available')
# - Total for region, sex, age

# Define the categories we want
settlement_categories = [
    'labour market/students',
    'refugees/persons in need of subsidiary protection/relatives ',  # Note trailing space
    'relatives - not refugees',
    'other/GFS not relevant'
]

years_categories = ['0-3 years', '4-9 years', '10- years']

# Filter the dataframe
df_filtered = df_raw[
    (df_raw['grounds for settlement'].isin(settlement_categories)) &
    (df_raw['number of years in Sweden'].isin(years_categories)) &
    (df_raw['region of birth'] == 'total') &
    (df_raw['sex'] == 'total') &
    (df_raw['age'] == 'total')
].copy()

print(f"Filtered shape: {df_filtered.shape}")
df_filtered.head(20)

Filtered shape: (36, 10)


Unnamed: 0,employment status,grounds for settlement,number of years in Sweden,region of birth,sex,age,2020,2021,2022,2023
504,total,relatives - not refugees,0-3 years,total,total,total,56858,55896,55123,53106
567,total,relatives - not refugees,4-9 years,total,total,total,93221,90189,86557,84961
630,total,relatives - not refugees,10- years,total,total,total,168535,186204,205647,220682
882,total,labour market/students,0-3 years,total,total,total,49170,50897,52912,50652
945,total,labour market/students,4-9 years,total,total,total,43174,42450,45388,45730
1008,total,labour market/students,10- years,total,total,total,43987,51600,58714,63028
1260,total,refugees/persons in need of subsidiary protect...,0-3 years,total,total,total,75144,52727,37164,26843
1323,total,refugees/persons in need of subsidiary protect...,4-9 years,total,total,total,179930,204601,216923,209047
1386,total,refugees/persons in need of subsidiary protect...,10- years,total,total,total,352337,364291,379500,404979
1638,total,other/GFS not relevant,0-3 years,total,total,total,10987,15408,14675,12092


In [14]:
# Reshape: we need to calculate employment rate = employed / total * 100
# First, let's pivot to get total and employed side by side

# Melt the year columns into rows
year_cols = ['2020', '2021', '2022', '2023']
df_melted = df_filtered.melt(
    id_vars=['employment status', 'grounds for settlement', 'number of years in Sweden'],
    value_vars=year_cols,
    var_name='year',
    value_name='count'
)

print(f"Melted shape: {df_melted.shape}")
df_melted.head(10)

Melted shape: (144, 5)


Unnamed: 0,employment status,grounds for settlement,number of years in Sweden,year,count
0,total,relatives - not refugees,0-3 years,2020,56858
1,total,relatives - not refugees,4-9 years,2020,93221
2,total,relatives - not refugees,10- years,2020,168535
3,total,labour market/students,0-3 years,2020,49170
4,total,labour market/students,4-9 years,2020,43174
5,total,labour market/students,10- years,2020,43987
6,total,refugees/persons in need of subsidiary protect...,0-3 years,2020,75144
7,total,refugees/persons in need of subsidiary protect...,4-9 years,2020,179930
8,total,refugees/persons in need of subsidiary protect...,10- years,2020,352337
9,total,other/GFS not relevant,0-3 years,2020,10987


In [15]:
# Pivot to get 'total' and 'gainfully employed population' as columns
df_pivot = df_melted.pivot_table(
    index=['grounds for settlement', 'number of years in Sweden', 'year'],
    columns='employment status',
    values='count',
    aggfunc='first'
).reset_index()

print(f"Pivot shape: {df_pivot.shape}")
print(f"Columns: {df_pivot.columns.tolist()}")
df_pivot.head(10)

Pivot shape: (48, 6)
Columns: ['grounds for settlement', 'number of years in Sweden', 'year', 'gainfully employed population', 'not gainfully employed population', 'total']


employment status,grounds for settlement,number of years in Sweden,year,gainfully employed population,not gainfully employed population,total
0,labour market/students,0-3 years,2020,32338,16830,49170
1,labour market/students,0-3 years,2021,34138,16760,50897
2,labour market/students,0-3 years,2022,35999,16914,52912
3,labour market/students,0-3 years,2023,36475,14176,50652
4,labour market/students,10- years,2020,32899,11086,43987
5,labour market/students,10- years,2021,38802,12799,51600
6,labour market/students,10- years,2022,44413,14304,58714
7,labour market/students,10- years,2023,48488,14540,63028
8,labour market/students,4-9 years,2020,30899,12274,43174
9,labour market/students,4-9 years,2021,30668,11780,42450


In [16]:
# Calculate employment rate
df_pivot['employment_rate'] = (df_pivot['gainfully employed population'] / df_pivot['total'] * 100).round(1)

# Clean up column names
df_pivot = df_pivot.rename(columns={
    'grounds for settlement': 'category',
    'number of years in Sweden': 'years_in_sweden'
})

# Clean up category names for display
category_mapping = {
    'labour market/students': 'Labour migrants',
    'refugees/persons in need of subsidiary protection/relatives ': 'Refugees & relatives',
    'relatives - not refugees': 'Family reunion',
    'other/GFS not relevant': 'Other'
}

years_mapping = {
    '0-3 years': '0-3 yrs',
    '4-9 years': '4-9 yrs',
    '10- years': '10+ yrs'
}

df_pivot['category'] = df_pivot['category'].map(category_mapping)
df_pivot['years_in_sweden'] = df_pivot['years_in_sweden'].map(years_mapping)

df_pivot.head(20)

employment status,category,years_in_sweden,year,gainfully employed population,not gainfully employed population,total,employment_rate
0,Labour migrants,0-3 yrs,2020,32338,16830,49170,65.8
1,Labour migrants,0-3 yrs,2021,34138,16760,50897,67.1
2,Labour migrants,0-3 yrs,2022,35999,16914,52912,68.0
3,Labour migrants,0-3 yrs,2023,36475,14176,50652,72.0
4,Labour migrants,10+ yrs,2020,32899,11086,43987,74.8
5,Labour migrants,10+ yrs,2021,38802,12799,51600,75.2
6,Labour migrants,10+ yrs,2022,44413,14304,58714,75.6
7,Labour migrants,10+ yrs,2023,48488,14540,63028,76.9
8,Labour migrants,4-9 yrs,2020,30899,12274,43174,71.6
9,Labour migrants,4-9 yrs,2021,30668,11780,42450,72.2


In [17]:
# For the heat map, we'll use the most recent year (2023)

df_heatmap = df_pivot[['category', 'years_in_sweden', 'year', 'employment_rate']].copy()
df_heatmap['year'] = df_heatmap['year'].astype(int)

print(f"Final shape: {df_heatmap.shape}")
print(f"\nYear range: {df_heatmap['year'].min()} - {df_heatmap['year'].max()}")
print(f"\nCategories: {df_heatmap['category'].unique().tolist()}")
print(f"\nYears in Sweden: {df_heatmap['years_in_sweden'].unique().tolist()}")

df_heatmap

Final shape: (48, 4)

Year range: 2020 - 2023

Categories: ['Labour migrants', 'Other', 'Refugees & relatives', 'Family reunion']

Years in Sweden: ['0-3 yrs', '10+ yrs', '4-9 yrs']


employment status,category,years_in_sweden,year,employment_rate
0,Labour migrants,0-3 yrs,2020,65.8
1,Labour migrants,0-3 yrs,2021,67.1
2,Labour migrants,0-3 yrs,2022,68.0
3,Labour migrants,0-3 yrs,2023,72.0
4,Labour migrants,10+ yrs,2020,74.8
5,Labour migrants,10+ yrs,2021,75.2
6,Labour migrants,10+ yrs,2022,75.6
7,Labour migrants,10+ yrs,2023,76.9
8,Labour migrants,4-9 yrs,2020,71.6
9,Labour migrants,4-9 yrs,2021,72.2


In [18]:
# Preview the 2023 data (what the heat map will show by default)
df_2023 = df_heatmap[df_heatmap['year'] == 2023].pivot(
    index='category',
    columns='years_in_sweden',
    values='employment_rate'
)

# Reorder columns
df_2023 = df_2023[['0-3 yrs', '4-9 yrs', '10+ yrs']]

print("Employment Rate by Category and Years in Sweden (2023):")
print("="*60)
df_2023

Employment Rate by Category and Years in Sweden (2023):


years_in_sweden,0-3 yrs,4-9 yrs,10+ yrs
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Family reunion,41.8,65.0,72.0
Labour migrants,72.0,78.1,76.9
Other,60.6,76.4,77.8
Refugees & relatives,33.5,62.2,67.6


In [19]:
# Export to JSON
output_filename = 'p3_data.json'

# Convert to records format for Vega-Lite
records = df_heatmap.to_dict(orient='records')

with open(output_filename, 'w') as f:
    json.dump(records, f, indent=2)

print(f"Exported {len(records)} records to {output_filename}")
print(f"\nJSON structure:")
print(json.dumps(records[:6], indent=2))

Exported 48 records to p3_data.json

JSON structure:
[
  {
    "category": "Labour migrants",
    "years_in_sweden": "0-3 yrs",
    "year": 2020,
    "employment_rate": 65.8
  },
  {
    "category": "Labour migrants",
    "years_in_sweden": "0-3 yrs",
    "year": 2021,
    "employment_rate": 67.1
  },
  {
    "category": "Labour migrants",
    "years_in_sweden": "0-3 yrs",
    "year": 2022,
    "employment_rate": 68.0
  },
  {
    "category": "Labour migrants",
    "years_in_sweden": "0-3 yrs",
    "year": 2023,
    "employment_rate": 72.0
  },
  {
    "category": "Labour migrants",
    "years_in_sweden": "10+ yrs",
    "year": 2020,
    "employment_rate": 74.8
  },
  {
    "category": "Labour migrants",
    "years_in_sweden": "10+ yrs",
    "year": 2021,
    "employment_rate": 75.2
  }
]


In [20]:
# Download
from google.colab import files
files.download(output_filename)

print(f"\nDownloaded: {output_filename}")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Downloaded: p3_data.json


---
## Summary

### Data Pipeline
```
SCB Integration Statistics (population counts by employment status)
    |
    v
Filter: grounds for settlement × years in Sweden × total (region/sex/age)
    |
    v
Calculate employment rate: (employed / total) × 100
    |
    v
JSON export for Vega-Lite heat map
```

### Output Schema
```json
[
  {"category": "Labour migrants", "years_in_sweden": "0-3 yrs", "year": 2023, "employment_rate": 72.5},
  {"category": "Refugees & relatives", "years_in_sweden": "0-3 yrs", "year": 2023, "employment_rate": 35.2},
  ...
]
```

### Key Findings
- **Labour migrants** have highest employment rates across all durations
- **Refugees** start low but show convergence over time (10+ years)
- Clear gradient visible: employment increases with years in Sweden for all groups