# Chart 2: Employment Rate by Birth Status, Education & Sex (2005–2024)

## Data Source
**Statistics Sweden (SCB)** — Labour Force Survey (AKU)

| Field | Value |
|-------|-------|
| Table | `AM0401Y3` — Population aged 15-74 (LFS) by labour status, born in Sweden/foreign-born, level of education, sex and year |
| URL | https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__AM__AM0401__AM0401R/ |
| Coverage | 2005–2024 (annual) |
| Unit | Population in 1000s → converted to employment rate (%) |

## How to Replicate (Download from SCB)
1. Go to: https://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__AM__AM0401__AM0401R/
2. Select table: **Population aged 15-74 (LFS) by labour status, born in Sweden/foreign-born, level of education, sex and year**
3. Choose variables:
   - **labour status**: select `total` and `employed`
   - **born in Sweden/foreign-born**: select `born in Sweden`, `foreign-born`, `total`
   - **level of education**: select ALL levels
   - **sex**: select `men`, `women`, `total`
   - **year**: select ALL (2005-2024)
4. Click **Continue**
5. Download as: **CSV with heading**
6. Save as `p2_raw.csv`

## Output
Three views for interactive dropdown:
1. **By Birth Status**: Native-born vs Foreign-born
2. **By Education**: Primary vs Secondary vs Tertiary
3. **By Sex**: Men vs Women

In [None]:
import pandas as pd
import json

print("Libraries loaded successfully")

Libraries loaded successfully


In [None]:
# Load data directly from GitHub
# Raw CSV downloaded from SCB Labour Force Survey, hosted on the repository (or follow instructions to replicate)

DATA_URL = "https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p2_raw.csv"

# Skip the title row (row 0), use row 1 as header
df_raw = pd.read_csv(DATA_URL, encoding='latin-1', skiprows=1)

print(f"Loaded from: {DATA_URL}")
print(f"Shape: {df_raw.shape}")
print(f"\nColumns: {df_raw.columns.tolist()}")

# Preview the data
df_raw.head(20)

Loaded from: https://raw.githubusercontent.com/kelvinchwng/kelvinchwng.github.io/main/project/data/p2_raw.csv
Shape: (180, 24)

Columns: ['labour status', 'born in Sweden/foreign-born', 'level of education', 'sex', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']


Unnamed: 0,labour status,born in Sweden/foreign-born,level of education,sex,2005,2006,2007,2008,2009,2010,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,total,born in Sweden,All educational levels,men,2875.9,2904.0,2926.2,2945.8,2958.8,2978.0,...,2988.9,2990.4,2989.8,2965.5,2946.5,2939.6,2922.7,2916.1,2910.7,2908.2
1,total,born in Sweden,All educational levels,women,2791.8,2800.5,2810.5,2821.3,2835.2,2839.8,...,2835.0,2832.1,2824.3,2833.2,2835.0,2821.3,2822.6,2814.9,2808.2,2804.3
2,total,born in Sweden,All educational levels,total,5667.7,5704.5,5736.6,5767.1,5794.0,5817.7,...,5823.8,5822.5,5814.1,5798.7,5781.5,5760.9,5745.3,5731.0,5719.0,5712.4
3,total,born in Sweden,primary and lower secondary education,men,747.2,743.8,733.4,726.0,708.3,697.6,...,569.3,541.6,530.7,504.8,494.1,479.0,460.1,450.0,447.1,449.8
4,total,born in Sweden,primary and lower secondary education,women,639.4,616.9,599.3,598.1,577.9,561.7,...,429.7,418.5,407.4,380.6,372.7,355.7,351.4,333.1,334.4,332.2
5,total,born in Sweden,primary and lower secondary education,total,1386.6,1360.7,1332.7,1324.1,1286.2,1259.2,...,999.1,960.1,938.0,885.4,866.8,834.7,811.5,783.2,781.5,782.0
6,total,born in Sweden,upper secondary education,men,1315.7,1327.5,1335.4,1353.3,1379.0,1380.7,...,1359.8,1356.2,1353.3,1332.1,1305.5,1288.1,1275.9,1249.3,1217.4,1201.5
7,total,born in Sweden,upper secondary education,women,1212.4,1222.0,1228.7,1223.8,1224.5,1216.1,...,1129.7,1104.2,1078.9,1055.0,1016.4,979.6,964.9,944.6,920.8,893.7
8,total,born in Sweden,upper secondary education,total,2528.1,2549.5,2564.1,2577.1,2603.5,2596.8,...,2489.5,2460.4,2432.2,2387.1,2321.9,2267.7,2240.8,2193.9,2138.2,2095.1
9,total,born in Sweden,post secondary education,men,789.7,816.0,836.8,848.8,859.0,893.8,...,1054.2,1088.4,1101.6,1124.4,1140.2,1167.8,1182.7,1213.5,1242.2,1254.6


In [None]:
# Clean column names
df = df_raw.copy()
df.columns = ['labour_status', 'birth_status', 'education', 'sex'] + list(df.columns[4:])

# Identify year columns
year_cols = [c for c in df.columns if str(c).isdigit() or (isinstance(c, str) and c.startswith('20'))]
print(f"Year columns: {year_cols}")

df.head()

Year columns: ['2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']


Unnamed: 0,labour_status,birth_status,education,sex,2005,2006,2007,2008,2009,2010,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,total,born in Sweden,All educational levels,men,2875.9,2904.0,2926.2,2945.8,2958.8,2978.0,...,2988.9,2990.4,2989.8,2965.5,2946.5,2939.6,2922.7,2916.1,2910.7,2908.2
1,total,born in Sweden,All educational levels,women,2791.8,2800.5,2810.5,2821.3,2835.2,2839.8,...,2835.0,2832.1,2824.3,2833.2,2835.0,2821.3,2822.6,2814.9,2808.2,2804.3
2,total,born in Sweden,All educational levels,total,5667.7,5704.5,5736.6,5767.1,5794.0,5817.7,...,5823.8,5822.5,5814.1,5798.7,5781.5,5760.9,5745.3,5731.0,5719.0,5712.4
3,total,born in Sweden,primary and lower secondary education,men,747.2,743.8,733.4,726.0,708.3,697.6,...,569.3,541.6,530.7,504.8,494.1,479.0,460.1,450.0,447.1,449.8
4,total,born in Sweden,primary and lower secondary education,women,639.4,616.9,599.3,598.1,577.9,561.7,...,429.7,418.5,407.4,380.6,372.7,355.7,351.4,333.1,334.4,332.2


In [None]:
# Check unique values
print("Labour status:", df['labour_status'].unique())
print("\nBirth status:", df['birth_status'].unique())
print("\nEducation:", df['education'].unique())
print("\nSex:", df['sex'].unique())

Labour status: ['total' 'unemployed' 'not in the labour force' 'employed']

Birth status: ['born in Sweden' 'foreign-born' 'total']

Education: ['All educational levels' 'primary and lower secondary education'
 'upper secondary education' 'post secondary education'
 'no information about level of educational attainment']

Sex: ['men' 'women' 'total']


In [None]:
# Convert year columns to numeric (handle '..' as NaN)
for col in year_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')

print("Converted to numeric")
df.dtypes

Converted to numeric


Unnamed: 0,0
labour_status,object
birth_status,object
education,object
sex,object
2005,float64
2006,float64
2007,float64
2008,float64
2009,float64
2010,float64


Employment rate = (employed / total population) x 100

In [None]:
# Separate total population and employed
df_total = df[df['labour_status'] == 'total'].copy()
df_employed = df[df['labour_status'] == 'employed'].copy()

print(f"Total population rows: {len(df_total)}")
print(f"Employed rows: {len(df_employed)}")

Total population rows: 45
Employed rows: 45


In [None]:
# Merge to calculate rates
# Join on birth_status, education, sex
df_merged = df_total.merge(
    df_employed,
    on=['birth_status', 'education', 'sex'],
    suffixes=('_pop', '_emp')
)

print(f"Merged rows: {len(df_merged)}")
df_merged.head()

Merged rows: 45


Unnamed: 0,labour_status_pop,birth_status,education,sex,2005_pop,2006_pop,2007_pop,2008_pop,2009_pop,2010_pop,...,2015_emp,2016_emp,2017_emp,2018_emp,2019_emp,2020_emp,2021_emp,2022_emp,2023_emp,2024_emp
0,total,born in Sweden,All educational levels,men,2875.9,2904.0,2926.2,2945.8,2958.8,2978.0,...,2092.1,2093.3,2114.4,2122.5,2108.4,2086.1,2081.3,2103.7,2084.6,2078.5
1,total,born in Sweden,All educational levels,women,2791.8,2800.5,2810.5,2821.3,2835.2,2839.8,...,1880.5,1901.6,1914.0,1936.2,1944.5,1894.6,1892.1,1925.5,1930.2,1921.6
2,total,born in Sweden,All educational levels,total,5667.7,5704.5,5736.6,5767.1,5794.0,5817.7,...,3972.6,3994.9,4028.4,4058.7,4052.9,3980.7,3973.4,4029.2,4014.8,4000.1
3,total,born in Sweden,primary and lower secondary education,men,747.2,743.8,733.4,726.0,708.3,697.6,...,225.9,205.3,201.0,196.2,192.3,178.9,169.8,169.6,159.8,165.5
4,total,born in Sweden,primary and lower secondary education,women,639.4,616.9,599.3,598.1,577.9,561.7,...,122.3,121.9,125.9,119.6,114.4,99.9,97.7,105.9,108.8,103.3


In [None]:
# Calculate employment rate for each year
output_data = []

for _, row in df_merged.iterrows():
    birth_status = row['birth_status']
    education = row['education']
    sex = row['sex']

    for year in year_cols:
        pop_col = f"{year}_pop"
        emp_col = f"{year}_emp"

        if pop_col in row and emp_col in row:
            pop = row[pop_col]
            emp = row[emp_col]

            if pd.notna(pop) and pd.notna(emp) and pop > 0:
                rate = (emp / pop) * 100
                output_data.append({
                    'year': int(year),
                    'birth_status': birth_status,
                    'education': education,
                    'sex': sex,
                    'employment_rate': round(rate, 1)
                })

df_rates = pd.DataFrame(output_data)
print(f"Employment rate records: {len(df_rates)}")
df_rates.head(20)

Employment rate records: 844


Unnamed: 0,year,birth_status,education,sex,employment_rate
0,2005,born in Sweden,All educational levels,men,69.2
1,2006,born in Sweden,All educational levels,men,69.9
2,2007,born in Sweden,All educational levels,men,70.8
3,2008,born in Sweden,All educational levels,men,70.6
4,2009,born in Sweden,All educational levels,men,68.3
5,2010,born in Sweden,All educational levels,men,68.6
6,2011,born in Sweden,All educational levels,men,69.3
7,2012,born in Sweden,All educational levels,men,69.1
8,2013,born in Sweden,All educational levels,men,69.4
9,2014,born in Sweden,All educational levels,men,69.7


In [None]:
# Standardize labels
birth_mapping = {
    'born in Sweden': 'Native-born',
    'foreign-born': 'Foreign-born',
    'total': 'Total'
}

education_mapping = {
    'All educational levels': 'All levels',
    'primary and lower secondary education': 'Primary',
    'upper secondary education': 'Secondary',
    'post secondary education': 'Tertiary',
    'no information about level of educational attainment': 'Unknown'
}

sex_mapping = {
    'men': 'Men',
    'women': 'Women',
    'total': 'Total'
}

df_rates['birth_status'] = df_rates['birth_status'].map(birth_mapping)
df_rates['education'] = df_rates['education'].map(education_mapping)
df_rates['sex'] = df_rates['sex'].map(sex_mapping)

print("Standardized labels")
print("Birth status:", df_rates['birth_status'].unique())
print("Education:", df_rates['education'].unique())
print("Sex:", df_rates['sex'].unique())

Standardized labels
Birth status: ['Native-born' 'Foreign-born' 'Total']
Education: ['All levels' 'Primary' 'Secondary' 'Tertiary' 'Unknown']
Sex: ['Men' 'Women' 'Total']


In [None]:
# VIEW 1: By Birth Status
# Filter: All education levels, Total sex
view1 = df_rates[
    (df_rates['education'] == 'All levels') &
    (df_rates['sex'] == 'Total') &
    (df_rates['birth_status'].isin(['Native-born', 'Foreign-born']))
].copy()

view1['view'] = 'Birth Status'
view1['category'] = view1['birth_status']
view1 = view1[['year', 'view', 'category', 'employment_rate']]

print(f"View 1 (Birth Status): {len(view1)} rows")
view1.head(10)

View 1 (Birth Status): 40 rows


Unnamed: 0,year,view,category,employment_rate
40,2005,Birth Status,Native-born,66.6
41,2006,Birth Status,Native-born,67.3
42,2007,Birth Status,Native-born,68.3
43,2008,Birth Status,Native-born,68.2
44,2009,Birth Status,Native-born,66.1
45,2010,Birth Status,Native-born,66.0
46,2011,Birth Status,Native-born,67.0
47,2012,Birth Status,Native-born,67.0
48,2013,Birth Status,Native-born,67.3
49,2014,Birth Status,Native-born,67.8


In [None]:
# VIEW 2: By Education
# Filter: Total birth status, Total sex
view2 = df_rates[
    (df_rates['birth_status'] == 'Total') &
    (df_rates['sex'] == 'Total') &
    (df_rates['education'].isin(['Primary', 'Secondary', 'Tertiary']))
].copy()

view2['view'] = 'Education'
view2['category'] = view2['education']
view2 = view2[['year', 'view', 'category', 'employment_rate']]

print(f"View 2 (Education): {len(view2)} rows")
view2.head(10)

View 2 (Education): 60 rows


Unnamed: 0,year,view,category,employment_rate
651,2005,Education,Primary,38.4
652,2006,Education,Primary,39.2
653,2007,Education,Primary,38.8
654,2008,Education,Primary,37.7
655,2009,Education,Primary,35.5
656,2010,Education,Primary,34.9
657,2011,Education,Primary,35.2
658,2012,Education,Primary,34.5
659,2013,Education,Primary,34.0
660,2014,Education,Primary,34.4


In [None]:
# VIEW 3: By Sex
# Filter: Total birth status, All education levels
view3 = df_rates[
    (df_rates['birth_status'] == 'Total') &
    (df_rates['education'] == 'All levels') &
    (df_rates['sex'].isin(['Men', 'Women']))
].copy()

view3['view'] = 'Sex'
view3['category'] = view3['sex']
view3 = view3[['year', 'view', 'category', 'employment_rate']]

print(f"View 3 (Sex): {len(view3)} rows")
view3.head(10)

View 3 (Sex): 40 rows


Unnamed: 0,year,view,category,employment_rate
551,2005,Sex,Men,67.8
552,2006,Sex,Men,68.5
553,2007,Sex,Men,69.5
554,2008,Sex,Men,69.5
555,2009,Sex,Men,67.0
556,2010,Sex,Men,67.3
557,2011,Sex,Men,68.0
558,2012,Sex,Men,67.7
559,2013,Sex,Men,68.0
560,2014,Sex,Men,68.5


In [None]:
# Combine all views
df_final = pd.concat([view1, view2, view3], ignore_index=True)
df_final = df_final.sort_values(['view', 'category', 'year']).reset_index(drop=True)

print(f"\nTotal records: {len(df_final)}")
print(f"Views: {df_final['view'].unique()}")
df_final


Total records: 140
Views: ['Birth Status' 'Education' 'Sex']


Unnamed: 0,year,view,category,employment_rate
0,2005,Birth Status,Foreign-born,55.5
1,2006,Birth Status,Foreign-born,55.8
2,2007,Birth Status,Foreign-born,56.9
3,2008,Birth Status,Foreign-born,57.7
4,2009,Birth Status,Foreign-born,55.8
...,...,...,...,...
135,2020,Sex,Women,64.0
136,2021,Sex,Women,64.2
137,2022,Sex,Women,65.9
138,2023,Sex,Women,67.0


In [None]:
# Key statistics
print("KEY FINDINGS")
print("="*60)

# Birth status gap
latest_year = df_final['year'].max()
birth_data = df_final[(df_final['view'] == 'Birth Status') & (df_final['year'] == latest_year)]
native = birth_data[birth_data['category'] == 'Native-born']['employment_rate'].values[0]
foreign = birth_data[birth_data['category'] == 'Foreign-born']['employment_rate'].values[0]
print(f"\n{latest_year} Birth Status Gap:")
print(f"  Native-born: {native}%")
print(f"  Foreign-born: {foreign}%")
print(f"  Gap: {native - foreign:.1f} percentage points")

# Education gap
edu_data = df_final[(df_final['view'] == 'Education') & (df_final['year'] == latest_year)]
print(f"\n{latest_year} Education Levels:")
for _, row in edu_data.iterrows():
    print(f"  {row['category']}: {row['employment_rate']}%")

# Sex gap
sex_data = df_final[(df_final['view'] == 'Sex') & (df_final['year'] == latest_year)]
men = sex_data[sex_data['category'] == 'Men']['employment_rate'].values[0]
women = sex_data[sex_data['category'] == 'Women']['employment_rate'].values[0]
print(f"\n{latest_year} Gender Gap:")
print(f"  Men: {men}%")
print(f"  Women: {women}%")
print(f"  Gap: {men - women:.1f} percentage points")

KEY FINDINGS

2024 Birth Status Gap:
  Native-born: 70.0%
  Foreign-born: 65.8%
  Gap: 4.2 percentage points

2024 Education Levels:
  Primary: 35.9%
  Secondary: 69.1%
  Tertiary: 79.5%

2024 Gender Gap:
  Men: 71.0%
  Women: 66.8%
  Gap: 4.2 percentage points


In [None]:
# Export
output_filename = 'p2_data.json'
chart_data = df_final.to_dict(orient='records')

with open(output_filename, 'w') as f:
    json.dump(chart_data, f, indent=2)

print(f"Saved: {output_filename}")
print(f"Records: {len(chart_data)}")

# Preview
print("\nJSON structure:")
print(json.dumps(chart_data[:6], indent=2))

Saved: p2_data.json
Records: 140

JSON structure:
[
  {
    "year": 2005,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 55.5
  },
  {
    "year": 2006,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 55.8
  },
  {
    "year": 2007,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 56.9
  },
  {
    "year": 2008,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 57.7
  },
  {
    "year": 2009,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 55.8
  },
  {
    "year": 2010,
    "view": "Birth Status",
    "category": "Foreign-born",
    "employment_rate": 55.1
  }
]


In [None]:
# Download
from google.colab import files
files.download(output_filename)

print(f"\nDownloaded: {output_filename}")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Downloaded: p2_data.json


---
## Summary

### Data Pipeline
```
SCB Labour Force Survey (1 CSV with all breakdowns)
    |
    v
Calculate employment rate: (employed / population) x 100
    |
    v
Create 3 views:
  - Birth Status (Native vs Foreign-born)
  - Education (Primary vs Secondary vs Tertiary)
  - Sex (Men vs Women)
    |
    v
JSON export for Vega-Lite interactive chart
```

### Output Schema
```json
[
  {"year": 2024, "view": "Birth Status", "category": "Native-born", "employment_rate": 70.0},
  {"year": 2024, "view": "Birth Status", "category": "Foreign-born", "employment_rate": 65.8},
  {"year": 2024, "view": "Education", "category": "Tertiary", "employment_rate": 88.2},
  ...
]
```

### Interactive Dropdown
User selects view and chart shows relevant lines:
- Birth Status: 2 lines (native vs foreign)
- Education: 3 lines (primary vs secondary vs tertiary)
- Sex: 2 lines (men vs women)