# Chart 5: NIER Fiscal Contribution by Country of Birth

## Data Source
**Konjunkturinstitutet (NIER)** — Specialstudie 117

| Field | Value |
|-------|-------|
| Report | Invandrades nettobidrag till de offentliga finanserna 1983–2022 |
| URL | https://www.konj.se/media/kpgnt5iw/specialstudie-117-invandrades-nettobidrag-till-de-offentliga-finanserna-1983-2022.pdf |
| Table | Tabell 8 (Bilaga 2) — Pages 52-53 |
| Published | June 2025 |

## Table Structure
- **Nettobidrag (*)**: Total net contribution in billions SEK (1990, 2005, 2022)
- **Nettobidrag per person (†)**: Per-capita in thousands SEK (1990, 2005, 2022)
- **Befolkning (‡)**: Population in thousands (1990, 2005, 2022)

In [41]:
# Install pdfplumber
!pip install pdfplumber -q

In [42]:
import pdfplumber
import pandas as pd
import requests

print("Libraries loaded")

Libraries loaded


In [43]:
# Download the PDF
pdf_url = "https://www.konj.se/media/kpgnt5iw/specialstudie-117-invandrades-nettobidrag-till-de-offentliga-finanserna-1983-2022.pdf"
pdf_path = "nier_report.pdf"

response = requests.get(pdf_url)
with open(pdf_path, 'wb') as f:
    f.write(response.content)

print(f"Downloaded: {pdf_path} ({len(response.content) / 1024:.0f} KB)")

Downloaded: nier_report.pdf (4481 KB)


In [44]:
# Extract tables from pages 52 and 53
# Tabell 8 spans both pages

all_rows = []

with pdfplumber.open('nier_report.pdf') as pdf:
    for page_num in [52, 53]:  # Pages 52 and 53
        page = pdf.pages[page_num]
        tables = page.extract_tables()

        print(f"\n=== Page {page_num} ===")
        print(f"Found {len(tables)} table(s)")

        if tables:
            table = tables[0]
            print(f"Table has {len(table)} rows")
            for i, row in enumerate(table):
                print(f"Row {i}: {row}")
            all_rows.extend(table)

print(f"\nTotal rows collected: {len(all_rows)}")


=== Page 52 ===
Found 1 table(s)
Table has 30 rows
Row 0: ['', '', None, None, 'Nettobidrag', None, None, '', None, None]
Row 1: ['', 'Nettobidrag (*)', None, None, 'per person (†)', None, None, 'Befolkning (‡)', None, None]
Row 2: ['Födelseland', '1990', '2005', '2022', '1990', '2005', '2022', '1990', '2005', '2022']
Row 3: ['Europa', '', '', '', '', '', '', '', '', '']
Row 4: ['Bosnien och', '', '', '', '', '', '', '', '', '']
Row 5: ['Hercegovina', '…', '−1,7', '3,6', '…', '−31', '60', '…', '55', '61']
Row 6: ['Danmark', '0,8', '−1,1', '−0,7', '17', '−25', '−18', '47', '45', '40']
Row 7: ['Finland', '6,6', '−0,3', '−13,7', '29', '−2', '−99', '224', '189', '138']
Row 8: ['Grekland', '−0,9', '−0,5', '−0,1', '−64', '−49', '−5', '13', '11', '21']
Row 9: ['Italien', '0,1', '−0,1', '0,8', '17', '−9', '49', '6', '7', '16']
Row 10: ['Jugoslavien', '−1,6', '−4,4', '−0,3', '−42', '−60', '−5', '38', '74', '62']
Row 11: ['Litauen', '…', '…', '1,2', '…', '…', '66', '…', '…', '18']
Row 12: ['Ned

In [45]:
# Filter and clean rows - handle split country names (e.g., "Bosnien och" + "Hercegovina")
data_rows = []
pending_name = None  # Store first part of split names

for row in all_rows:
    if row is None or len(row) < 10:
        continue

    first_cell = str(row[0]).strip() if row[0] else ''

    # Skip empty rows
    if not first_cell:
        continue

    # Skip region headers
    if first_cell in region_headers:
        continue

    # Skip header rows
    if any(kw in first_cell for kw in header_keywords):
        continue

    # Check if this row has data (any non-empty numeric cell)
    has_data = any(row[i] and str(row[i]).strip() not in ['', '…'] for i in range(1, 10))

    if not has_data:
        # This is the first part of a split name (e.g., "Bosnien och")
        pending_name = first_cell
        continue

    # This row has data
    if pending_name:
        # Combine with pending name (e.g., "Bosnien och" + "Hercegovina")
        row = [f"{pending_name} {first_cell}"] + list(row[1:10])
        pending_name = None
    else:
        row = row[:10]

    data_rows.append(row)

print(f"Data rows after filtering: {len(data_rows)}")
for row in data_rows:
    print(row)

Data rows after filtering: 33
['Bosnien och Hercegovina', '…', '−1,7', '3,6', '…', '−31', '60', '…', '55', '61']
['Danmark', '0,8', '−1,1', '−0,7', '17', '−25', '−18', '47', '45', '40']
['Finland', '6,6', '−0,3', '−13,7', '29', '−2', '−99', '224', '189', '138']
['Grekland', '−0,9', '−0,5', '−0,1', '−64', '−49', '−5', '13', '11', '21']
['Italien', '0,1', '−0,1', '0,8', '17', '−9', '49', '6', '7', '16']
['Jugoslavien', '−1,6', '−4,4', '−0,3', '−42', '−60', '−5', '38', '74', '62']
['Litauen', '…', '…', '1,2', '…', '…', '66', '…', '…', '18']
['Nederländerna', '…', '0,4', '1,1', '…', '66', '71', '…', '6', '15']
['Norge', '−0,6', '−1,8', '−0,6', '−10', '−37', '−15', '57', '47', '42']
['Polen', '−0,8', '0,0', '3,6', '−22', '−1', '36', '36', '47', '100']
['Rumänien', '−0,5', '0,1', '1,9', '−55', '5', '53', '9', '13', '36']
['Ryssland', '…', '−0,8', '1,8', '…', '−70', '72', '…', '11', '25']
['Serbien', '…', '…', '0,4', '…', '…', '22', '…', '…', '18']
['Storbritannien', '0,8', '1,9', '3,4', '68'

In [47]:
# Define column names
columns = [
    'country',
    'netto_1990', 'netto_2005', 'netto_2022',
    'per_person_1990', 'per_person_2005', 'per_person_2022',
    'pop_1990', 'pop_2005', 'pop_2022'
]

# Region headers to skip
region_headers = ['Europa', 'Afrika', 'Asien', 'Nord- och', 'Sydamerika', 'Nord- och Sydamerika']

# Header keywords to skip
header_keywords = ['Födelseland', 'Nettobidrag', 'per person', 'Befolkning', '1990', '2005', '2022']

# Filter and clean rows - handle split country names
data_rows = []
pending_name = None

for row in all_rows:
    if row is None or len(row) < 10:
        continue

    first_cell = str(row[0]).strip() if row[0] else ''

    if not first_cell:
        continue

    if first_cell in region_headers:
        continue

    if any(kw in first_cell for kw in header_keywords):
        continue

    # Check if this row has data
    has_data = any(row[i] and str(row[i]).strip() not in ['', '…'] for i in range(1, 10))

    if not has_data:
        pending_name = first_cell
        continue

    if pending_name:
        row = [f"{pending_name} {first_cell}"] + list(row[1:10])
        pending_name = None
    else:
        row = row[:10]

    data_rows.append(row)

print(f"Data rows after filtering: {len(data_rows)}")
for row in data_rows:
    print(row)

Data rows after filtering: 33
['Bosnien och Hercegovina', '…', '−1,7', '3,6', '…', '−31', '60', '…', '55', '61']
['Danmark', '0,8', '−1,1', '−0,7', '17', '−25', '−18', '47', '45', '40']
['Finland', '6,6', '−0,3', '−13,7', '29', '−2', '−99', '224', '189', '138']
['Grekland', '−0,9', '−0,5', '−0,1', '−64', '−49', '−5', '13', '11', '21']
['Italien', '0,1', '−0,1', '0,8', '17', '−9', '49', '6', '7', '16']
['Jugoslavien', '−1,6', '−4,4', '−0,3', '−42', '−60', '−5', '38', '74', '62']
['Litauen', '…', '…', '1,2', '…', '…', '66', '…', '…', '18']
['Nederländerna', '…', '0,4', '1,1', '…', '66', '71', '…', '6', '15']
['Norge', '−0,6', '−1,8', '−0,6', '−10', '−37', '−15', '57', '47', '42']
['Polen', '−0,8', '0,0', '3,6', '−22', '−1', '36', '36', '47', '100']
['Rumänien', '−0,5', '0,1', '1,9', '−55', '5', '53', '9', '13', '36']
['Ryssland', '…', '−0,8', '1,8', '…', '−70', '72', '…', '11', '25']
['Serbien', '…', '…', '0,4', '…', '…', '22', '…', '…', '18']
['Storbritannien', '0,8', '1,9', '3,4', '68'

In [48]:
# Create DataFrame
df = pd.DataFrame(data_rows, columns=columns)

# Clean function for Swedish numbers
def clean_num(val):
    if pd.isna(val) or val in ['', None, '…', '...']:
        return None
    s = str(val).strip()
    if s in ['…', '...', '']:
        return None
    # Remove spaces, replace comma with period, fix minus signs
    s = s.replace(' ', '').replace('\xa0', '')
    s = s.replace(',', '.')
    s = s.replace('−', '-').replace('–', '-').replace('—', '-')
    try:
        return float(s)
    except:
        return None

# Apply cleaning to numeric columns
numeric_cols = [c for c in columns if c != 'country']
for col in numeric_cols:
    df[col] = df[col].apply(clean_num)

# Clean country names
df['country'] = df['country'].str.strip()

# Handle multi-line country names (e.g., "Bosnien och" + "Hercegovina")
# These should already be combined by pdfplumber, but check
df = df[df['country'] != '']

print(f"\nCleaned DataFrame ({len(df)} countries):")
print(df.to_string())


Cleaned DataFrame (33 countries):
                    country  netto_1990  netto_2005  netto_2022  per_person_1990  per_person_2005  per_person_2022  pop_1990  pop_2005  pop_2022
0   Bosnien och Hercegovina         NaN        -1.7         3.6              NaN            -31.0             60.0       NaN      55.0      61.0
1                   Danmark         0.8        -1.1        -0.7             17.0            -25.0            -18.0      47.0      45.0      40.0
2                   Finland         6.6        -0.3       -13.7             29.0             -2.0            -99.0     224.0     189.0     138.0
3                  Grekland        -0.9        -0.5        -0.1            -64.0            -49.0             -5.0      13.0      11.0      21.0
4                   Italien         0.1        -0.1         0.8             17.0             -9.0             49.0       6.0       7.0      16.0
5               Jugoslavien        -1.6        -4.4        -0.3            -42.0            -60

In [49]:
# Create slope chart data: 2005 → 2022 per-person values
# Filter to countries with both years

df_valid = df.dropna(subset=['per_person_2005', 'per_person_2022']).copy()
print(f"Countries with both 2005 and 2022 data: {len(df_valid)}")
print(df_valid[['country', 'per_person_2005', 'per_person_2022']])

Countries with both 2005 and 2022 data: 30
                    country  per_person_2005  per_person_2022
0   Bosnien och Hercegovina            -31.0             60.0
1                   Danmark            -25.0            -18.0
2                   Finland             -2.0            -99.0
3                  Grekland            -49.0             -5.0
4                   Italien             -9.0             49.0
5               Jugoslavien            -60.0             -5.0
7             Nederländerna             66.0             71.0
8                     Norge            -37.0            -15.0
9                     Polen             -1.0             36.0
10                 Rumänien              5.0             53.0
11                 Ryssland            -70.0             72.0
13           Storbritannien            106.0            100.0
14                  Turkiet            -67.0             13.0
15                 Tyskland            -34.0             -2.0
16                   Ungern

In [50]:
# Melt to long format for Vega-Lite slope chart
df_slope = pd.melt(
    df_valid[['country', 'per_person_2005', 'per_person_2022']],
    id_vars=['country'],
    var_name='year_col',
    value_name='net_contribution_per_person'
)

# Extract year
df_slope['year'] = df_slope['year_col'].str.extract(r'(\d{4})').astype(int)
df_slope = df_slope[['country', 'year', 'net_contribution_per_person']]
df_slope = df_slope.sort_values(['country', 'year']).reset_index(drop=True)

print("Slope chart data (long format):")
print(df_slope.to_string())

Slope chart data (long format):
                    country  year  net_contribution_per_person
0               Afghanistan  2005                       -166.0
1               Afghanistan  2022                        -19.0
2   Bosnien och Hercegovina  2005                        -31.0
3   Bosnien och Hercegovina  2022                         60.0
4                     Chile  2005                          2.0
5                     Chile  2022                         30.0
6                   Danmark  2005                        -25.0
7                   Danmark  2022                        -18.0
8                   Eritrea  2005                         -7.0
9                   Eritrea  2022                        -35.0
10                 Etiopien  2005                         -8.0
11                 Etiopien  2022                          1.0
12             Filippinerna  2005                        -24.0
13             Filippinerna  2022                         27.0
14                  Fin

In [51]:
# Export CSVs
df.to_csv('p5_full_table.csv', index=False)
df_slope.to_csv('p5_data.csv', index=False)

print("Exported:")
print("  - p5_full_table.csv (wide format, all columns)")
print("  - p5_data.csv (slope chart format)")
print("\np5_data.csv preview:")
print(df_slope.head(20))

Exported:
  - p5_full_table.csv (wide format, all columns)
  - p5_data.csv (slope chart format)

p5_data.csv preview:
                    country  year  net_contribution_per_person
0               Afghanistan  2005                       -166.0
1               Afghanistan  2022                        -19.0
2   Bosnien och Hercegovina  2005                        -31.0
3   Bosnien och Hercegovina  2022                         60.0
4                     Chile  2005                          2.0
5                     Chile  2022                         30.0
6                   Danmark  2005                        -25.0
7                   Danmark  2022                        -18.0
8                   Eritrea  2005                         -7.0
9                   Eritrea  2022                        -35.0
10                 Etiopien  2005                         -8.0
11                 Etiopien  2022                          1.0
12             Filippinerna  2005                        -24.0


In [52]:
from google.colab import files
files.download('p5_data.csv')
files.download('p5_full_table.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
## Summary

### Data Pipeline
```
NIER Specialstudie 117 (PDF)
    ↓
pdfplumber: extract pages 52 + 53
    ↓
Combine tables, filter region headers
    ↓
Clean Swedish number format
    ↓
Melt to long format (country, year, value)
    ↓
CSV export for Vega-Lite slope chart
```

### Key Insight
The per-person values show fiscal integration over time. Countries like Afghanistan improved from -166k SEK (2005) to -19k SEK (2022).

### Source
Konjunkturinstitutet (2025). "Invandrades nettobidrag till de offentliga finanserna 1983–2022". Specialstudie 117, Tabell 8.