# Allsvenskan Transfers 2020-2025

Clean dataset of all transfers involving Allsvenskan:
- **Entry:** Players who joined Allsvenskan from another league
- **Exit:** Players who left Allsvenskan to another league
- **Intra:** Team changes within Allsvenskan

In [1]:
import pandas as pd
from pathlib import Path

DATA_PATH = Path('../../thesis_data')
OUTPUT_PATH = DATA_PATH / 'processed'
OUTPUT_PATH.mkdir(exist_ok=True)

## 1. Load Data

In [2]:
# Transfer history
th = pd.read_parquet(DATA_PATH / 'tm_data/transfer_history_all.parquet')
print(f"Transfer history: {len(th):,} records")

# Names from Twelve (wy_player_id -> short_name)
twelve = pd.read_parquet(DATA_PATH / 'raw_data_twelve/Twelve/male_transfer_model.parquet')
name_lookup = twelve.drop_duplicates('player_id').set_index('player_id')['short_name'].to_dict()
print(f"Name lookup: {len(name_lookup):,} players")

Transfer history: 363,943 records
Name lookup: 41,468 players


## 2. Filter Allsvenskan Transfers (2020-2025)

In [3]:
# Parse dates
th['transfer_date'] = pd.to_datetime(th['date'], errors='coerce')
th['transfer_year'] = th['transfer_date'].dt.year

# Filters
is_from_allsv = th['competition_name_from'].str.contains('Allsvenskan', case=False, na=False)
is_to_allsv = th['competition_name_to'].str.contains('Allsvenskan', case=False, na=False)
in_window = (th['transfer_year'] >= 2020) & (th['transfer_year'] <= 2025)

# Categorize
entries = th[~is_from_allsv & is_to_allsv & in_window].copy()
entries['transfer_type'] = 'entry'

exits = th[is_from_allsv & ~is_to_allsv & in_window].copy()
exits['transfer_type'] = 'exit'

intra = th[is_from_allsv & is_to_allsv & in_window].copy()
intra['transfer_type'] = 'intra'

print(f"Entries: {len(entries):,}")
print(f"Exits: {len(exits):,}")
print(f"Intra: {len(intra):,}")
print(f"Total: {len(entries) + len(exits) + len(intra):,}")

Entries: 802
Exits: 808
Intra: 147
Total: 1,757


## 3. Build Clean Dataset

In [4]:
# Combine
allsv = pd.concat([entries, exits, intra], ignore_index=True)

# Add player names from Twelve
allsv['player_name'] = allsv['wy_player_id'].map(name_lookup)

# Rename player_id to tm_player_id for clarity
allsv = allsv.rename(columns={'player_id': 'tm_player_id'})

# Select columns
allsv = allsv[[
    # IDs
    'wy_player_id', 'tm_player_id', 'player_name',
    # Transfer info
    'transfer_type', 'transfer_date', 'transfer_year',
    # From
    'team_id_from', 'team_name_from', 
    'competition_id_from', 'competition_name_from', 'competition_country_from',
    # To
    'team_id_to', 'team_name_to',
    'competition_id_to', 'competition_name_to', 'competition_country_to',
    # Value
    'age_at_transfer', 'transfer_fee', 'transfer_value',
    'remaining_contract_period', 'contract_until_date'
]].sort_values(['transfer_date', 'wy_player_id']).reset_index(drop=True)

print(f"\nDataset shape: {allsv.shape}")
print(f"Players with name: {allsv['player_name'].notna().sum():,} ({allsv['player_name'].notna().mean()*100:.1f}%)")
allsv.head(10)


Dataset shape: (1757, 21)
Players with name: 1,757 (100.0%)


Unnamed: 0,wy_player_id,tm_player_id,player_name,transfer_type,transfer_date,transfer_year,team_id_from,team_name_from,competition_id_from,competition_name_from,...,team_id_to,team_name_to,competition_id_to,competition_name_to,competition_country_to,age_at_transfer,transfer_fee,transfer_value,remaining_contract_period,contract_until_date
0,101885,187042,K. Pogrebnyak,exit,2020-01-01,2020.0,5196,Falkenbergs FF,SE1,Allsvenskan,...,15309,Lokomotiv Tashkent,UZ1,O'zbekiston Superligasi,Uzbekistan,27.0,0.0,250000,,2020-12-30
1,361563,357091,Maxwell,exit,2020-01-01,2020.0,3654,Kalmar FF,SE1,Allsvenskan,...,28022,Cuiabá Esporte Clube (MT),BRA2,Campeonato Brasileiro Série B,Brazil,24.0,,200000,1306.0,2021-02-27
2,361865,357622,A. Jallow,entry,2020-01-01,2020.0,7411,Jönköpings Södra IF,SE2,Superettan,...,801,IFK Göteborg,SE1,Allsvenskan,Sweden,21.0,,250000,364.0,2023-12-30
3,472352,428703,M. Persson,exit,2020-01-01,2020.0,3654,Kalmar FF,SE1,Allsvenskan,...,28447,Oskarshamns AIK,SE3S,Ettan Södra,Sweden,19.0,,0,,
4,607268,663237,J. Ondrejka,entry,2020-01-02,2020.0,2294,Landskrona BoIS,SE3S,Ettan Södra,...,1101,IF Elfsborg,SE1,Allsvenskan,Sweden,17.0,,0,728.0,2023-12-30
5,51334,101029,V. Prodell,exit,2020-01-03,2020.0,1056,Örebro SK,SE1,Allsvenskan,...,20830,Ho Chi Minh City FC,VIE1,V.League 1,Vietnam,31.0,,300000,362.0,
6,347512,377468,A. Ahmedhodžić,entry,2020-01-03,2020.0,5818,Hobro IK,DK1,Superliga,...,496,Malmö FF,SE1,Allsvenskan,Sweden,20.0,,450000,178.0,2023-12-30
7,473832,323325,L. Bengtsson,intra,2020-01-05,2020.0,1059,Hammarby IF,SE1,Allsvenskan,...,1109,BK Häcken,SE1,Allsvenskan,Sweden,21.0,100000.0,200000,725.0,2022-12-30
8,73350,204664,E. Bejtulai,entry,2020-01-06,2020.0,1167,Shkendija Tetovo,MAZ1,Prva Makedonska Fudbalska Liga,...,699,Helsingborgs IF,SE1,Allsvenskan,Sweden,26.0,0.0,600000,,
9,280326,288597,J. Asani,entry,2020-01-06,2020.0,4040,FK Partizani,ALB1,Abissnet Superiore,...,272,AIK,SE1,Allsvenskan,Sweden,24.0,,400000,540.0,2020-08-29


## 4. Quick Stats

In [5]:
# By year and type
print("TRANSFERS BY YEAR AND TYPE")
print("="*50)
pivot = allsv.pivot_table(index='transfer_year', columns='transfer_type', 
                          aggfunc='size', fill_value=0)
pivot['total'] = pivot.sum(axis=1)
print(pivot)

TRANSFERS BY YEAR AND TYPE
transfer_type  entry  exit  intra  total
transfer_year                           
2020.0           128   105     17    250
2021.0           129   122     18    269
2022.0           159   146     37    342
2023.0           153   162     23    338
2024.0           134   152     29    315
2025.0            99   121     23    243


In [6]:
# Exit fees
print("EXIT FEES")
print("="*50)
exit_fees = allsv[(allsv['transfer_type'] == 'exit') & (allsv['transfer_fee'] > 0)]
print(f"Exits with fee > 0: {len(exit_fees):,}")
print(f"Total: €{exit_fees['transfer_fee'].sum()/1e6:.1f}M")
print(f"Avg: €{exit_fees['transfer_fee'].mean()/1e6:.2f}M")
print(f"Max: €{exit_fees['transfer_fee'].max()/1e6:.2f}M")

EXIT FEES
Exits with fee > 0: 153
Total: €246.9M
Avg: €1.61M
Max: €20.00M


In [7]:
# Top 20 exits by fee
print("TOP 20 EXITS BY FEE")
print("="*80)
top_exits = allsv[allsv['transfer_type'] == 'exit'].nlargest(20, 'transfer_fee')
for i, (_, r) in enumerate(top_exits.iterrows(), 1):
    name = r['player_name'] if pd.notna(r['player_name']) else f"ID:{r['wy_player_id']}"
    print(f"{i:2}. {name:<22} €{r['transfer_fee']/1e6:>5.2f}M  {r['team_name_from']} → {r['team_name_to']} ({r['competition_name_to']})")

TOP 20 EXITS BY FEE
 1. L. Bergvall            €20.00M  Djurgårdens IF → Tottenham Hotspur (Premier League)
 2. S. Nanasi              €11.00M  Malmö FF → RC Strasbourg Alsace (Ligue 1)
 3. H. Larsson             € 9.00M  Malmö FF → Eintracht Frankfurt (Bundesliga)
 4. S. Hakšabanović        € 6.50M  IFK Norrköping → Rubin Kazan (Premier Liga)
 5. S. Tounekti            € 6.40M  Hammarby IF → Celtic FC (Scottish Premiership)
 6. N. Adjei               € 5.70M  Hammarby IF → FC Lorient (Ligue 2)
 7. S. Dahl                € 5.67M  Djurgårdens IF → AS Roma (Serie A)
 8. W. Swedberg            € 5.50M  Hammarby IF → Celta de Vigo (LaLiga)
 9. M. Danielson           € 5.00M  Djurgårdens IF → Dalian Professional (2009-2024) (Chinese Super League)
10. A. Fatah               € 4.70M  AIK → ESTAC Troyes (Ligue 1)
11. I. Hien                € 4.60M  Djurgårdens IF → Hellas Verona (Serie A)
12. A. Ahmedhodžić         € 4.50M  Malmö FF → Sheffield United (Championship)
13. V. Birmančević         

In [8]:
# Destination leagues (exits)
print("TOP DESTINATION LEAGUES (Exits)")
print("="*50)
print(allsv[allsv['transfer_type'] == 'exit']['competition_name_to'].value_counts().head(15).to_string())

TOP DESTINATION LEAGUES (Exits)
competition_name_to
Superettan             163
Ettan Norra             47
Eliteserien             46
Ettan Södra             44
Superliga               39
Eredivisie              22
Veikkausliiga           19
1.Division              18
PKO BP Ekstraklasa      18
Premier Liga            16
OBOS-ligaen             16
Protathlima Cyta        14
Major League Soccer     14
Jupiler Pro League      14
Championship            13


In [9]:
# Source leagues (entries)
print("TOP SOURCE LEAGUES (Entries)")
print("="*50)
print(allsv[allsv['transfer_type'] == 'entry']['competition_name_from'].value_counts().head(15).to_string())

TOP SOURCE LEAGUES (Entries)
competition_name_from
Superettan                 168
Ettan Norra                 61
Eliteserien                 58
Ettan Södra                 45
Superliga                   44
1.Division                  27
Veikkausliiga               23
Eredivisie                  21
OBOS-ligaen                 17
Keuken Kampioen Divisie     13
Premier Liga                12
Protathlima Cyta            11
Major League Soccer         11
Jupiler Pro League          10
Besta deild                  9


## 5. Save

In [10]:
allsv.to_parquet(OUTPUT_PATH / 'allsvenskan_transfers_2020_2025.parquet', index=False)
print(f"✅ Saved: allsvenskan_transfers_2020_2025.parquet")
print(f"   {len(allsv):,} records")
print(f"   Columns: {list(allsv.columns)}")

✅ Saved: allsvenskan_transfers_2020_2025.parquet
   1,757 records
   Columns: ['wy_player_id', 'tm_player_id', 'player_name', 'transfer_type', 'transfer_date', 'transfer_year', 'team_id_from', 'team_name_from', 'competition_id_from', 'competition_name_from', 'competition_country_from', 'team_id_to', 'team_name_to', 'competition_id_to', 'competition_name_to', 'competition_country_to', 'age_at_transfer', 'transfer_fee', 'transfer_value', 'remaining_contract_period', 'contract_until_date']


---

## ⚠️ Data Gap

This dataset only contains **transfers**. Missing data:

- Players who played in Allsvenskan without making a transfer
- Performance metrics per season
- Minutes played

**Action:** Request complete Allsvenskan 2020-2025 dataset from Twelve.