# Allsvenskan Player Trajectories (2020-2025)

**Objetivo:** Construir el historial completo de movimientos de jugadores en Allsvenskan:
- Entradas a la liga (desde otras ligas)
- Salidas de la liga (hacia otras ligas)
- Transfers intra-liga (cambios de equipo dentro de Allsvenskan)

**Output:** Dataset limpio con trayectorias de jugadores incluyendo:
- `player_id`, `player_name`
- `transfer_date`, `transfer_type` (entry/exit/intra)
- `team_from`, `team_to`
- `league_from`, `league_to`
- `transfer_fee`, `transfer_value`
- `age_at_transfer`

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Paths
DATA_PATH = Path('../../thesis_data')
TH_PATH = DATA_PATH / 'tm_data/transfer_history_all.parquet'

print(f"Loading transfer history from: {TH_PATH}")

## 1. Load and Explore Transfer History

In [None]:
# Load full transfer history
th = pd.read_parquet(TH_PATH)
print(f"Total records: {len(th):,}")
print(f"Columns: {list(th.columns)}")
th.head(3)

In [None]:
# Parse dates
th['transfer_date'] = pd.to_datetime(th['date'], errors='coerce')
th['transfer_year'] = th['transfer_date'].dt.year

print(f"Date range: {th['transfer_date'].min()} to {th['transfer_date'].max()}")
print(f"\nTransfers by year (last 10 years):")
th[th['transfer_year'] >= 2015]['transfer_year'].value_counts().sort_index()

## 2. Filter Allsvenskan Transfers (2020-2025)

In [None]:
# Define window
START_YEAR = 2020
END_YEAR = 2025

# Identify Allsvenskan transfers
is_from_allsv = th['competition_name_from'].str.contains('Allsvenskan', case=False, na=False)
is_to_allsv = th['competition_name_to'].str.contains('Allsvenskan', case=False, na=False)
in_window = (th['transfer_year'] >= START_YEAR) & (th['transfer_year'] <= END_YEAR)

# Three categories:
# 1. ENTRY: from other league TO Allsvenskan
entries = th[~is_from_allsv & is_to_allsv & in_window].copy()
entries['transfer_type'] = 'entry'

# 2. EXIT: FROM Allsvenskan to other league
exits = th[is_from_allsv & ~is_to_allsv & in_window].copy()
exits['transfer_type'] = 'exit'

# 3. INTRA: within Allsvenskan (team change)
intra = th[is_from_allsv & is_to_allsv & in_window].copy()
intra['transfer_type'] = 'intra'

print(f"ENTRIES to Allsvenskan ({START_YEAR}-{END_YEAR}): {len(entries):,}")
print(f"EXITS from Allsvenskan ({START_YEAR}-{END_YEAR}): {len(exits):,}")
print(f"INTRA-LEAGUE transfers ({START_YEAR}-{END_YEAR}): {len(intra):,}")
print(f"\nTotal movements: {len(entries) + len(exits) + len(intra):,}")

In [None]:
# Combine all Allsvenskan-related transfers
allsv_transfers = pd.concat([entries, exits, intra], ignore_index=True)

# Select and rename columns for clarity
allsv_transfers = allsv_transfers[[
    'wy_player_id', 'player_name', 'player_short_name',
    'transfer_date', 'transfer_year', 'transfer_type',
    'team_id_from', 'team_name_from', 'competition_name_from', 'competition_country_from',
    'team_id_to', 'team_name_to', 'competition_name_to', 'competition_country_to',
    'age_at_transfer', 'transfer_fee', 'transfer_value',
    'remaining_contract_period', 'contract_until_date'
]].copy()

# Sort by player and date
allsv_transfers = allsv_transfers.sort_values(['wy_player_id', 'transfer_date']).reset_index(drop=True)

print(f"\nAllsvenskan transfers dataset: {allsv_transfers.shape}")
allsv_transfers.head(10)

## 3. Unique Players in Allsvenskan Window

In [None]:
# All unique players who touched Allsvenskan in this window
players_entered = set(entries['wy_player_id'].unique())
players_exited = set(exits['wy_player_id'].unique())
players_intra = set(intra['wy_player_id'].unique())

all_players = players_entered | players_exited | players_intra

print(f"Players who ENTERED Allsvenskan: {len(players_entered):,}")
print(f"Players who EXITED Allsvenskan: {len(players_exited):,}")
print(f"Players with INTRA transfers: {len(players_intra):,}")
print(f"\nTotal unique players: {len(all_players):,}")

# Venn-style breakdown
only_entered = players_entered - players_exited - players_intra
only_exited = players_exited - players_entered - players_intra
entered_and_exited = players_entered & players_exited

print(f"\n--- Player Categories ---")
print(f"Only entered (still there or no tracked exit): {len(only_entered):,}")
print(f"Only exited (were there before 2020): {len(only_exited):,}")
print(f"Entered AND exited in window: {len(entered_and_exited):,}")

## 4. Transfer Statistics

In [None]:
# Transfers by year and type
pivot = allsv_transfers.pivot_table(
    index='transfer_year', 
    columns='transfer_type', 
    values='wy_player_id', 
    aggfunc='count',
    fill_value=0
)
pivot['total'] = pivot.sum(axis=1)
print("Transfers by Year and Type:")
pivot

In [None]:
# Transfer fees analysis
print("=" * 60)
print("TRANSFER FEE ANALYSIS")
print("=" * 60)

for ttype in ['entry', 'exit', 'intra']:
    subset = allsv_transfers[allsv_transfers['transfer_type'] == ttype]
    fees = subset['transfer_fee'].dropna()
    fees_nonzero = fees[fees > 0]
    
    print(f"\n{ttype.upper()}:")
    print(f"  Total transfers: {len(subset):,}")
    print(f"  With fee data: {len(fees):,} ({len(fees)/len(subset)*100:.1f}%)")
    print(f"  Non-zero fees: {len(fees_nonzero):,}")
    if len(fees_nonzero) > 0:
        print(f"  Fee range: ‚Ç¨{fees_nonzero.min():,.0f} - ‚Ç¨{fees_nonzero.max():,.0f}")
        print(f"  Median fee: ‚Ç¨{fees_nonzero.median():,.0f}")
        print(f"  Total fees: ‚Ç¨{fees_nonzero.sum():,.0f}")

In [None]:
# Top transfer fees
print("\nTOP 15 TRANSFER FEES (Exits from Allsvenskan):")
print("=" * 80)

top_exits = exits.nlargest(15, 'transfer_fee')[[
    'player_name', 'team_name_from', 'team_name_to', 'competition_name_to',
    'transfer_date', 'transfer_fee', 'age_at_transfer'
]].copy()
top_exits['transfer_fee_m'] = top_exits['transfer_fee'] / 1_000_000
top_exits

## 5. Source and Destination Leagues

In [None]:
# Where do players come FROM when entering Allsvenskan?
print("TOP 15 SOURCE LEAGUES (entries to Allsvenskan):")
print(entries['competition_name_from'].value_counts().head(15))

In [None]:
# Where do players GO when leaving Allsvenskan?
print("TOP 15 DESTINATION LEAGUES (exits from Allsvenskan):")
print(exits['competition_name_to'].value_counts().head(15))

In [None]:
# Source countries
print("\nSOURCE COUNTRIES (entries):")
print(entries['competition_country_from'].value_counts().head(10))

print("\nDESTINATION COUNTRIES (exits):")
print(exits['competition_country_to'].value_counts().head(10))

## 6. Allsvenskan Teams Analysis

In [None]:
# Teams receiving players (entries + intra)
teams_receiving = pd.concat([
    entries[['team_name_to', 'team_id_to', 'transfer_fee']],
    intra[['team_name_to', 'team_id_to', 'transfer_fee']]
])

teams_receiving_agg = teams_receiving.groupby(['team_name_to', 'team_id_to']).agg(
    signings=('transfer_fee', 'count'),
    total_spent=('transfer_fee', lambda x: x.fillna(0).sum()),
    avg_fee=('transfer_fee', lambda x: x[x > 0].mean() if (x > 0).any() else 0)
).reset_index().sort_values('signings', ascending=False)

print("ALLSVENSKAN TEAMS - SIGNINGS (2020-2025):")
teams_receiving_agg.head(20)

In [None]:
# Teams losing players (exits + intra)
teams_selling = pd.concat([
    exits[['team_name_from', 'team_id_from', 'transfer_fee']],
    intra[['team_name_from', 'team_id_from', 'transfer_fee']]
])

teams_selling_agg = teams_selling.groupby(['team_name_from', 'team_id_from']).agg(
    departures=('transfer_fee', 'count'),
    total_received=('transfer_fee', lambda x: x.fillna(0).sum()),
    avg_fee=('transfer_fee', lambda x: x[x > 0].mean() if (x > 0).any() else 0)
).reset_index().sort_values('departures', ascending=False)

print("ALLSVENSKAN TEAMS - DEPARTURES (2020-2025):")
teams_selling_agg.head(20)

## 7. Build Player Trajectories

In [None]:
def build_player_trajectory(player_id, transfers_df):
    """
    Build a trajectory for a single player from their transfer records.
    Returns a list of (season, team, league) tuples.
    """
    player_transfers = transfers_df[transfers_df['wy_player_id'] == player_id].sort_values('transfer_date')
    
    if player_transfers.empty:
        return []
    
    trajectory = []
    
    for _, row in player_transfers.iterrows():
        transfer_date = row['transfer_date']
        year = transfer_date.year if pd.notna(transfer_date) else None
        month = transfer_date.month if pd.notna(transfer_date) else None
        
        # Determine season (Swedish season is calendar year)
        # Transfers Jan-Jun typically for current season, Jul-Dec for next
        if month and month <= 6:
            season = year
        else:
            season = year + 1 if year else None
        
        trajectory.append({
            'date': transfer_date,
            'season_effect': season,
            'type': row['transfer_type'],
            'from_team': row['team_name_from'],
            'to_team': row['team_name_to'],
            'from_league': row['competition_name_from'],
            'to_league': row['competition_name_to'],
            'fee': row['transfer_fee'],
            'value': row['transfer_value'],
            'age': row['age_at_transfer']
        })
    
    return trajectory

# Example: build trajectory for a player with multiple transfers
sample_players = allsv_transfers.groupby('wy_player_id').size().sort_values(ascending=False).head(5)
print("Players with most transfers in window:")
print(sample_players)

In [None]:
# Show example trajectory
example_id = sample_players.index[0]
example_name = allsv_transfers[allsv_transfers['wy_player_id'] == example_id]['player_name'].iloc[0]

print(f"\nEXAMPLE TRAJECTORY: {example_name} (ID: {example_id})")
print("=" * 80)

traj = build_player_trajectory(example_id, allsv_transfers)
for i, t in enumerate(traj, 1):
    fee_str = f"‚Ç¨{t['fee']:,.0f}" if pd.notna(t['fee']) and t['fee'] > 0 else "Free/Unknown"
    print(f"{i}. {t['date'].strftime('%Y-%m-%d') if pd.notna(t['date']) else 'N/A'} | {t['type'].upper()}")
    print(f"   {t['from_team']} ({t['from_league']}) ‚Üí {t['to_team']} ({t['to_league']})")
    print(f"   Fee: {fee_str} | Age: {t['age']}")
    print()

## 8. Create Comprehensive Player-Season-Team Dataset

In [None]:
def infer_seasons_in_allsvenskan(player_id, transfers_df, start_year=2020, end_year=2025):
    """
    Infer which seasons a player was in Allsvenskan and with which team.
    Returns list of dicts with season, team, and status.
    """
    player_transfers = transfers_df[transfers_df['wy_player_id'] == player_id].sort_values('transfer_date')
    
    if player_transfers.empty:
        return []
    
    # Get player name
    player_name = player_transfers['player_name'].iloc[0]
    
    # Track status by season
    seasons = {}
    
    # Process each transfer
    for _, row in player_transfers.iterrows():
        transfer_date = row['transfer_date']
        if pd.isna(transfer_date):
            continue
            
        year = transfer_date.year
        month = transfer_date.month
        ttype = row['transfer_type']
        
        # For Swedish football, season = calendar year
        # Main window: Jan-Mar (for current season), Jul-Aug (for current season)
        current_season = year
        
        if ttype == 'entry':
            # Player entered Allsvenskan
            team = row['team_name_to']
            # Mark from this season onwards until exit
            for s in range(current_season, end_year + 1):
                if s not in seasons or seasons[s]['status'] != 'in_league':
                    seasons[s] = {'team': team, 'status': 'in_league', 'entry_date': transfer_date}
                    
        elif ttype == 'exit':
            # Player left Allsvenskan
            team = row['team_name_from']
            # They were in the league up to this season
            if current_season in seasons:
                seasons[current_season]['exit_date'] = transfer_date
                seasons[current_season]['exit_team'] = row['team_name_to']
                seasons[current_season]['exit_league'] = row['competition_name_to']
                seasons[current_season]['exit_fee'] = row['transfer_fee']
            # Remove from future seasons
            for s in range(current_season + 1, end_year + 1):
                if s in seasons:
                    del seasons[s]
                    
        elif ttype == 'intra':
            # Changed team within Allsvenskan
            new_team = row['team_name_to']
            for s in range(current_season, end_year + 1):
                seasons[s] = {'team': new_team, 'status': 'in_league'}
    
    # Filter to our window and format output
    result = []
    for season in range(start_year, end_year + 1):
        if season in seasons:
            record = {
                'wy_player_id': player_id,
                'player_name': player_name,
                'season': season,
                'team': seasons[season].get('team'),
                'in_allsvenskan': True,
                'exit_date': seasons[season].get('exit_date'),
                'exit_team': seasons[season].get('exit_team'),
                'exit_league': seasons[season].get('exit_league'),
                'exit_fee': seasons[season].get('exit_fee')
            }
            result.append(record)
    
    return result

# Test with example player
print(f"Inferred seasons for {example_name}:")
inferred = infer_seasons_in_allsvenskan(example_id, allsv_transfers)
pd.DataFrame(inferred)

In [None]:
# Build for all players
print("Building player-season-team dataset...")

all_player_seasons = []
for player_id in all_players:
    seasons = infer_seasons_in_allsvenskan(player_id, allsv_transfers)
    all_player_seasons.extend(seasons)

player_seasons_df = pd.DataFrame(all_player_seasons)
print(f"\nPlayer-season records: {len(player_seasons_df):,}")
print(f"Unique players: {player_seasons_df['wy_player_id'].nunique():,}")
print(f"\nRecords by season:")
print(player_seasons_df['season'].value_counts().sort_index())

In [None]:
# Players by number of seasons in Allsvenskan
seasons_per_player = player_seasons_df.groupby('wy_player_id')['season'].nunique()
print("\nPlayers by number of seasons in Allsvenskan:")
print(seasons_per_player.value_counts().sort_index())

# Players with at least N seasons
for n in [1, 2, 3, 4, 5]:
    count = (seasons_per_player >= n).sum()
    print(f"Players with >= {n} seasons: {count:,}")

## 9. Export Datasets

In [None]:
# Export paths
OUTPUT_PATH = DATA_PATH / 'processed'
OUTPUT_PATH.mkdir(exist_ok=True)

# 1. All Allsvenskan transfers (entries, exits, intra)
allsv_transfers.to_parquet(OUTPUT_PATH / 'allsvenskan_transfers_2020_2025.parquet', index=False)
print(f"Saved: allsvenskan_transfers_2020_2025.parquet ({len(allsv_transfers):,} records)")

# 2. Player-season-team dataset
player_seasons_df.to_parquet(OUTPUT_PATH / 'allsvenskan_player_seasons_2020_2025.parquet', index=False)
print(f"Saved: allsvenskan_player_seasons_2020_2025.parquet ({len(player_seasons_df):,} records)")

# 3. Player summary (one row per player)
player_summary = player_seasons_df.groupby(['wy_player_id', 'player_name']).agg(
    seasons_count=('season', 'nunique'),
    first_season=('season', 'min'),
    last_season=('season', 'max'),
    teams=('team', lambda x: list(x.dropna().unique())),
    exited=('exit_date', lambda x: x.notna().any()),
    exit_fee=('exit_fee', 'max')
).reset_index()

player_summary.to_parquet(OUTPUT_PATH / 'allsvenskan_players_summary_2020_2025.parquet', index=False)
print(f"Saved: allsvenskan_players_summary_2020_2025.parquet ({len(player_summary):,} players)")

In [None]:
# Preview player summary
print("\nPLAYER SUMMARY PREVIEW:")
player_summary.sort_values('exit_fee', ascending=False).head(20)

## 10. Summary Statistics

In [None]:
print("=" * 70)
print("ALLSVENSKAN 2020-2025 - TRANSFER DATA SUMMARY")
print("=" * 70)

print(f"\nüìä TRANSFERS:")
print(f"   Total movements: {len(allsv_transfers):,}")
print(f"   - Entries (to Allsvenskan): {len(entries):,}")
print(f"   - Exits (from Allsvenskan): {len(exits):,}")
print(f"   - Intra-league: {len(intra):,}")

print(f"\nüë• PLAYERS:")
print(f"   Total unique players: {len(all_players):,}")
print(f"   With 1+ season: {(seasons_per_player >= 1).sum():,}")
print(f"   With 2+ seasons: {(seasons_per_player >= 2).sum():,}")
print(f"   With 3+ seasons: {(seasons_per_player >= 3).sum():,}")

exits_with_fee = exits[exits['transfer_fee'] > 0]
print(f"\nüí∞ TRANSFER FEES (Exits):")
print(f"   Exits with fee: {len(exits_with_fee):,} ({len(exits_with_fee)/len(exits)*100:.1f}%)")
print(f"   Total fees: ‚Ç¨{exits_with_fee['transfer_fee'].sum()/1e6:.1f}M")
print(f"   Avg fee: ‚Ç¨{exits_with_fee['transfer_fee'].mean()/1e6:.2f}M")
print(f"   Max fee: ‚Ç¨{exits_with_fee['transfer_fee'].max()/1e6:.2f}M")

print(f"\nüìÅ EXPORTED FILES:")
print(f"   - allsvenskan_transfers_2020_2025.parquet")
print(f"   - allsvenskan_player_seasons_2020_2025.parquet")
print(f"   - allsvenskan_players_summary_2020_2025.parquet")

---

## ‚ö†Ô∏è DATA GAP - NEXT STEPS

**Lo que tenemos:**
- ‚úÖ Historial completo de transfers (entradas, salidas, intra-liga)
- ‚úÖ Trayectorias de jugadores (season, team)
- ‚úÖ Transfer fees, values, dates

**Lo que nos falta:**
- ‚ùå **M√©tricas de performance** para TODOS los jugadores por temporada
- ‚ùå Jugadores que NO hicieron ning√∫n transfer (solo jugaron en Allsvenskan)
- ‚ùå Minutos jugados por temporada

**Opciones:**
1. Solicitar a Twelve dataset completo de Allsvenskan (no solo transfers)
2. Usar Wyscout API para obtener m√©tricas
3. Scrape de FBref/otras fuentes