<small><i>June 2024 - This notebook was created by [Ivan Casanovas Rodríguez](https://www.linkedin.com/in/ivancasanovaas/)

# Network construction. OVERVIEW



<div class = "alert alert-info" style ="border-radius:10px;border-width:3px" >
    
    
1. [General information](#1)
 
2. [Features](#2)

3. [Cleaning the dataset](#3)
    
4. [Characterization of the network: nodes and edges](#4)


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# 1. General information  <a class="anchor" id="1"></a>

This dataset includes information about the transfers of the 6 major European football leagues from 2009 to 2021, according to [Transfermarkt](https://www.transfermarkt.com) website.

- Dataset can be found in `dataset` folder.
- Scraper and preprocessing Jupyther notebooks with code can be found in `dataset/src` folder.
- Raw data files scraped from Transfermarkt can be found in `dataset/data` folder.

https://github.com/d2ski/football-transfers-data

In [2]:
df = pd.read_csv('../dataset/transfers.csv')
df

Unnamed: 0,league,season,window,team_id,team_name,team_country,dir,player_id,player_name,player_age,...,counter_team_id,counter_team_name,counter_team_country,transfer_fee_amnt,market_val_amnt,is_free,is_loan,is_loan_end,is_retired,transfer_id
0,GB1,2009,s,985,Manchester United,England,in,33544,Antonio Valencia,23.0,...,1071,Wigan Athletic,England,18800000.0,,False,False,False,False,310832
1,GB1,2009,s,985,Manchester United,England,in,62049,Mame Diouf,21.0,...,687,Molde FK,Norway,4500000.0,1600000.0,False,False,False,False,319841
2,GB1,2009,s,985,Manchester United,England,in,43261,Gabriel Obertan,20.0,...,40,FC Girondins Bordeaux,France,4000000.0,400000.0,False,False,False,False,315185
3,GB1,2009,s,985,Manchester United,England,in,1397,Michael Owen,29.0,...,762,Newcastle United,England,0.0,,True,False,False,False,306421
4,GB1,2009,s,985,Manchester United,England,in,73538,Scott Moffatt,18.0,...,5242,Manchester United U18,England,,,False,False,False,False,339015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70001,NL1,2021,w,306,SC Heerenveen,Netherlands,in,257808,Runar Espejord,25.0,...,1293,Tromsø IL,Norway,,500000.0,False,True,True,False,3071862
70002,NL1,2021,w,306,SC Heerenveen,Netherlands,in,580142,Joaquín Fernández,22.0,...,37535,Montevideo City Torque,Uruguay,,100000.0,False,True,True,False,3268245
70003,NL1,2021,w,468,Sparta Rotterdam,Netherlands,in,340353,Maduka Okoye,22.0,...,1010,Watford FC,England,,1000000.0,False,True,False,False,3619917
70004,NL1,2021,w,468,Sparta Rotterdam,Netherlands,left,340353,Maduka Okoye,22.0,...,1010,Watford FC,England,7000000.0,1000000.0,False,False,False,False,3619916


# 2. Features  <a class="anchor" id="2"></a>
`league` - codename of the football league. Takes the following values:
- `GB1` English Premier League
- `ES1` La Liga
- `IT1` Serie A
- `L1` Bundesliga
- `FR1` French Ligue 1
- `PO1` Liga Portugal
- `NL1` Dutch Eredivisie
 
`season` - season  
`window` - transfer window (`s` summer or `w` winter)  
`team_id` - team's ID as used by Transfermarkt  
`team_name` - team's name  
`team_country` - team's country  
`dir` - transfer direction (`in` or `left`)  
`player_id` - player's ID as used by Transfermarkt  
`player_name` - player's name  
`player_age` - player's age when transfer occurred  
`player_nation` - player's nationality  
`player_nation2` - player's 2nd nationality  
`player_pos` - player's field position  
`counter_team_id` - counter team's ID as used by Transfermarkt  
`counter_team_name` - counter team's name  
`counter_team_country` - counter team's country  
`transfer_fee_amnt` - transfer fee amount (EUR)  
`market_val_amnt` - player's market value (EUR) when transfer occurred estimated by Transfermarkt  
`is_free` - free transfer (`True` or `False`)  
`is_loan` - loan transfer (`True` or `False`)  
`is_loan_end` - end of loan transfer (`True` or `False`)  
`is_retired` - player retired (`True` or `False`)  
`transfer_id` - transfer's ID as used by Transfermarkt  

In [3]:
features = list(df.columns)

features_info = {}
for feature in features:
    features_info[feature] = df[feature].unique()

# 3. Cleaning the dataset  <a class="anchor" id="3"></a>
We are only interested in the market connections between clubs, so we are going to discard all the players that are free agents. In this regard, this entails excluding free transfers and players who are currently without a club.

In [4]:
df = df[df['counter_team_name'] != 'Without Club']
df = df[df['is_free'] == False]
df

Unnamed: 0,league,season,window,team_id,team_name,team_country,dir,player_id,player_name,player_age,...,counter_team_id,counter_team_name,counter_team_country,transfer_fee_amnt,market_val_amnt,is_free,is_loan,is_loan_end,is_retired,transfer_id
0,GB1,2009,s,985,Manchester United,England,in,33544,Antonio Valencia,23.0,...,1071,Wigan Athletic,England,18800000.0,,False,False,False,False,310832
1,GB1,2009,s,985,Manchester United,England,in,62049,Mame Diouf,21.0,...,687,Molde FK,Norway,4500000.0,1600000.0,False,False,False,False,319841
2,GB1,2009,s,985,Manchester United,England,in,43261,Gabriel Obertan,20.0,...,40,FC Girondins Bordeaux,France,4000000.0,400000.0,False,False,False,False,315185
4,GB1,2009,s,985,Manchester United,England,in,73538,Scott Moffatt,18.0,...,5242,Manchester United U18,England,,,False,False,False,False,339015
5,GB1,2009,s,985,Manchester United,England,in,42411,Fraizer Campbell,21.0,...,148,Tottenham Hotspur,England,,700000.0,False,True,True,False,301497
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69999,NL1,2021,s,403,Willem II Tilburg,Netherlands,left,340172,Lindon Selahi,22.0,...,317,Twente Enschede FC,Netherlands,,650000.0,False,True,True,False,3206359
70001,NL1,2021,w,306,SC Heerenveen,Netherlands,in,257808,Runar Espejord,25.0,...,1293,Tromsø IL,Norway,,500000.0,False,True,True,False,3071862
70002,NL1,2021,w,306,SC Heerenveen,Netherlands,in,580142,Joaquín Fernández,22.0,...,37535,Montevideo City Torque,Uruguay,,100000.0,False,True,True,False,3268245
70003,NL1,2021,w,468,Sparta Rotterdam,Netherlands,in,340353,Maduka Okoye,22.0,...,1010,Watford FC,England,,1000000.0,False,True,False,False,3619917


# 4. Characterization of the network: nodes and edges  <a class="anchor" id="4"></a>

By using the player transfers between clubs, the idea is to build a complex economic network that describes the connections between clubs in the transfer market. 


The starting point consists of considering clubs as nodes, and the initial approach will be to study a simple network: an undirected and unweighted network. This means that it doesn't matter whether a player is bought or sold (undirected), and the number of players transferred between clubs is irrelevant (unweighted). Therefore, we establish a single connection between clubs when at least one player is transferred.

In [5]:
# Clubs as nodes
clubs = pd.concat([df['team_name'],df['counter_team_name']]).unique()
N = len(clubs)
nodes = np.arange(1,N+1)

# Transfers as edges
edges = df[['team_name','counter_team_name']]
edges = edges.apply(lambda row: sorted(row), axis=1, result_type='expand') # Undirected network 
edges = edges.drop_duplicates(keep='first') # Unweighted network
E = edges.shape[0]
for i in range(N):
    edges = edges.replace(clubs[i],nodes[i])
edges = edges.values.tolist()

# Output nodes file
fnodes = open('./nodes.txt','w')
for i in range(N):
    fnodes.write('%6i %40s\n'%(nodes[i], clubs[i]))
fnodes.close()

# Output edges file
fedges = open('./edges.txt','w')
for i in range(E):
    fedges.write('%6i %6i\n'%(edges[i][0], edges[i][1]))
fedges.close()