# Transfermarkt Dataset Notebook

This data includes team squad value data from the 5 big leagues of European football and the Turkish super league. The data was scraped from transfermarkt.com website. The exact links can be found in the notebook. The data is available in csv format.

## Data dictionary

- **Team Name:** The name of the team
- **League:** The league the team belongs to
- **Squad Value:** The squad value of the team in million Euros. Teams with above 1bn in squad value are represented in 1000+ million euros.

In [1]:
import pandas as pd
from requests import get
from bs4 import BeautifulSoup as bs
import re

header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4.1 Safari/605.1.15'
}

In [9]:
def parse_transfermarkt(url):
    response = get(url, headers=header)
    soup = bs(response.text, 'html.parser')
    table = soup.find('table', { 'class': 'items' })

    teamtags = table.find_all('img', {'class': 'tiny_wappen'})
    teamnames = []
    for tag in teamtags:
        name = tag.attrs['title']
        teamnames.append(str(name))

    leaguenames = []
    leaguetags = table.find_all('td', {'class': ''})
    for tag in leaguetags:
        leaguenames.append(str(tag.contents[0]))
    leaguenames = leaguenames[1:] # The first tag is useless

    teamvalues = []
    valuetags = table.find_all('td', {'class': 'rechts hauptlink'})
    for tag in valuetags:
        tag = tag.contents[0]
        atag = tag.find('a')
        price = str(atag.contents[0])
        pat = '[0-9]+\.?[0-9]*'
        price_millions = float(re.findall(pat, price)[0])
        if 'bn' in price:
            price_millions *= 1000
        teamvalues.append(price_millions)

    df = pd.DataFrame({
        'Team Name': teamnames,
        'League': leaguenames,
        'Squad Value': teamvalues
    })

    return df

In [10]:
stsl = 'https://www.transfermarkt.com/super-lig/marktwerteverein/wettbewerb/TR1/stichtag/2024-04-15'
epl = 'https://www.transfermarkt.com/premier-league/marktwerteverein/wettbewerb/GB1'
laliga = 'https://www.transfermarkt.com/laliga/marktwerteverein/wettbewerb/ES1'
seriea = 'https://www.transfermarkt.com/serie-a/marktwerteverein/wettbewerb/IT1'
bundesliga = 'https://www.transfermarkt.com/bundesliga/marktwerteverein/wettbewerb/L1'
ligueone = 'https://www.transfermarkt.com/ligue-1/marktwerteverein/wettbewerb/FR1'

leagues = [stsl, epl, laliga, seriea, bundesliga, ligueone]

In [11]:
dfs = [parse_transfermarkt(league) for league in leagues]
df = pd.concat(dfs)
df.sample(10)

Unnamed: 0,Team Name,League,Squad Value
18,FC Empoli,Serie A,68.6
4,Atalanta BC,Serie A,349.6
11,CA Osasuna,LaLiga,99.3
18,Sheffield United,Premier League,144.2
9,Kayserispor,Süper Lig,29.35
17,FC Metz,Ligue 1,45.7
4,VfB Stuttgart,Bundesliga,277.9
15,Le Havre AC,Ligue 1,61.5
9,West Ham United,Premier League,446.6
8,Borussia Mönchengladbach,Bundesliga,185.23


Uncomment the following line to save the data as csv.

In [15]:
# df.to_csv('transfermarkt.csv', index=False)