# FBREF

https://fbref.com/ is a popular website for more advanced statistics. The nice thing about FBREF is that everything is just in tables so we can scrapeit really easily just using pandas. 

So while we'll only be scraping just a single table in this video, this method is applicable to any table on the website.

In [1]:
import pandas as pd

In [2]:
# We'll start off by scraping the data from the 2022/2023 Chamions League

df = pd.read_html('https://fbref.com/en/comps/8/2022-2023/2022-2023-Champions-League-Stats', attrs={'id': 'results2022-202380_overall'})[0]

In [3]:
df.head()

Unnamed: 0,Rk,Squad,MP,W,D,L,GF,GA,GD,Pts,xG,xGA,xGD,xGD/90,Attendance,Top Team Scorer,Goalkeeper,Notes
0,1,eng Manchester City,13.0,8.0,5.0,0.0,32.0,5.0,27.0,29.0,26.5,10.5,16.0,1.23,63639.0,Erling Haaland - 12,Ederson,
1,,,,,,,,,,,,,,,,,,
2,2,it Inter,13.0,7.0,3.0,3.0,19.0,11.0,8.0,24.0,17.9,15.1,2.8,0.22,71415.0,Edin Džeko - 4,André Onana,
3,,,,,,,,,,,,,,,,,,
4,SF,es Real Madrid,12.0,8.0,2.0,2.0,26.0,13.0,13.0,26.0,22.3,15.9,6.4,0.53,58761.0,Vinicius Júnior - 7,Thibaut Courtois,


In [4]:
# Fbref also uses a row for a separator, so we'll remove those rows
df = df.dropna(subset=['Rk'])

In [5]:
df.head()

Unnamed: 0,Rk,Squad,MP,W,D,L,GF,GA,GD,Pts,xG,xGA,xGD,xGD/90,Attendance,Top Team Scorer,Goalkeeper,Notes
0,1,eng Manchester City,13.0,8.0,5.0,0.0,32.0,5.0,27.0,29.0,26.5,10.5,16.0,1.23,63639.0,Erling Haaland - 12,Ederson,
2,2,it Inter,13.0,7.0,3.0,3.0,19.0,11.0,8.0,24.0,17.9,15.1,2.8,0.22,71415.0,Edin Džeko - 4,André Onana,
4,SF,es Real Madrid,12.0,8.0,2.0,2.0,26.0,13.0,13.0,26.0,22.3,15.9,6.4,0.53,58761.0,Vinicius Júnior - 7,Thibaut Courtois,
5,SF,it Milan,12.0,5.0,3.0,4.0,15.0,11.0,4.0,18.0,16.5,14.6,1.9,0.16,72546.0,Olivier Giroud - 5,Mike Maignan,
7,QF,de Bayern Munich,10.0,8.0,1.0,1.0,22.0,6.0,16.0,25.0,19.4,10.9,8.4,0.84,75000.0,"Leroy Sané, Eric Maxim Choupo-Moting - 4",Yann Sommer,


In [6]:
# Let's scrape one more table just for practice
# We'll scrape the Liverpool 2022/2023 stats from the premier league
df = pd.read_html('https://fbref.com/en/squads/822bd0ba/2022-2023/Liverpool-Stats', attrs={'id': 'stats_standard_9'})[0]

In [7]:
df.head()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Playing Time,Playing Time,Playing Time,Performance,Performance,...,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Per 90 Minutes,Unnamed: 33_level_0
Unnamed: 0_level_1,Player,Nation,Pos,Age,MP,Starts,Min,90s,Gls,Ast,...,Ast,G+A,G-PK,G+A-PK,xG,xAG,xG+xAG,npxG,npxG+xAG,Matches
0,Alisson,br BRA,GK,29.0,37,37,3330.0,37.0,0.0,1.0,...,0.03,0.03,0.0,0.03,0.0,0.01,0.01,0.0,0.01,Matches
1,Mohamed Salah,eg EGY,FW,30.0,38,37,3290.0,36.6,19.0,12.0,...,0.33,0.85,0.47,0.79,0.59,0.21,0.8,0.51,0.72,Matches
2,Trent Alexander-Arnold,eng ENG,DF,23.0,37,34,2923.0,32.5,2.0,9.0,...,0.28,0.34,0.06,0.34,0.07,0.35,0.43,0.07,0.43,Matches
3,Virgil van Dijk,nl NED,DF,31.0,32,32,2835.0,31.5,3.0,1.0,...,0.03,0.13,0.1,0.13,0.08,0.04,0.12,0.08,0.12,Matches
4,Fabinho,br BRA,MF,28.0,36,31,2671.0,29.7,0.0,2.0,...,0.07,0.07,0.0,0.07,0.02,0.07,0.1,0.02,0.1,Matches


In [8]:
# You'll also notice that the columns are multi-indexed, so we'll flatten the columns
df.columns = ['_'.join(col).strip() for col in df.columns.values]

In [9]:
df.head()

Unnamed: 0,Unnamed: 0_level_0_Player,Unnamed: 1_level_0_Nation,Unnamed: 2_level_0_Pos,Unnamed: 3_level_0_Age,Unnamed: 4_level_0_MP,Playing Time_Starts,Playing Time_Min,Playing Time_90s,Performance_Gls,Performance_Ast,...,Per 90 Minutes_Ast,Per 90 Minutes_G+A,Per 90 Minutes_G-PK,Per 90 Minutes_G+A-PK,Per 90 Minutes_xG,Per 90 Minutes_xAG,Per 90 Minutes_xG+xAG,Per 90 Minutes_npxG,Per 90 Minutes_npxG+xAG,Unnamed: 33_level_0_Matches
0,Alisson,br BRA,GK,29.0,37,37,3330.0,37.0,0.0,1.0,...,0.03,0.03,0.0,0.03,0.0,0.01,0.01,0.0,0.01,Matches
1,Mohamed Salah,eg EGY,FW,30.0,38,37,3290.0,36.6,19.0,12.0,...,0.33,0.85,0.47,0.79,0.59,0.21,0.8,0.51,0.72,Matches
2,Trent Alexander-Arnold,eng ENG,DF,23.0,37,34,2923.0,32.5,2.0,9.0,...,0.28,0.34,0.06,0.34,0.07,0.35,0.43,0.07,0.43,Matches
3,Virgil van Dijk,nl NED,DF,31.0,32,32,2835.0,31.5,3.0,1.0,...,0.03,0.13,0.1,0.13,0.08,0.04,0.12,0.08,0.12,Matches
4,Fabinho,br BRA,MF,28.0,36,31,2671.0,29.7,0.0,2.0,...,0.07,0.07,0.0,0.07,0.02,0.07,0.1,0.02,0.1,Matches


In [10]:
# You can remove some of the names if you want
df.columns = df.columns.str.replace('Unnamed: 0_level_0_', '')

In [11]:
df.head()

Unnamed: 0,Player,Unnamed: 1_level_0_Nation,Unnamed: 2_level_0_Pos,Unnamed: 3_level_0_Age,Unnamed: 4_level_0_MP,Playing Time_Starts,Playing Time_Min,Playing Time_90s,Performance_Gls,Performance_Ast,...,Per 90 Minutes_Ast,Per 90 Minutes_G+A,Per 90 Minutes_G-PK,Per 90 Minutes_G+A-PK,Per 90 Minutes_xG,Per 90 Minutes_xAG,Per 90 Minutes_xG+xAG,Per 90 Minutes_npxG,Per 90 Minutes_npxG+xAG,Unnamed: 33_level_0_Matches
0,Alisson,br BRA,GK,29.0,37,37,3330.0,37.0,0.0,1.0,...,0.03,0.03,0.0,0.03,0.0,0.01,0.01,0.0,0.01,Matches
1,Mohamed Salah,eg EGY,FW,30.0,38,37,3290.0,36.6,19.0,12.0,...,0.33,0.85,0.47,0.79,0.59,0.21,0.8,0.51,0.72,Matches
2,Trent Alexander-Arnold,eng ENG,DF,23.0,37,34,2923.0,32.5,2.0,9.0,...,0.28,0.34,0.06,0.34,0.07,0.35,0.43,0.07,0.43,Matches
3,Virgil van Dijk,nl NED,DF,31.0,32,32,2835.0,31.5,3.0,1.0,...,0.03,0.13,0.1,0.13,0.08,0.04,0.12,0.08,0.12,Matches
4,Fabinho,br BRA,MF,28.0,36,31,2671.0,29.7,0.0,2.0,...,0.07,0.07,0.0,0.07,0.02,0.07,0.1,0.02,0.1,Matches
