Skip to content

Big 5 league player stats for fbref#69

Closed
andrewRowlinson wants to merge 3 commits intoprobberechts:masterfrom
andrewRowlinson:master
Closed

Big 5 league player stats for fbref#69
andrewRowlinson wants to merge 3 commits intoprobberechts:masterfrom
andrewRowlinson:master

Conversation

@andrewRowlinson
Copy link
Copy Markdown
Contributor

Added read_big5_season_stats to rbref.py for efficiently reading the data from the big five leagues (England, Italy, France, Germany, Spain).

Added read_big5_season_stats to rbref.py for efficiently reading the data from the big five leagues (England, Italy, France, Germany, Spain).
@andrewRowlinson
Copy link
Copy Markdown
Contributor Author

Fbref also has pages for the big five leagues that allow you to more efficiently get player data when you want multiple leagues. I added a method here to get this data, but it doesn't fit neatly into the existing class as it ignores the leagues attribute. I have tried to keep the interface and results similar to the other methods.

@probberechts
Copy link
Copy Markdown
Owner

Thanks for your PR! The fbref.read_team_season_stats method is indeed inefficient as it visits the page of each individual team in a league. I've noticed that FBRef now has a single page for each league/season where these stats can be obtained (e.g., https://fbref.com/en/comps/9/stats/Premier-League-Stats). Using that page to obtain the data would already reduce the number of requests by a factor 15-20x and it works for each league.

Additionally, you could then use the page for the top-5 leagues if the user requested data from (multiple) of the top-5 leagues, but the benefit would be more limited. This should not be a separate (public) function though. It should be integrated in the fbref.read_team_season_stats function and the selection of the best source page to obtain the data from should happen transparently for the user.

@andrewRowlinson
Copy link
Copy Markdown
Contributor Author

I've noticed that FBRef now has a single page for each league/season where these stats can be obtained (e.g., https://fbref.com/en/comps/9/stats/Premier-League-Stats). Using that page to obtain the data would already reduce the number of requests by a factor 15-20x and it works for each league.

Unfortunately, the player stats for these new pages (e.g. https://fbref.com/en/comps/9/stats/Premier-League-Stats) wouldn't be loaded by the existing functions, as it only currently loads the top table containing squad/ opponent stats and not the player statistics underneath. I think you can get around this using Selenium to load the whole page, unless I am missing a simpler way?

Additionally, you could then use the page for the top-5 leagues if the user requested data from (multiple) of the top-5 leagues, but the benefit would be more limited. This should not be a separate (public) function though. It should be integrated in the fbref.read_team_season_stats function and the selection of the best source page to obtain the data from should happen transparently for the user.

I have amended the fbref.read_team_season_stats to use the Big-5 league data. It is significantly faster, however, the disadvantage is that you lose the aggregated team and opponent statistics and it also misses the players who have not played any minutes.

@probberechts
Copy link
Copy Markdown
Owner

Unfortunately, the player stats for these new pages (e.g. https://fbref.com/en/comps/9/stats/Premier-League-Stats) wouldn't be loaded by the existing functions, as it only currently loads the top table containing squad/ opponent stats and not the player statistics underneath. I think you can get around this using Selenium to load the whole page, unless I am missing a simpler way?

I've only quickly looked at this in the browser, but it seems that the tables are actually there with all the data. They are just commented out in the HTML. Some javascript then makes them visible. I think a simple html.replace('<!--', '') should do the trick.

probberechts added a commit that referenced this pull request Sep 27, 2022
FBRef now has a single page for each league/season where player stats can be
obtained for each player in the league (e.g.,
https://fbref.com/en/comps/9/stats/Premier-League-Stats). Therefore, it is no
longer required to visit the page of each individual team in a league. The
fbref.read_team_season_stats method now uses 15-20x less requests, leading to a
large speed-up.

See also #69

Co-authored-by: Andrew Rowlinson <rowlinsonandy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants