Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster scraping of player seasons stats - fbref. #85

Merged
merged 16 commits into from Oct 23, 2022

Conversation

andrewRowlinson
Copy link
Contributor

I am having another go at this [previous attempt #69] because you have updated the FBRef class to use the league pages. I have tried this against all stat_types for 2020-2021 and it seems to work

  • Amended the FBRef scraper so it uses the Big 5 pages if all five leagues are requested.
  • Added some type checks for the stats_type argument.

I am not able to run the tests locally, but I'll try to fix anything that doesn't work after.

@andrewRowlinson
Copy link
Contributor Author

read_player_season_stats is failing the checks as the logic is too complex. I have added conditional statements to change the url and read_html when using the big 5 leagues data. I am not sure what to do, as breaking up the method would likely replicate code.

@probberechts
Copy link
Owner

One remaining issue is that the table headers are not always consistent between the Big5 leagues and individual leagues, which is annoying if you want to merge tables. For example, GA and PKA has no category in the 2020/21 Ligue 1, but is grouped under Goals in the Big 5 leagues.

@andrewRowlinson
Copy link
Contributor Author

Thanks for working on this. I really want to use this for my project, but it takes too long scraping each of the leagues individually currently.

@probberechts probberechts merged commit 2ce4956 into probberechts:master Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants