Skip to content

[FBref] Optimize reading team match history by creating and reading from csv cache#533

Closed
Kalaweksh wants to merge 3 commits intoprobberechts:masterfrom
Kalaweksh:master
Closed

[FBref] Optimize reading team match history by creating and reading from csv cache#533
Kalaweksh wants to merge 3 commits intoprobberechts:masterfrom
Kalaweksh:master

Conversation

@Kalaweksh
Copy link
Copy Markdown
Contributor

Resolves

Changes

  • Modified read_team_match_stats to cache match history tables as csv files for faster reading.

  • Added force_cache parameter to read_team_match_stats for consistency with other methods.

Issues

  • Currently does not detect if league seasons in the cache are not complete e.g la liga season 20-21 only cached up to matchweek 24. This can result in the silent returning of tables with missing data to the user in certain edge cases.

-Modified read_team_match_stats to cache match history tables as csv files for faster reading.

-Added force_cache parameter to read_team_match_stats for consistency with other methods.
-Moved read csv cache code into separate method to declutter read_team_match_stats.

-All seasons that aren't the current season are dropped after reading from cache; we assume leagues and seasons in the cached df are complete.
@probberechts
Copy link
Copy Markdown
Owner

Thank you for taking the time to submit this pull request. I really appreciate your effort for improving the project. Yet, I do not believe that this is a good solution. It is too ad hoc (I prefer a general solution for all FBref endpoints), will make things too complicated and does not really fit in the scope of soccerdata.

I see that storing the full HTML page is inefficient, but I am not a huge fan of caching the preprocessed data. Soccerdata is meant to be able to download and parse the data. It is (at least for now) not intended as a database system. How to store the data and make it quickly accessible is out of scope.

Rather, I see value in compacting the HTML page before caching it. For example by only keeping the tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants