Skip to content

v3.0.0

Compare
Choose a tag to compare
@oseymour oseymour released this 17 Jun 22:58
· 17 commits to main since this release

Why the change?

This is a big update and it's not backwards compatible; some of you will have to rewrite small parts of your own code. I know this can be frustrating so I want to explain why I'm making these changes. If you're not interested in the "why", feel free to skip to the [[#Changelog]] below and see what the changes are!

A lot of the changes are non-codebase changes. Things I should have done from Day 1. Unit tests, CI pipelines for testing, docs, and builds, etc. Most of you won't care or see these unless you a) look for them or b) contribute code in the future.

The codebase changes fall into a few categories:

  1. Making it easier for me to maintain the code moving forward. The code got pretty messy and hard for me to take care of.
  2. Making the code run faster and more reliably.
  3. Making it easier for community members (you!) to contribute new code.

Changelog

Now the part you've all been waiting for.

Shared functions

  • Moved the ScraperFC exceptions into their own file.
  • Got rid of the overly-complicated function to check years and leagues, get_source_comp_info(). This was a function from very early on in ScraperFC. It was poor architecting and was too much of a pain in the a$$ to fix before this. Now, each module now has a comps dict in its .py file. Any checks to make sure year and league inputs are valid are done in the module functions.

FBref

  • Updated the capitalization, I finally realized the "r" is lowercase 🤦‍♂️.
  • FBref.close() has been removed. Only 1 function used the Selenium driver and that function has been updated to open, use, and then close the driver without the user needing to call close().
  • Added FBref.get_valid_seasons(). This returns the valid seasons for a given competition, scraped directly from the competition's history page on FBref.
  • The year argument is no longer an int. This is a byproduct of adding get_valid_seasons(). The year is now a str and needs to match the year as it appears on the competition's history page on FBref. This will require a lot of user code changes but makes it far easier to assert the year is valid. See the year parameter page on ReadTheDocs for more details.
  • FBref.scrape_league_table() now returns all tables from the season's league table page. The first table should be the league table and then any tables after that vary by competition.

Understat

  • No longer need to call Understat.close(). The Understat module doesn't even need Selenium anymore! They embed a lot of the raw data as JSON in JS scripts right in the HTML.
  • As a result of getting the data in a different format, a lot of the functions have changed functionality or been deprecated in favor of new functions. Please read the ReadTheDocs page for this module.
  • Added Understat.get_valid_seasons().
  • The year argument is a string now. Write the year as it appears in the season dropdown on the Understat website. See the year parameter page on ReadTheDocs for more details.

Sofascore

  • I switched from requests to the Botasaurus library. Requests was no longer returning accurate data but using Botasaurus fixes this.
  • I renamed a lot of the functions to more closely match the naming convention of the rest of the modules.
  • Just about the only complaint I ever heard about this module was that it wasn't automated enough; a lot of the functions required a match link as input but there was no way to get all of the match URLs for a given season. So....
    • I've added a function to return basic info for all of the matches, Sofascore.get_match_dicts().
    • You can use the match IDs in the output of this function as input to a lot of the other functions because they now take match URLs or match IDs as inputs. Match URLs must be strings, match IDs must be ints.

Transfermarkt

  • Removed Transfermarkt.close(). The Transfermarkt module now uses cloudscraper instead of a Selenium driver.
  • Added Transfermarkt.get_valid_seasons()
  • year argument is a string now. Enter the string as it appears in the competition's season dropdown on the Transfermarkt website. See the year parameter page on ReadTheDocs for more details.

Capology

  • No longer need to call Capology.close(). Driver will be closed on its own when scraping is done.
  • Added Capology.get_valid_seasons().
  • The year argument is a string now. Write the year as it appears in the season dropdown on the Capology website. See the year parameter page on ReadTheDocs for more details.
  • Removed Capology.scrape_payrolls(). It ended up doing the same thing as Capology.scrape_salaries().

ClubELO

  • Minor changes to how invalid team names are detected. Shouldn't impact anything.

FiveThirtyEight

  • No longer need to call FiveThirtyEight.close(). Driver will be closed on its own when scraping is done.

"Behind the Scenes"

  • Unit tests
    • Uses pytest and pytest-cov
    • These are in the test folder at the root of the GitHub repository.
    • There's a test file for each ScraperFC module.
  • Python packaging tooling changes
    • tox: I've created tox environments for running the unit tests, building the docs, and building the package.
    • GitHub Actions:
      • Every push now automatically runs the test suite and does a test build of the docs.
      • Tagged commits will trigger a workflow to build from that commit and upload to PyPI.
  • I've updated the layout of the documentation on Read the Docs.
  • I've updated the examples in Examples.ipynb in the GitHub repo to reflect all of the changes introduced in ScraperFC 3.0.