Skip to content

owena-b/aueagles-rosters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

American University Athletics rosters

Python programs to scrape data tables from AU Athletics' website containing the rosters for its various NCAA Division I teams. Aimed at streamlining data collection for sports journalists.

It formats the data into a CSV for a single team in which each line is one player. The code also fixes abbreviations and height formatting, and it separates the hometown, state/country, and high school into their own columns.

Teams (and corresponding CSV files):
Men's Basketball (CSV)
Women's Basketball (CSV)
Men's Soccer (CSV)
Women's Soccer (CSV)
Men's Cross Country (CSV)
Women's Cross Country (CSV)
Men's Track and Field (CSV)
Women's Track and Field (CSV)
Men's Swimming and Diving (CSV)
Women's Swimming and Diving (CSV)
Field Hockey (CSV)
Lacrosse (CSV)
Volleyball (CSV)
Wrestling (CSV)

How to use these files:
Click the CSV link for the team you want data on. On the right of the screen, below "History," you can download the raw file. It will download a CSV (comma-separated values) file, which you can import into the spreadsheet software of your choosing. In Google Sheets, you can import the CSV into a new or existing spreadsheet. Remember, always double-check your data!

A little about the methodology (for snooping data editors):
Originally, I wrote several different Python programs, one for each sport. This helped me find the differences in each sport's web table, but I quickly realized it was a lot of repeated code. So, I took an object-oriented approach. I created a single Scraper class that was compatible with every sport, and relied on a number of if/else statements to correct for the differences in each table.

main.py simply creates Scraper-class objects for each sport then calls the scrape() method on each object. oop_scrape.py houses the Scraper class and a bunch of dictionaries that allow it to work for all the sports. fixes.py contains dictionaries that correct abbreviations and misspellings in the source data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages