Skip to content

Latest commit

 

History

History
53 lines (34 loc) · 2.48 KB

README.md

File metadata and controls

53 lines (34 loc) · 2.48 KB

RetrosheetUmpires

Extract umpire information from Retrosheet Event and Box Score Event files

Tested with Python 3.10.8 on Windows.

These files are licensed by a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license: https://creativecommons.org/licenses/by-nc/4.0/

References:

https://www.retrosheet.org/game.htm (to download Event Files and Box Score Event Files, ballparks.zip, and teams.zip)

https://www.retrosheet.org/eventfile.htm (explanation of the Event File format)

https://www.retrosheet.org/biofile.htm (to download biofile.zip)

Requirements:

  1. Download ballparks.zip, biofile.zip, and teams.zip from retrosheet.org; unzip all of these into a subfolder named "ids".

  2. Download and unzip one or more Event Files (.evx) into a subfolder named "evx".

    AND/OR

  3. Download and unzip one or more Box Score Event Files (.ebx) into a subfolder named "ebx".

At least one .evx or .ebx file is required.

Note that some games are not included in the Event Files due to lack of information. All games prior to 1950 are included in Box Score Event Files even if the game is also included in an Event File; this script assumes that the information in the Event Files is more complete/accurate, and uses that information as the primary data source for each game.

Example output files for the regular season from 1900-1979 are included, based on data files downloaded from Retrosheet on December 20, 2023. The data files were released by Retrosheet as part of their "Fall 2023 Release" on December 6, 2023: https://www.retrosheet.org/fall2023release.html (Note that this release remapped the "CLE" team abbreviation in teams.csv from Cleveland Indians to Cleveland Guardians. I did not update umpires.py to compensate for this, so the Guardians name now appears in the generated .csv files for all years from 1901 to the present. This is similar to how the Boston Red Sox name is used for the early 1900's. Maybe I will fix these problems in a future release.)

Additional examples for All-Star Games and Postseason games are also included. These include Negro League games.

  • The .csv files were generated by umpires.py
  • The .xlsx files were manually generated using Microsoft Excel, using the corresponding .csv files as a starting point.

Copyright notice for the Retrosheet data used to generate the .csv files:

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.