Skip to content

A {json:api} dataset of lottery and entrant data for the Western States Endurance Run

License

Notifications You must be signed in to change notification settings

runspired/wser-open-dataset

Repository files navigation

WSER WSER


wser | open dataset

This project collects publicly available data for the Western States Endurance Run and formats it into {json:api} for ease of consumption.

Goals

The goal for this project is to provide

  • a raw, minimally processed dataset
  • a normalized relational dataset
  • cdn based access to both datasets
  • git/npm based access to both datasets
  • a sqlite seed of the normalized dataset
  • typescript types for the data in both datasets
  • request builders and schemas for both datasets for use with warp-drive.io

Raw Dataset

The raw dataset is the result of injesting various public sources and transforming it into well-structured {json:api} . This dataset stores each source in isolation. type+id information in the dataset is unique by given race year and data source.

The following data sources are currently available:

Important

In the url and filepath schemes below, replace {YYYY} with the desired year. E.g. 2013

1974 →

  • finishers
    • source: https://www.wser.org/results/{YYYY}-results/
    • output: ./data/raw/{YYYY}/finisher.json

Tip

Some early years had starters but no finishers, and some years the race includes folks who finished slightly after the 30hour mark in the results but without a place. There are also a few finishers without a listed age in this data.

2013 →

  • applicants
    • source: https://www.wser.org/lottery{YYYY}.html
    • output: ./data/raw/{YYYY}/applicant.json

Tip

Beginning in 2020 the race began assigning each applicant an ID. We are unsure yet if this is stable across years.

  • entrants
    • source: https://www.wser.org/{YYYY}-entrants-list/
    • output: ./data/raw/{YYYY}/entrant.json

Tip

The entrants list contains non-lottery entrant data as well as individuals who were selected from the waitlist. It does not represent fully the lottery outcome.

2014 →

  • splits
    • XLSX files from https://www.wser.org/splits/
    • source: https://www.wser.org/wp-content/uploads/stats/wser{YYYY}.xlsx
    • output: ./data/raw/{YYYY}/split.json

2017 →

  • wait-list
    • source: https://www.wser.org/{YYYY}-wait-list/
    • output: ./data/raw/{YYYY}/waitlist.json

Tip

The waitlist in 2020 became the 2021 waitlist, But it can be useful for tracking who withdrew and did not rollover.

2024 →

  • live (lottery outcome)
    • source: https://lottery.wser.org/
    • output: ./data/raw/{YYYY}/live-lottery-results.json

Tip

The live dataset can only be collected the year of the given lottery. It can be useful for tracking the delta of who withdrew from the entrants list.

Contributing

  1. Install bun from https://oven.sh

  2. Install dependencies:

bun install

To run the script which ingests and processes the data as necessary

bun run ./index.ts

This will scrape publicly available data from https://wser.org and store it in data/raw/. We keep this under git versioning and only scrape data when we don't have an entry for it in the cache already: so unless looking to add data to a new year or working to add ingestion of data from new sources and earlier years this will likely do-nothing. Setting the ENV var FORCE_GENERATE=true will cause the files in data/raw to rebuild. Note: they will rebuild from the responses stored in .fetch-cache when possible, see below.

Additionally, we cache any successful raw fetch response that we scraped into .fetch-cache. This allows us to write tests, work offline, ensures access to the data in the future should the wser site change, and further reduces server load ensuring we don't accidentally put a site we love under undo strain.

When fetching a page to scrape data from, use the GET method to participate in the .fetch-cache.

To bypass the fetch cache, set the ENV var FORCE_FETCH=true.

Manual Data

For the occasional data with no other available source, we keep manually created json in files indata/manual.

About

A {json:api} dataset of lottery and entrant data for the Western States Endurance Run

Resources

License

Stars

Watchers

Forks

Releases

No releases published