wser | open dataset

This project collects publicly available data for the Western States Endurance Run and formats it into {json:api} for ease of consumption.

Goals

The goal for this project is to provide

a raw, minimally processed dataset
a normalized relational dataset
cdn based access to both datasets
git/npm based access to both datasets
a sqlite seed of the normalized dataset
typescript types for the data in both datasets
request builders and schemas for both datasets for use with warp-drive.io

Raw Dataset

The raw dataset is the result of injesting various public sources and transforming it into well-structured {json:api} . This dataset stores each source in isolation. type+id information in the dataset is unique by given race year and data source.

The following data sources are currently available:

Important

In the url and filepath schemes below, replace {YYYY} with the desired year. E.g. 2013

1974 →

finishers
- source: https://www.wser.org/results/{YYYY}-results/
- output: ./data/raw/{YYYY}/finisher.json

Tip

Some early years had starters but no finishers, and some years the race includes folks who finished slightly after the 30hour mark in the results but without a place. There are also a few finishers without a listed age in this data.

2013 →

applicants
- source: https://www.wser.org/lottery{YYYY}.html
- output: ./data/raw/{YYYY}/applicant.json

Tip

Beginning in 2020 the race began assigning each applicant an ID. We are unsure yet if this is stable across years.

entrants
- source: https://www.wser.org/{YYYY}-entrants-list/
- output: ./data/raw/{YYYY}/entrant.json

Tip

The entrants list contains non-lottery entrant data as well as individuals who were selected from the waitlist. It does not represent fully the lottery outcome.

2014 →

splits
- XLSX files from https://www.wser.org/splits/
- source: https://www.wser.org/wp-content/uploads/stats/wser{YYYY}.xlsx
- output: ./data/raw/{YYYY}/split.json

2017 →

wait-list
- source: https://www.wser.org/{YYYY}-wait-list/
- output: ./data/raw/{YYYY}/waitlist.json

Tip

The waitlist in 2020 became the 2021 waitlist, But it can be useful for tracking who withdrew and did not rollover.

2024 →

live (lottery outcome)
- source: https://lottery.wser.org/
- output: ./data/raw/{YYYY}/live-lottery-results.json

Tip

The live dataset can only be collected the year of the given lottery. It can be useful for tracking the delta of who withdrew from the entrants list.

Contributing

Install bun from https://oven.sh
Install dependencies:

bun install

To run the script which ingests and processes the data as necessary

bun run ./index.ts

This will scrape publicly available data from https://wser.org and store it in data/raw/. We keep this under git versioning and only scrape data when we don't have an entry for it in the cache already: so unless looking to add data to a new year or working to add ingestion of data from new sources and earlier years this will likely do-nothing. Setting the ENV var FORCE_GENERATE=true will cause the files in data/raw to rebuild. Note: they will rebuild from the responses stored in .fetch-cache when possible, see below.

Additionally, we cache any successful raw fetch response that we scraped into .fetch-cache. This allows us to write tests, work offline, ensures access to the data in the future should the wser site change, and further reduces server load ensuring we don't accidentally put a site we love under undo strain.

When fetching a page to scrape data from, use the GET method to participate in the .fetch-cache.

To bypass the fetch cache, set the ENV var FORCE_FETCH=true.

Manual Data

For the occasional data with no other available source, we keep manually created json in files indata/manual.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.fetch-cache		.fetch-cache
.github		.github
.vscode		.vscode
assets		assets
data		data
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.js		.prettierrc.js
LICENSE.md		LICENSE.md
README.md		README.md
bun.lockb		bun.lockb
index.ts		index.ts
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wser | open dataset

Goals

Raw Dataset

1974 →

2013 →

2014 →

2017 →

2024 →

Contributing

Manual Data

About

Releases

Sponsor this project

Languages

License

runspired/wser-open-dataset

Folders and files

Latest commit

History

Repository files navigation

wser | open dataset

Goals

Raw Dataset

1974 →

2013 →

2014 →

2017 →

2024 →

Contributing

Manual Data

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Languages