Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
out
race-data-bri
race-data-pcs
README.md
byRider.json
countries.txt
package.json
paris-roubaix-scraper-bri.js
paris-roubaix-scraper-pcs.js
parisRoubaix-full.json
parisRoubaix-fullv2.json
race-data-mash.js
race-downloads-bri.js
race-downloads-pcs.js
racerPoints.js
races-bri.json
races-pcs.json
races.txt
server.js
starters.csv
top10percent.json

README.md

Paris-Roubaix Data Sets

These data files were gathered and compiled in several steps:

First, all of the relevant html pages are downloaded from their respective sources, to avoid overwhelming servers with requests. (race-downloads-bri.js and race-downloads-pcs.js)

The html files are scraped by paris-roubaix-scraper-bri and paris-roubaix-scraper-pcs, which output races-bri.json and races-pcs.json.

These two files are combined by race-data-mash.js, which performs name lookups to improve formatting, matches countries to racers when available, calculates speed and compiles into parisRoubaix-fullv2.json. In a few cases, speed and time are estimated based on rank due to incomplete records, in which case the attribute est is marked true.

From parisRoubaix-fullv2, racerPoints.js compiles byRider.json, an associative array of every racer known to have participated in Paris-Roubaix, the years they participated, their rank and the points they would have gotten under the current UCI ranking system (http://www.uci.ch/mm/Document/News/Rulesandregulation/17/73/59/2-ROA-20161108-E_English.PDF Page 60), tracked individually and cumulatively. The top 10% of riders under this scoring system are output to top10percent.json.

None of these data sets would exist without the incredible and complete resources on which they are based at ProCyclingStats.com, BikeRaceInfo.com, and www.letour.com/paris-roubaix/2016/us/.