Skip to content

Motorsport data YAML files created using Python to scrape and parse from the Wikipedia API.

Notifications You must be signed in to change notification settings

imclab/motorsport-yaml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Motorsport data YAML's

Motorsport data YAML files created using Python to scrape and parse from the Wikipedia API.

CSV files

These were created using this Chrome scraping extension to find the proper XPaths for the relevant data from the following Wikipedia pages:

2013 Formula One season
2013 IndyCar Series season
2013 NASCAR Sprint Cup Series
2013 NASCAR Nationwide Series
2013 NASCAR Camping World Truck Series
2013 NHRA Mello Yello Drag Racing Series

After grabbing the appropriate Wikipeida URL's for the relevant race events, I then used them to generate the respective URL's from Wikipedia's MediaWiki API.

Python scripts

race_alias_tester.py
This is used to test the API URL's and to figure out the parsing rules to extract the race event aliases. This was a big pain in the ass since Wikipedia articles do not have uniform formatting.

racing_yaml_creator.py
This is the main script that reads the CSV files, parses for the race event aliases from the Wikipedia API, and then creates the corresponding YAML file. Because Wikipedia data is not uniform, some minor cleaning up of the YAML files after they were created was necessary.

aliases_csv2yml.py
This is the quick script that creates the YAML file from a CSV file that already contains the race event aliases, such as the case for Formula One.

About

Motorsport data YAML files created using Python to scrape and parse from the Wikipedia API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published