Toss-up: The Impact of Winning the Toss on The Probability of Winning
We analyze data from nearly 43,000 first-class men's cricket matches -- a near census of the relevant population. And we make a series of discoveries that upend some conventional wisdom, and understanding based on analysis of much smaller datasets --- in fact, one prominent previous study (pdf) basis its analysis on just about 1% of the data we have.
Match Level Data: We got our data from espncricinfo.com. We went about downloading and parsing the data a couple of different ways. Gaurav just scraped and parsed the HTML pages. Derek, clearly the sharper of the two, realized that espncricinfo also provides a nice json API and developed a python module.
Aware of the duplication of work, in this repository, we only provide scripts and data that aren't available elsewhere (except for the final dataset we use). These include, a script to download match ids, match ids by match type (json), a script for making the requests and parsing the requests using the json data, and output for ODI matches based on the script. However, the final dataset we use is the same as posted on Gaurav's repository.
Rankings Data: parse_rankings gets monthly rankings for ODIs from 1981-2013 and for tests from 1952-2013. ICC changed its site in 2014 so that it only shows the most recent rankings. The script outputs odi rankings and test rankings.
Analysis, Write-up, And Figures
Gaurav Sood and Derek Willis
Scripts, figures, and writing are released under CC BY 2.0.