CS74: Predicting basketball statistics
- MATLAB R2014a
- scrapy
- Run
pip install scrapyto install
- Run
- All data is contained in the
datadirectory. The data was all collected from basketball-reference.com. We used total and per-possession data from the 1979-80 season, when the three-point line was introduced, to the present.column_headersgives a short description of each column of information. - To scrape your own data, you must first set up the python environment by running in the home directory
source env/bin/activate. Then runscrapy crawl curryin thescraper/scraperdirectory to crawl basketball-reference.com for data.
- To run pre-selected MARS regressions, call
run_marsfrom themarsdirectory. This script tests the MARS algorithm on per-possession data to predict two-point percentage, assists, total rebounds, and points for various positions. - Call
interfacein themarsdirectory and follow the given directions to test MARS on parameters of your choosing.
- To see how GBDT works, call
run_gbdtfrom thegbdtdirectory. This script will run the GBDT algorithm on per-possession data and predict two-point percentage.
- All files in the
envdirectory are from virtualenv. All files in thescraperdirectory, with the exception of items.py and bballspider.py, are from scrapy.