Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Scripts to scrape and analyze basketball data.

Scripts scrape, Sports Illustrated, etc. and analyze the resulting data.

Newest data collection scripts pull JSON (using the requests module) via the NBA API. For instance, this block of Python code saves data for the first game of the 2014-15 regular season:

import bb_tools as bb
gameid = '0021400001'
bb.write_game_json(gameid) # save to disk (not necessary, but faster)
box = bb.get_boxscore(gameid) # read from disk -- or web if not saved 
pbp = bb.get_pbp(gameid)
shots = bb.get_shots(gameid)

Data read/write paths are set in

The NBA's Game ID, 0021400001, is a 10-digit code: XXXYYGGGGG, where XXX refers to a season prefix, YY is the season year (e.g. 14 for 2014-15), and GGGGG refers to the game number (1-1230 for a full 30-team regular season).

Season prefixes are...

001 : Pre Season
002 : Regular Season
003 : All-Star
004 : Post Season

To save all data for the current season, run the following from the command line:


To save data from previous seasons, you can pass an entire list of games. For example, the list of regular season games from 1996-97 to 2014-15 is provided in the repo.

gamelist = bb.REPOHOME + '/data/csv/games_96-14.csv'


Older data collection scripts parse HTML (using modules re, BeautifulSoup, etc.) into CSV and include:

  1. : Given a range of dates & season code (e.g. 00214 = 2014-15 regular season), formulates a text list of all NBA games in the range. [REPLACED BY]

  2. : Given a list of games, queries and writes box score data for each one. Also can query to get moneyline odds for each game. [REPLACED BY]

  3. : Given a list of games, queries and writes play-by-play data for each one. [REPLACED BY]

  4. : Given a list of play-by-play data, convert certain events (shots attempted/made, fouls, turnovers) into point process data (i.e. list of binary outcomes over time [0 0 0 0 1 ...]) [REPLACED BY]


Analysis scripts apply machine learning techniques to the resulting data in Python, Matlab, and R. Some simple examples are shown in IPython Notebooks.


Scripts to collect and analyze basketball data






No releases published


No packages published