## Historical NBA Playoff Series Information

This notebook demonstrates how to build a data set of historical NBA playoff match ups, using box scores from [stats.nba.com](http://stats.nba.com/).

In [1]:
import numpy as np
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999
pd.options.display.float_format = '{:.3f}'.format

We will use the [`pracnbastats`](https://pypi.org/project/pracnbastats/) package to scrape [stats.nba.com](http://stats.nba.com/). The `playoffs` module in this package also contains functions and classes that make it easy to represent and analyze NBA playoffs information.

You can install this package in your sports analytics Python environment by executing `pip install pracnbastats` in Terminal (on Mac or Linux computers) or at the Anaconda Prompt (on Windows computers).

In [2]:
import pracnbastats as nba

In [3]:
from pracpred.utils import printsource

In [4]:
from pathlib import Path

This code assumes the existence of a directory to hold scraped NBA data. You can create and name this directory however you want, and adjust the code in the cell below to suit your preferences. If you've previously scraped the data, the `pracnbastats` library can find it and avoid re-scraping. You just need to specify the location of the previously scraped data using the `store` object defined below.

In [5]:
PROJECT_DIR = Path.cwd().parent
DATA_DIR = PROJECT_DIR / 'data'
STATS_DIR = DATA_DIR / 'stats-nba-com'
OUTPUT_DIR = DATA_DIR / 'prepared'

To scrape data from [stats.nba.com](http://stats.nba.com/), you need to specify a user agent. Below is the user agent I used. You can find your own user agent by searching for "my user agent" in Google.

In [6]:
USER_AGENT = (
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) '
    'AppleWebKit/537.36 (KHTML, like Gecko) '
    'Chrome/66.0.3359.139 Safari/537.36'
)

In [7]:
session = nba.scrape.NBASession(user_agent=USER_AGENT)

In [8]:
store = nba.store.FlatFiles.CSV(path=STATS_DIR)

In [9]:
scraper = nba.scrape.NBAScraper(session=session, store=store)

### Historical Playoff Match Ups

I created the `playoffs` module in the `pracnbastats` package to make it easy to analyze historical NBA playoff match ups. You can see the source code [here](https://github.com/practicallypredictable/pracnbastats/tree/master/pracnbastats). Let's see how to use this module.

We can create a `SeriesBoxScores` object to get all of the playoff team box scores for a particular season. Let's get all of the post-season box scores for the 2016-17 season.

In [10]:
post2016 = nba.playoffs.SeriesBoxScores(scraper=scraper, season=nba.params.Season(start_year=2016))

This `SeriesBoxScores` object has a lot of useful information. It has the scores for each post-season match up.

In [11]:
len(post2016.box_scores.matchups)

79

In [12]:
post2016.box_scores.matchups.head()

Unnamed: 0,game_id,season,season_type,date,video,team_id_h,team_abbr_h,pts_h,win_loss_h,team_id_r,team_abbr_r,pts_r,win_loss_r,hr_winner,winner,loser,mov
0,41600405,2016,post,2017-06-12,Y,1610612744,GSW,129,W,1610612739,CLE,120,L,H,GSW,CLE,9
1,41600404,2016,post,2017-06-09,Y,1610612739,CLE,137,W,1610612744,GSW,116,L,H,CLE,GSW,21
2,41600403,2016,post,2017-06-07,Y,1610612739,CLE,113,L,1610612744,GSW,118,W,R,GSW,CLE,5
3,41600402,2016,post,2017-06-04,Y,1610612744,GSW,132,W,1610612739,CLE,113,L,H,GSW,CLE,19
4,41600401,2016,post,2017-06-01,Y,1610612744,GSW,113,W,1610612739,CLE,91,L,H,GSW,CLE,22


It has each team's box score for every post-season game.

In [13]:
len(post2016.box_scores.data)

158

There are 158 box scores, 2 for each of the 79 post-season games in the 2016-17 season.

In [14]:
post2016.box_scores.data.head()

Unnamed: 0,season,season_type,team_id,team_abbr,game_id,date,opp_team_abbr,home_road,win_loss,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,reb,ast,stl,blk,tov,pf,pts,mov,video
0,2016,post,1610612739,CLE,41600405,2017-06-12,GSW,R,L,240,47,88,0.534,11,24,0.458,15,23,0.652,12,28,40,22,6,5,15,22,120,-9,Y
1,2016,post,1610612744,GSW,41600405,2017-06-12,CLE,H,W,240,46,90,0.511,14,38,0.368,23,28,0.821,13,29,42,27,8,2,13,24,129,9,Y
2,2016,post,1610612744,GSW,41600404,2017-06-09,CLE,R,L,240,39,87,0.448,11,39,0.282,27,36,0.75,16,24,40,26,5,6,12,27,116,-21,Y
3,2016,post,1610612739,CLE,41600404,2017-06-09,GSW,H,W,240,46,87,0.529,24,45,0.533,21,31,0.677,11,30,41,27,6,3,11,24,137,21,Y
4,2016,post,1610612744,GSW,41600403,2017-06-07,CLE,R,W,240,40,83,0.482,16,33,0.485,22,24,0.917,8,36,44,29,8,4,18,28,118,5,Y


Most importantly, this object understands the NBA playoff series structure. Let's take a look.

In [15]:
post2016.data

Unnamed: 0,season,playoff_round,best_of,games_played,series_hca,series_non_hca,series_winner,game_home_teams,game_winners,game_ids
0,2016,1,7,4,CLE,IND,CLE,"CLE,CLE,IND,IND","CLE,CLE,CLE,CLE",41600111416001124160011341600114
1,2016,1,7,7,LAC,UTA,UTA,"LAC,LAC,UTA,UTA,LAC,UTA,LAC","UTA,LAC,LAC,UTA,UTA,LAC,UTA","41600171,41600172,41600173,41600174,41600175,4..."
2,2016,1,7,6,SAS,MEM,SAS,"SAS,SAS,MEM,MEM,SAS,MEM","SAS,SAS,MEM,MEM,SAS,SAS","41600151,41600152,41600153,41600154,41600155,4..."
3,2016,1,7,6,TOR,MIL,TOR,"TOR,TOR,MIL,MIL,TOR,MIL","MIL,TOR,MIL,TOR,TOR,TOR","41600121,41600122,41600123,41600124,41600125,4..."
4,2016,1,7,6,WAS,ATL,WAS,"WAS,WAS,ATL,ATL,WAS,ATL","WAS,WAS,ATL,ATL,WAS,WAS","41600131,41600132,41600133,41600134,41600135,4..."
5,2016,1,7,6,BOS,CHI,BOS,"BOS,BOS,CHI,CHI,BOS,CHI","CHI,CHI,BOS,BOS,BOS,BOS","41600101,41600102,41600103,41600104,41600105,4..."
6,2016,1,7,4,GSW,POR,GSW,"GSW,GSW,POR,POR","GSW,GSW,GSW,GSW",41600141416001424160014341600144
7,2016,1,7,5,HOU,OKC,HOU,"HOU,HOU,OKC,OKC,HOU","HOU,HOU,OKC,HOU,HOU",4160016141600162416001634160016441600165
8,2016,2,7,7,BOS,WAS,BOS,"BOS,BOS,WAS,WAS,BOS,WAS,BOS","BOS,BOS,WAS,WAS,BOS,WAS,BOS","41600201,41600202,41600203,41600204,41600205,4..."
9,2016,2,7,4,CLE,TOR,CLE,"CLE,CLE,TOR,TOR","CLE,CLE,CLE,CLE",41600211416002124160021341600214


The above `DataFrame` tells us which teams played in each series, which round the series was, where each game was played, and which team won which game. 

To make it easier to process NBA playoff information, this `SeriesBoxScores` object has various [methods](https://realpython.com/instance-class-and-static-methods-demystified/). For example, let's look at the conference semi-finals for the 2016-17 season.

In [16]:
semis = list(post2016.conf_semis())
semis

[PlayoffSeries(season=2016, playoff_round=2, best_of=7, games_played=7, series_hca='BOS', series_non_hca='WAS', series_winner='BOS', game_home_teams='BOS,BOS,WAS,WAS,BOS,WAS,BOS', game_winners='BOS,BOS,WAS,WAS,BOS,WAS,BOS', game_ids='41600201,41600202,41600203,41600204,41600205,41600206,41600207'),
 PlayoffSeries(season=2016, playoff_round=2, best_of=7, games_played=4, series_hca='CLE', series_non_hca='TOR', series_winner='CLE', game_home_teams='CLE,CLE,TOR,TOR', game_winners='CLE,CLE,CLE,CLE', game_ids='41600211,41600212,41600213,41600214'),
 PlayoffSeries(season=2016, playoff_round=2, best_of=7, games_played=6, series_hca='SAS', series_non_hca='HOU', series_winner='SAS', game_home_teams='SAS,SAS,HOU,HOU,SAS,HOU', game_winners='HOU,SAS,SAS,HOU,SAS,SAS', game_ids='41600231,41600232,41600233,41600234,41600235,41600236'),
 PlayoffSeries(season=2016, playoff_round=2, best_of=7, games_played=4, series_hca='GSW', series_non_hca='UTA', series_winner='GSW', game_home_teams='GSW,GSW,UTA,UTA', 

For each of the 4 conference semi-finals series, we know how many games were played, which team had series home court advantage, which team won which game, and which team won the playoff series. We also have the [stats.nba.com](http://stats.nba.com/) game IDs so we can get the underlying detailed game information if we want.

This information is returned in the form of a Python [generator](https://realpython.com/introduction-to-python-generators/). Passing the generator to the `list()` object creates a `list` that we can inspect in Jupyter notebook. In your Python code, you don't actually need to convert the generator to a list to iterate over it. You can learn more about generators [here](https://dbader.org/blog/python-generator-expressions).

Notice that the `PlayoffSeries` objects are [`tuples`](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences). (In fact, they are instances of a custom Python class inheriting from the [`namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple) class in the Python standard library [`collections`](https://docs.python.org/3/library/collections.html) module.)

Let's examine the `PlayoffSeries` object corresponding to the Celtics/Wizards conference semi-finals series in more detail. First, we can see that Boston had series home court advantage.

In [17]:
series = semis[0]
series.series_hca

'BOS'

The tuple also tells us the abbreviation of the team playing at home for each game in the series. It would be cumbersome to have to work with the actual team abbreviations (which change for every series). We can get the home court information in a more useful format as follows:

In [18]:
series.all_home_teams

[TeamType.SHCA,
 TeamType.SHCA,
 TeamType.OTHER,
 TeamType.OTHER,
 TeamType.SHCA,
 TeamType.OTHER,
 TeamType.SHCA]

This tells us that the series follows the 2-2-1-1-1 format for playoff game location. The `TeamType` class inherits from the `Enum` class defined in the Python standard library [`enum`](https://docs.python.org/3/library/enum.html) module. The `SHCA` member represents the team with series home court advantage, while `OTHER` represents the team without series home court advantage.

Now let's get information on the series outcome.

In [19]:
series.outcome

SeriesOutcome('1122121')

The `SeriesOutcome` object tells us which team won which game. The string with `'1'` and `'2'` characters represents the outcomes of each individual game in the series. Team 1 is the team with series home court advantage, and Team 2 is the other team.

We can obtain the game outcomes in a more useful format (using the same `TeamType` class that we saw for the home teams above) as follows:

In [20]:
series.outcome.game_winners

[TeamType.SHCA,
 TeamType.SHCA,
 TeamType.OTHER,
 TeamType.OTHER,
 TeamType.SHCA,
 TeamType.OTHER,
 TeamType.SHCA]

 This object has methods to tell us how many games were played in the series:

In [21]:
series.outcome.games_played

7

as well as which team won the playoff series:

In [22]:
series.outcome.winner

TeamType.SHCA

We can get the game IDs as well:

In [23]:
series.all_game_ids

[41600201, 41600202, 41600203, 41600204, 41600205, 41600206, 41600207]

Using the game IDs, we can get detailed game information. For example, if we want to know the scores and margin of victory, we can get them as follows:

In [24]:
games = post2016.box_scores.matchups['game_id'].isin(series.all_game_ids)
post2016.box_scores.matchups[games]

Unnamed: 0,game_id,season,season_type,date,video,team_id_h,team_abbr_h,pts_h,win_loss_h,team_id_r,team_abbr_r,pts_r,win_loss_r,hr_winner,winner,loser,mov
13,41600207,2016,post,2017-05-15,Y,1610612738,BOS,115,W,1610612764,WAS,105,L,H,BOS,WAS,10
15,41600206,2016,post,2017-05-12,Y,1610612764,WAS,92,W,1610612738,BOS,91,L,H,WAS,BOS,1
17,41600205,2016,post,2017-05-10,Y,1610612738,BOS,123,W,1610612764,WAS,101,L,H,BOS,WAS,22
21,41600204,2016,post,2017-05-07,Y,1610612764,WAS,121,W,1610612738,BOS,102,L,H,WAS,BOS,19
26,41600203,2016,post,2017-05-04,Y,1610612764,WAS,116,W,1610612738,BOS,89,L,H,WAS,BOS,27
30,41600202,2016,post,2017-05-02,Y,1610612738,BOS,129,W,1610612764,WAS,119,L,H,BOS,WAS,10
35,41600201,2016,post,2017-04-30,Y,1610612738,BOS,123,W,1610612764,WAS,111,L,H,BOS,WAS,12


We can also get the team box scores for each game in the series.

In [25]:
games = post2016.box_scores.data['game_id'].isin(series.all_game_ids)
post2016.box_scores.data[games]

Unnamed: 0,season,season_type,team_id,team_abbr,game_id,date,opp_team_abbr,home_road,win_loss,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,reb,ast,stl,blk,tov,pf,pts,mov,video
26,2016,post,1610612738,BOS,41600207,2017-05-15,WAS,H,W,240,42,79,0.532,11,24,0.458,20,25,0.8,4,27,31,27,7,4,8,24,115,10,Y
27,2016,post,1610612764,WAS,41600207,2017-05-15,BOS,R,L,240,36,80,0.45,10,29,0.345,23,29,0.793,10,33,43,18,6,4,15,17,105,-10,Y
30,2016,post,1610612738,BOS,41600206,2017-05-12,WAS,R,L,240,32,79,0.405,11,35,0.314,16,18,0.889,7,30,37,25,5,3,11,19,91,-1,Y
31,2016,post,1610612764,WAS,41600206,2017-05-12,BOS,H,W,240,37,86,0.43,5,24,0.208,13,21,0.619,11,35,46,20,4,7,12,24,92,1,Y
34,2016,post,1610612738,BOS,41600205,2017-05-10,WAS,H,W,240,46,87,0.529,16,33,0.485,15,21,0.714,9,39,48,33,7,8,12,25,123,22,Y
35,2016,post,1610612764,WAS,41600205,2017-05-10,BOS,R,L,240,35,91,0.385,7,29,0.241,24,29,0.828,16,29,45,21,5,5,13,18,101,-22,Y
42,2016,post,1610612764,WAS,41600204,2017-05-07,BOS,H,W,240,44,84,0.524,9,21,0.429,24,27,0.889,12,33,45,29,13,3,20,22,121,19,Y
43,2016,post,1610612738,BOS,41600204,2017-05-07,WAS,R,L,240,35,79,0.443,14,31,0.452,18,24,0.75,9,22,31,28,13,6,20,25,102,-19,Y
52,2016,post,1610612738,BOS,41600203,2017-05-04,WAS,R,L,240,27,77,0.351,10,32,0.313,25,33,0.758,8,30,38,20,4,2,16,26,89,-27,Y
53,2016,post,1610612764,WAS,41600203,2017-05-04,BOS,H,W,240,43,92,0.467,8,25,0.32,22,25,0.88,13,37,50,26,7,5,10,26,116,27,Y
