# NBA Bubble Shooting

If you've been paying any attention to the NBA, you've probably heard many analysts (including friend and family - we're all analysts really) comment on the increase in offensive play we've recently seen. Now, I'm not talking about rule changes like hand checking and resetting the shot clock to 14s, I'm talking about the circumstances NBA players experience in bubble life.

There are many factors that go into this, some social (eg. players don't have the opportunity to go out at night and are spending more time together than ever before), some logistical (eg. no travel, improved diet). However, one thing we keep hearing is how the rim is much easier because of improved depth perception on the bubble courts. Let's see if teams are really shooting better in the bubble.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings(action='once')


In [2]:
#!pip install xlrd

## Getting the Data

All the monthly team data used below was taken directly from stats.nba.com, including the forumlas for computing advanced metrics.

**Source:** https://stats.nba.com/teams/traditional/?sort=W_PCT&dir=-1&Season=2019-20&SeasonType=Regular%20Season&Month=3

In [3]:
bubble_months = ["July", "August", "Playoffs"]
reg_months = ["October", 'November', "December", "January", "February", "March"]
months = [bubble_months, reg_months]

# Read NBA team monthly stats from: 
# "https://stats.nba.com/teams/traditional/?sort=W_PCT&dir=-1&Season=2019-20&SeasonType=Regular%20Season&Month=3"


team_stats = pd.read_excel("NBA_team_stats.xlsx", sheet_name=bubble_months+reg_months)
#bubble = pd.read_excel("NBA_team_stats.xlsx", sheet_name=bubble_months)
#reg = pd.read_excel("NBA_team_stats.xlsx", sheet_name=reg_months)

# Merge monthly data into single df
df = pd.DataFrame()
for key in team_stats.keys():
    team_stats[key]["Month"] = key
    df = pd.concat([df, team_stats[key]], axis=0)

df.head(10)

  for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():


Unnamed: 0,Team,GP,W,L,Win %,MIN,PTS,FGM,FGA,FG%,...,REB,AST,TOV,STL,BLK,BLKA,PF,PFD,Plus/Minus,Month
0,Houston Rockets,1,1,0,1.0,53.0,153.0,52.0,101.0,51.5,...,43.0,26.0,9.0,12.0,10.0,4.0,31.0,36.0,4.0,July
1,Los Angeles Lakers,1,1,0,1.0,48.0,103.0,32.0,82.0,39.0,...,45.0,21.0,16.0,6.0,3.0,5.0,27.0,30.0,2.0,July
2,Milwaukee Bucks,1,1,0,1.0,48.0,119.0,39.0,84.0,46.4,...,47.0,25.0,15.0,9.0,7.0,5.0,26.0,28.0,7.0,July
3,Orlando Magic,1,1,0,1.0,48.0,128.0,46.0,87.0,52.9,...,42.0,31.0,14.0,8.0,3.0,2.0,23.0,25.0,10.0,July
4,Phoenix Suns,1,1,0,1.0,48.0,125.0,42.0,80.0,52.5,...,37.0,29.0,18.0,10.0,3.0,2.0,24.0,26.0,13.0,July
5,Portland Trail Blazers,1,1,0,1.0,53.0,140.0,49.0,91.0,53.8,...,42.0,27.0,17.0,10.0,9.0,5.0,36.0,26.0,5.0,July
6,San Antonio Spurs,1,1,0,1.0,48.0,129.0,48.0,90.0,53.3,...,42.0,32.0,15.0,6.0,4.0,2.0,17.0,25.0,9.0,July
7,Utah Jazz,1,1,0,1.0,48.0,106.0,37.0,84.0,44.0,...,43.0,17.0,20.0,11.0,6.0,2.0,23.0,25.0,2.0,July
8,Boston Celtics,1,0,1,0.0,48.0,112.0,37.0,91.0,40.7,...,48.0,18.0,16.0,8.0,5.0,7.0,28.0,26.0,-7.0,July
9,Brooklyn Nets,1,0,1,0.0,48.0,118.0,44.0,91.0,48.4,...,39.0,30.0,14.0,4.0,2.0,3.0,25.0,23.0,-10.0,July


In [4]:
"""This function takes the following parameters:

    df: a DataFrame organized as above
    months: tuple consisting of two lists of months - (bubble_months, regular_months)
    num_skip: number of columns at beginning of df to sum, not average

and returns DatFrames for each subset of months with per game averages and in addition to a data spanning all months."""

def basic_clean(df_total, months, num_skip=4):
    df_bubble = df_total[df_total["Month"].isin(months[0])] # Bubble months
    df_reg = df_total[df_total["Month"].isin(months[1])] # Regular months
    df_reg = df_reg[df_reg["Team"].isin(df_bubble["Team"])] # Eliminate teams not in bubble
    dfs = [df_total, df_bubble, df_reg]
    for i in range(len(dfs)):
        dfs[i] = dfs[i].set_index(["Team", "Month"])
        dfs[i] = dfs[i].sort_index(level=["Team", "Month"])
        
        # Condense monthly data into per game averages
        dfs[i] = dfs[i].apply(lambda x: x[:num_skip].append(x[num_skip:]*x["GP"]), axis=1)
        dfs[i] = dfs[i].groupby("Team").sum()
        dfs[i] = dfs[i].apply(lambda x: x[:num_skip].append(x[num_skip:]/x["GP"]), axis=1) # Compute weighed average (by GP) for each stat
        
        # Recalculate percentages
        dfs[i].loc[:,"Win %"] = dfs[i].loc[:,"W"]/dfs[i].loc[:,"GP"]
        dfs[i].loc[:,"FG%"] = dfs[i].loc[:,"FGM"]/dfs[i].loc[:,"FGA"]
        dfs[i].loc[:,"3P%"] = dfs[i].loc[:,"3PM"]/dfs[i].loc[:,"3PA"]
        dfs[i].loc[:,"FT%"] = dfs[i].loc[:,"FTM"]/dfs[i].loc[:,"FTA"]
        dfs[i] = dfs[i].reset_index()
    return dfs

dfs = basic_clean(df, months) # Clean and separate bubble, non-bubble, and regular season total data

In [5]:
for d in dfs:
    print(d.head())

                Team    GP     W     L     Win %        MIN         PTS  \
0      Atlanta Hawks  67.0  20.0  47.0  0.298507  48.619403  111.759701   
1     Boston Celtics  76.0  52.0  24.0  0.684211  48.409211  113.601316   
2      Brooklyn Nets  76.0  35.0  41.0  0.460526  48.535526  111.448684   
3  Charlotte Hornets  65.0  23.0  42.0  0.353846  48.447692  102.889231   
4      Chicago Bulls  65.0  22.0  43.0  0.338462  48.229231  106.841538   

         FGM        FGA       FG%  ...       DREB        REB        AST  \
0  40.635821  90.540299  0.448815  ...  33.392537  43.262687  23.955224   
1  41.138158  89.392105  0.460199  ...  35.310526  45.934211  22.768421   
2  40.173684  90.321053  0.444788  ...  37.143421  47.723684  24.526316   
3  37.312308  85.952308  0.434105  ...  31.803077  42.775385  23.832308   
4  39.569231  88.658462  0.446311  ...  31.384615  41.850769  23.258462   

         TOV        STL       BLK      BLKA         PF        PFD  Plus/Minus  
0  16.220896   7.8

## Feature Engineering: Advanced Stats

Now that we have the statistical averages for each NBA team throughout the season, let's add some of our own advanced stats that may  that we may want to analyze in the future.

- Three Point Attempt Rate (3PAr)
    - Percent of a team's field goals taken from behind the arc
- Effective Field Goal Percentage (eFG%)
    - eFG% = ((FGM + (0.5x3PM))/FGA
        - Adjusts for 3PT shots being 1.5 times more valuable than a 2PT shot
- True Shooting Percentage (TS%)
    - TS% = Points/[2(FGA+0.44xFTA)]
        - 0.44 is a constant used to estimate the number of possessions contributed by FTs
        - Factors in value of 3PT shots and FTs
- Offensive Rating (OffRtg)
    - OffRtg = Points per 100 Possessions = 100 x [PTS/Possessions]
        - Possessions = FGA + 0.44xFTA - OREB + TO
            - Note: offensive rebounds excluded because they do not create a new possession.
- PACE
    - Number of possessions per 48 minutes
    
    
**Source:** https://stats.nba.com/help/glossary/#tspct

In [6]:
"""This functions takes the following parameters:

    - dfs: list of DataFrames
    - list: boolean representing whether or not dfs is a list
    - sort_values: which column to sort before returning

and adds the advanced stats listed above to each DataFrame in dfs."""

def add_advanced(dfs, list=True, sort_val="OffRtg"):
    if list:
        for i in range(len(dfs)):
            dfs[i].loc[:,"3PAr"] = dfs[i].loc[:,"3PA"]/dfs[i].loc[:,"FGA"]
            dfs[i].loc[:,"eFG%"] = (dfs[i].loc[:,"FGM"] + 0.5*dfs[i].loc[:,"3PM"])/dfs[i].loc[:,"FGA"]
            dfs[i].loc[:,"TS%"] = dfs[i].loc[:,"PTS"]/(2*(dfs[i].loc[:,"FGA"] + 0.44*dfs[i].loc[:,"FTA"]))
            possessions = dfs[i].loc[:,"FGA"] + 0.44*dfs[i].loc[:,"FTA"] - dfs[i].loc[:,"OREB"] + dfs[i].loc[:,"TOV"]
            dfs[i].loc[:,"PPP"] = dfs[i].loc[:,"PTS"]/possessions
            dfs[i].loc[:,"OffRtg"] = 100*dfs[i].loc[:,"PPP"]
            dfs[i].loc[:,"PACE"] = (possessions*48)/dfs[i].loc[:,"MIN"]
            dfs[i].sort_values(sort_val, ascending=False)
    else:
        dfs.loc[:,"3PAr"] = dfs.loc[:,"3PA"]/dfs.loc[:,"FGA"]
        dfs.loc[:,"eFG%"] = (dfs.loc[:,"FGM"] + 0.5*dfs.loc[:,"3PM"])/dfs.loc[:,"FGA"]
        dfs.loc[:,"TS%"] = dfs.loc[:,"PTS"]/(2*(dfs.loc[:,"FGA"] + 0.44*dfs.loc[:,"FTA"]))
        possessions = dfs.loc[:,"FGA"] + 0.44*dfs.loc[:,"FTA"] - dfs.loc[:,"OREB"] + dfs.loc[:,"TOV"]
        dfs.loc[:,"PPP"] = dfs.loc[:,"PTS"]/possessions
        dfs.loc[:, "OffRtg"] = 100*dfs.loc[:,"PPP"]
        dfs.loc[:,"PACE"] = (possessions*48)/dfs.loc[:,"MIN"]
        dfs.sort_values(sort_val, ascending=False)
    return dfs

dfs = add_advanced(dfs)

In [7]:
for d in dfs:
    print(d.head())

                Team    GP     W     L     Win %        MIN         PTS  \
0      Atlanta Hawks  67.0  20.0  47.0  0.298507  48.619403  111.759701   
1     Boston Celtics  76.0  52.0  24.0  0.684211  48.409211  113.601316   
2      Brooklyn Nets  76.0  35.0  41.0  0.460526  48.535526  111.448684   
3  Charlotte Hornets  65.0  23.0  42.0  0.353846  48.447692  102.889231   
4      Chicago Bulls  65.0  22.0  43.0  0.338462  48.229231  106.841538   

         FGM        FGA       FG%  ...      BLKA         PF        PFD  \
0  40.635821  90.540299  0.448815  ...  6.394030  23.107463  20.943284   
1  41.138158  89.392105  0.460199  ...  5.432895  21.715789  20.755263   
2  40.173684  90.321053  0.444788  ...  5.272368  21.002632  21.197368   
3  37.312308  85.952308  0.434105  ...  5.044615  18.815385  20.569231   
4  39.569231  88.658462  0.446311  ...  5.904615  21.783077  19.170769   

   Plus/Minus      3PAr      eFG%       TS%       PPP      OffRtg        PACE  
0   -7.962687  0.398141 

Now we'll export the four dataframes as csv files to GitHub so we can access them using RStudio, which is much better for descriptive analytics and visualization.

In [9]:
df.to_csv("df.csv", sep=",", index=False)
dfs[0].to_csv("df_total.csv", sep=",", index=False)
dfs[1].to_csv("df_bubble.csv", sep=",", index=False)
dfs[2].to_csv("df_reg.csv", sep=",", index=False)  