# COGS 108 - Data Checkpoint

# Names

- Khiem Pham
- Charles Mastrangelo
- Joseph Perez
- Patrick Yeh
- Riki Osawa

<a id='research_question'></a>
# Research Question

How did the Coronavirus pandemic playoffs in the NBA, held in a single neutral stadium with no fans, affect the home team advantage? How did these changes affect NBA play tendencies (i.e. shot type, pace, play type such as isolation/post up/pick and roll, efficiency, etc.)?

# Dataset(s)

- Dataset Name: NBA
- Link to the dataset: https://www.nba.com/stats/teams/boxscores/
- Number of observations: 986

The number of observations is the number of teams for every game in the playoffs from 2015 to 2020.
162 games in 2015, 172 games in 2016, 158 games in 2017, 164 games in 2018, 164 games in 2019, 166 observations in 2020 = 986

This dataset, containing a wide array of basketball game statistics, is found on the official website of the NBA.


- Dataset Name: Basketball Reference
- Link to the dataset: https://www.basketball-reference.com/
- Number of observations: 986

The number of observations is the number of teams (home and away) for every game in the playoffs from 2015 to 2020.
81 * 2 games in 2015, 86 * 2 games in 2016, 79* 2 games in 2017, 82 * 2 games in 2018, 82 * 2 games in 2019, 83 * 3 games in 2020 = 986

Basketball Reference is a premier website containing a comprehensive database of basketball statistics. It comprises information from games played across multiple leagues, including the NBA, ABA, WNBA, and European conferences.

We are considering incorporating the team offensive and defensive ratings found in the Basketball Reference database with all other statistics found in the NBA dataset. If we choose not to do so, then we will not include the Basketball Reference database in our dataset.


# Setup

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import math
import seaborn as sns
sns.set()
sns.set_context('talk')

import warnings
warnings.filterwarnings('ignore')
import patsy
import statsmodels.api as sm
import scipy.stats as stats
from scipy.stats import ttest_ind, chisquare, normaltest

# Data Cleaning

We received our data directly from the NBA’s official website. The data was already clean, in the tidy format, and ready for us to work with. Many relevant game statistics already had their own columns. To get the data into a usable format, we transferred it from the website to Microsoft Excel, and saved them as CSV files. We then uploaded the data to GitHub for ease of access.

Being unfamiliar with NBA conventions, upon examining our data, we were left to interpret some information. Specifically, the “Match Up” column had two formats for games: “Team 1 @ Team 2,” and “Team 1 vs. Team 2.” We deduced how to identify the home and away teams in both of these arrangements, determining how their use depended on the “Team” column. Still, to make the identification of home and away teams more obvious, we added a new column, “Home/Away,” to contain this information. We have implemented a function to facilitate this process, and we have set "Home/Away" as the third column in our dataset. This allows us to group together the "Home” and “Away" games separately in order for us to conduct our analysis.

To answer our research question, we must compare pre-COVID NBA playoff games with COVID games. Currently, we have five pre-COVID playoff seasons in five separate CSV files. While this extra granularity will be helpful for certain analyses, we have also created one additional DataFrame incorporating all pre-COVID games.



In [None]:
df14_15 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202014-15.csv')
df14_15.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,DAL,DAL @ HOU,4/18/2015,L,240,108,44,99,44.4,6,17,35.3,14,17,82.4,14,35,49,19,5,4,17,28,-10
1,NOP,NOP @ GSW,4/18/2015,L,240,99,35,83,42.2,9,22,40.9,20,25,80.0,10,34,44,24,8,5,14,28,-7
2,TOR,TOR vs. WAS,4/18/2015,L,265,86,35,92,38.0,6,29,20.7,10,14,71.4,10,38,48,21,5,4,12,21,-7
3,HOU,HOU vs. DAL,4/18/2015,W,240,118,38,85,44.7,10,25,40.0,32,45,71.1,8,36,44,26,11,9,13,22,10
4,CHI,CHI vs. MIL,4/18/2015,W,240,103,38,83,45.8,12,32,37.5,15,22,68.2,10,42,52,30,8,7,19,21,12


In [None]:
df15_16 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202015-16.csv')
df15_16.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,OKC,OKC vs. DAL,4/16/2016,W,240,108,36,80,45.0,12,35,34.3,24,28,85.7,14,42,56,23,7,6,18,25,38
1,BOS,BOS @ ATL,4/16/2016,L,240,101,37,102,36.3,11,35,31.4,16,19,84.2,15,35,50,27,6,6,10,32,-1
2,ATL,ATL vs. BOS,4/16/2016,W,240,102,35,86,40.7,5,26,19.2,27,35,77.1,13,40,53,23,4,9,12,20,1
3,IND,IND @ TOR,4/16/2016,W,240,100,34,79,43.0,11,21,52.4,21,29,72.4,9,29,38,19,11,8,13,25,10
4,TOR,TOR vs. IND,4/16/2016,L,240,90,30,79,38.0,4,19,21.1,26,38,68.4,20,32,52,17,6,5,20,27,-10


In [None]:
df16_17 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202016-17.csv')
df16_17.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,LAC,LAC vs. UTA,4/15/2017,L,240,95,36,81,44.4,8,24,33.3,15,17,88.2,8,32,40,20,6,4,15,18,-2
1,IND,IND @ CLE,4/15/2017,L,240,108,40,81,49.4,11,24,45.8,17,20,85.0,12,29,41,21,6,1,13,25,-1
2,MIL,MIL @ TOR,4/15/2017,W,240,97,38,85,44.7,9,23,39.1,12,15,80.0,9,34,43,22,5,5,5,23,14
3,MEM,MEM @ SAS,4/15/2017,L,240,82,31,79,39.2,7,20,35.0,13,17,76.5,8,27,35,17,7,6,9,20,-29
4,SAS,SAS vs. MEM,4/15/2017,W,240,111,41,76,53.9,10,19,52.6,19,25,76.0,8,34,42,16,4,11,10,20,29


In [None]:
df17_18 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202017-18.csv')
df17_18.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,WAS,WAS @ TOR,4/14/2018,L,240,106,41,86,47.7,8,21,38.1,16,18,88.9,6,29,35,29,11,3,14,21,-8
1,TOR,TOR vs. WAS,4/14/2018,W,240,114,41,77,53.2,16,30,53.3,16,20,80.0,5,33,38,26,6,7,17,18,8
2,SAS,SAS @ GSW,4/14/2018,L,240,92,32,80,40.0,9,22,40.9,19,24,79.2,3,27,30,19,9,4,13,20,-21
3,PHI,PHI vs. MIA,4/14/2018,W,240,130,45,95,47.4,18,28,64.3,22,29,75.9,17,33,50,34,9,4,11,23,27
4,POR,POR vs. NOP,4/14/2018,L,240,95,37,98,37.8,12,39,30.8,9,12,75.0,15,37,52,17,10,6,12,15,-2


In [None]:
df18_19 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202018-19.csv')
df18_19.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,UTA,UTA @ HOU,4/14/2019,L,240,90,30,77,39.0,7,27,25.9,23,27,85.2,7,34,41,17,6,2,19,20,-32
1,UTA,UTA vs. HOU,4/22/2019,W,240,107,37,86,43.0,11,35,31.4,22,26,84.6,16,36,52,24,7,7,15,19,16
2,UTA,UTA @ HOU,4/24/2019,L,240,93,35,94,37.2,9,38,23.7,14,19,73.7,14,33,47,26,8,4,16,22,-7
3,UTA,UTA @ HOU,4/17/2019,L,240,98,39,98,39.8,8,38,21.1,12,18,66.7,15,33,48,27,16,3,12,26,-20
4,UTA,UTA vs. HOU,4/20/2019,L,240,101,32,77,41.6,12,41,29.3,25,38,65.8,7,41,48,21,9,11,14,22,-3


In [None]:
df19_20 = pd.read_csv('https://raw.githubusercontent.com/kpham841/NBA-Playoffs-Data/main/NBA%20Playoffs%202019-20.csv')
df19_20.head()

Unnamed: 0,TEAM,MATCH UP,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,TOR,TOR vs. BKN,8/17/2020,W,240,134,40,85,47.1,22,44,50.0,32,33,97.0,9,38,47,26,4,6,12,22,24
1,BOS,BOS vs. PHI,8/17/2020,W,240,109,38,90,42.2,10,31,32.3,23,26,88.5,16,27,43,22,12,4,7,21,8
2,DAL,DAL @ LAC,8/17/2020,L,240,110,37,81,45.7,15,43,34.9,21,24,87.5,5,36,41,18,9,2,21,21,-8
3,DEN,DEN vs. UTA,8/17/2020,W,265,135,49,95,51.6,22,41,53.7,15,18,83.3,8,33,41,23,10,6,12,22,10
4,PHI,PHI @ BOS,8/17/2020,L,240,101,37,80,46.3,9,27,33.3,18,23,78.3,15,35,50,23,5,3,18,24,-8


In [None]:
# Function that returns the data frame with a new column indicating if the team was the Home or Away team
def home_away(df):
  df_Home_Away = []
  for index, row in df.iterrows():
    if '@' in row['MATCH UP']:
      if row['MATCH UP'][-3:] == row['TEAM']:
        df_Home_Away.append('Home')
      else:
        df_Home_Away.append('Away')
    else:
      if row['MATCH UP'][0:3] == row['TEAM']:
        df_Home_Away.append('Home')
      else:
        df_Home_Away.append('Away')
  df['HOME/AWAY'] = df_Home_Away
  df = df[['TEAM', 'MATCH UP', 'HOME/AWAY', 'GAME DATE', 'Win/Loss', 'MIN', 'PTS', 'FGM', 'FGA', 'FG%', '3PM', '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'Plus/Minus']]
  return df

In [None]:
# Create a new column that indicates if the team was the Home or Away team
df14_15 = home_away(df14_15)
df15_16 = home_away(df15_16)
df16_17 = home_away(df16_17)
df17_18 = home_away(df17_18)
df18_19 = home_away(df18_19)
df19_20 = home_away(df19_20)

In [None]:
df14_15.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,DAL,DAL @ HOU,Away,4/18/2015,L,240,108,44,99,44.4,6,17,35.3,14,17,82.4,14,35,49,19,5,4,17,28,-10
1,NOP,NOP @ GSW,Away,4/18/2015,L,240,99,35,83,42.2,9,22,40.9,20,25,80.0,10,34,44,24,8,5,14,28,-7
2,TOR,TOR vs. WAS,Home,4/18/2015,L,265,86,35,92,38.0,6,29,20.7,10,14,71.4,10,38,48,21,5,4,12,21,-7
3,HOU,HOU vs. DAL,Home,4/18/2015,W,240,118,38,85,44.7,10,25,40.0,32,45,71.1,8,36,44,26,11,9,13,22,10
4,CHI,CHI vs. MIL,Home,4/18/2015,W,240,103,38,83,45.8,12,32,37.5,15,22,68.2,10,42,52,30,8,7,19,21,12


In [None]:
df15_16.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,OKC,OKC vs. DAL,Home,4/16/2016,W,240,108,36,80,45.0,12,35,34.3,24,28,85.7,14,42,56,23,7,6,18,25,38
1,BOS,BOS @ ATL,Away,4/16/2016,L,240,101,37,102,36.3,11,35,31.4,16,19,84.2,15,35,50,27,6,6,10,32,-1
2,ATL,ATL vs. BOS,Home,4/16/2016,W,240,102,35,86,40.7,5,26,19.2,27,35,77.1,13,40,53,23,4,9,12,20,1
3,IND,IND @ TOR,Away,4/16/2016,W,240,100,34,79,43.0,11,21,52.4,21,29,72.4,9,29,38,19,11,8,13,25,10
4,TOR,TOR vs. IND,Home,4/16/2016,L,240,90,30,79,38.0,4,19,21.1,26,38,68.4,20,32,52,17,6,5,20,27,-10


In [None]:
df16_17.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,LAC,LAC vs. UTA,Home,4/15/2017,L,240,95,36,81,44.4,8,24,33.3,15,17,88.2,8,32,40,20,6,4,15,18,-2
1,IND,IND @ CLE,Away,4/15/2017,L,240,108,40,81,49.4,11,24,45.8,17,20,85.0,12,29,41,21,6,1,13,25,-1
2,MIL,MIL @ TOR,Away,4/15/2017,W,240,97,38,85,44.7,9,23,39.1,12,15,80.0,9,34,43,22,5,5,5,23,14
3,MEM,MEM @ SAS,Away,4/15/2017,L,240,82,31,79,39.2,7,20,35.0,13,17,76.5,8,27,35,17,7,6,9,20,-29
4,SAS,SAS vs. MEM,Home,4/15/2017,W,240,111,41,76,53.9,10,19,52.6,19,25,76.0,8,34,42,16,4,11,10,20,29


In [None]:
df17_18.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,WAS,WAS @ TOR,Away,4/14/2018,L,240,106,41,86,47.7,8,21,38.1,16,18,88.9,6,29,35,29,11,3,14,21,-8
1,TOR,TOR vs. WAS,Home,4/14/2018,W,240,114,41,77,53.2,16,30,53.3,16,20,80.0,5,33,38,26,6,7,17,18,8
2,SAS,SAS @ GSW,Away,4/14/2018,L,240,92,32,80,40.0,9,22,40.9,19,24,79.2,3,27,30,19,9,4,13,20,-21
3,PHI,PHI vs. MIA,Home,4/14/2018,W,240,130,45,95,47.4,18,28,64.3,22,29,75.9,17,33,50,34,9,4,11,23,27
4,POR,POR vs. NOP,Home,4/14/2018,L,240,95,37,98,37.8,12,39,30.8,9,12,75.0,15,37,52,17,10,6,12,15,-2


In [None]:
df18_19.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,UTA,UTA @ HOU,Away,4/14/2019,L,240,90,30,77,39.0,7,27,25.9,23,27,85.2,7,34,41,17,6,2,19,20,-32
1,UTA,UTA vs. HOU,Home,4/22/2019,W,240,107,37,86,43.0,11,35,31.4,22,26,84.6,16,36,52,24,7,7,15,19,16
2,UTA,UTA @ HOU,Away,4/24/2019,L,240,93,35,94,37.2,9,38,23.7,14,19,73.7,14,33,47,26,8,4,16,22,-7
3,UTA,UTA @ HOU,Away,4/17/2019,L,240,98,39,98,39.8,8,38,21.1,12,18,66.7,15,33,48,27,16,3,12,26,-20
4,UTA,UTA vs. HOU,Home,4/20/2019,L,240,101,32,77,41.6,12,41,29.3,25,38,65.8,7,41,48,21,9,11,14,22,-3


In [None]:
df19_20.head()

Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,TOR,TOR vs. BKN,Home,8/17/2020,W,240,134,40,85,47.1,22,44,50.0,32,33,97.0,9,38,47,26,4,6,12,22,24
1,BOS,BOS vs. PHI,Home,8/17/2020,W,240,109,38,90,42.2,10,31,32.3,23,26,88.5,16,27,43,22,12,4,7,21,8
2,DAL,DAL @ LAC,Away,8/17/2020,L,240,110,37,81,45.7,15,43,34.9,21,24,87.5,5,36,41,18,9,2,21,21,-8
3,DEN,DEN vs. UTA,Home,8/17/2020,W,265,135,49,95,51.6,22,41,53.7,15,18,83.3,8,33,41,23,10,6,12,22,10
4,PHI,PHI @ BOS,Away,8/17/2020,L,240,101,37,80,46.3,9,27,33.3,18,23,78.3,15,35,50,23,5,3,18,24,-8


In [None]:
# Create a single dataframe for all games pre-Covid
df14_16 = df14_15.append(df15_16)
df14_17 = df14_16.append(df16_17)
df14_18 = df14_17.append(df17_18)
df14_19 = df14_18.append(df18_19)
print(df14_19.shape)
df14_19.head()

(820, 25)


Unnamed: 0,TEAM,MATCH UP,HOME/AWAY,GAME DATE,Win/Loss,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,OREB,DREB,REB,AST,STL,BLK,TOV,PF,Plus/Minus
0,DAL,DAL @ HOU,Away,4/18/2015,L,240,108,44,99,44.4,6,17,35.3,14,17,82.4,14,35,49,19,5,4,17,28,-10
1,NOP,NOP @ GSW,Away,4/18/2015,L,240,99,35,83,42.2,9,22,40.9,20,25,80.0,10,34,44,24,8,5,14,28,-7
2,TOR,TOR vs. WAS,Home,4/18/2015,L,265,86,35,92,38.0,6,29,20.7,10,14,71.4,10,38,48,21,5,4,12,21,-7
3,HOU,HOU vs. DAL,Home,4/18/2015,W,240,118,38,85,44.7,10,25,40.0,32,45,71.1,8,36,44,26,11,9,13,22,10
4,CHI,CHI vs. MIL,Home,4/18/2015,W,240,103,38,83,45.8,12,32,37.5,15,22,68.2,10,42,52,30,8,7,19,21,12


In [None]:
# Check for any null values
print((df14_19.append(df19_20)).isnull().values.any())

False
