# Exploratory Analysis on NBA Shot Log Data

This notebook will do some exploratory analysis based on the 2014-2015 NBA Shot log data from Kaggle (credit to Dans Becker)

# Step 1: Loading the Data

In [3]:
import pandas as pd

In [4]:
nba_data = pd.read_csv('E:/Random_Projects/NBA-Stuff/data/shot_logs.csv')

# Step 2: Properties of the Data 

Some questions to consider
* What are the data types of each column (format, etc.)?
* What attributes are missing, if any?

Using the info method we can see that the the shot clock data, has missing values and tell which data is numeric vs non numeric

In [12]:
nba_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128069 entries, 0 to 128068
Data columns (total 21 columns):
GAME_ID                       128069 non-null int64
MATCHUP                       128069 non-null object
LOCATION                      128069 non-null object
W                             128069 non-null object
FINAL_MARGIN                  128069 non-null int64
SHOT_NUMBER                   128069 non-null int64
PERIOD                        128069 non-null int64
GAME_CLOCK                    128069 non-null object
SHOT_CLOCK                    122502 non-null float64
DRIBBLES                      128069 non-null int64
TOUCH_TIME                    128069 non-null float64
SHOT_DIST                     128069 non-null float64
PTS_TYPE                      128069 non-null int64
SHOT_RESULT                   128069 non-null object
CLOSEST_DEFENDER              128069 non-null object
CLOSEST_DEFENDER_PLAYER_ID    128069 non-null int64
CLOSE_DEF_DIST                128069 non-null

In [5]:
nba_data.head(5)

Unnamed: 0,GAME_ID,MATCHUP,LOCATION,W,FINAL_MARGIN,SHOT_NUMBER,PERIOD,GAME_CLOCK,SHOT_CLOCK,DRIBBLES,...,SHOT_DIST,PTS_TYPE,SHOT_RESULT,CLOSEST_DEFENDER,CLOSEST_DEFENDER_PLAYER_ID,CLOSE_DEF_DIST,FGM,PTS,player_name,player_id
0,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,1,1,1:09,10.8,2,...,7.7,2,made,"Anderson, Alan",101187,1.3,1,2,brian roberts,203148
1,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,2,1,0:14,3.4,0,...,28.2,3,missed,"Bogdanovic, Bojan",202711,6.1,0,0,brian roberts,203148
2,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,3,1,0:00,,3,...,10.1,2,missed,"Bogdanovic, Bojan",202711,0.9,0,0,brian roberts,203148
3,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,4,2,11:47,10.3,2,...,17.2,2,missed,"Brown, Markel",203900,3.4,0,0,brian roberts,203148
4,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,5,2,10:34,10.9,2,...,3.7,2,missed,"Young, Thaddeus",201152,1.1,0,0,brian roberts,203148


In [35]:
nba_data.tail(10)

Unnamed: 0,Game Id,Matchup,Location,W,Final Margin,Shot Number,Period,Game Clock,Shot Clock,Dribbles,...,Shot Dist,Pts Type,Shot Result,Closest Defender,Closest Defender Player Id,Close Def Dist,Fgm,Pts,Player Name,Player Id
128059,21400033,"NOV 01, 2014 - BKN @ DET",A,W,12,4,4,8:34,19.8,0,...,22.7,3,missed,"Augustin, D.J.",201571,4.0,0,0,jarrett jack,101127
128060,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,1,1,1:59,11.4,16,...,12.6,2,missed,"Rondo, Rajon",200765,4.8,0,0,jarrett jack,101127
128061,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,2,2,10:10,19.0,0,...,7.4,2,missed,"Bradley, Avery",202340,2.7,0,0,jarrett jack,101127
128062,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,3,2,7:46,7.0,1,...,14.5,2,made,"Smart, Marcus",203935,3.1,1,2,jarrett jack,101127
128063,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,4,2,5:05,15.3,2,...,8.9,2,made,"Sullinger, Jared",203096,5.7,1,2,jarrett jack,101127
128064,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,5,3,1:52,18.3,5,...,8.7,2,missed,"Smart, Marcus",203935,0.8,0,0,jarrett jack,101127
128065,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,6,4,11:28,19.8,4,...,0.6,2,made,"Turner, Evan",202323,0.6,1,2,jarrett jack,101127
128066,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,7,4,11:10,23.0,2,...,16.9,2,made,"Thornton, Marcus",201977,4.2,1,2,jarrett jack,101127
128067,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,8,4,2:37,9.1,4,...,18.3,2,missed,"Bradley, Avery",202340,3.0,0,0,jarrett jack,101127
128068,21400006,"OCT 29, 2014 - BKN @ BOS",A,L,-16,9,4,0:12,,5,...,5.1,2,made,"Bradley, Avery",202340,2.3,1,2,jarrett jack,101127


# Step 3: Data Cleaning

Now we're going to make some some adjustments to make the data easier to read.

In [6]:
#fixing column names

column_list = list(nba_data.columns)

#remove underscore and lowercase data

proper_col = [(col.replace('_', ' ')).title() for col in column_list]

nba_data.columns = proper_col

Now the Matchup Column has a lot of information that can be split into separate columns. Right now it is not really clear which team is the home team or the away team. It is broken down the following way:

* Date

* Home Team

* Away Team

We'll create functions that we can use on the series to each part, but let's try with one example

In [8]:
matchup = nba_data['Matchup']

In [9]:
sample_date = matchup[0].split(' - ')[0]

print(sample_date)

MAR 04, 2015


Now we can use the datetime module to parse the date information

In [10]:
from datetime import datetime

parsed_date = (datetime.strptime(sample_date, '%b %d, %Y')).date()

print(parsed_date)

2015-03-04


Now we have a way to create a function to return the parsed date information

In [15]:
def get_date(s):
    sample_date = s.split(' - ')[0]
    parsed_date = (datetime.strptime(sample_date, '%b %d, %Y')).date()
    return parsed_date

In [17]:
game_date = nba_data['Matchup'].apply(get_date)

nba_data['Date'] = game_date