# NBA SHOT LOGS

## The Dataset

The dataset contains info on shots taken during the 2014-2015 NBA season, with columns displaying who took the shot, where on the court was the shot taken from, nearest defender, etc. There are 21 columns in total and 128069 observations.

The data was found on https://www.kaggle.com/dansbecker/nba-shot-logs, but originally scraped from NBA's REST API.


### Columns

There are 21 columns in total:
1.	GAME_ID: the ID of the game on the NBA website.
2.	MATCHUP: Date and the two teams' abbreviated names.
3.	LOCATION: A (away) or H (home), from the shot taker's perspective.
4.	W: W (win) or L (loss), from the shot taker's perspective.
5.	FINAL_MARGIN: Point differential between the two teams at the end of regulation.
6.	SHOT_NUMBER: Number of the shot taken by that player during the game.
7.	PERIOD: Which period the shot was taken at.
8.	GAME_CLOCK: Game time played until the moment of the shot.
9.	SHOT_CLOCK: Seconds into the shot clock when the shot was taken (time allotted by the shot clock in the NBA is 24s).
10.	DRIBBLES: How many dribbles taken before the shot.
11.	TOUCH_TIME: Time with the ball before taking the shot.
12.	SHOT_DIST: How far the shot is from the basket (measured in feet).
13.	PTS_TYPE: 2 or 3 (depends on whether the shot was taken inside or outside the 3-point line).
14.	SHOT_RESULT: made/missed.
15.	CLOSEST_DEFENDER: Name of the closest defender.
16.	CLOSEST_DEFENDER_PLAYER_ID: Id of the closest defender.
17.	CLOSE_DEF_DIST: Distance of the closest defender from the shot taker.
18.	FGM: Field goals made until that shot by that player.
19.	PTS: How many points the player had until that shot.
20.	player_name: Name of the player taking the shot
21.	player_id: Shot taker ID


## Data prep/cleaning process

In [13]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('fivethirtyeight')

In [84]:
# import data
df = pd.read_csv("shot_logs.csv")

First thing that was done was to check the type of each column:

In [7]:
df.dtypes

GAME_ID                         int64
MATCHUP                        object
LOCATION                       object
W                              object
FINAL_MARGIN                    int64
SHOT_NUMBER                     int64
PERIOD                          int64
GAME_CLOCK                     object
SHOT_CLOCK                    float64
DRIBBLES                        int64
TOUCH_TIME                    float64
SHOT_DIST                     float64
PTS_TYPE                        int64
SHOT_RESULT                    object
CLOSEST_DEFENDER               object
CLOSEST_DEFENDER_PLAYER_ID      int64
CLOSE_DEF_DIST                float64
FGM                             int64
PTS                             int64
player_name                    object
player_id                       int64
dtype: object

The only column that needs to be changed is the GAME_CLOCK column (originally an object type) to number of SECONDS after the game started so it can be used as a measure during the data analysis. 

In [85]:
#Turns object into datetime64 type then into number of seconds
df['GAME_CLOCK'] = (pd.to_datetime(df['GAME_CLOCK'], format="%M:%S") - pd.to_datetime('1900-01-01')).dt.total_seconds()

Here are the first 5 observations of the dataset: 

In [86]:
df.head()

Unnamed: 0,GAME_ID,MATCHUP,LOCATION,W,FINAL_MARGIN,SHOT_NUMBER,PERIOD,GAME_CLOCK,SHOT_CLOCK,DRIBBLES,...,SHOT_DIST,PTS_TYPE,SHOT_RESULT,CLOSEST_DEFENDER,CLOSEST_DEFENDER_PLAYER_ID,CLOSE_DEF_DIST,FGM,PTS,player_name,player_id
0,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,1,1,69.0,10.8,2,...,7.7,2,made,"Anderson, Alan",101187,1.3,1,2,brian roberts,203148
1,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,2,1,14.0,3.4,0,...,28.2,3,missed,"Bogdanovic, Bojan",202711,6.1,0,0,brian roberts,203148
2,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,3,1,0.0,,3,...,10.1,2,missed,"Bogdanovic, Bojan",202711,0.9,0,0,brian roberts,203148
3,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,4,2,707.0,10.3,2,...,17.2,2,missed,"Brown, Markel",203900,3.4,0,0,brian roberts,203148
4,21400899,"MAR 04, 2015 - CHA @ BKN",A,W,24,5,2,634.0,10.9,2,...,3.7,2,missed,"Young, Thaddeus",201152,1.1,0,0,brian roberts,203148


## Simple Statistics

## Visualization