# NBA Box Scores and data analysis

Data from [Kaggle](https://www.kaggle.com/pablote/nba-enhanced-stats "Kaggle page").

3/21/2018 play around with NBA box scores data as practice for learning Python and pandas
11/7/2018 revisiting this project as practice for learning pandas and numpy


## Interesting questions
1. Specific referee bias against teams or individual players?
2. Specific player stats. Regular vs post-season performance? Home vs away games? Shooting percentage depending on game venue? 
3. Seasonal influence on player performances? e.g. do they play better at particular times of the year? Do players play better at night or during the day? On weekdays or weekends? Holidays? 


In [1]:
import numpy as np
import pandas as pd
!ls

2016-17_officialBoxScore.csv  2017-18_teamBoxScore.csv
2016-17_playerBoxScore.csv    metadata_officialBoxScore.pdf
2016-17_standings.csv	      metadata_playerBoxScore.pdf
2016-17_teamBoxScore.csv      metadata_standing.pdf
2017-18_officialBoxScore.csv  metadata_teamBoxScore.pdf
2017-18_playerBoxScore.csv    NBA Scores Analysis.ipynb
2017-18_standings.csv	      teamBoxScore.csv


Load player Box Scores

In [3]:
df = pd.read_csv('2016-17_playerBoxScore.csv');

Get a sense of the data and available columns 

In [6]:
df.head()

Unnamed: 0,gmDate,gmTime,seasTyp,playLNm,playFNm,teamAbbr,teamConf,teamDiv,teamLoc,teamRslt,...,playFT%,playORB,playDRB,playTRB,opptAbbr,opptConf,opptDiv,opptLoc,opptRslt,opptDayOff
0,2016-10-25,08:00,Regular,Porziņģis,Kristaps,NY,East,Atlantic,Away,Loss,...,0.5,4,3,7,CLE,East,Central,Home,Win,0
1,2016-10-25,08:00,Regular,Rose,Derrick,NY,East,Atlantic,Away,Loss,...,1.0,2,1,3,CLE,East,Central,Home,Win,0
2,2016-10-25,08:00,Regular,Anthony,Carmelo,NY,East,Atlantic,Away,Loss,...,1.0,1,4,5,CLE,East,Central,Home,Win,0
3,2016-10-25,08:00,Regular,Lee,Courtney,NY,East,Atlantic,Away,Loss,...,0.0,1,2,3,CLE,East,Central,Home,Win,0
4,2016-10-25,08:00,Regular,Noah,Joakim,NY,East,Atlantic,Away,Loss,...,0.0,1,5,6,CLE,East,Central,Home,Win,0


In [7]:
df.iloc[1]

gmDate          2016-10-25
gmTime               08:00
seasTyp            Regular
playLNm               Rose
playFNm            Derrick
teamAbbr                NY
teamConf              East
teamDiv           Atlantic
teamLoc               Away
teamRslt              Loss
teamDayOff               0
offLNm1               Lane
offFNm1               Karl
offLNm2              Adams
offFNm2             Bennie
offLNm3            Kennedy
offFNm3               Bill
playDispNm    Derrick Rose
playStat           Starter
playMin                 30
playPos                 PG
playHeight              75
playWeight             190
playBDate       1988-10-04
playPTS                 17
playAST                  1
playTO                   4
playSTL                  0
playBLK                  1
playPF                   1
playFGA                 17
playFGM                  7
playFG%             0.4118
play2PA                 15
play2PM                  6
play2P%                0.4
play3PA                  2
p

Let's dig deeper into Klay Thompson's numbers from 2016-2017

In [9]:
kt = df[df['playDispNm'] == 'Klay Thompson'][['playMin', 'playPTS', 'playFG%', 'play2PA']]
kt.describe()

# another way to extract column information
# df[df['playLNm'] == 'Rose'].loc[:, 'playPTS']

Unnamed: 0,playMin,playFGM,playPTS,play2PA
count,78.0,78.0,78.0,78.0
mean,33.935897,8.25641,22.333333,9.358974
std,4.231422,2.862366,7.979953,3.023606
min,24.0,3.0,8.0,4.0
25%,31.0,6.25,17.0,7.0
50%,34.0,8.0,22.5,9.0
75%,37.0,10.0,26.0,11.0
max,47.0,21.0,60.0,19.0


At first glance 

## player performance by game time

Do players perform better during the day or at night? 
1. get columns for common player stats (AST, TO, FG%, BLK, STL) and game times
2. categorize player stats by game times
3. compare average player stats between night and day and plot difference


In [15]:
df = pd.read_csv('2016-17_officialBoxScore.csv')
df.iloc[1,:]

gmDate        2016-10-25
gmTime             08:00
seasTyp          Regular
offLNm             Adams
offFNm            Bennie
teamAbbr              NY
teamConf            East
teamDiv         Atlantic
teamLoc             Away
teamRslt            Loss
teamMin              240
teamDayOff             0
teamPTS               88
teamAST               17
teamTO                18
teamSTL                6
teamBLK                6
teamPF                22
teamFGA               87
teamFGM               32
teamFG%           0.3678
team2PA               60
team2PM               23
team2P%           0.3833
team3PA               27
team3PM                9
team3P%           0.3333
teamFTA               20
teamFTM               15
teamFT%             0.75
                 ...    
opptPTS1              28
opptPTS2              20
opptPTS3              34
opptPTS4              35
opptPTS5               0
opptPTS6               0
opptPTS7               0
opptPTS8               0
opptTREB%        54.8387


In [22]:
df[df['teamAbbr'] == 'GS'][['gmTime', 'teamTO', 'teamSTL', 'teamFG%', 'team2P%', 'team3P%', 'teamFT%']]

Unnamed: 0,gmTime,teamTO,teamSTL,teamFG%,team2P%,team3P%,teamFT%
15,10:30,16,11,0.4706,0.6346,0.2121,0.7222
16,10:30,16,11,0.4706,0.6346,0.2121,0.7222
17,10:30,16,11,0.4706,0.6346,0.2121,0.7222
144,09:30,14,8,0.4835,0.5556,0.3214,0.8929
145,09:30,14,8,0.4835,0.5556,0.3214,0.8929
146,09:30,14,8,0.4835,0.5556,0.3214,0.8929
222,06:00,16,6,0.4217,0.5000,0.2759,0.8485
223,06:00,16,6,0.4217,0.5000,0.2759,0.8485
224,06:00,16,6,0.4217,0.5000,0.2759,0.8485
318,10:00,16,14,0.5618,0.6349,0.3846,0.7391


In [None]:
df2 = df.pivot