<img src="https://theundefeated.com/wp-content/uploads/2017/05/nba-logo.png?w=50" style="float: left; margin: 20px; height: 55px">

# NBA Player Prediction

_Authors: Patrick Wales-Dinan_

---

This project is an attempt to glean information about NBA Players. When teams invest in a player they are taking a risk. They pay them millions of dollars for the promise of a return on the basketball court and or through marketing/ticket sales. These two are obviously related but that relationship is not one to one. Here I wanted to focus on whether you could predict the future performance of an NBA player based on his statistics in his first two years:

<img src="./assets/mitchell.png" style="float: left; margin: 10px; height: 90px">


<img src="https://theundefeated.com/wp-content/uploads/2017/05/nba-logo.png?w=50" style="float: left; margin: 20px; height: 55px">


---


## Contents:
- [Data Import](#Data-Import)
- [Feature Creation](#Feature-Creation)
- [Choosing the Features](#Feature-Choice)
- [Log Scaling](#Log-Scaling-Independent-Variables)
- [Cleaning the Data and Modifying the Data](#Cleaning-&-Creating-the-Data-Set)
- [Modeling the Data](#Modeling-the-Data)
- [Model Analysis](#Analyzing-the-model)

Please visit the Graphs & Relationships notebook for additional visuals: Notebook - [Here](/Users/pwalesdi/Desktop/GA/GA_Project_2/Project_2_Graphs_&_Relationships.ipynb)


In [3]:
import numpy as np
import pandas as pd
import os
pd.set_option('display.max_rows', 2000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


In [4]:
cd ..

/Users/pwalesdi/Desktop/GA/NBA_Player_Prediction


In [46]:
nba_2019 = pd.read_csv('./csv_files/2019_advanced.txt')
nba_2018 = pd.read_csv('./csv_files/2018_advanced.txt')
nba_2017 = pd.read_csv('./csv_files/2017_advanced.txt')
nba_2016 = pd.read_csv('./csv_files/2016_advanced.txt')
nba_2015 = pd.read_csv('./csv_files/2015_advanced.txt')
nba_2014 = pd.read_csv('./csv_files/2014_advanced.txt')
nba_2013 = pd.read_csv('./csv_files/2013_advanced.txt')
nba_2012 = pd.read_csv('./csv_files/2012_advanced.txt')
nba_2011 = pd.read_csv('./csv_files/2011_advanced.txt')
nba_2010 = pd.read_csv('./csv_files/2010_advanced.txt')
nba_2009 = pd.read_csv('./csv_files/2009_advanced.txt')
nba_2008 = pd.read_csv('./csv_files/2008_advanced.txt')
nba_2007 = pd.read_csv('./csv_files/2007_advanced.txt')
nba_2006 = pd.read_csv('./csv_files/2006_advanced.txt')
draft_2019 = pd.read_csv('./csv_files/draft/2019_draft.txt')
draft_2018 = pd.read_csv('./csv_files/draft/2018_draft.txt')
draft_2017 = pd.read_csv('./csv_files/draft/2017_draft.txt')
draft_2016 = pd.read_csv('./csv_files/draft/2016_draft.txt')
draft_2015 = pd.read_csv('./csv_files/draft/2015_draft.txt')
draft_2014 = pd.read_csv('./csv_files/draft/2014_draft.txt')
draft_2013 = pd.read_csv('./csv_files/draft/2013_draft.txt')
draft_2012 = pd.read_csv('./csv_files/draft/2012_draft.txt')
draft_2011 = pd.read_csv('./csv_files/draft/2011_draft.txt')
draft_2010 = pd.read_csv('./csv_files/draft/2010_draft.txt')
draft_2009 = pd.read_csv('./csv_files/draft/2009_draft.txt')
draft_2008 = pd.read_csv('./csv_files/draft/2008_draft.txt')
draft_2007 = pd.read_csv('./csv_files/draft/2007_draft.txt')
draft_2006 = pd.read_csv('./csv_files/draft/2006_draft.txt')

nba_2019['SEASON'] = 2019
nba_2018['SEASON'] = 2018
nba_2017['SEASON'] = 2017
nba_2016['SEASON'] = 2016
nba_2015['SEASON'] = 2015
nba_2014['SEASON'] = 2014
nba_2013['SEASON'] = 2013
nba_2012['SEASON'] = 2012
nba_2011['SEASON'] = 2011
nba_2010['SEASON'] = 2010
nba_2009['SEASON'] = 2009
nba_2008['SEASON'] = 2008
nba_2007['SEASON'] = 2007
nba_2006['SEASON'] = 2006

draft_2019['DRAFT_YEAR+1'] = 2020
draft_2018['DRAFT_YEAR+1'] = 2019
draft_2017['DRAFT_YEAR+1'] = 2018
draft_2016['DRAFT_YEAR+1'] = 2017
draft_2015['DRAFT_YEAR+1'] = 2016
draft_2014['DRAFT_YEAR+1'] = 2015
draft_2013['DRAFT_YEAR+1'] = 2014
draft_2012['DRAFT_YEAR+1'] = 2013
draft_2011['DRAFT_YEAR+1'] = 2012
draft_2010['DRAFT_YEAR+1'] = 2011
draft_2009['DRAFT_YEAR+1'] = 2010
draft_2008['DRAFT_YEAR+1'] = 2009
draft_2007['DRAFT_YEAR+1'] = 2008
draft_2006['DRAFT_YEAR+1'] = 2007
#Creating a list of every advanced metrics DF
advanced_list = [nba_2019, nba_2018, nba_2017, nba_2016, nba_2015, nba_2014, nba_2013, 
           nba_2012, nba_2011, nba_2010, nba_2009, nba_2008, nba_2007, nba_2006]
#Creating a list of every draft DF
draft_list = [draft_2019, draft_2018, draft_2017, draft_2016, draft_2015, draft_2014, draft_2013, 
           draft_2012, draft_2011, draft_2010, draft_2009, draft_2008, draft_2007, draft_2006]

In [47]:
#Creating a master advanced metrics DF and a master draft DF
advanced = pd.concat([nba_2019, nba_2018, nba_2017, nba_2016, nba_2015, nba_2014, nba_2013, nba_2012, nba_2011, nba_2010, nba_2009, nba_2008, nba_2007, nba_2006])
draft = pd.concat([draft_2019, draft_2018, draft_2017, draft_2016, draft_2015, draft_2014, draft_2013, draft_2012, draft_2011, draft_2010, draft_2009, draft_2008, draft_2007, draft_2006])

In [48]:
#Splitting the player name and unique player id column and labeling them, then dropping the Player column and the unnamed columns
advanced[['player_name','player_id']] = advanced.Player.str.split("\\", expand=True)
advanced.drop(columns=['Player', 'Unnamed: 19', 'Unnamed: 24', 'Rk'], inplace=True)
advanced.head(3)

Unnamed: 0,Pos,Age,Tm,G,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP,SEASON,player_name,player_id
0,SG,25,OKC,31,588,6.3,0.507,0.809,0.083,0.9,7.8,4.2,4.3,1.3,0.9,7.9,12.2,0.1,0.6,0.6,0.053,-2.4,-0.9,-3.4,-0.2,2019,Álex Abrines,abrinal01
1,PF,28,PHO,10,123,2.9,0.379,0.833,0.556,2.7,20.1,11.3,8.2,0.4,2.7,15.2,9.2,-0.1,0.0,-0.1,-0.022,-5.7,-0.3,-5.9,-0.1,2019,Quincy Acy,acyqu01
2,PG,22,ATL,34,428,7.6,0.474,0.673,0.082,2.6,12.3,7.4,19.8,1.5,1.0,19.7,13.5,-0.1,0.2,0.1,0.011,-3.1,-1.3,-4.4,-0.3,2019,Jaylen Adams,adamsja01


In [49]:
#Splitting the player name and unique player id column and labeling them, then dropping various columns that are redundent between DFs
draft[['player_name','player_id']] = draft.Player.str.split("\\", expand=True)
draft.drop(columns=['Player', 'MP', 'MP.1', 'WS', 'WS/48', 'VORP', 'BPM', 'G', 'Rk'], inplace=True)
draft.head(3)

Unnamed: 0,Pk,Tm,College,Yrs,PTS,TRB,AST,FG%,3P%,FT%,PTS.1,TRB.1,AST.1,DRAFT_YEAR+1,player_name,player_id
0,1,NOP,Duke University,,,,,,,,,,,2020,Zion Williamson,willizi01
1,2,MEM,Murray State University,,,,,,,,,,,2020,Ja Morant,moranja01
2,3,NYK,Duke University,,,,,,,,,,,2020,RJ Barrett,barrerj01


In [50]:
# Merge the DFs and then rename some of the columns and reorder all the columns
nba = pd.merge(advanced, draft, how='left', on='player_id')
nba.drop(columns=['player_name_y',], inplace=True)
nba.rename({'player_name_x':'Player_name', 'Tm_y':'Draft_team', 'PTS.1':'PPG', 'TRB.1':'RPG', 'AST.1':'APG'}, axis=1, inplace=True)
nba = nba[['Player_name', 'player_id','SEASON', 'Tm_x','DRAFT_YEAR+1','Draft_team','Pk','Pos','Age','G', 'MP', 'PER', 'TS%', '3PAr',
           'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'USG%', 'OWS', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM',
           'BPM', 'VORP', 'College', 'Yrs', 'PTS', 'TRB', 'AST', 'FG%', '3P%', 'FT%', 'PPG', 'RPG', 'APG',]]
nba.loc[(nba['Tm_x'] == 'NOH'), 'Tm_x'] = 'NOP'
nba.loc[(nba['Tm_x'] == 'SEA'), 'Tm_x'] = 'OKC'
nba.loc[(nba['Tm_x'] == 'NJN'), 'Tm_x'] = 'BRK'
nba.loc[(nba['Tm_x'] == 'CHA'), 'Tm_x'] = 'CHO'
nba.loc[(nba['Tm_x'] == 'NOK'), 'Tm_x'] = 'NOP'
nba = nba[(nba.Tm_x != 'TOT')]
nba["draft_round"] = np.nan
nba.loc[nba['Pk'] > 30 , 'draft_round'] = 2
nba.loc[nba['Pk'] < 31 , 'draft_round'] = 1
%store nba

Stored 'nba' (DataFrame)


In [51]:
years = pd.DataFrame()
for player in nba['Player_name'].unique():
    player_df = nba.loc[nba['Player_name'] == player]
    max_year = max(player_df.SEASON)
    min_year = min(player_df.SEASON)
    for year in range(min_year, max_year + 1):
        stats1 = player_df.loc[player_df['SEASON'] == year]
        stats1.squeeze()
        vorp1 = stats1['VORP']
        stats2 = player_df.loc[player_df['SEASON'] == (year + 2)]
        stats2.squeeze()
        vorp2 = stats2['VORP']
        if vorp2.sum() > (vorp1.sum() + 2):
            years = years.append(stats2)
improvement = pd.DataFrame()
for player in years['Player_name'].unique():
    first_season = min(years.loc[years['Player_name'] == player]["SEASON"])
    first_season_stats = years.loc[(years['SEASON'] == first_season) & (years["Player_name"] == player)]
    improvement = improvement.append(first_season_stats)

In [52]:
years = pd.DataFrame()
for player in nba['Player_name'].unique():
    player_df = nba.loc[nba['Player_name'] == player]
    max_year = max(player_df.SEASON)
    min_year = min(player_df.SEASON)
    for year in range(min_year, max_year + 1):
        stats1 = player_df.loc[player_df['SEASON'] == year]
        stats1.squeeze()
        vorp1 = stats1['PER']
        stats2 = player_df.loc[player_df['SEASON'] == (year + 2)]
        stats2.squeeze()
        vorp2 = stats2['PER']
        if vorp2.sum() > (vorp1.sum() + 4):
            years = years.append(stats2)
per_improvement = pd.DataFrame()
for player in years['Player_name'].unique():
    first_season = min(years.loc[years['Player_name'] == player]["SEASON"])
    first_season_stats = years.loc[(years['SEASON'] == first_season) & (years["Player_name"] == player)]
    per_improvement = per_improvement.append(first_season_stats)

In [53]:
per_improvement = per_improvement.loc[(per_improvement["MP"] > 900) | (per_improvement["G"] > 55)]
per_improvement = per_improvement[per_improvement.PER > 11]

In [54]:
per_improvement = per_improvement.loc[per_improvement["DRAFT_YEAR+1"].notnull()]
print(per_improvement.shape)
per_improvement.sort_values(by='SEASON', ascending=True)

(166, 43)


Unnamed: 0,Player_name,player_id,SEASON,Tm_x,DRAFT_YEAR+1,Draft_team,Pk,Pos,Age,G,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP,College,Yrs,PTS,TRB,AST,FG%,3P%,FT%,PPG,RPG,APG,draft_round
5484,Brandon Roy,roybr01,2009,POR,2007.0,MIN,6.0,SG,24,78,2903,24.0,0.573,0.167,0.383,4.4,11.6,7.9,25.4,1.7,0.6,9.0,27.4,10.9,2.6,13.5,0.223,5.9,-0.2,5.8,5.7,University of Washington,6.0,6136.0,1388.0,1517.0,0.459,0.348,0.8,18.8,4.3,4.7,1.0
5419,Steve Novak,novakst01,2009,LAC,2007.0,HOU,32.0,PF,25,71,1161,13.5,0.606,0.722,0.058,1.8,10.9,6.2,5.8,0.9,0.3,5.4,16.8,2.0,0.2,2.2,0.092,1.6,-3.5,-1.9,0.0,Marquette University,11.0,2177.0,591.0,132.0,0.437,0.43,0.877,4.7,1.3,0.3,2.0
5478,Rajon Rondo,rondora01,2009,BOS,2007.0,PHO,21.0,PG,22,80,2642,18.8,0.543,0.063,0.353,4.8,13.9,9.6,39.7,3.0,0.3,19.2,19.2,4.8,5.1,9.9,0.179,2.0,2.5,4.5,4.3,University of Kentucky,13.0,8567.0,3989.0,6975.0,0.46,0.315,0.605,10.4,4.8,8.5,1.0
5346,Kyle Lowry,lowryky01,2009,MEM,2007.0,MEM,24.0,PG,22,49,1071,14.1,0.54,0.254,0.592,1.5,11.4,6.4,26.9,2.4,0.6,18.7,18.4,0.9,0.9,1.8,0.081,-0.7,-0.8,-1.5,0.1,Villanova University,13.0,12355.0,3654.0,5224.0,0.424,0.367,0.805,14.4,4.3,6.1,1.0
4611,Andrea Bargnani,bargnan01,2010,TOR,2007.0,TOR,1.0,PF,24,80,2799,15.5,0.552,0.284,0.205,4.6,15.9,10.4,5.4,0.5,3.0,8.8,22.3,3.3,0.9,4.2,0.072,0.7,-1.5,-0.8,0.8,,10.0,7873.0,2541.0,653.0,0.439,0.354,0.824,14.3,4.6,1.2,1.0
4621,Marco Belinelli,belinma01,2010,TOR,2008.0,GSW,18.0,SG,23,66,1121,12.6,0.543,0.417,0.319,1.5,8.3,5.0,11.8,1.9,0.3,12.2,20.1,0.9,0.2,1.2,0.049,0.4,-2.6,-2.2,-0.1,,12.0,8009.0,1670.0,1360.0,0.425,0.376,0.847,10.0,2.1,1.7,1.0
4781,Al Horford,horfoal01,2010,ATL,2008.0,ATL,3.0,C,23,81,2845,19.4,0.594,0.001,0.319,9.6,23.3,16.4,10.4,1.1,2.4,11.2,17.6,6.9,3.9,10.9,0.183,1.5,1.9,3.3,3.8,University of Florida,12.0,11092.0,6597.0,2548.0,0.525,0.368,0.754,14.1,8.4,3.2,1.0
4707,Kevin Durant,duranke01,2010,OKC,2008.0,SEA,2.0,SF,21,82,3239,26.2,0.607,0.21,0.504,3.8,17.9,11.0,13.5,1.8,1.9,11.7,32.0,11.1,5.0,16.1,0.238,4.9,0.2,5.1,5.8,University of Texas at Austin,12.0,22940.0,5992.0,3486.0,0.493,0.381,0.883,27.0,7.1,4.1,1.0
4396,Kevin Love,loveke01,2011,MIN,2009.0,MEM,5.0,PF,22,73,2611,24.3,0.593,0.206,0.486,13.7,34.2,23.6,11.8,0.9,0.8,11.1,22.9,8.9,2.5,11.4,0.21,3.9,-0.2,3.7,3.8,University of California Los Angeles,11.0,12006.0,7397.0,1519.0,0.442,0.37,0.827,18.3,11.3,2.3,1.0
4572,Shawne Williams,willish03,2011,NYK,2007.0,IND,17.0,PF,24,64,1323,12.2,0.558,0.551,0.127,5.1,15.6,10.3,5.3,1.5,2.8,10.2,15.2,1.4,1.1,2.5,0.089,0.0,-0.4,-0.5,0.5,University of Memphis,7.0,1769.0,945.0,220.0,0.403,0.339,0.755,5.6,3.0,0.7,1.0


In [55]:
%store per_improvement
%store improvement

Stored 'per_improvement' (DataFrame)
Stored 'improvement' (DataFrame)


In [15]:
improvement.sort_values(by='DRAFT_YEAR+1', ascending=True)

Unnamed: 0,Player_name,player_id,SEASON,Tm_x,DRAFT_YEAR+1,Draft_team,Pk,Pos,Age,G,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP,College,Yrs,PTS,TRB,AST,FG%,3P%,FT%,PPG,RPG,APG,draft_round
5033,LaMarcus Aldridge,aldrila01,2009,POR,2007.0,CHI,2.0,PF,23,81,3004,19.1,0.529,0.023,0.269,9.5,15.6,12.5,9.3,1.4,2.0,8.0,23.7,6.1,3.3,9.5,0.151,1.2,0.7,1.9,3.0,University of Texas at Austin,13.0,18598.0,7968.0,1856.0,0.491,0.283,0.81,19.6,8.4,2.0,1.0
3136,P.J. Tucker,tuckepj01,2014,PHO,2007.0,TOR,35.0,SF,28,81,2490,13.3,0.54,0.311,0.327,7.3,16.6,11.9,8.1,2.2,0.7,12.7,14.2,3.3,2.8,6.1,0.117,0.7,1.6,2.3,2.7,University of Texas at Austin,8.0,4304.0,3323.0,824.0,0.422,0.361,0.744,7.4,5.7,1.4,2.0
5478,Rajon Rondo,rondora01,2009,BOS,2007.0,PHO,21.0,PG,22,80,2642,18.8,0.543,0.063,0.353,4.8,13.9,9.6,39.7,3.0,0.3,19.2,19.2,4.8,5.1,9.9,0.179,2.0,2.5,4.5,4.3,University of Kentucky,13.0,8567.0,3989.0,6975.0,0.46,0.315,0.605,10.4,4.8,8.5,1.0
4397,Kyle Lowry,lowryky01,2011,HOU,2007.0,MEM,24.0,PG,24,75,2563,16.5,0.55,0.424,0.31,3.9,9.7,6.8,29.4,2.0,0.6,14.7,18.6,5.1,1.8,7.0,0.13,3.1,-0.4,2.7,3.0,Villanova University,13.0,12355.0,3654.0,5224.0,0.424,0.367,0.805,14.4,4.3,6.1,1.0
5484,Brandon Roy,roybr01,2009,POR,2007.0,MIN,6.0,SG,24,78,2903,24.0,0.573,0.167,0.383,4.4,11.6,7.9,25.4,1.7,0.6,9.0,27.4,10.9,2.6,13.5,0.223,5.9,-0.2,5.8,5.7,University of Washington,6.0,6136.0,1388.0,1517.0,0.459,0.348,0.8,18.8,4.3,4.7,1.0
4291,Rudy Gay,gayru01,2011,MEM,2007.0,HOU,8.0,SF,24,54,2152,17.8,0.548,0.166,0.277,4.5,14.2,9.3,11.6,2.2,2.2,12.2,23.3,2.7,2.9,5.5,0.123,1.4,1.4,2.7,2.6,University of Connecticut,13.0,15461.0,5213.0,1965.0,0.456,0.348,0.796,17.6,5.9,2.2,1.0
5100,Ronnie Brewer,brewero02,2009,UTA,2007.0,UTA,14.0,SF,23,81,2605,16.1,0.565,0.103,0.425,4.6,9.1,6.9,10.9,2.7,0.9,10.1,18.8,3.9,2.9,6.8,0.126,1.0,1.0,2.0,2.7,University of Arkansas,8.0,3940.0,1427.0,828.0,0.49,0.254,0.675,7.8,2.8,1.6,1.0
3033,Joakim Noah,noahjo01,2014,CHI,2008.0,CHI,9.0,C,28,80,2820,20.0,0.531,0.003,0.419,11.6,24.5,18.2,26.4,1.9,3.3,17.0,18.7,4.5,6.6,11.2,0.19,1.1,5.5,6.6,6.1,University of Florida,12.0,5867.0,6042.0,1900.0,0.491,0.0,0.7,8.8,9.1,2.8,1.0
3325,Marc Gasol,gasolma01,2013,MEM,2008.0,LAL,48.0,C,28,80,2796,19.5,0.559,0.016,0.364,7.6,18.9,13.1,19.1,1.6,4.1,13.5,19.2,6.1,5.4,11.5,0.197,1.7,4.4,6.1,5.7,,11.0,11921.0,6114.0,2740.0,0.483,0.35,0.777,15.0,7.7,3.4,2.0
4707,Kevin Durant,duranke01,2010,OKC,2008.0,SEA,2.0,SF,21,82,3239,26.2,0.607,0.21,0.504,3.8,17.9,11.0,13.5,1.8,1.9,11.7,32.0,11.1,5.0,16.1,0.238,4.9,0.2,5.1,5.8,University of Texas at Austin,12.0,22940.0,5992.0,3486.0,0.493,0.381,0.883,27.0,7.1,4.1,1.0
