In this assignment I chose to perform my supervised learning to predict the NBA All Stars for the 2019-2020 season. All data was pulled from `https://www.basketball-reference.com/`. I used the season data from the 2016-2017, 2017-2018, and 2018-2019 seasons along with the chosen all star team members from those respective seasons. 

I chose 2 types of supervised classifier models (knn and random forest) to perform the classifications. Both performed with high accuracy, but had a hard time classifying the all stars with a high accuracy, given that the dataset was so imbalanced (far more non all stars than all stars).

In the test sets, the notable differences between actual and predicted all stars included:

Actual but not Predicted (model says they shouldn't have been all stars)
- Giannis Antetokounmpo, 2016
- Kyle Lowry, 2018
- Klay Thompson, 2016
- LaMarcus Aldridge, 2017
- Dirk Nowitzki, 2018
- Khris Middleton, 2018
- DeAndre Jordan, 2016

Predicted but not Actual (model says they should've been all stars, but were not)
- Marc Gasol, 2017
- CJ McCollum, 2016

Also, using the dataset from the 2019-2020 season so far, the projected all stars include
- Pascal Siakam
- LeBron James
- Kawhi Leonard
- Bradley Beal
- Buddy Hield
- Paul George
- Karl-Anthony Towns
- Donovan Mitchell
- Brandon Ingram
- Kyrie Irving
- Andrew Wiggins
- Anthony Davis
- Damian Lillard
- CJ McCollum
- Giannis Antetokounmpo
- James Harden
- DeMar DeRozan

In [172]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
import sklearn.metrics
from matplotlib import pyplot as plt

pd.set_option('display.max_columns', 999)

In [173]:
as2016 = pd.read_csv('AS_2016.csv')
as2016['year'] = 2016
as2016['name'] = as2016['Starters'].apply(lambda x: x.split("\\")[0])
as_2016 = as2016['name'].tolist()

d2016 = pd.read_csv('2016.csv')
d2016['year'] = 2016
d2016['name'] = d2016['Player'].apply(lambda x: x.split("\\")[0])
d2016['allstar'] = d2016['name'].apply(lambda x: 1 if x in as_2016 else 0)
mt2016 = d2016[d2016['Tm']=='TOT']['name'].tolist()
d2016.drop(d2016[(d2016['name'].isin(mt2016)) & (d2016['Tm']!='TOT')].index , inplace=True)

as2017 = pd.read_csv('AS_2017.csv')
as2017['year'] = 2017
as2017['name'] = as2017['Starters'].apply(lambda x: x.split("\\")[0])
as_2017 = as2017['name'].tolist()

d2017 = pd.read_csv('2017.csv')
d2017['year'] = 2017
d2017['name'] = d2017['Player'].apply(lambda x: x.split("\\")[0])
d2017['allstar'] = d2017['name'].apply(lambda x: 1 if x in as_2017 else 0)
mt2017 = d2017[d2017['Tm']=='TOT']['name'].tolist()
d2017.drop(d2017[(d2017['name'].isin(mt2017)) & (d2017['Tm']!='TOT')].index , inplace=True)

as2018 = pd.read_csv('AS_2018.csv')
as2018['year'] = 2018
as2018['name'] = as2018['Starters'].apply(lambda x: x.split("\\")[0])
as_2018 = as2018['name'].tolist()

d2018 = pd.read_csv('2018.csv')
d2018['year'] = 2018
d2018['name'] = d2018['Player'].apply(lambda x: x.split("\\")[0])
d2018['allstar'] = d2018['name'].apply(lambda x: 1 if x in as_2018 else 0)
mt2018 = d2018[d2018['Tm']=='TOT']['name'].tolist()
d2018.drop(d2018[(d2018['name'].isin(mt2018)) & (d2018['Tm']!='TOT')].index , inplace=True)

d2019 = pd.read_csv('2019.csv')
d2019['year'] = 2019
d2019['name'] = d2019['Player'].apply(lambda x: x.split("\\")[0])
d2019.fillna(0, inplace=True)
d2019['GS%'] = d2019['GS']/d2019['G']

df = pd.concat([d2016, d2017, d2018])
df['GS%'] = df['GS']/df['G']
df.fillna(0, inplace=True)

#print(mt2016) 
df.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,year,name,allstar,GS%
0,1,Álex Abrines\abrinal01,SG,23,OKC,68,6,15.5,2.0,5.0,0.393,1.4,3.6,0.381,0.6,1.4,0.426,0.531,0.6,0.7,0.898,0.3,1.0,1.3,0.6,0.5,0.1,0.5,1.7,6.0,2016,Álex Abrines,0,0.088235
1,2,Quincy Acy\acyqu01,PF,26,TOT,38,1,14.7,1.8,4.5,0.412,1.0,2.4,0.411,0.9,2.1,0.413,0.521,1.2,1.6,0.75,0.5,2.5,3.0,0.5,0.4,0.4,0.6,1.8,5.8,2016,Quincy Acy,0,0.026316
2,3,Steven Adams\adamsst01,C,23,OKC,80,80,29.9,4.7,8.2,0.571,0.0,0.0,0.0,4.7,8.2,0.572,0.571,2.0,3.2,0.611,3.5,4.2,7.7,1.1,1.1,1.0,1.8,2.4,11.3,2016,Steven Adams,0,1.0
3,4,Arron Afflalo\afflaar01,SG,31,SAC,61,45,25.9,3.0,6.9,0.44,1.0,2.5,0.411,2.0,4.4,0.457,0.514,1.4,1.5,0.892,0.1,1.9,2.0,1.3,0.3,0.1,0.7,1.7,8.4,2016,Arron Afflalo,0,0.737705
4,5,Alexis Ajinça\ajincal01,C,28,NOP,39,15,15.0,2.3,4.6,0.5,0.0,0.1,0.0,2.3,4.5,0.511,0.5,0.7,1.0,0.725,1.2,3.4,4.5,0.3,0.5,0.6,0.8,2.0,5.3,2016,Alexis Ajinça,0,0.384615


In [174]:
X = df.drop(columns=['Rk','Player','Pos','Tm','G','GS','name','year','allstar'])
y = df['allstar']

X_2019 = d2019.drop(columns=['Rk','Player','Pos','Tm','G','GS','name','year'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)

In [175]:
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train,y_train)
y_predict = knn.predict(X_test)
y_2019 = knn.predict(X_2019)
knn.score(X_test, y_test)

0.9711538461538461

In [179]:
knn_results = X_test.merge(df)
knn_results.rename({'allstar':'actual'}, axis=1, inplace=True)
knn_results['predicted'] = y_predict
knn_results.sort_values(by='actual', ascending=False, inplace=True)
knn_results[(knn_results['actual']==1) |  (knn_results['predicted']==1)]

Unnamed: 0,Age,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS%,Rk,Player,Pos,Tm,G,GS,year,name,actual,predicted
158,22,35.6,8.2,15.7,0.521,0.6,2.3,0.272,7.6,13.5,0.563,0.541,5.9,7.7,0.77,1.8,7.0,8.8,5.4,1.6,1.9,2.9,3.1,22.9,1.0,16,Giannis Antetokounmpo\antetgi01,SF,MIL,80,80,2016,Giannis Antetokounmpo,1,0
264,24,33.7,9.1,18.7,0.484,1.2,4.1,0.3,7.8,14.6,0.535,0.517,8.2,10.1,0.804,2.5,11.1,13.6,3.7,0.7,1.9,3.5,3.3,27.5,1.0,155,Joel Embiid\embiijo01,C,PHI,64,64,2018,Joel Embiid,1,1
284,32,34.0,4.7,11.4,0.411,2.4,7.0,0.347,2.3,4.4,0.514,0.518,2.5,3.0,0.83,0.6,4.2,4.8,8.7,1.4,0.5,2.8,2.6,14.2,1.0,313,Kyle Lowry\lowryky01,PG,TOR,65,65,2018,Kyle Lowry,1,0
185,30,33.8,9.2,19.4,0.472,5.1,11.7,0.437,4.0,7.7,0.525,0.604,3.8,4.2,0.916,0.7,4.7,5.3,5.2,1.3,0.4,2.8,2.4,27.3,1.0,124,Stephen Curry\curryst01,PG,GSW,69,69,2018,Stephen Curry,1,1
292,29,36.8,10.8,24.5,0.442,4.8,13.2,0.368,6.0,11.3,0.528,0.541,9.7,11.0,0.879,0.8,5.8,6.6,7.5,2.0,0.7,5.0,3.1,36.1,1.0,206,James Harden\hardeja01,PG,HOU,78,78,2018,James Harden,1,1
215,24,32.8,10.0,17.3,0.578,0.7,2.8,0.256,9.3,14.5,0.641,0.599,6.9,9.5,0.729,2.2,10.3,12.5,5.9,1.3,1.5,3.7,3.2,27.7,1.0,18,Giannis Antetokounmpo\antetgi01,PF,MIL,72,72,2018,Giannis Antetokounmpo,1,1
188,28,34.6,10.2,24.0,0.425,2.5,7.2,0.343,7.7,16.8,0.459,0.476,8.8,10.4,0.845,1.7,9.0,10.7,10.4,1.6,0.4,5.4,2.3,31.6,1.0,458,Russell Westbrook\westbru01,PG,OKC,81,81,2016,Russell Westbrook,1,1
93,25,33.4,8.6,17.7,0.485,2.0,5.2,0.38,6.6,12.5,0.529,0.541,6.3,7.2,0.88,1.1,4.7,5.8,3.5,1.8,0.7,2.1,1.6,25.5,1.0,261,Kawhi Leonard\leonaka01,SF,SAS,74,74,2016,Kawhi Leonard,1,1
95,27,35.4,9.7,20.9,0.467,0.4,1.7,0.266,9.3,19.2,0.484,0.477,7.4,8.7,0.842,0.9,4.3,5.2,3.9,1.1,0.2,2.4,1.8,27.3,1.0,108,DeMar DeRozan\derozde01,SG,TOR,74,74,2016,DeMar DeRozan,1,1
224,28,36.9,9.2,21.0,0.438,3.8,9.8,0.386,5.4,11.1,0.484,0.529,5.9,7.0,0.839,1.4,6.8,8.2,4.1,2.2,0.4,2.7,2.8,28.0,1.0,183,Paul George\georgpa01,SF,OKC,77,77,2018,Paul George,1,1


In [192]:
pred_2019 = X_2019.merge(d2019)
pred_2019['predicted'] = y_2019
pred_2019.sort_values(by='predicted', ascending=False, inplace=True)
pred_2019[pred_2019['predicted']==1]

Unnamed: 0,Age,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS%,Rk,Player,Pos,Tm,G,GS,year,name,predicted
122,20,32.2,9.4,19.6,0.481,3.0,9.3,0.326,6.4,10.3,0.62,0.558,7.4,9.2,0.803,1.3,8.3,9.6,8.9,1.2,0.1,4.4,2.3,29.3,1.0,123,Luka Dončić\doncilu01,PG,DAL,25,25,2019,Luka Dončić,1
10,25,31.2,11.7,20.8,0.564,1.6,5.1,0.321,10.1,15.7,0.642,0.603,6.6,11.0,0.598,2.7,10.1,12.8,5.3,1.3,1.2,3.7,3.1,31.7,1.0,11,Giannis Antetokounmpo\antetgi01,PF,MIL,27,27,2019,Giannis Antetokounmpo,1
219,27,33.8,10.2,22.9,0.444,2.8,8.3,0.341,7.4,14.6,0.503,0.506,5.4,5.7,0.937,0.9,4.5,5.4,7.2,1.1,0.5,2.4,2.9,28.5,1.0,220,Kyrie Irving\irvinky01,PG,BRK,11,11,2019,Kyrie Irving,1
446,31,34.8,8.5,20.0,0.426,1.2,5.1,0.228,7.3,14.8,0.494,0.455,4.7,6.1,0.767,1.6,6.5,8.1,7.2,1.6,0.4,4.4,3.8,22.8,1.0,447,Russell Westbrook\westbru01,PG,HOU,24,24,2019,Russell Westbrook,1
271,29,36.7,8.2,18.6,0.443,3.2,9.0,0.359,5.0,9.6,0.522,0.53,6.8,7.5,0.903,0.5,3.8,4.3,7.6,1.0,0.4,2.9,2.0,26.5,1.0,272,Damian Lillard\lillada01,PG,POR,26,26,2019,Damian Lillard,1
157,29,30.2,8.1,18.0,0.448,3.9,9.8,0.403,4.1,8.2,0.5,0.557,4.7,5.1,0.913,0.4,5.4,5.9,3.7,1.6,0.4,3.6,2.6,24.7,1.0,158,Paul George\georgpa01,SF,LAC,18,18,2019,Paul George,1
262,24,33.1,7.9,18.3,0.432,3.0,7.6,0.395,4.9,10.7,0.458,0.514,4.4,5.3,0.831,0.8,3.7,4.4,3.9,1.3,0.4,3.3,2.4,23.2,1.0,263,Zach LaVine\lavinza01,SG,CHI,30,30,2019,Zach LaVine,1
111,26,34.7,9.6,19.3,0.499,1.2,3.6,0.319,8.5,15.7,0.541,0.529,7.0,8.2,0.858,2.4,6.9,9.3,3.3,1.5,2.6,2.3,2.4,27.4,1.0,112,Anthony Davis\davisan02,PF,LAL,26,26,2019,Anthony Davis,1
227,35,34.7,10.0,20.1,0.498,2.2,6.2,0.353,7.9,14.0,0.563,0.552,3.7,5.5,0.673,1.0,6.3,7.4,10.6,1.3,0.6,4.0,1.8,25.9,1.0,228,LeBron James\jamesle01,PG,LAL,28,28,2019,LeBron James,1
422,24,33.9,9.0,17.5,0.514,3.6,8.5,0.418,5.4,9.0,0.604,0.615,4.9,6.2,0.796,2.7,9.0,11.7,4.4,1.0,1.3,3.1,3.5,26.5,1.0,423,Karl-Anthony Towns\townska01,C,MIN,23,23,2019,Karl-Anthony Towns,1


In [211]:
model = RandomForestClassifier(class_weight="balanced")
model.fit(X_train, y_train)
y_predict_rf = model.predict(X_test)
y_2019_rf = model.predict(X_2019)
metrics.accuracy_score(y_test.values, y_predict_rf)



0.9711538461538461

In [212]:
rf_results = X_test.merge(df)
rf_results.rename({'allstar':'actual'}, axis=1, inplace=True)
rf_results['predicted'] = y_predict
rf_results.sort_values(by='actual', ascending=False, inplace=True)
rf_results[(rf_results['actual']==1) |  (rf_results['predicted']==1)]

Unnamed: 0,Age,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS%,Rk,Player,Pos,Tm,G,GS,year,name,actual,predicted
158,22,35.6,8.2,15.7,0.521,0.6,2.3,0.272,7.6,13.5,0.563,0.541,5.9,7.7,0.77,1.8,7.0,8.8,5.4,1.6,1.9,2.9,3.1,22.9,1.0,16,Giannis Antetokounmpo\antetgi01,SF,MIL,80,80,2016,Giannis Antetokounmpo,1,0
264,24,33.7,9.1,18.7,0.484,1.2,4.1,0.3,7.8,14.6,0.535,0.517,8.2,10.1,0.804,2.5,11.1,13.6,3.7,0.7,1.9,3.5,3.3,27.5,1.0,155,Joel Embiid\embiijo01,C,PHI,64,64,2018,Joel Embiid,1,1
284,32,34.0,4.7,11.4,0.411,2.4,7.0,0.347,2.3,4.4,0.514,0.518,2.5,3.0,0.83,0.6,4.2,4.8,8.7,1.4,0.5,2.8,2.6,14.2,1.0,313,Kyle Lowry\lowryky01,PG,TOR,65,65,2018,Kyle Lowry,1,0
185,30,33.8,9.2,19.4,0.472,5.1,11.7,0.437,4.0,7.7,0.525,0.604,3.8,4.2,0.916,0.7,4.7,5.3,5.2,1.3,0.4,2.8,2.4,27.3,1.0,124,Stephen Curry\curryst01,PG,GSW,69,69,2018,Stephen Curry,1,1
292,29,36.8,10.8,24.5,0.442,4.8,13.2,0.368,6.0,11.3,0.528,0.541,9.7,11.0,0.879,0.8,5.8,6.6,7.5,2.0,0.7,5.0,3.1,36.1,1.0,206,James Harden\hardeja01,PG,HOU,78,78,2018,James Harden,1,1
215,24,32.8,10.0,17.3,0.578,0.7,2.8,0.256,9.3,14.5,0.641,0.599,6.9,9.5,0.729,2.2,10.3,12.5,5.9,1.3,1.5,3.7,3.2,27.7,1.0,18,Giannis Antetokounmpo\antetgi01,PF,MIL,72,72,2018,Giannis Antetokounmpo,1,1
188,28,34.6,10.2,24.0,0.425,2.5,7.2,0.343,7.7,16.8,0.459,0.476,8.8,10.4,0.845,1.7,9.0,10.7,10.4,1.6,0.4,5.4,2.3,31.6,1.0,458,Russell Westbrook\westbru01,PG,OKC,81,81,2016,Russell Westbrook,1,1
93,25,33.4,8.6,17.7,0.485,2.0,5.2,0.38,6.6,12.5,0.529,0.541,6.3,7.2,0.88,1.1,4.7,5.8,3.5,1.8,0.7,2.1,1.6,25.5,1.0,261,Kawhi Leonard\leonaka01,SF,SAS,74,74,2016,Kawhi Leonard,1,1
95,27,35.4,9.7,20.9,0.467,0.4,1.7,0.266,9.3,19.2,0.484,0.477,7.4,8.7,0.842,0.9,4.3,5.2,3.9,1.1,0.2,2.4,1.8,27.3,1.0,108,DeMar DeRozan\derozde01,SG,TOR,74,74,2016,DeMar DeRozan,1,1
224,28,36.9,9.2,21.0,0.438,3.8,9.8,0.386,5.4,11.1,0.484,0.529,5.9,7.0,0.839,1.4,6.8,8.2,4.1,2.2,0.4,2.7,2.8,28.0,1.0,183,Paul George\georgpa01,SF,OKC,77,77,2018,Paul George,1,1


In [214]:
pred_2019_rf = X_2019.merge(d2019)
pred_2019_rf['predicted'] = y_2019_rf
pred_2019_rf.sort_values(by='predicted', ascending=False, inplace=True)
pred_2019_rf[pred_2019_rf['predicted']==1]

Unnamed: 0,Age,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS%,Rk,Player,Pos,Tm,G,GS,year,name,predicted
400,25,36.6,9.4,20.5,0.457,2.5,6.3,0.392,6.9,14.2,0.486,0.517,3.9,4.7,0.813,1.4,6.6,8.0,3.6,1.0,1.0,2.7,2.9,25.1,1.0,401,Pascal Siakam\siakapa01,PF,TOR,27,27,2019,Pascal Siakam,1
227,35,34.7,10.0,20.1,0.498,2.2,6.2,0.353,7.9,14.0,0.563,0.552,3.7,5.5,0.673,1.0,6.3,7.4,10.6,1.3,0.6,4.0,1.8,25.9,1.0,228,LeBron James\jamesle01,PG,LAL,28,28,2019,LeBron James,1
268,28,31.3,8.9,19.6,0.451,1.7,4.8,0.356,7.1,14.8,0.482,0.495,6.1,7.0,0.871,1.0,6.8,7.8,5.0,1.8,0.7,3.5,2.0,25.5,1.0,269,Kawhi Leonard\leonaka01,SF,LAC,21,21,2019,Kawhi Leonard,1
32,26,37.0,9.3,21.3,0.438,2.7,8.0,0.333,6.7,13.3,0.5,0.5,6.3,7.6,0.832,1.2,3.6,4.8,7.0,1.0,0.3,3.6,2.7,27.6,1.0,33,Bradley Beal\bealbr01,SG,WAS,26,26,2019,Bradley Beal,1
198,27,34.2,8.0,18.3,0.435,3.9,10.3,0.38,4.0,8.0,0.507,0.543,1.7,2.0,0.855,1.0,4.0,5.0,2.9,0.9,0.3,2.5,2.5,21.6,1.0,199,Buddy Hield\hieldbu01,SG,SAC,27,27,2019,Buddy Hield,1
157,29,30.2,8.1,18.0,0.448,3.9,9.8,0.403,4.1,8.2,0.5,0.557,4.7,5.1,0.913,0.4,5.4,5.9,3.7,1.6,0.4,3.6,2.6,24.7,1.0,158,Paul George\georgpa01,SF,LAC,18,18,2019,Paul George,1
422,24,33.9,9.0,17.5,0.514,3.6,8.5,0.418,5.4,9.0,0.604,0.615,4.9,6.2,0.796,2.7,9.0,11.7,4.4,1.0,1.3,3.1,3.5,26.5,1.0,423,Karl-Anthony Towns\townska01,C,MIN,23,23,2019,Karl-Anthony Towns,1
313,23,34.7,9.4,20.7,0.453,2.3,6.3,0.359,7.1,14.4,0.495,0.508,4.2,5.0,0.837,0.9,3.8,4.7,3.7,1.1,0.3,2.3,2.3,25.2,1.0,314,Donovan Mitchell\mitchdo01,SG,UTA,27,27,2019,Donovan Mitchell,1
218,22,33.7,9.0,18.5,0.489,2.3,5.8,0.396,6.8,12.7,0.531,0.551,4.9,5.8,0.842,0.9,6.2,7.1,3.7,0.8,0.8,2.9,3.2,25.3,1.0,219,Brandon Ingram\ingrabr01,PF,NOP,25,25,2019,Brandon Ingram,1
219,27,33.8,10.2,22.9,0.444,2.8,8.3,0.341,7.4,14.6,0.503,0.506,5.4,5.7,0.937,0.9,4.5,5.4,7.2,1.1,0.5,2.4,2.9,28.5,1.0,220,Kyrie Irving\irvinky01,PG,BRK,11,11,2019,Kyrie Irving,1
