# 2022 NBA Player Performance Report

This report is an evaluation of player performance thoroughout the 2022 season in comparison to their salary. The goal of this report is to identify players who are overpaid and underpaid based on their performance. The report will also identify players who are overpaid and underpaid based on their performance in comparison to their peers. This information is useful to us because it allows us to get a broad look at how the players are performing and whether there are any underdogs or overcompensated players whose performance is not up to par with their salary.

## Data Processing

Our import libraries:

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import MinMaxScaler, StandardScaler

## First Step: Data Cleanup

The first thing we want to do to build our model is pull all of our needed information together. We have player stats as well as information about their salary that we want to incorporate. Once we have combined our data, we can then choose which features we want to use in our model and which ones we don't. We will also need to clean up our data to make sure that it is in the correct format for our model. We first dropped rows that had missing data that we couldn't work with. This data won't be helpful for our model and we don't need it. We secondly dropped columns that repeated information already available in our table. This will make the model better and faster as we won't be repeating inputs. 

In [3]:
NBA_Perf_22 = pd.read_csv("data/NBA_Perf_22.csv", encoding='latin-1')
NBA_salaries_22 = pd.read_csv("data/nba_salaries_22.csv")
NBA_22 = pd.merge(NBA_Perf_22, NBA_salaries_22, on='Player', how='inner')
NBA_22 = NBA_22.dropna()
drop_cols = [1, 3, 7, 8, 10, 11, 13, 14, 17, 18]
eNBA_22 = NBA_22.drop(NBA_22.columns[drop_cols], axis=1)
salary_array = eNBA_22.Salary.str.replace('[\$,]', '', regex=True).fillna(0).astype(int)
player_array = eNBA_22.Player
y = eNBA_22.drop(['Salary', "Player"], axis=1)
y

Unnamed: 0,Age,G,GS,MP,FG%,3P%,2P%,eFG%,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,22,73,28,23.6,0.439,0.359,0.468,0.486,0.595,2.0,4.5,6.5,1.1,0.5,0.6,1.2,2.1,9.1
1,28,76,75,26.3,0.547,0.000,0.548,0.547,0.543,4.6,5.4,10.0,3.4,0.9,0.8,1.5,2.0,6.9
2,24,56,56,32.6,0.557,0.000,0.562,0.557,0.753,2.4,7.6,10.1,3.4,1.4,0.8,2.6,3.1,19.1
3,21,32,0,11.3,0.402,0.125,0.560,0.424,0.625,1.0,1.7,2.7,0.7,0.2,0.3,0.5,1.1,4.1
4,23,65,21,22.6,0.372,0.311,0.433,0.449,0.743,0.6,2.3,2.9,2.4,0.7,0.4,1.4,1.6,10.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
498,33,52,1,16.3,0.518,0.354,0.554,0.550,0.469,1.5,2.5,4.0,2.0,1.0,0.3,1.0,1.6,6.2
499,33,26,1,14.2,0.578,0.000,0.602,0.578,0.455,1.5,2.0,3.6,2.3,0.9,0.3,1.2,1.5,6.1
500,33,26,0,18.3,0.465,0.395,0.495,0.524,0.481,1.5,2.9,4.4,1.7,1.2,0.4,0.8,1.7,6.3
501,23,76,76,34.9,0.460,0.382,0.512,0.536,0.904,0.7,3.1,3.7,9.7,0.9,0.1,4.0,1.7,28.4


## Second Step: Data Analysis

Now that we have our table of needed information, we can move onto building our model. We will be using a clustering model that will group players based on their performance and salary. This will allow us to identify players who are overpaid and underpaid based on their performance. To make the information in our table usable for the model, we have to scale everything to fit between 0 and 1 so that the model can properly evaluate the data.

In [4]:
numbers_listing = list(y.select_dtypes('number'))
y[numbers_listing] = MinMaxScaler().fit_transform(y[numbers_listing])
y.head()

Unnamed: 0,Age,G,GS,MP,FG%,3P%,2P%,eFG%,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0.136364,0.888889,0.341463,0.591429,0.445896,0.359,0.290667,0.533582,0.392804,0.434783,0.392523,0.426573,0.101852,0.227273,0.214286,0.25,0.408163,0.273649
1,0.409091,0.925926,0.914634,0.668571,0.647388,0.0,0.397333,0.647388,0.314843,1.0,0.476636,0.671329,0.314815,0.409091,0.285714,0.3125,0.387755,0.199324
2,0.227273,0.679012,0.682927,0.848571,0.666045,0.0,0.416,0.666045,0.629685,0.521739,0.682243,0.678322,0.314815,0.636364,0.285714,0.541667,0.612245,0.611486
3,0.090909,0.382716,0.0,0.24,0.376866,0.125,0.413333,0.41791,0.437781,0.217391,0.130841,0.160839,0.064815,0.090909,0.107143,0.104167,0.204082,0.10473
4,0.181818,0.790123,0.256098,0.562857,0.320896,0.311,0.244,0.464552,0.614693,0.130435,0.186916,0.174825,0.222222,0.318182,0.142857,0.291667,0.306122,0.324324


Oh boy! This is confusing. Let's try to make sense of this by modeling it in a graph. We want to use variables from the table that will best isolate underperforming and overpaid players from underpaid and well-performing players. While we could use all of the variables, we want to use the ones that will give us the best results. Building a model takes time and we had to try many different combinations, but the ones that gave us the best results were Field Goal percentages, total points, and total assists.

In [13]:
clust_data_NBA = y[["FG%", "PTS", "AST"]]
kmeans_obj_NBA = KMeans(n_clusters=3, random_state=1).fit(clust_data_NBA)
fig = px.scatter_3d(y, x= "FG%", y= "PTS", z="AST", color=salary_array, hover_data = [player_array],
                    title="")
fig.show()





Now that we can see our clustering model, we can see which players are underpaid and well-performing that we might want to snag for a good playoff team. What determines desireability here is based off a few things. First, note the color scale. The darker the color, the lower salary the player has. We are also scaling up along Assists, Points, and FG% so the players that are higher on the graph are the ones that are performing better. The players that are lower on the graph are the ones that are performing worse. We don't have distinct clusters, but we have a distinct relationship between salary and performance that we can observe. We are looking for players in the bright-colored area that have darker colors. These are underpaid and well-performing players. For us, our desired players are Tyrese Halliburton, Ja Morany, LaMelo Ball, and Darius Garland. Players we want to avoid are players like Rudy Gobert, Marvin Bagley III, and Deondre Ayton. These players are overpaid and underperforming. If we can't get our desired players, we should try and get Fred VanFleet, Ricky Rubio, or Kevin Porter Jr. These players are underpaid and well-performing, but they are not as good as our desired players.