# Problem statement: How do we choose the best players for a basketball team?

In the NBA, players are traded frequently to different teams. In any case, the team that wants to acquire a player wants to do so by spending the least amount of money for a player with the best statistics. Therefore, a method of determining the low-cost best-performing players is by using a data science algorithm called K-means. The goal is to cluster players based on specific attributes, and from there, the underrated players can be selected. To solve this problem, I used a data set from the NBA 2025 season, which lists features of specific players.

# What is the process?

In order to do this, we need to first clean the data by removing all columns that are not useful for the model (this includes features like a player's team or position) and remove duplicate rows if applicable. Then we need to scale all the data, as each feature of data is on a different scale. This will ensure that the model does not make odd predictions because each feature will be standardized to the same scale. From here, the model is created based on a value for k, the number of cluster centers. It is also good practice to check if there are better values for k, so use a method for optimizing (the variance explained and/or the silhouette score). After the model has been created and optimized, create a visualization or graph to help you make predictions on players. In my case, I selected three different features, **player points, player salary, and their effective field goal percent**, as I think they would be the most influential statistics to look at. The features examined will vary based on the data set and the problem you are trying to solve.

> ### Why did I use these variables?
> Player points refer to the total amount of points scored by the player in the season. Generally speaking, a higher amount of points scored correlates well to how good a player is.
>
> The player's salary gives us a general idea of how well the player is paid, as if we were to select the player, we would have to pay them that amount or more. We want to find players that have less salary so we can use the money towards other players on the team.
>
> Player's effective field goal percentage (abbreviated "eFG%") seems complicated. However, "field goals" in basketball just refer to the amount of baskets a player makes. Since a field goal can be either 2 or 3 points, the effective field goal percent accounts for this fact and is generally higher if a player has more 3 point field goals. This will be useful in determining a good player since it can show us how good of a shooter the player is.
>
> Age was also a factor, but was not a feature I plotted. Younger players have more of their career ahead of them, so team longevity is increased. Keep in mind a professional basketball player typically has their career from ages 22-35 (rough estimate).

# What players did I pick and why?

* **Best choice players**:
* Ryan Kalkbrenner and Jaxson Hayes. Both are good prospects, with low salaries and high eFG%. They are also both relatively young.
* Jaylon Tyson and Keyonte George. Both have an above-average amount of points and are younger.

* **Mid-choice players**:
* Mikal Bridges. With above average points scored, an average eFG%, but has a high salary and is 29 years old.
* Tim Hardaway Jr. He has above average points and eFG% with lower salary, but he is an older player (33 years old).
* Immanuel Quickley, He has above average points and average eFG%, but has a high salary.

* **Probably-shouldn't-choose players**:
* OG Anunoby. Into his 2nd half of his career (28 years old) with both a high salary and low amount of points.
* Evan Mobley. Although a younger player (23 years old) with an above average total points, has a very high salary.
* Joe Ingles, an older player (38 years old) has very few points and an average eFG%, as he is past his prime.

Examine below the 3D graph of the players. You can rotate the graph and hover over points to see the specific stats. The legend on the right shows the player name.

In [7]:
import pandas as pd
import plotly.express as px

salary=pd.read_csv("2025_salaries.csv", header=1, encoding="latin-1")
stats=pd.read_csv("nba_2025.csv", encoding="latin-1")
merged=pd.merge(salary, stats, on="Player")
merged=merged.drop(columns=["Player-additional","Awards"])
merged=merged.drop_duplicates()

players = merged.sort_values("Team")
players=players.drop_duplicates(subset="Player", keep="first")

final=players.drop(columns=["Tm","Team","Pos","Rk"]).dropna()
final["2025-26"]=final["2025-26"].str.replace("$","").str.replace(",","").astype(int)
final=final.rename(columns={"2025-26":"Salary"})

player_names=["Ryan Kalkbrenner","Jaxson Hayes","Jaylon Tyson","Keyonte George","OG Anunoby","Evan Mobley","Joe Ingles","Mikal Bridges","Tim Hardaway Jr.","Immanuel Quickley"]

df=final[final["Player"].isin(player_names)][["Player","PTS","Salary","eFG%","Age"]]

fig = px.scatter_3d(
    df, x="PTS", y="Salary", z="eFG%",
    color=player_names,
    title="10 NBA Players: Points, Salary, eFG%")
fig.show()