NBA Clustering based on Similar Skillset

Background

NBA players were grouped together based on their stats provided from: Basketball Reference

Code and Resources Used

Python Version 3.7
Jupyter Notebook, Spyder
Basketball Reference

Files

NBA_cluster.ipynb - Jupyter notebook for clustering NBA players based on similar skillset
HC_Cluster.ipynb - Jupyter notebook for Hierarchical Clustering of NBA players based on similar skillset
data_scrape.py - Python script to scrape NBA players stats
nba.csv - csv file for all NBA players stats

The Dataset

Scraping the data from Basketball Reference and manipulating it, the dataset contains:
- NAME: Player's Name
- Pos: Player's position
- MP: Minutes Played
- PER: Player Efficiency Rating
- TS%: True Shooting %
- 3PAr: 3pt Attempt Rate
- Ftr: Free throw Attempt Rate
- ORB%: Offensive Rebound %
- DRB%: Defensive Rebound %
- TRB%: Total Rebound %
- AST%: Assist %
- STL%: Steal %
- BLK%: Block %
- TOV%: Turnover %
- USG%: Usage %
- OWS: Offensive Win Shares
- DWS: Defensive Win Share
- WS: Win Shares
- WS/48: Win Shares per 48 games
- OBPM: Offensive Box Plus/Minus
- DBPM: Defensive Box Plus/Minus
- BPM: Box Plus/Minus
- VORP: Value over Replacement Player
- 2P: 2-Point field goal per game
- 2P%: 2-Point %
- 2PA: 2-Point Attempts
- 3P: 3-Point field goal per game
- 3P%: 3-Point %
- 3PA: 3-Point Attempts per game
- AST: Assists Per Game
- BLK: Blocks Per Game
- DRB: Defense Rebound per game
- FG: Field goals per game
- FT: Free throws per game
- FT%: Free throw %
- FTA: Free throw attempts per game
- ORB: Offensive Rebound per game
- PF: Personal Fouls
- PTS: Points per game
- STL: Steals per game
- TOV: Turnover per game
- TRB: Total rebounds per game
- PPG: Average Points per game
- Salary: Player salary

Clusters

Utilziing the elbow method and silhouette scores, NBA players were grouped into 3 distinct clusters.
Cluster 0 represents players who are generally more 'bigger' in the league and can grab rebounds
We can see here, that cluster 1 favours more of the Fowards positions (C and PF)
While cluster 2 favours the Guards positions (PG and SG)
In an attempt to cluster each player by their positions (PG, SG, SF, PF, C) I ran the algorithm again for 5 clusters.
We can see using 5 clusters, we were able to cluster each player respective to their position
Clusters 3 and 4 represent players who play the Forward position (PF and C)
Clusters 0 and 1 represent players who play the Guard Position (PG and SG)
While cluster 2 is an interesting cluster as it clusters players who tend to have a higher shooting rate.
- It clusters PF's who have a higher shooting percentage, e.g. Lauri Markkanen and Marc Gasol.

Anomaly Detection

Using a Multivariate Gaussian Distribution, I was able to identify outliers within the dataset.
The picture shown below highlights players who are anomalous, and a deeper understanding on why they are anomalous can be found within the jupyter notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

HC_Cluster.ipynb

HC_Cluster.ipynb

NBA_cluster.ipynb

NBA_cluster.ipynb

README.md

README.md

data_scrape.py

data_scrape.py

nba.csv

nba.csv

Repository files navigation

NBA Clustering based on Similar Skillset

Background

Code and Resources Used

Files

The Dataset

Clusters

Anomaly Detection

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Images		Images
HC_Cluster.ipynb		HC_Cluster.ipynb
NBA_cluster.ipynb		NBA_cluster.ipynb
README.md		README.md
data_scrape.py		data_scrape.py
nba.csv		nba.csv

jason-huynh83/NBA-cluster

Folders and files

Latest commit

History

Repository files navigation

NBA Clustering based on Similar Skillset

Background

Code and Resources Used

Files

The Dataset

Clusters

Anomaly Detection

About

Resources

Stars

Watchers

Forks

Languages