Skip to content

Clustering NBA players based on similar skillsets using K Means clustering algorithm

Notifications You must be signed in to change notification settings

jason-huynh83/NBA-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NBA Clustering based on Similar Skillset

Background

Code and Resources Used

Files

  • NBA_cluster.ipynb - Jupyter notebook for clustering NBA players based on similar skillset
  • HC_Cluster.ipynb - Jupyter notebook for Hierarchical Clustering of NBA players based on similar skillset
  • data_scrape.py - Python script to scrape NBA players stats
  • nba.csv - csv file for all NBA players stats

The Dataset

  • Scraping the data from Basketball Reference and manipulating it, the dataset contains:
    • NAME: Player's Name
    • Pos: Player's position
    • MP: Minutes Played
    • PER: Player Efficiency Rating
    • TS%: True Shooting %
    • 3PAr: 3pt Attempt Rate
    • Ftr: Free throw Attempt Rate
    • ORB%: Offensive Rebound %
    • DRB%: Defensive Rebound %
    • TRB%: Total Rebound %
    • AST%: Assist %
    • STL%: Steal %
    • BLK%: Block %
    • TOV%: Turnover %
    • USG%: Usage %
    • OWS: Offensive Win Shares
    • DWS: Defensive Win Share
    • WS: Win Shares
    • WS/48: Win Shares per 48 games
    • OBPM: Offensive Box Plus/Minus
    • DBPM: Defensive Box Plus/Minus
    • BPM: Box Plus/Minus
    • VORP: Value over Replacement Player
    • 2P: 2-Point field goal per game
    • 2P%: 2-Point %
    • 2PA: 2-Point Attempts
    • 3P: 3-Point field goal per game
    • 3P%: 3-Point %
    • 3PA: 3-Point Attempts per game
    • AST: Assists Per Game
    • BLK: Blocks Per Game
    • DRB: Defense Rebound per game
    • FG: Field goals per game
    • FT: Free throws per game
    • FT%: Free throw %
    • FTA: Free throw attempts per game
    • ORB: Offensive Rebound per game
    • PF: Personal Fouls
    • PTS: Points per game
    • STL: Steals per game
    • TOV: Turnover per game
    • TRB: Total rebounds per game
    • PPG: Average Points per game
    • Salary: Player salary

Clusters

  • Utilziing the elbow method and silhouette scores, NBA players were grouped into 3 distinct clusters.

  • Cluster 0 represents players who are generally more 'bigger' in the league and can grab rebounds

  • We can see here, that cluster 1 favours more of the Fowards positions (C and PF)

  • While cluster 2 favours the Guards positions (PG and SG)

  • In an attempt to cluster each player by their positions (PG, SG, SF, PF, C) I ran the algorithm again for 5 clusters.

  • We can see using 5 clusters, we were able to cluster each player respective to their position

  • Clusters 3 and 4 represent players who play the Forward position (PF and C)

  • Clusters 0 and 1 represent players who play the Guard Position (PG and SG)

  • While cluster 2 is an interesting cluster as it clusters players who tend to have a higher shooting rate.

    • It clusters PF's who have a higher shooting percentage, e.g. Lauri Markkanen and Marc Gasol.

Anomaly Detection

  • Using a Multivariate Gaussian Distribution, I was able to identify outliers within the dataset.
  • The picture shown below highlights players who are anomalous, and a deeper understanding on why they are anomalous can be found within the jupyter notebook.

About

Clustering NBA players based on similar skillsets using K Means clustering algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published