# Kmeans Clustering -- NBA Teams
- Title Contender(4):   
- Playoff Team(12):  
- Good(5):  
- Below Average(5):  
- Worst(4):   

## Project Motivation  
Although the past decade had witnessed never seen before trades and insane franchise expansions, it has been regarded as one of the most lopsided times in basketball history because of how highly predictable the Finals' matchups were. An unstoppable dynasty from the west versus one of greatest to ever play basketball, Lebron James, from the east. Overtime teams went from having a couple superstars and several mediocre players to an equally balanced league where talent is found on any team. Whether it is an advancement in sports medicine, increased cap space, shift to brilliant recruiting, or improved management, the league as a whole is getting better. Not to mention, the overall style of play in todays era has changed remarkably. Younger players such as Luka Doncic and Jaysum Tatum are rising stars who will lift the association to new heights. With substantial moves happening every season, the purpose of this project is to indicate/define these teams into clusters based on their past performances.

## Data Source
The data was scraped from basketball-reference and nba.com/stats for every team from 2014 to 2019 were collected. 2014 was used as the initial start year just to view a crucial period (Lebron going back to the Cavs and the take over of Golden State). In total there are 30 teams and 48 features that include both averages per game and end of the season totals describing each team. 

In [1]:
# helpful packages to load
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
# learning plotly

import plotly.express as px
import chart_studio
chart_studio.tools.set_credentials_file(username='sathwikkes', api_key='W7qOWMS1nyoPXRJKlpQA')
import plotly.graph_objs as go

NameError: name 'plotly' is not defined

In [None]:
df_2015 = pd.read_csv('data/totals_2015.csv') #2014-2015
df_2016 = pd.read_csv('data/totals_2016.csv') #2015-2016
df_2017 = pd.read_csv('data/totals_2017.csv') #2016-2017
df_2018 = pd.read_csv('data/totals_2018.csv') #2017-2018
df_2019 = pd.read_csv('data/totals_2019.csv') #2018-2019

In [None]:
df_2015['season'] = '2014-15'
df_2016['season'] = '2015-16'
df_2017['season'] = '2016-17'
df_2018['season'] = '2017-18'
df_2019['season'] = '2018-19'

In [None]:
#combing all seasons and store into one dataframe
df = df_2015.append(df_2016).append(df_2017).append(df_2018).append(df_2019)
df = df.drop(['Team2', 'G2'], axis = 1)

In [None]:
df['made_playoffs'] = df['Team'].str.contains('*', regex=False)
df['made_playoffs'] = df['made_playoffs'].astype(int)
df['Team'] = df['Team'].str.replace('*', '', regex=False)

In [None]:
df.Team

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
plt.figure(figsize=(12,6))

_ = plt.hist(df['3P%-Total'], bins=7)
_ = plt.xticks(rotation=90)
plt.show()

In [None]:
df.isnull().sum()/df.shape[0]

In [None]:
def unique_count(df, col):
    return df[col].nunique()

In [None]:
for col in df.columns: 
    print("# of unique values in {}:".format(col), unique_count(df, col))

In [None]:
stat_cols = list(dict.fromkeys([i.split('-')[0] for i in df.columns]))
stat_cols

In [None]:
## splitting the data by division
atlantic = ['Boston Celtics', 'Philadelphia 76ers', 'Brooklyn Nets', 
            'New York Knicks', 'Toronto Raptors']

central = ['Milwaukee Bucks', 'Indiana Pacers', 'Cleveland Cavaliers', 
          'Chicago Bulls', 'Detroit Pistons']

southeast = ['Atlanta Hawks', 'Orlando Magic', 'Charlotte Hornets', 
             'Miami Heat', 'Washington Wizards']

northwest = ['Utah Jazz', 'Portland Trail Blazers', 'Denver Nuggets', 
             'Oklahoma City Thunder', 'Minnesota Timberwolves']

pacific = ['Los Angeles Lakers', 'Los Angeles Clippers', 'Phoenix Suns', 
           'Golden State Warriors', 'Sacramento Kings']

southwest = ['San Antonio Spurs', 'Memphis Grizzlies','Dallas Mavericks', 
             'New Orleans Pelicans', 'Houston Rockets']

In [None]:
df.loc[df.Team.isin(atlantic), 'division'] = 'Atlantic'
df.loc[df.Team.isin(central), 'division'] = 'Central'
df.loc[df.Team.isin(southeast), 'division'] = 'Southeast'
df.loc[df.Team.isin(northwest), 'division'] = 'Northwest'
df.loc[df.Team.isin(pacific), 'division'] = 'Pacific'
df.loc[df.Team.isin(southwest), 'division'] = 'Southwest'

In [None]:
df.loc[df.Team.isin(atlantic), 'conference'] = 'east'
df.loc[df.Team.isin(central), 'conference'] = 'east'
df.loc[df.Team.isin(southeast), 'conference'] = 'east'
df.loc[df.Team.isin(northwest), 'conference'] = 'west'
df.loc[df.Team.isin(pacific), 'conference'] = 'west'
df.loc[df.Team.isin(southwest), 'conference'] = 'west'

In [None]:
df.conference.value_counts()

In [None]:
df['division'].value_counts()

In [None]:
fig = px.histogram(df, x='division', y='made_playoffs')
fig.show()

In [None]:
fig = px.histogram(df, x='season', y='FGM-Total')
fig.show()

In [None]:
fig = px.histogram(df, x='conference', y='FGM-PG')
fig.show()

In [None]:
data = {}
for i in stat_cols:
    if i not in ['Team', 'G']:
        data[i] = df[[q for q in df.columns if q.startswith(i)]]

In [None]:
data['TOV']

In [None]:
fta = data['FTA'].value_counts()
fig = go.Figure([go.Bar(x=fta.index, y=fta.values)])
fig.show()

In [None]:
Teams = {}
for i in df.Team.unique():
    Teams[i] = df[df.Team == i]

In [None]:
Teams.keys()

In [None]:
Teams['Los Angeles Lakers'].sort_values('FG%-PG', ascending=False)

In [None]:
min_per_game = df['MP-PG'].value_counts()
min_per_game

In [None]:
Divisions = {}
for i in df.division.unique():
    Divisions[i] = df[df.division == i]

In [None]:
Divisions.keys()

In [None]:
Seasons = {}
for i in df.season.unique():
    Seasons[i] = df[df.season == i]

In [None]:
szn = pd.DataFrame(Seasons.items())
szn.head()

In [None]:
Seasons.keys()

In [None]:
Divisions['Pacific'].columns = list(Divisions['Pacific'].mode().iloc[0:])
pac = Divisions['Pacific'][['Team', 'season', 'FG%-PG']]
pac = pac.sort_values('FG%-PG', ascending = False)
fig = px.bar(x=pac.Team, y=pac['FG%-PG'], color = pac['season'])
fig.update_layout(barmode='stack')
fig.show()

In [None]:
fig = go.Figure(data=[
    go.Histogram(name='atlantic', x=Divisions['Atlantic']['division'], y=Divisions['Atlantic']['3P%-PG'],),
    go.Histogram(name='pacific', x=Divisions['Pacific']['division'], y=Divisions['Pacific']['3P%-PG'])
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
Divisions.keys()

In [None]:
fig = go.Figure(data=[
    go.Histogram(name='atlantic', x=Divisions['Atlantic']['division'], y=Divisions['Atlantic']['3P%-PG'],),
    go.Histogram(name='pacific', x=Divisions['Pacific']['division'], y=Divisions['Pacific']['3P%-PG'])
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
Seasons['2015-16']['division']

In [None]:
#https://stackoverflow.com/questions/59406167/plotly-how-to-filter-a-pandas-dataframe-using-a-dropdown-menu
#https://plotly.com/python/dropdowns/


fig = go.Figure()
fig.add_trace(go.Bar(name = 'Team', x= df.index, y=df.values
                    ))




#fig.add_trace(go.Scatter(x=df[df.columns[0]],
#                         y=df.index,
#                         visible=True)
#             )
#buttons are the things you see in the dropdown 
buttons = []

#for each graph we want to show, we need a button for it
#you can do a lot with dropdowns, not just replace data 
# button with one option for each dataframe
for col in df.columns:
    buttons.append(dict(method='restyle',
                        label=col,
                        visible=True,
                        args=[{'y':[df[col]],
                               'x':[df.index],
                               'type':'bar'}, [0]],
                        )
                  )


updatemenu = []
your_menu = {}
updatemenu.append(your_menu)

updatemenu[0]['buttons'] = buttons
updatemenu[0]['direction'] = 'down'
updatemenu[0]['showactive'] = True

# add dropdown menus to the figure
fig.update_layout(showlegend=True, updatemenus=updatemenu)
fig.show()

In [None]:
fig = go.Figure(layout=go.Layout(title= go.layout.Title(text="Comparing Teams by Stats")))
fig.add_trace(go.Bar(name= 'Team 1', x= df.index, y=df.values))

buttons = []
# add buttons for first series of bars  
for i in list(Teams.keys())[0:]:
    buttons.append(dict(method='restyle',
                        label= i,
                        visible=True,
                        args=[{'y':[Teams[i]['FG%-PG']],
                               'x':[Teams[i]['3P%-PG'].index.name],
                               'type':'bar'}, [0]], # the [0] at the end lets us know they are for the first trace
                        )
                  )

fig.add_trace(go.Bar(name= 'Team 2',x= df.index, y=(df.values)))

buttons2 = []
# add buttons for second series of bars               
for i in list(Teams.keys())[0:]:
    buttons2.append(dict(method='restyle',
                        label= i,
                        visible=True,
                        args=[{'y':[Teams[i]['3P%-PG']],
                               'x':[Teams[i]['3P%-PG'].index.name],
                               'type':'bar'}, [1]], # the [1] at the end lets us know they are for the first trace
                        )                        #literally figured that out by just experimenting 
                  )
    

# adjusted dropdown placement 
#found out updatemenus take a dictionary of buttons and allow you to format how the dropdowns look etc.
# https://plotly.com/python/dropdowns/
button_layer_1_height = 1.23
updatemenus = list([
    dict(buttons=buttons,
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=button_layer_1_height,
            yanchor="top"),
    dict(buttons=buttons2,
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.5,
            xanchor="left",
            y=button_layer_1_height,
            yanchor="top")])


fig.update_layout(annotations=[
        dict(text="Team 1", x=0, xref="paper", y=1.15, yref="paper",
                             align="left", showarrow=False),
        dict(text="Team 2", x=0.65, xref="paper", y=1.15,
                             yref="paper", showarrow=False)
    ])
fig.update_xaxes(categoryorder= 'array', categoryarray=df.season.unique())
#fig.update_xaxes(ticktext = ['2014-2015', '2015-2016', '2016-2017','2017-2018', '2018-2019'], tickmode='array')  
fig.update_layout(updatemenus=updatemenus)
fig.show()

#add topline to each for all types
# add seleciton 1 and selection 2

In [None]:
columns = df.season.unique()
columns

In [None]:
headers = ['Team', 'Match-Up', 'Game-Date', 'Season', 'W/L', 'Min', 'PTS', 'FGM', 'FGA', 'FG%', '3PM', '3PA', '3P%'
          'FTM', 'FTA', 'FT%', 'OREB', 'DREB', 'REB', 'AST', 'TOV', 'STL', 'BLK', 'PF', '+/-']

In [None]:
boxscores_2014 = pd.read_csv('data/totals_2014-15.csv', names=headers, skiprows=1, index_col=False)
boxscores_2014.head(10)

In [None]:
boxscores_2014.info()

In [None]:
boxscores_2014 = boxscores_2014.replace('\n','', regex=True)

In [None]:
boxscores_2014.head()

In [None]:
boxscores_2014.shape

In [None]:
boxscores_2014.Team.value_counts()