# NBA  Project

## Introduction

This project will analyze the Washington Wizards, focusing on the Wizards’ offensive capabilities. With Wizards star John Wall planning a return from injury in the 2020-21 NBA season, the analysis will seek to determine whether John Wall’s return will address the biggest offensive weaknesses of the Washington Wizards. My hypothesis is that in the past two seasons, when John Wall has been injured, the Wizards have lacked a top quality playmaker and that their greatest weakness is a lack of assists. Because John Wall’s greatest ability is making assists, my hypothesis is that his return will address this need.

To begin this analysis, I first had to acquire the necessary data to understand how the Washington Wizards compared to other NBA teams. As discussed in Project Deliverable 1, an immense amount of data is readily accessible directly from the NBA using a Python API called nba_api developed by Swar Patel (Patel, 2018). Using this API, I connected to several of the available endpoints 

## Import packages and begin evaluating endpoints

In [2]:
import pandas as pd

First, I had to collect team statistics in order to identify the ID for the Washington Wizards. This was accomplished by calling the teams endpoint.

In [4]:
from nba_api.stats.static import teams
# get_teams returns a list of 30 dictionaries, each an NBA team.
nba_teams = teams.get_teams()

In [5]:
# Testing to see the results for the Wizards, in order to find the team_id
wiz = [team for team in nba_teams
         if team['full_name'] == 'Washington Wizards'][0]
wiz

{'id': 1610612764,
 'full_name': 'Washington Wizards',
 'abbreviation': 'WAS',
 'nickname': 'Wizards',
 'city': 'Washington',
 'state': 'District of Columbia',
 'year_founded': 1961}

I next referred to the teamyearbyyearstats endpoint to understand the historical statistics of the Washington Wizards. In future analysis, I determined this endpoint was not needed, but it provided a useful starting point as I began exploring the available endpoints and how they could be used in tandem to perform this analysis.

In [6]:
from nba_api.stats.endpoints import teamyearbyyearstats

In [7]:
wiz_id = '1610612764'
wiz_stats = teamyearbyyearstats.TeamYearByYearStats(team_id=wiz_id)
all_wiz_stats = wiz_stats.get_data_frames()[0]
all_wiz_stats.head(5)

Unnamed: 0,TEAM_ID,TEAM_CITY,TEAM_NAME,YEAR,GP,WINS,LOSSES,WIN_PCT,CONF_RANK,DIV_RANK,...,OREB,DREB,REB,AST,PF,STL,TOV,BLK,PTS,PTS_RANK
0,1610612764,Chicago,Packers,1961-62,80,18,62,0.225,0,5,...,0,0,0,1802,1954,0,0,0,8874,9
1,1610612764,Chicago,Zephyrs,1962-63,80,25,55,0.313,0,5,...,0,0,0,1773,2065,0,0,0,8795,8
2,1610612764,Baltimore,Bullets,1963-64,80,31,49,0.388,0,4,...,0,0,0,1423,2073,0,0,0,8948,5
3,1610612764,Baltimore,Bullets,1964-65,80,37,43,0.463,0,3,...,0,0,0,1676,2119,0,0,0,9087,2
4,1610612764,Baltimore,Bullets,1965-66,80,38,42,0.475,0,2,...,0,0,0,1890,2199,0,0,0,9465,2


Having assessed the teamyearbyyearstats endpoint, I determined that for my analysis, I do not need the full set of data from 1961 to present. Instead, I referred to Basketball Reference (https://www.basketball-reference.com/players/w/walljo01.html) to determine the seasons that John Wall has played in the league. Since he started in 2010, we will focus this analysis on the years from 2010 to present (2020).

In [8]:
# John Wall joined the league in 2010, so we won't need anything before that season
years = ['2010-11','2011-12','2012-13','2013-14','2014-15','2015-16','2016-17','2017-18','2018-19','2019-20']
years

['2010-11',
 '2011-12',
 '2012-13',
 '2013-14',
 '2014-15',
 '2015-16',
 '2016-17',
 '2017-18',
 '2018-19',
 '2019-20']

### First data pull: team statistics by year

Following this analysis, I determined that the teamyearbyyearstats endpoint provided more information than was necessary for this analysis, so I looked instead to the endpoint titled 'leaguedashteamstats'. This endpoint allowed me to select specific years and collect all of the team-level statistics for each team in the league.

In [9]:
from nba_api.stats.endpoints import leaguedashteamstats

In [10]:
# To pull the data for all years from 2010-2020, I used a for-loop to create a single dataframe.
team_stats = pd.DataFrame()
for year in years:
    scoring = leaguedashteamstats.LeagueDashTeamStats(season = year)
    scoring_stats = scoring.get_data_frames()[0]
    scoring_stats['SEASON'] = year
    team_stats = team_stats.append(scoring_stats)
team_stats['SEASON'].unique()

array(['2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16',
       '2016-17', '2017-18', '2018-19', '2019-20'], dtype=object)

Looking specifically at the Wizards subset of the data, we see that there are now 10 rows of data for each team, from the 2010-11 season to the 2019-20 season. In looking at this data, the most obvious data cleansing that will be needed will be to account for the fact that different seasons had different number of games played (GP). In 2011-12, the season was only 66 games due to a lockout because of disputes between the players and the owners over the leagues collective bargaining agreement (CBA), largely due to debates over salary caps. Then, in 2019-20, which just ended in October 2020, the season was only 72 games because of the COVID-19 pandemic. To account for this, we will need to focus largely on the team ranks or use Per-Game statistics rather than total statistics.

In [11]:
team_stats[team_stats['TEAM_NAME']=='Washington Wizards']

Unnamed: 0,TEAM_ID,TEAM_NAME,GP,W,L,W_PCT,MIN,FGM,FGA,FG_PCT,...,STL_RANK,BLK_RANK,BLKA_RANK,PF_RANK,PFD_RANK,PTS_RANK,PLUS_MINUS_RANK,CFID,CFPARAMS,SEASON
29,1610612764,Washington Wizards,82,23,59,0.28,3986.0,3048,6888,0.443,...,4,1,20,29,21,21,29,10,Washington Wizards,2010-11
29,1610612764,Washington Wizards,66,20,46,0.303,3173.0,2414,5475,0.441,...,13,2,7,26,23,22,26,10,Washington Wizards,2011-12
29,1610612764,Washington Wizards,82,29,53,0.354,3971.0,2910,6693,0.435,...,20,21,7,20,21,28,21,10,Washington Wizards,2012-13
29,1610612764,Washington Wizards,82,44,38,0.537,4011.0,3177,6920,0.459,...,11,15,7,13,23,16,15,10,Washington Wizards,2013-14
29,1610612764,Washington Wizards,82,46,36,0.561,3991.0,3139,6790,0.462,...,20,16,6,17,21,17,14,10,Washington Wizards,2014-15
29,1610612764,Washington Wizards,82,41,41,0.5,3951.0,3238,7033,0.46,...,8,26,6,21,16,9,17,10,Washington Wizards,2015-16
29,1610612764,Washington Wizards,82,49,33,0.598,3971.0,3388,7137,0.475,...,2,24,10,27,18,5,9,10,Washington Wizards,2016-17
29,1610612764,Washington Wizards,82,43,39,0.524,3971.0,3275,7018,0.467,...,10,22,13,25,15,13,15,10,Washington Wizards,2017-18
29,1610612764,Washington Wizards,82,32,50,0.39,3986.0,3456,7387,0.468,...,7,23,10,13,12,10,25,10,Washington Wizards,2018-19
29,1610612764,Washington Wizards,72,25,47,0.347,3471.0,2990,6544,0.457,...,10,22,20,30,4,8,25,10,Washington Wizards,2019-20


### Second data pull: all player stats by year

The second key endpoint that will be used for this analysis is the 'leaguedashplayerstats' endpoint. This endpoint provides the same set of statistics as the 'leaguedashteamstats' endpoint, but provides these statistics at the player level rather than the team level. This will allow us to compare John Wall with other players.

In [12]:
from nba_api.stats.endpoints import leaguedashplayerstats

In [13]:
# We again use a for-loop to collect all seasons from 2010-2020 into a single dataframe.
player_stats = pd.DataFrame()
for year in years:
    all_stats = leaguedashplayerstats.LeagueDashPlayerStats(season = year)
    player_stats_x = all_stats.get_data_frames()[0]
    player_stats_x['SEASON'] = year
    player_stats = player_stats.append(player_stats_x)
player_stats['SEASON'].unique()

array(['2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16',
       '2016-17', '2017-18', '2018-19', '2019-20'], dtype=object)

## Data Cleansing

#### We will need to update the team names in both datasets. For example, the Brooklyn Nets used to be known as the New Jersey Nets until 2012, so both names are in the dataset but represent the same team. We will reference https://www.world-today-news.com/origins-and-name-changes-of-all-nba-teams/ for this information.

In [14]:
player_stats.columns

Index(['PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'TEAM_ABBREVIATION', 'AGE', 'GP',
       'W', 'L', 'W_PCT', 'MIN', 'FGM', 'FGA', 'FG_PCT', 'FG3M', 'FG3A',
       'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'TOV',
       'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS', 'PLUS_MINUS',
       'NBA_FANTASY_PTS', 'DD2', 'TD3', 'GP_RANK', 'W_RANK', 'L_RANK',
       'W_PCT_RANK', 'MIN_RANK', 'FGM_RANK', 'FGA_RANK', 'FG_PCT_RANK',
       'FG3M_RANK', 'FG3A_RANK', 'FG3_PCT_RANK', 'FTM_RANK', 'FTA_RANK',
       'FT_PCT_RANK', 'OREB_RANK', 'DREB_RANK', 'REB_RANK', 'AST_RANK',
       'TOV_RANK', 'STL_RANK', 'BLK_RANK', 'BLKA_RANK', 'PF_RANK', 'PFD_RANK',
       'PTS_RANK', 'PLUS_MINUS_RANK', 'NBA_FANTASY_PTS_RANK', 'DD2_RANK',
       'TD3_RANK', 'CFID', 'CFPARAMS', 'SEASON'],
      dtype='object')

In [15]:
team_stats.columns

Index(['TEAM_ID', 'TEAM_NAME', 'GP', 'W', 'L', 'W_PCT', 'MIN', 'FGM', 'FGA',
       'FG_PCT', 'FG3M', 'FG3A', 'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB',
       'DREB', 'REB', 'AST', 'TOV', 'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS',
       'PLUS_MINUS', 'GP_RANK', 'W_RANK', 'L_RANK', 'W_PCT_RANK', 'MIN_RANK',
       'FGM_RANK', 'FGA_RANK', 'FG_PCT_RANK', 'FG3M_RANK', 'FG3A_RANK',
       'FG3_PCT_RANK', 'FTM_RANK', 'FTA_RANK', 'FT_PCT_RANK', 'OREB_RANK',
       'DREB_RANK', 'REB_RANK', 'AST_RANK', 'TOV_RANK', 'STL_RANK', 'BLK_RANK',
       'BLKA_RANK', 'PF_RANK', 'PFD_RANK', 'PTS_RANK', 'PLUS_MINUS_RANK',
       'CFID', 'CFPARAMS', 'SEASON'],
      dtype='object')

In looking at the columns for both datasets, we see that the team stats and player stats refer to the team name differently, with the team stats using the full team name, whereas the player stats uses only the team abbreviation. The four team names that need changing are the 'Charlotte Bobcats' (Charlotte Hornets), 'New Jersey Nets' (Brooklyn Nets), 'New Orleans Hornets' (New Orleans Pelicans), and 'LA Clippers' (Los Angeles Clippers). However, both the Charlotte team and the Clippers kept their abbreviation when the names changed, we only need to correct two of the abbreviations.

In [16]:
team_stats['TEAM_NAME'].loc[team_stats['TEAM_NAME']=='Charlotte Bobcats']='Charlotte Hornets'
team_stats['TEAM_NAME'].loc[team_stats['TEAM_NAME']=='New Jersey Nets']='Brooklyn Nets'
team_stats['TEAM_NAME'].loc[team_stats['TEAM_NAME']=='New Orleans Hornets']='New Orleans Pelicans'
team_stats['TEAM_NAME'].loc[team_stats['TEAM_NAME']=='LA Clippers']='Los Angeles Clippers'
team_stats['TEAM_NAME'].unique()

array(['Atlanta Hawks', 'Boston Celtics', 'Charlotte Hornets',
       'Chicago Bulls', 'Cleveland Cavaliers', 'Dallas Mavericks',
       'Denver Nuggets', 'Detroit Pistons', 'Golden State Warriors',
       'Houston Rockets', 'Indiana Pacers', 'Los Angeles Clippers',
       'Los Angeles Lakers', 'Memphis Grizzlies', 'Miami Heat',
       'Milwaukee Bucks', 'Minnesota Timberwolves', 'Brooklyn Nets',
       'New Orleans Pelicans', 'New York Knicks', 'Oklahoma City Thunder',
       'Orlando Magic', 'Philadelphia 76ers', 'Phoenix Suns',
       'Portland Trail Blazers', 'Sacramento Kings', 'San Antonio Spurs',
       'Toronto Raptors', 'Utah Jazz', 'Washington Wizards'], dtype=object)

In [17]:
player_stats['TEAM_ABBREVIATION'].loc[player_stats['TEAM_ABBREVIATION']=='NOH']='NOP'
player_stats['TEAM_ABBREVIATION'].loc[player_stats['TEAM_ABBREVIATION']=='NJN']='BKN'
player_stats['TEAM_ABBREVIATION'].unique()

array(['IND', 'PHX', 'NOP', 'GSW', 'DEN', 'ATL', 'UTA', 'LAC', 'TOR',
       'CLE', 'NYK', 'WAS', 'PHI', 'POR', 'MIL', 'LAL', 'BKN', 'MIN',
       'SAC', 'SAS', 'DET', 'BOS', 'CHA', 'HOU', 'ORL', 'DAL', 'CHI',
       'OKC', 'MIA', 'MEM'], dtype=object)

As mentioned, the next step is to create per-game statistics for any of the total stats. Percentages and rankings do not need to be updated, so we can focus on only updating the cumulative statistics, such as Wins, Points, Assists and Rebounds.

These columns in the Teams dataset are: 'MIN', 'FGM', 'FGA',
       'FG3M', 'FG3A', 'FTM', 'FTA', 'OREB', 'DREB', 'REB', 'AST', 'TOV', 
       'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS', 'PLUS_MINUS'.

These columns in the Player dataset are: 'GP', 'MIN', 
       'FGM', 'FGA', 'FG3M', 'FG3A', 'FTM', 'FTA', 'OREB', 'DREB', 'REB', 
       'AST', 'TOV', 'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS', 'PLUS_MINUS'.

In [18]:
per_game_columns = ['W','L','MIN', 'FGM', 'FGA', 'FG3M', 'FG3A', 'FTM', 'FTA', 'OREB', 'DREB', 'REB', 'AST', 'TOV', 'STL', 'BLK', 'BLKA', 'PF', 'PFD', 'PTS', 'PLUS_MINUS']

for stat in per_game_columns:
    team_stats[stat+'_PG'] = team_stats[stat]/team_stats['GP']
    player_stats[stat+'_PG'] = player_stats[stat]/player_stats['GP']

In [19]:
# For player stats, I'd also like to see what percentage of games they were available for.
# A normal season is 82 games
player_stats['GP_Perc'] = player_stats['GP']/82

# In 2011-12, all teams played 66 games.
player_stats_10 = player_stats[player_stats['SEASON']=='2010-11']
player_stats_11 = player_stats[player_stats['SEASON']=='2011-12']
player_stats_11['GP_Perc']=player_stats_11['GP']/66

# In 2019-20, 22 of the 30 teams played 72 games. The other 8 teams played only 64 games.
player_stats_19 = player_stats[player_stats['SEASON']=='2019-20']
player_stats_19['GP_Perc']=player_stats_19['GP']/72

player_stats_19_teams = player_stats_19[player_stats_19['TEAM_ABBREVIATION'].isin(['GSW','ATL','CLE','NYK','MIN','CHA','CHI','DET'])]
player_stats_19_other = player_stats_19[~player_stats_19['TEAM_ABBREVIATION'].isin(['GSW','ATL','CLE','NYK','MIN','CHA','CHI','DET'])]

player_stats_19_teams['GP_Perc'] = player_stats_19_teams['GP']/64

# Next, combine the dataframes again. 
player_stats_mid = player_stats[~player_stats['SEASON'].isin(['2010-11','2011-12','2019-20'])]

player_stats = pd.concat([player_stats_10, player_stats_11, player_stats_mid, player_stats_19_teams, player_stats_19_other], axis=0)

player_stats.reset_index()
player_stats['SEASON'].unique()

array(['2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16',
       '2016-17', '2017-18', '2018-19', '2019-20'], dtype=object)

Because in 2019-20, different teams played a different number of games, the rankings for 2019-20 are not completely accurate because the ranking statistics are based on total values. Therefore, we will need to also recalculate the 2019-20 rankings. To do so, we will need to calculate each team's per game rankings for any per game statistics.

In [20]:
# Using the same per game columns list, we can calculate the rank for each team.

# First, select subset of data for just 2019-20
team_stats_latest = team_stats[team_stats['SEASON']=='2019-20']
for stat in per_game_columns:
    team_stats_latest[stat+'_RANK'] = team_stats_latest[stat+'_PG'].rank(ascending=False)

for stat in ['L','TOV','PF']:
    team_stats_latest[stat+'_RANK'] = team_stats_latest[stat+'_PG'].rank(ascending=True)

# Next, drop 2019-20 from original dataset and append the new 2019-20 dataset
team_stats = team_stats[team_stats['SEASON']!='2019-20']
team_stats = team_stats.append(team_stats_latest)
team_stats.tail(3)

Unnamed: 0,TEAM_ID,TEAM_NAME,GP,W,L,W_PCT,MIN,FGM,FGA,FG_PCT,...,REB_PG,AST_PG,TOV_PG,STL_PG,BLK_PG,BLKA_PG,PF_PG,PFD_PG,PTS_PG,PLUS_MINUS_PG
27,1610612761,Toronto Raptors,72,53,19,0.736,3476.0,2897,6331,0.458,...,45.388889,25.222222,14.819444,8.833333,4.972222,5.486111,21.652778,20.361111,112.75,6.236111
28,1610612762,Utah Jazz,72,44,28,0.611,3471.0,2886,6130,0.471,...,44.902778,22.430556,15.125,6.083333,4.055556,4.569444,20.388889,20.847222,111.291667,2.472222
29,1610612764,Washington Wizards,72,25,47,0.347,3471.0,2990,6544,0.457,...,42.041667,25.013889,14.166667,7.972222,4.291667,5.041667,22.694444,22.236111,114.416667,-4.666667


I tried to then pull a list of all players and their positions in order to allow further analysis at the position level. This would allow us to compare John Wall to other point guards. However, there is an API requests limitation that prevented me from pulling this data for the full list of teams, even when trying multiple calls instead of a single looped call. A future iteration of this project could look into other methods for performing this level of the analysis.

Finally, we will check to confirm that there are no duplicates and no missing values in our dataset.

In [21]:
print('Player Stats: ',player_stats.isnull().sum().sum(), player_stats.duplicated().sum(), 'Team Stats: ',team_stats.isnull().sum().sum(), team_stats.duplicated().sum())

Player Stats:  0 0 Team Stats:  0 0


Having cleansed the dataset and determined that there are no null values or duplicates in either of our datasets, we can now export the results to a CSV to allow us to also perform a visual analysis of the data in Tableau.

In [41]:
# Export to CSV
player_stats.to_csv('all_season_stats.csv',index=False)
team_stats.to_csv('team_stats.csv',index=False)

## Exploratory Analysis

### Averages

First, we will explore the average rankings of the Washington Wizards for each metric. This will provide a picture of which metric has been the worst over the last ten years. We can also evaluate these metrics over time, which will initially be performed in Tableau.

In [22]:
# All teams metrics
team_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].mean()

W_PCT        0.499483
PTS_PG     103.245949
AST_PG      22.575597
OREB_PG     10.595015
FG3M_PG      8.783876
dtype: float64

In [23]:
# Wizards only
wiz_stats = team_stats[team_stats['TEAM_NAME']=='Washington Wizards']
wiz_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].mean()

W_PCT        0.439400
PTS_PG     103.163840
AST_PG      23.224486
OREB_PG     10.545787
FG3M_PG      8.172432
dtype: float64

From this comparison, we can see that from the full population, the Wizards are right around the average for nearly all metrics, falling slightly below in all cases except for the assists average. Perhaps an annual comparison will show more detail into this.

In [24]:
print('Wizards: ', '\n', wiz_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']],'\n',  'All Teams: ', '\n', team_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].groupby(team_stats['SEASON']).mean())

Wizards:  
     W_PCT      PTS_PG     AST_PG    OREB_PG    FG3M_PG
29  0.280   97.280488  19.414634  12.353659   4.780488
29  0.303   93.636364  19.121212  11.742424   5.212121
29  0.354   93.219512  21.646341  10.817073   6.646341
29  0.537  100.658537  23.280488  10.804878   7.890244
29  0.561   98.536585  24.012195  10.512195   6.060976
29  0.500  104.073171  24.451220   9.060976   8.646341
29  0.598  109.182927  23.853659  10.280488   9.219512
29  0.524  106.609756  25.182927  10.036585   9.926829
29  0.390  114.024390  26.268293   9.682927  11.341463
29  0.347  114.416667  25.013889  10.166667  12.000000 
 All Teams:  
             W_PCT      PTS_PG     AST_PG    OREB_PG    FG3M_PG
SEASON                                                        
2010-11  0.500000   99.550407  21.498780  10.912195   6.457724
2011-12  0.499900   96.259596  20.976263  11.369697   6.410606
2012-13  0.500100   98.135878  22.137423  11.169492   7.160976
2013-14  0.500033  101.008943  22.004472  10.913008 

Interestingly, the Wizards have actually been significantly above the points per game average of the league in the past two seasons without John Wall, though their winning percentage has been very low. In contrast, the teams' assists have also been above average, with their offensive rebounding in 2018-19 being the most significantly below-average statistic since John Wall's injury.

In [25]:
wiz_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].describe()

Unnamed: 0,W_PCT,PTS_PG,AST_PG,OREB_PG,FG3M_PG
count,10.0,10.0,10.0,10.0,10.0
mean,0.4394,103.16384,23.224486,10.545787,8.172432
std,0.116705,7.785118,2.419112,0.958285,2.499499
min,0.28,93.219512,19.121212,9.060976,4.780488
25%,0.34875,97.594512,22.054878,10.069106,6.207317
50%,0.445,102.365854,23.932927,10.396341,8.268293
75%,0.53375,108.539634,24.873222,10.814024,9.75
max,0.598,114.416667,26.268293,12.353659,12.0


In addition to the statistical averages and distribution, we can look to the rankings for the Wizards to see how their rank has changed over time. In the full set of the population, the Wizards have actually been within the top 13 in assists, but they are between 16th and 19th in wins, points, offensive rebounds, and 3-point field goals made. This begins to suggest that the Wizards may be worse at shooting 3-point field goals and rebounding than they are at making assists. While these overall statistics include John Wall's contribution to the team, our previous year-by-year analysis showed that even in 2019-20, when John Wall did not play a single game, the team made more assists than the league on average.

In [26]:
wiz_stats[['W_RANK','PTS_RANK','AST_RANK','OREB_RANK','FG3M_RANK']].describe()

Unnamed: 0,W_RANK,PTS_RANK,AST_RANK,OREB_RANK,FG3M_RANK
count,10.0,10.0,10.0,10.0,10.0
mean,19.6,14.8,12.2,16.7,19.5
std,6.752777,7.360556,9.425733,6.766749,5.441609
min,9.0,5.0,4.0,3.0,13.0
25%,14.75,9.25,6.0,13.0,16.0
50%,20.0,14.5,7.0,18.0,17.5
75%,24.5,20.0,17.25,20.5,24.0
max,29.0,28.0,29.0,27.0,28.0


### Data Visualization

This project then conducted additional graphical analyses to assess the trends in the dataset through visual representations. Using the following Tableau dashboard, we see several key points. Please see the attached Tableau dashboard, with the Story board demonstrating some of these main visualizations.

In this dashboard, we can see that the Wizards tend to perform worse on offensive rebounding and 3-point shooting than other offensive statistics. However, we can also see in the third slide of the dashboard that the Wizards' assists total dropped significantly without John Wall in the team. Whenever John Wall makes at least 8.5 assists per game, the team falls within the top 8 for assists. In contrast, in the 2019-20 season, the Wizards fell to 12th in assists rank.

We also notice that since 2016-17, the season Marcin Gortat left the Washington Wizards, the Wizards have not had a dominant offensive rebounder. The chart depicting individual players' statistic shows most teams having at least one high outlier for offensive rebounding, but the Wizards do not have a player that has a high output of offensive rebounding. In contrast, between Bradley Beal and John Wall, the Wizards have always had at least one high performer for points scored and assists.

## Modeling and Algorithms

### Correlation Analysis

We can also evaluate John Wall's average ranking for each metric; however, we can easily see that in his most recent couple of seasons, he did not play many games, which distorts these ranks. Because of this, we can instead compare his Per Game statistics to the distribution of the Per Game statistics for the league. 

In [27]:
# John Wall metrics
wall_stats = player_stats[player_stats['PLAYER_NAME']=='John Wall']
wall_stats['GP']

232    69
247    66
237    49
237    82
245    79
226    77
231    78
257    41
263    32
Name: GP, dtype: int64

In [28]:
wall_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].describe()

Unnamed: 0,W_PCT,PTS_PG,AST_PG,OREB_PG,FG3M_PG
count,9.0,9.0,9.0,9.0,9.0
mean,0.464,19.025321,9.109345,0.575013,0.962605
std,0.12539,2.167476,1.062318,0.122684,0.584841
min,0.275,16.30303,7.612245,0.455696,0.045455
25%,0.344,17.556962,8.318841,0.46875,0.492754
50%,0.49,19.304878,8.792683,0.536585,1.141026
75%,0.561,19.883117,10.025316,0.714286,1.493506
max,0.615,23.141026,10.653846,0.75641,1.59375


In [29]:
wall_stats[['W_RANK','PTS_RANK','AST_RANK','OREB_RANK','FG3M_RANK']].mean()

W_RANK       201.333333
PTS_RANK      59.666667
AST_RANK      14.555556
OREB_RANK    231.333333
FG3M_RANK    154.000000
dtype: float64

We see that John Wall's win percentage of 46.4% is higher than the team's average of 43.9% and is significantly higher than the team's 34.7% win percentage in 2019-20 (a decrease from their 39% win percentage in 2018-19). We can also see that in the last 10 years, even without playing last season, John Wall is 15 in assists since his debut. He also falls in the top 60 for points per game. 

In [30]:
wall_stats[['W_RANK','PTS_RANK','AST_RANK','OREB_RANK','FG3M_RANK']].groupby(wall_stats['SEASON']).mean()

Unnamed: 0_level_0,W_RANK,PTS_RANK,AST_RANK,OREB_RANK,FG3M_RANK
SEASON,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-11,290,53,10,248,164
2011-12,251,28,4,159,271
2012-13,258,91,30,234,244
2013-14,93,15,1,219,57
2014-15,75,19,2,227,125
2015-16,139,18,3,204,50
2016-17,47,18,3,145,107
2017-18,265,129,20,289,161
2018-19,394,166,58,357,207


Grouped by year, as in the above analysis, we see even stronger evidence of John Wall's prowess as both a scorer and assist maker. Despite only playing 41 games (or 50% of the season), John Wall was still 20th in assists in 2017-18. He also finished top three in four consecutive seasons from 2013-2017 and finished top 20 in scoring in each of those seasons. In two of those seasons he also finished in the top 60 in 3-pointers made.

In [31]:
# All NBA players
player_stats[['W_PCT','PTS_PG','AST_PG','OREB_PG','FG3M_PG']].describe()

Unnamed: 0,W_PCT,PTS_PG,AST_PG,OREB_PG,FG3M_PG
count,4934.0,4934.0,4934.0,4934.0,4934.0
mean,0.487411,8.265268,1.824629,0.878578,0.706274
std,0.185745,5.842187,1.774729,0.799016,0.737832
min,0.0,0.0,0.0,0.0,0.0
25%,0.359,3.8,0.622327,0.321429,0.032923
50%,0.5,6.870266,1.226541,0.629883,0.5
75%,0.614,11.559694,2.394781,1.1875,1.136831
max,1.0,36.128205,11.698113,5.52,5.130435


Finally, we can evaluate the impact John Wall has on the Wizards' ranks and winning percentages. To do this, we will see if there is any correlation between the number of games John Wall plays and these metrics. Additionally, we will see if there is any correlation between John Wall's per game statistics and these metrics, to see if perhaps it is not a matter of just John Wall playing but that he has to perform at a certain level to have a positive impact on the Wizards' offensive output.

In [50]:
# John Wall's impact
wiz_stats_wall = pd.merge(wiz_stats,wall_stats,how='left',on=['SEASON'])
wiz_stats_wall = wiz_stats_wall.fillna(0)

In [51]:
wiz_stats_wall[['W_PCT_x','PTS_PG_x','FG_PCT_x','AST_PG_x','OREB_PG_x','FG3M_PG_x','GP_Perc']].corr(method='pearson')

Unnamed: 0,W_PCT_x,PTS_PG_x,FG_PCT_x,AST_PG_x,OREB_PG_x,FG3M_PG_x,GP_Perc
W_PCT_x,1.0,0.245124,0.764962,0.561654,-0.547267,0.237488,0.350155
PTS_PG_x,0.245124,1.0,0.758724,0.815712,-0.648877,0.936908,-0.628429
FG_PCT_x,0.764962,0.758724,1.0,0.798651,-0.659639,0.650244,-0.06911
AST_PG_x,0.561654,0.815712,0.798651,1.0,-0.888062,0.863119,-0.479578
OREB_PG_x,-0.547267,-0.648877,-0.659639,-0.888062,1.0,-0.749227,0.298819
FG3M_PG_x,0.237488,0.936908,0.650244,0.863119,-0.749227,1.0,-0.723688
GP_Perc,0.350155,-0.628429,-0.06911,-0.479578,0.298819,-0.723688,1.0


From the above correlation matrix, it does not appear that John Wall's games played has a significant correlation with many of the metrics. The strongest correlation is a negative correlation of -0.72 between the % of games played by John Wall and the 3-pointers made per game by the Wizards. However, this is likely due to the fact that John Wall has played gradually fewer games, with the past three seasons his fewest, while the NBA's trend of scoring more 3-pointers year-over-year has continued each year. Also, John Wall is not known to be a strong 3-point scorer, at least compared to his scoring and assists as we saw in his ranking statistics. Interestingly, both points per game and assists per game has a moderate negative relationship with the number of games played by John Wall. 

It will be interesting to look further into the assists per game to see if the number of assists John Wall has per game has a correlation with the team's wins, points scored, field goals scored, or team assist totals.

In [52]:
wiz_stats_wall[['W_PCT_x','PTS_PG_x','FGM_PG_x','FG3M_PG_x','AST_PG_x','AST_PG_y']].corr(method='pearson')

Unnamed: 0,W_PCT_x,PTS_PG_x,FGM_PG_x,FG3M_PG_x,AST_PG_x,AST_PG_y
W_PCT_x,1.0,0.245124,0.382545,0.237488,0.561654,0.529661
PTS_PG_x,0.245124,1.0,0.97792,0.936908,0.815712,-0.32011
FGM_PG_x,0.382545,0.97792,1.0,0.8816,0.820183,-0.168321
FG3M_PG_x,0.237488,0.936908,0.8816,1.0,0.863119,-0.390597
AST_PG_x,0.561654,0.815712,0.820183,0.863119,1.0,-0.055484
AST_PG_y,0.529661,-0.32011,-0.168321,-0.390597,-0.055484,1.0


It actually appears that John Wall's assists per game has a moderate relationship with the team's win percentage, which is a positive sign for his return. We also see that Wall's assists per game has almost 0 correlation to the team's total assists per game, which is an interesting finding, but must be due in part to other players filling in and providing assists for teammates when he is not playing.