## PSTAT 134
## J Steven Raquel
## Fri 9am Section

## Assignment 2

Our objective of this assignment is to create an interactive dashboard in this Jupyter notebook, using NBA data.

### Problem 1

First we need to download the data. We can do this by using the `get_nba_data()` function that was defined in Lecture 5. We can redefine here and then utilize it. 

In [None]:
import numpy as np
import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt

%matplotlib nbagg

In [None]:
# Problem 1
# from Lecture 5, Data Frame and Visualization
import pandas as pd

def get_nba_data(endpt, params, return_url=False):

    ## endpt: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation
    ## params: dictionary of parameters: i.e., {'LeagueID':'00'}
    
    from pandas import DataFrame
    from urllib.parse import urlencode
    import json
    
    useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""
    dataurl = "\"" + "http://stats.nba.com/stats/" + endpt + "?" + urlencode(params) + "\""
    
    # for debugging: just return the url
    if return_url:
        return(dataurl)
    
    jsonstr = !wget -q -O - --user-agent={useragent} {dataurl}
    
    data = json.loads(jsonstr[0])
    
    h = data['resultSets'][0]['headers']
    d = data['resultSets'][0]['rowSet']
    
    return(DataFrame(d, columns=h))

This function allows us to directly download the NBA data, with the specific parameters that we set. We're going to use it to take info about the 2016-17 teams and their rosters, and later to pull specific shot chart information about each player.

### Cleaning the data

Our first step is to import the teams and the players from the NBA into DataFrames that we can use, with the `get_nba_data()` function. The following code, taken from Lecture 5, subsets and cleans the data, and shows how to group by team abbreviation, and by code. 

In [None]:
## get all teams
params = {'LeagueID':'00'}
teams = get_nba_data('commonTeamYears', params)

In [None]:
## get all players
params = {'LeagueID':'00', 'Season': '2016-17', 'IsOnlyCurrentSeason': '0'}
players = get_nba_data('commonallplayers', params)

In [None]:
# changing types of the columns
teams.ABBREVIATION = teams.ABBREVIATION.astype('category')
teams.TEAM_ID      = teams.TEAM_ID.astype('category')
teams.MIN_YEAR     = teams.MIN_YEAR.astype('int')
teams.MAX_YEAR     = teams.MAX_YEAR.astype('int')

We're only looking at the teams and players who were active up to the 2016-17 season, as such we're going to subset the data up to the current teams.

In [None]:
# subset just current teams
teams = teams[teams['MAX_YEAR'] == 2017] # selecting only teams active as late as 2017
teams['TEAM_AGE'] = teams.MAX_YEAR - teams.MIN_YEAR # new columns for age of the team
teams_clean = teams.copy() ## make a copy for later 

In [None]:
# adding 'TEAM_ABBREVIATION'
team_names = players[['TEAM_ABBREVIATION', 'TEAM_CODE']].drop_duplicates()#.set_index('TEAM_ABBREVIATION')
teams = pd.merge(teams_clean, team_names, left_on='ABBREVIATION', right_on='TEAM_ABBREVIATION')
teams.TEAM_CODE = teams.TEAM_CODE.str.capitalize() # returns values so needs to be reassigned
teams.sort_values('ABBREVIATION', inplace=True)    # modifies object

We'll do the same thing with the players as well so that we have only those who were playing in the 2016-17 season. 

In [None]:
# subset just the players in current teams
players = players[players.TEAM_ID.isin(teams.TEAM_ID)]

In [None]:
# create a dictionary of teams with their abbreviation and team ID
team_dd_text = teams.TEAM_ABBREVIATION+', '+teams.TEAM_CODE
team_dd = dict(zip(team_dd_text, teams.TEAM_ID))

plyr_by_team_dd = dict()
for t, p in players.groupby('TEAM_ID'):
    plyr_by_team_dd[t] = dict(zip(p.DISPLAY_LAST_COMMA_FIRST, p.PERSON_ID))
    
plyr_dd_text = players.DISPLAY_LAST_COMMA_FIRST
plyr_dd_id = players.PERSON_ID
plyr_dd = dict(zip(plyr_dd_text, plyr_dd_id))

### Problem 2
#### Creating widgets

The first two widgets are drop-down menus that allow us to select the team and a player from the team's roster. These were given in the lecture.

The third widget is a third drop-down menu that allows you to select the range from which a player was shooting relative to the basket: 16-24 ft, 24+ ft, 8-16 ft, a Back Court Shot (from behind the midway line) and less than 8ft away.

In [None]:
from ipywidgets import interact, FloatSlider, Dropdown, Button

selected = 'LAC, Clippers'
selected2 = 'Less Than 8 ft.'
# dictionary of ranges
ranges = {'Less Than 8 ft.': '',
         '8-16 ft.': '',
         '16-24 ft.' : '',
         '24+ ft.': '',
         'Back Court Shot': ''}

team_menu = Dropdown(options=team_dd, label=selected)
plyr_menu = Dropdown(options=plyr_by_team_dd[team_dd[selected]])
range_menu = Dropdown(options=ranges.keys(), value = selected2)  
fetch_button = Button(description='Count Attempts!', icon='check')

# showing the buttons
display(team_menu, plyr_menu, range_menu, fetch_button)

## update players list
def update_team(change):
    plyr_menu.index = None
    plyr_menu.options = plyr_by_team_dd[change['new']]
    plyr_menu.value = list(plyr_by_team_dd[change['new']].values())[0]

# to change the team/players when the user alters the team
team_menu.observe(update_team, names = 'value')

### Problem 3
#### Downloading data with changing widget states

So we want to make sure that the data correctly pulls from the NBA stats database. The function `get_range_stats()` queries the NBA API based on the team and player selected, counting the number of shots they've attempted from various distances. 

Clicking the button below should now print out the number of attempts they made at that distance in the 2016-2017 season.

In [None]:
# this function gets the number of attempts from a certain range
def get_range_stats(change):
    params = {'PlayerID': plyr_menu.value,
      'PlayerPosition':'',
      'Season':'2016-17',
      'ContextMeasure':'FGA',
      'DateFrom':'',
      'DateTo':'',
      'GameID':'',
      'GameSegment':'',
      'LastNGames':'0',
      'LeagueID':'00',
      'Location':'',
      'Month':'0',
      'OpponentTeamID':'0',
      'Outcome':'',
      'Period':'0',
      'Position':'',
      'RookieYear':'',
      'SeasonSegment':'',
      'SeasonType':'Regular Season',
      'TeamID': team_menu.value,
      'VsConference':'',
      'VsDivision':''}

    plyr_data = get_nba_data('shotchartdetail', params)
    fga = plyr_data.groupby('SHOT_ZONE_RANGE')['SHOT_ATTEMPTED_FLAG'].count().to_dict()
    # adding zeroes into the dictionary for unattempted ranges
    for key,value in ranges.items():
        if key not in fga:
            fga[key] = 0
    print("This player attempted ", fga[range_menu.value], 
          " shots from this range in the 2016-2017 season.", sep = "")
    
# showing the widgets and now adding the on_click functionality
display(team_menu, plyr_menu, range_menu, fetch_button)
fetch_button.on_click(get_range_stats)

### Problem 4
#### Data transformation and visualization

Now we want to create a couple of data transformations using the split-apply-combine approach from Lecture 05. We will utilize the `groupby` function to create summaries of different groupings in order to accomplish this. 

* The _split_ step will break up and group a DataFrame depending on the value of the specified key, for example some categorical variable like 'Team' or 'Year'. 

* The _apply_ step computes some function within individual groups. 

* The _combine_ step merges the results of these operations into an output array.

#### James Harden's 2016-17 season

I'm going to choose to scrutinize the 2016-17 statistics of Houston Rockets player, James Harden. Harden is a shooting guard, a position which is expected to score a lot of points, and he is one of the most successful players in his position currently. 

The question we would like to answer using this approach is, how does the Houston Rockets player James Harden's shooting average change when going from home, to away? 

In [None]:
# querying NBA API for Harden's 2016-17 season statistics
params = {'PlayerID':'201935',
          'PlayerPosition':'',
          'Season':'2016-17',
          'ContextMeasure':'FGA',
          'DateFrom':'',
          'DateTo':'',
          'GameID':'',
          'GameSegment':'',
          'LastNGames':'0',
          'LeagueID':'00',
          'Location':'',
          'Month':'0',
          'OpponentTeamID':'0',
          'Outcome':'',
          'Period':'0',
          'Position':'',
          'RookieYear':'',
          'SeasonSegment':'',
          'SeasonType':'Regular Season',
          'TeamID':'0',
          'VsConference':'',
          'VsDivision':''}

harden_shotdata = get_nba_data('shotchartdetail', params)

### Bar graph of Harden's field goal percentage at home and away

What follows is a bar graph depicting Harden's field goal percentage at and away from home. This is to depict the age-old adage of the 'home court advantage', which implies that teams (or players) perform better when playing at home than they do away. 

In [None]:
# vector of all the home teams, including his own home team
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np

season_avg = harden_shotdata['SHOT_MADE_FLAG'].mean() # 0.44

# all team abbreviations, alphabetized
home_teams = sorted(harden_shotdata['HTM'].unique())
shot_avg = harden_shotdata.groupby('HTM')['SHOT_MADE_FLAG'].mean()

# creating the bar plot
fig, ax = plt.subplots()
y_pos = np.arange(len(home_teams))
plt.barh(y_pos, shot_avg, align='center', alpha=0.5, color = 'red')
plt.yticks(y_pos, home_teams)
ax.invert_yaxis()
plt.xlim(0,1)
plt.xlabel("Shooting Average")
plt.title("Harden's Shooting Avg vs Different Home Teams, NBA '16-17 Season")
# drawing a vertical line of his 2016-2017 shooting average
plt.axvline(x=season_avg, color ='blue', alpha=0.5, 
            label= "Mean Season FG% (.44)")
plt.legend(loc='upper right')
plt.show()

### Observations

When we look at Harden's shooting average over the whole of the season relative to who was playing at home, a couple of interesting observations emerge. 

Firstly, Harden performed well under his season average against the Indiana Pacers (IND) and the Oklahoma City Thunder (OKC). Secondly, he performed pretty much close to his own average when playing at home in Houston (HOU). He had a remarkable strong performance away at Cleveland.

On the subject of Harden's background, Harden played for Arizona State near Phoenix when he was in college, and his numbers at Phoenix mirror those of his numbers when playing at home in Houston. He also played for Oklahoma City for three years before transferring to the Rockets, and his second least successful performance away from home was in Oklahoma. 

### Plot of Harden's field goal percentage vs distance from basket

The mark of a good shooter in basketball is being able to land shots from all around the court. We can make a line plot depicting Harden's field goal percentage, while also taking into account what percentage of his shots actually occur at that distance. 

In [None]:
from matplotlib import axes
# grouping by shot_distance, and returning the avg shots landed and count of shots attempted
avg_by_distance = harden_shotdata.groupby('SHOT_DISTANCE')['SHOT_MADE_FLAG'].agg(['mean','count'])
total_games = len(harden_shotdata)
distances = np.arange(len(avg_by_distance))
pct_of_attempts = avg_by_distance['count'] / total_games
plt.plot(distances, avg_by_distance['mean'], 
            color = 'blue', alpha=0.5, 
            label='Mean FG%')
plt.plot(distances, pct_of_attempts, 
            color = 'red', alpha=0.5, 
        label='% of Total Shots Attempted from this Distance')
plt.legend(loc='best')
plt.title("Harden's Accuracy vs Distance from Basket, NBA '16-'17 Season")
plt.xlabel("Distance from Basket (Feet)")
plt.ylim(0,1)
plt.show()

### Observations

As we can see, a significant portion of Harden's attempted shots occur at around 22 - 27 from the basket. In the NBA, the 3-point line is 23.75 feet from the hoop, and 22 feet from the hoop in the corners, so this is of course correlated with his 3-point attempts. About a fifth of his shots take place from behind the 3-point line, and around a tenth of them are directly up against the basket. 

We can see thusly that while Harden is more actively shooting behind the 3-point line, he achieves the most success when he gets as close to the basket as possible. The plot does show that he remains a formidable two-point shooter even when not directly against the basket, scoring at or above his season average (44%). 


Finally, we note that he has attempted a handful of grandiose shots from well behind the line and been successful, but these occasions happen very seldom.

A potential follow-up question would be to look at how distance correlates with the amount of points or percentage of total points scored, but I won't explore that here.