# Who guards the guards?

NBA teams value lockdown defenders -- a player who can keep their opponent's star from controlling the game. Although some defensive specialists are widely recognized around the league, and may play extensively despite limited offensive games, others may be overlooked.

In this notebook, I'll use publicly-released data from the NBA to identify the players most frequently tasked with challenging defensive assignments -- limited to guards for now -- and look at some related questions, like:
- Does top-defender-dom persist from year to year?
- How are teams with two important offensive players defended? How do teams with two top defenders assign them?
- How do these matchups change in the playoffs?

## What's our universe of "important offensive players"?

We're going to start by using a Usage leaderboard, which reflects the proportion of a team's possessions in which that player was last to touch the ball (either by shooting it or turning it over). Because turnovers are most likely when a player is either dribbling the ball or passing the ball, this component is a reasonable approximation for players who spend the most time handling the ball, even if they don't shoot as often themselves.

We could consider incorporating Assist Ratio, which is the proportion of possessions for which that player receives credit for an assist (the last pass leading directly to a made shot), but assists are noisier than the components of usage. For one, assists are determined subjectively by official scorekeepers on the basis of whether that last pass was sufficiently proximate to the shot -- scorekeepers are tied to an arena and have been demonstrated to show a bias in awarding more assists to the home team. In addition, two passes of equal quality will not be treated identically because assists are only awarded if the shot is made, so a miss (or a shooting foul drawn) can't be assisted. As a result, Assist Ratio is dependent on whether the game is home or on the road, and on the shooting ability of a player's teammate (and to a smaller extent on the skill of the defender guarding that teammate). So we'll set it aside for now.

In addition, we'll focus on Guards for now -- we want a relatively homogeneous pool of offensive players so that a standout defender is likely to be matched up against most or all of them. In particular, a player who can match up against a point guard could also handle other perimeter players but not necessarily centers.

In [12]:
# Our first step will be to pull a leaderboard for Usage from stats.nba.com and turn it into a pandas dataframe.
# Here, I'm following the workflow helpfully laid out by Greg Reda (http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/)
# and Savvas Tjortjoglu (http://savvastjortjoglou.com/nba-shot-sharts.html) that they used to obtain other sets of stats from the same site.

import requests
import pandas as pd
import numpy as np
import seaborn as sns
from time import sleep
%matplotlib inline

In [2]:
# we'll save the URL as a string first

# this gets us a regular-season data from 2018-19 in JSON format; 
# the MeasureType=Advanced parameter gets us the Usage stat, among others
usage_url = 'https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country='+ \
                '&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height='+ \
                '&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0'+ \
                '&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience='+ \
                '&PlayerPosition=G&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season'+ \
                '&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight='

The server won't accept the request using the default parameters from requests.get(), so we need to send what it sees when I load the page manually (the headers).
I'm not super-confident how this conforms to the TOS for the NBA Stats site, so I'm going to endeavor to send a minimal number of GET requests, at least, no more than I would use when playing around with the full site.

In [3]:
http_headers = {'Accept': 'application/json', 'x-nba-stats-token': 'true', 'X-NewRelic-ID': 'VQECWF5UChAHUlNTBwgBVw==',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131',
                'x-nba-stats-origin': 'stats', 'Referer': 'https://stats.nba.com/players/advanced/?sort=USG_PCT&dir=-1&CF=GP*G*5:MIN*G*20&Season=2018-19&SeasonType=Regular%20Season'}

usage_output = requests.get(usage_url, headers=http_headers)


In [18]:
# now take that JSON output and turn it into a dataframe
headers1 = usage_output.json()['resultSets'][0]['headers']
players1 = usage_output.json()['resultSets'][0]['rowSet']

usage_df = pd.DataFrame(players1, columns=headers1)

usage_df.shape

(262, 73)

In [5]:
# this has the stats we want, but includes plenty of players we aren't interested in,
# so we'll apply a couple filters to eliminate guys who played a limited number of games (or minutes per game)
# and then pare down to the Usage leaders based on a threshold -- between 20% and 25%

# let's first look at the variable names
list(usage_df)

['PLAYER_ID',
 'PLAYER_NAME',
 'TEAM_ID',
 'TEAM_ABBREVIATION',
 'AGE',
 'GP',
 'W',
 'L',
 'W_PCT',
 'MIN',
 'eOFF_RATING',
 'OFF_RATING',
 'sp_work_OFF_RATING',
 'eDEF_RATING',
 'DEF_RATING',
 'sp_work_DEF_RATING',
 'eNET_RATING',
 'NET_RATING',
 'sp_work_NET_RATING',
 'AST_PCT',
 'AST_TO',
 'AST_RATIO',
 'OREB_PCT',
 'DREB_PCT',
 'REB_PCT',
 'TM_TOV_PCT',
 'EFG_PCT',
 'TS_PCT',
 'USG_PCT',
 'ePACE',
 'PACE',
 'sp_work_PACE',
 'PIE',
 'FGM',
 'FGA',
 'FGM_PG',
 'FGA_PG',
 'FG_PCT',
 'GP_RANK',
 'W_RANK',
 'L_RANK',
 'W_PCT_RANK',
 'MIN_RANK',
 'eOFF_RATING_RANK',
 'OFF_RATING_RANK',
 'sp_work_OFF_RATING_RANK',
 'eDEF_RATING_RANK',
 'DEF_RATING_RANK',
 'sp_work_DEF_RATING_RANK',
 'eNET_RATING_RANK',
 'NET_RATING_RANK',
 'sp_work_NET_RATING_RANK',
 'AST_PCT_RANK',
 'AST_TO_RANK',
 'AST_RATIO_RANK',
 'OREB_PCT_RANK',
 'DREB_PCT_RANK',
 'REB_PCT_RANK',
 'TM_TOV_PCT_RANK',
 'EFG_PCT_RANK',
 'TS_PCT_RANK',
 'USG_PCT_RANK',
 'ePACE_RANK',
 'PACE_RANK',
 'sp_work_PACE_RANK',
 'PIE_RANK',
 'F

In [20]:
# now we'll implement our filters -- a threshold of 22% gives us 38 players, or a little more than one per team
usage_leaders = usage_df.loc[(usage_df['MIN'] >= 24.0) & (usage_df['GP'] >= 20) & (usage_df['USG_PCT'] >= .22)]

print(usage_leaders.shape)
print(usage_leaders.head())

(38, 73)
    PLAYER_ID     PLAYER_NAME     TEAM_ID TEAM_ABBREVIATION   AGE  GP   W   L  \
9      203952  Andrew Wiggins  1610612750               MIN  24.0  73  31  42   
19     203078    Bradley Beal  1610612764               WAS  25.0  82  32  50   
25    1627741     Buddy Hield  1610612758               SAC  26.0  82  39  43   
27     203468     CJ McCollum  1610612757               POR  27.0  70  43  27   
31    1627747    Caris LeVert  1610612751               BKN  24.0  40  19  21   

    W_PCT   MIN         ...          PACE_RANK  sp_work_PACE_RANK  PIE_RANK  \
9   0.425  34.8         ...                164                164       132   
19  0.390  36.9         ...                127                127        24   
25  0.476  31.9         ...                 64                 64        42   
27  0.614  33.9         ...                171                171        57   
31  0.475  26.6         ...                141                141        56   

    FGM_RANK  FGA_RANK  FGM_P

# Who guards those players?

In [7]:
# now we take the list of important offensive players' player IDs
offensive_list = usage_leaders['PLAYER_ID'].tolist()

offensive_list[0:5]
# note that these appear as integers, and they're in first-name alpha order

[203952, 203078, 1627741, 203468, 1627747]

In [8]:
# and use it as the source for a new query to stats.nba.com to get the list of players they matched up against
# we'll start with a single example (James Harden) and then generalize to the list

matchup_url = 'https://stats.nba.com/stats/leagueseasonmatchups?DateFrom=&DateTo=&LeagueID=00&OffPlayerID=' + \
              str(201935) + '&Outcome=&PORound=0&PerMode=Totals&Season=2018-19&SeasonType=Regular+Season'

http_headers2 = {'Accept': 'application/json', 'x-nba-stats-token': 'true', 'X-NewRelic-ID': 'VQECWF5UChAHUlNTBwgBVw==',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131',
                'x-nba-stats-origin': 'stats', 'Referer': 'https://stats.nba.com/player/' + str(201935) + '/matchups/?Season=2018-19&SeasonType=Regular%20Season&PerMode=Totals'}

matchup_output = requests.get(matchup_url, headers=http_headers2)

headers2 = matchup_output.json()['resultSets'][0]['headers']
players2 = matchup_output.json()['resultSets'][0]['rowSet']

harden_df = pd.DataFrame(players2, columns=headers2)

harden_df.head()

Unnamed: 0,OFF_TEAM_ID,OFF_TEAM_ABBREVIATION,OFF_TEAM_CITY,OFF_TEAM_NICKNAME,OFF_PLAYER_ID,OFF_PLAYER_NAME,DEF_TEAM_ID,DEF_TEAM_ABBREVIATION,DEF_TEAM_CITY,DEF_TEAM_NICKNAME,...,FGA,FGA_DIFF,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,SFL,DEF_FOULS,OFF_FOULS
0,1610612745,HOU,Houston,Rockets,201935,James Harden,1610612760,OKC,Oklahoma City,Thunder,...,39,0.795922,0.462,6,20,0.3,12,3,1,3
1,1610612745,HOU,Houston,Rockets,201935,James Harden,1610612740,NOP,New Orleans,Pelicans,...,39,0.879078,0.308,9,25,0.36,7,0,2,0
2,1610612745,HOU,Houston,Rockets,201935,James Harden,1610612762,UTA,Utah,Jazz,...,31,0.761245,0.452,4,9,0.444,9,4,0,0
3,1610612745,HOU,Houston,Rockets,201935,James Harden,1610612742,DAL,Dallas,Mavericks,...,28,0.748423,0.286,6,21,0.286,8,3,0,0
4,1610612745,HOU,Houston,Rockets,201935,James Harden,1610612756,PHX,Phoenix,Suns,...,30,0.846847,0.533,7,14,0.5,9,1,2,0


In [9]:
# now we generalize that process over each offensive player in our list
# this definitely doesn't feel pythonic, so let's assume it'll get tinkered with over time
def matchups(players, season='2018-19', seasontype='Regular+Season'):
    matchup_df = pd.DataFrame()
    for id in players:
        req_url = 'https://stats.nba.com/stats/leagueseasonmatchups?DateFrom=&DateTo=&LeagueID=00&OffPlayerID=' + \
              str(id) + '&Outcome=&PORound=0&PerMode=Totals&Season='+ season +'&SeasonType='+ seasontype

        req_headers = {'Accept': 'application/json', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.9',
                'Connection': 'keep-alive', 'x-nba-stats-token': 'true', 'X-NewRelic-ID': 'VQECWF5UChAHUlNTBwgBVw==',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131',
                'x-nba-stats-origin': 'stats', 'x-nba-stats-token': 'true', 'Referer': 'https://stats.nba.com/player/' + str(id) + '/matchups/?Season='+ season +'&SeasonType=Regular%20Season&PerMode=Totals'}
        
        req_output = requests.get(req_url, headers=req_headers)
        sleep(.5)

        headers = req_output.json()['resultSets'][0]['headers']
        players = req_output.json()['resultSets'][0]['rowSet']

        df = pd.DataFrame(players, columns=headers)
        
        # let's limit the list to defenders who faced that offensive player a non-trivial number of times
        df = df.loc[(df['POSS'] >= 20)]
        
        # this step, in particular, will be very slow -- should be much faster to create a bunch
        # of dataframes with obvious names and then concat them all at once as a list of names
        matchup_df = pd.concat([matchup_df, df], ignore_index=True)
    
    return matchup_df
 
matchup_df = matchups(offensive_list)


In [10]:
# sanity check: how many matchups were for exactly 20 possessions
matchup_df.loc[(matchup_df['POSS'] == 20)]

Unnamed: 0,OFF_TEAM_ID,OFF_TEAM_ABBREVIATION,OFF_TEAM_CITY,OFF_TEAM_NICKNAME,OFF_PLAYER_ID,OFF_PLAYER_NAME,DEF_TEAM_ID,DEF_TEAM_ABBREVIATION,DEF_TEAM_CITY,DEF_TEAM_NICKNAME,...,FGA,FGA_DIFF,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,SFL,DEF_FOULS,OFF_FOULS
79,1610612750,MIN,Minnesota,Timberwolves,203952,Andrew Wiggins,1610612763,MEM,Memphis,Grizzlies,...,6,1.313245,0.333,0,0,0.000,1,0,0,0
80,1610612750,MIN,Minnesota,Timberwolves,203952,Andrew Wiggins,1610612761,TOR,Toronto,Raptors,...,5,1.094371,0.800,1,2,0.500,0,0,0,0
81,1610612750,MIN,Minnesota,Timberwolves,203952,Andrew Wiggins,1610612758,SAC,Sacramento,Kings,...,2,0.437748,0.000,0,2,0.000,0,0,0,0
82,1610612750,MIN,Minnesota,Timberwolves,203952,Andrew Wiggins,1610612740,NOP,New Orleans,Pelicans,...,1,0.218874,1.000,0,0,0.000,1,1,0,0
83,1610612750,MIN,Minnesota,Timberwolves,203952,Andrew Wiggins,1610612765,DET,Detroit,Pistons,...,6,1.313245,0.833,1,1,1.000,0,0,0,0
176,1610612764,WAS,Washington,Wizards,203078,Bradley Beal,1610612766,CHA,Charlotte,Hornets,...,2,0.399743,0.500,0,0,0.000,2,0,0,0
262,1610612758,SAC,Sacramento,Kings,1627741,Buddy Hield,1610612744,GSW,Golden State,Warriors,...,7,1.448090,0.429,1,2,0.500,0,0,0,0
263,1610612758,SAC,Sacramento,Kings,1627741,Buddy Hield,1610612745,HOU,Houston,Rockets,...,8,1.654960,0.625,3,4,0.750,0,0,0,0
264,1610612758,SAC,Sacramento,Kings,1627741,Buddy Hield,1610612748,MIA,Miami,Heat,...,7,1.448090,0.143,1,6,0.167,3,1,1,0
335,1610612757,POR,Portland,Trail Blazers,203468,CJ McCollum,1610612747,LAL,Los Angeles,Lakers,...,6,1.191951,0.333,0,2,0.000,0,0,0,0


In [16]:
# group and aggregate by defender (as a pandas groupby object)
# explore properties of this construction of the data

defenders = matchup_df.groupby(['DEF_PLAYER_NAME'])

defenders['GP','POSS'].agg(['count', np.sum, np.max])

Unnamed: 0_level_0,GP,GP,GP,POSS,POSS,POSS
Unnamed: 0_level_1,count,sum,amax,count,sum,amax
DEF_PLAYER_NAME,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Aaron Gordon,4,10,4,4,190,57
Aaron Holiday,2,5,3,2,57,35
Abdel Nader,2,4,2,2,71,51
Al-Farouq Aminu,3,10,4,3,100,49
Alec Burks,11,17,2,11,415,61
Alex Abrines,2,4,2,2,45,24
Alex Caruso,5,7,2,5,130,36
Alfonzo McKinnie,1,4,4,1,47,47
Allen Crabbe,8,19,4,8,282,90
Allonzo Trier,8,21,4,8,217,34


In [46]:
# identify a threshold value to create our list of standout defenders
some_defs = defenders.filter(lambda x: (x['POSS'].mean() >= 40) and (x['POSS'].count() >= 10))

# here, we'll use a combination of POSS.count, so a defender has been matched up with many of our offensive leaders
# and POSS.mean, so a defender has covered them lots of times over the course of the season
top_defs = some_defs.groupby(['DEF_PLAYER_NAME'])

top_defs['POSS'].agg(['count', 'sum'])

Unnamed: 0_level_0,count,sum
DEF_PLAYER_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1
Avery Bradley,32,1655
Brandon Ingram,14,600
Bruce Brown,23,1056
Buddy Hield,20,848
CJ McCollum,21,1214
Collin Sexton,19,1042
Cory Joseph,28,1212
D'Angelo Russell,16,949
D.J. Augustin,15,1097
Damian Lillard,21,1347


In [50]:
# is there any overlap between the lists (two-way players)?
def two_way_plr(offense):
    two_way_list = offense
    for o in two_way_list:
        if o in some_defs['DEF_PLAYER_ID']:
            None
        else: two_way_list.remove(o)
    return two_way_list

two_way_plr(offensive_list)

[203078,
 203468,
 1629012,
 203081,
 201942,
 1628372,
 1626164,
 2548,
 202339,
 201935,
 203903,
 202689,
 202681,
 1629029,
 202704,
 203915,
 203082,
 1629027,
 203897]

In [None]:
# use this list of IDs to filter and grab the two-way players (and identify one-way offensive players)

### Is this consistent from year to year?

In [None]:
# we quickly repeat the same exercise but for the 2017-18 season (without using the cutoff for defenders,
# for either season, to make the output richer)
# pair the years against each other by defensive player
# plot pairwise in a scatterplot

### Elite teammates

In [None]:
# identify cases in the single-season data where two players from the same team are both
# 1) important offensive players or 2) defensive standouts
# do their matchups look different from others?

### The Playoffs

In [None]:
# return to stats.nba.com to pull playoff data (probably 2017-18 for now)
# look to see if the following patterns hold:
# - proportion of high-usage players (since rotations shorten)
# - ability of defenders to retain their matchups (more switching)
# - new names (Iguodala)?