# NBA Play-by-Play Possessions

For poor free throw shooters, is it a better strategy to *intentionally* miss the second free throw and aim for the offensive rebound and new possession? Analysis of play-by-play data using [nba_api](https://github.com/swar/nba_api).

### TODO

- [ ] Use consistent methods for creating new DF columns 
- [ ] Reduce use of list comprehensions for `pandas` / `numpy` methods

## 1. Setup and get play-by-play DataFrame

In [1]:
import numpy as np
import pandas as pd

In [2]:
#Get a list of team IDs
from nba_api.stats.static import teams
nba_teams = teams.get_teams()

# Select the dictionary for the Pacers, which contains their team ID
team_ids = [team['id'] for team in nba_teams]

# Query for the regular season games of the Cavs (team no. 3 alphabetically)
from nba_api.stats.endpoints import leaguegamefinder
from nba_api.stats.library.parameters import Season
from nba_api.stats.library.parameters import SeasonType

gamefinder = leaguegamefinder.LeagueGameFinder(team_id_nullable=team_ids[2], 
                            season_nullable=Season.default,
                            season_type_nullable=SeasonType.regular)  

games_dict = gamefinder.get_normalized_dict()
games = games_dict['LeagueGameFinderResults']
game_IDs = [game['GAME_ID'] for game in games]

In [3]:
# Query for the play by play of that most recent regular season game
from nba_api.stats.endpoints import playbyplayv2
df = playbyplayv2.PlayByPlayV2(game_IDs[0]).get_data_frames()[0]

## 2. DataFrame Formatting

In [4]:
#Convert the game clock time to seconds and re-order
df['PCTIME_SECONDS'] = df['PCTIMESTRING'].map(lambda x: int(x.split(":")[0])*60 + int(x.split(":")[1]))
df = df.sort_values(['PERIOD','PCTIME_SECONDS','EVENTNUM'], ascending=[True,False,True])
df = df.reset_index(drop=True)

#Update the SCORE column to fill in blanks
df.at[0,"SCORE"] = "0 - 0"
df["SCORE"] = df["SCORE"].fillna(method="ffill")

#New columns for possession formulae
df['EVENTMSGTYPE_1'] = df['EVENTMSGTYPE'].shift(-1)
df['EVENTMSGACTIONTYPE_1'] = df['EVENTMSGACTIONTYPE'].shift(-1)
df['PCTIME_SECONDS_1'] = df['PCTIME_SECONDS'].shift(-1)
df['PLAYER1_TEAM_ID_1'] = df['PLAYER1_TEAM_ID'].shift(-1)
df['SCORE_1'] = df['SCORE'].shift(1)
df.at[0,'SCORE_1'] = "0 - 0"

## 3. Extracting Further Info

There are four main ways a possession can end:
1. Made FG / FT
2. Missed FG / FT followed by a defensive rebound
3. Turnover
4. Quarter end

We need a formula for each of the outcomes to check if and when each possession ends.

In [5]:
def possEndFG(loc, df):
    # Check if it's a shooting foul: the next play is a foul with the same time code
    if (df.iloc[loc]['EVENTMSGTYPE_1'] == 6) and (df.iloc[loc]['PCTIME_SECONDS'] == df.iloc[loc]['PCTIME_SECONDS_1']):
        return False
    else:
        return True
    
def possEndRebound(loc, df):
    # Check for offensive rebound: the next play (i.e. the rebound after a miss) is by the same team
    if df.iloc[loc]['PLAYER1_TEAM_ID'] == df.iloc[loc]['PLAYER1_TEAM_ID_1']:
        return False
    else:
        return True

def possEndFT(loc, df):
    # Check for last FT (10 is 1st of 1; 12 is 2nd of 2; 15 is 3rd of 3) 
    if df.iloc[loc]['EVENTMSGACTIONTYPE'] in [10,12,15]:
        if (df.iloc[loc]['EVENTMSGTYPE_1'] == 4):
            return possEndRebound(loc, df)
        else:
            return True
    else:
        return False

# The keys in this dict correspond to relevant EVENTMSGTYPE (1 - FG make, 2 - FG miss, 3 - FT attempt)
possOutcomesDict = {
    '1': possEndFG,
    '2': possEndRebound,
    '3': possEndFT
}


def possEndCheck(loc, df):    
    if df.iloc[loc]['EVENTMSGTYPE'] in [1,2,3]:
        #print(f"running {possOutcomesDict[str(df.iloc[loc]['EVENTMSGTYPE'])]}")
        return possOutcomesDict[str(df.iloc[loc]['EVENTMSGTYPE'])](loc, df)
    elif df.iloc[loc]['EVENTMSGTYPE'] in [5,13]:
        return True
    else:
        return False

With these formulae we can run through the plays in the game and determine on which of them a possession ended, and who was in possesion for each play.
We can also calculate the change in score for each play, to use in analysing points per possession later on.

In [6]:
# Add new column for possession end True / False
df['POSSESSION_END'] = [possEndCheck(loc, df) for loc in range(len(df))]

# Gets team that wins the tip - index 0 is the start of game play 
# Index 1 is the jump ball row and Player 3 is who it gets tipped to
currentTeam = [df.iloc[1]["PLAYER3_TEAM_ABBREVIATION"]][0]

# Get the abbreviations of the two teams 
teamNames = list(filter(lambda x: x is not None, df["PLAYER1_TEAM_ABBREVIATION"].unique().tolist()))

# Initialising variables for the loop
switch = True
teamInPoss = []

for idx, loc in enumerate(range(len(df))):
      
    teamInPoss.append(currentTeam)

    # If POSSESSION_END == True, switch the team in possession for the next play
    if df.iloc[loc]["POSSESSION_END"]:
        currentTeam = teamNames[int(not switch)]
        switch = not switch

df["POSSESSION_TEAM_ABBREVIATION"] = teamInPoss

# Calculates the change in score between two plays - they don't have to be sequential
def eventScore(before,after):

    start_score = str(before).split(" - ")
    end_score = str(after).split(" - ")

    diff_score = [int(x)-int(y) for x, y in zip(end_score, start_score)]

    return max(diff_score)

df['SCORE_CHANGE'] = df.apply(lambda row: eventScore(row['SCORE_1'],row['SCORE']), axis=1)

In [7]:
df.head()

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,...,VIDEO_AVAILABLE_FLAG,PCTIME_SECONDS,EVENTMSGTYPE_1,EVENTMSGACTIONTYPE_1,PCTIME_SECONDS_1,PLAYER1_TEAM_ID_1,SCORE_1,POSSESSION_END,POSSESSION_TEAM_ABBREVIATION,SCORE_CHANGE
0,22200739,2,12,0,1,8:11 PM,12:00,,Start of 1st Period (8:11 PM EST),,...,0,720,10.0,0.0,720.0,1610613000.0,0 - 0,False,CLE,0
1,22200739,4,10,0,1,8:11 PM,12:00,Jump Ball Jay. Williams vs. Allen: Tip to Mobley,,,...,1,720,2.0,79.0,695.0,1610613000.0,0 - 0,False,CLE,0
2,22200739,7,2,79,1,8:11 PM,11:35,,,MISS Garland 26' 3PT Pullup Jump Shot,...,0,695,4.0,0.0,692.0,1610613000.0,0 - 0,False,CLE,0
3,22200739,8,4,0,1,8:11 PM,11:32,,,Allen REBOUND (Off:1 Def:0),...,0,692,2.0,6.0,682.0,1610613000.0,0 - 0,False,CLE,0
4,22200739,9,2,6,1,8:11 PM,11:22,,,MISS Garland 2' Driving Layup,...,1,682,4.0,0.0,680.0,,0 - 0,True,CLE,0


### Points Per Possession

As an example, using the new columns we've added to the pbp data, we can calculate the average points per possession on OKC offensive rebounds.

I'm sure I can update this example to use more `pandas` or `numpy` tools instead of the excessive list comprehensions I've used instead.

In [8]:
def pointsPerPossession(team, df):
  # Get a list of missed shots
  missedShots = df.index[df['EVENTMSGTYPE'] == 2].tolist()

  # For each missed shot, add if the following play is a rebound with the same team abbreviation
  ORB = [shot + 1 for shot in missedShots if df.loc[shot]['PLAYER1_TEAM_ABBREVIATION'] == team and 
                                                df.loc[shot+1]['PLAYER1_TEAM_ABBREVIATION'] == team and 
                                                df.loc[shot+1]['EVENTMSGTYPE'] == 4]

  # Get a list of the plays where possession changed
  possessionChanges = df[df['POSSESSION_END'] == True].index.tolist()
  possessionChanges = np.array(possessionChanges)

  # For each rebound, find the next change of possession and return the pair of start / end indices
  ORB_poss = [[reb, possessionChanges[possessionChanges > reb].min()+1] for reb in ORB]

  # Find the total of the points for each possession after a ORB
  points = sum([df.iloc[reb[0]:reb[1]]['SCORE_CHANGE'].sum() for reb in ORB_poss])
  num_poss = len(ORB)
  
  return [points / num_poss, num_poss]

# Points per possession is the total number of points divided by the number of possessions
team = "OKC"

ORB = pointsPerPossession(team, df)
print(f'Points Per Possession ({team}): {ORB[0]: .2f} (on {ORB[1]} possessions)') 

Points Per Possession (OKC):  1.56 (on 9 possessions)
