# Working With Play by Play

Working with play by play can be interesting work in that there's a lot of unknown types of data as well as parsing of strings. In addition, there a ton of cool things that can be done with play be play like sending the feed into a pub\sub model so other systems can interact with it, build your own UI, or a whole host of other ideas.

The goal of this notebook is to walk through the play by play feed examining data such as 

1. `EVENTMSGTYPE` which provides the play type (e.g. FIELD_GOAL_MADE, FIELD_GOAL_MISSED, TIMEOUT, PERIOD_BEGIN, etc.)
2. `EVENTMSGACTIONTYPE`which provides a subcatagorization of `EVENTMSGTYPE` (e.g. REVERSE_LAYUP, 3PT_JUMP_SHOT, HOOK_SHOT, etc.)

This notebook builds on top of the following notebooks: [Finding Games](notebook2.ipynb), [Basics Notebook](Basics.ipynb), and of course, dives into `PlayByPlay` endpoint. Note that the `PlayByPlayV2` endpoint is an extension of `PlayByPlay`.


So with that...let's get started!

The goals are
1. Get the last game the Pacers played (maybe we'll get lucky and get a current game)
2. Examine the feed and the fields that are returned
3. See how Regex can be applied to the play by play
3. Dynamically build a unique list of NBA Player Actions Events using EVENTMSGACTIONTYPE
4. See what's hiding in the feed...need to get those BLOCKS from the shot blockers!

# Let's get started and jump into the game!

First thing's first...get the Pacers team_id

In [1]:
#Get the Pacers team_id
from nba_api.stats.static import teams

nba_teams = teams.get_teams()

# Select the dictionary for the Pacers, which contains their team ID
pacers = [team for team in nba_teams if team['abbreviation'] == 'IND'][0]
pacers_id = pacers['id']
print(f'pacers_id: {pacers_id}')

pacers_id: 1610612754


Searching through the games and get the most recent Pacers game_id

In [3]:
# Query for the last regular season game where the Pacers were playing
from nba_api.stats.endpoints import leaguegamefinder
from nba_api.stats.library.parameters import Season
from nba_api.stats.library.parameters import SeasonType

gamefinder = leaguegamefinder.LeagueGameFinder(team_id_nullable=pacers_id,
                            season_nullable=Season.default,
                            season_type_nullable=SeasonType.regular)  

games_dict = gamefinder.get_normalized_dict()
games = games_dict['LeagueGameFinderResults']
game = games[0]
game_id = game['GAME_ID']
game_matchup = game['MATCHUP']

print(f'Searching through {len(games)} game(s) for the game_id of {game_id} where {game_matchup}')

Searching through 63 game(s) for the game_id of 0021800854 where IND vs. MIL


# Retrieving the play by play data
Now that we've got a game_id, let's pull some play by play data

In [92]:
# Query for the play by play of that most recent regular season game
from nba_api.stats.endpoints import playbyplay
df = playbyplay.PlayByPlay(game_id).get_data_frames()[0]
df.head() #just looking at the head of the data

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,SCORE,SCOREMARGIN
0,21800854,2,12,0,1,7:11 PM,12:00,,,,,
1,21800854,4,10,0,1,7:11 PM,12:00,Jump Ball Turner vs. Lopez: Tip to Antetokounmpo,,,,
2,21800854,7,5,1,1,7:11 PM,11:40,Bogdanovic STEAL (1 STL),,Bledsoe Bad Pass Turnover (P1.T1),,
3,21800854,9,1,1,1,7:11 PM,11:25,Turner 27' 3PT Jump Shot (3 PTS) (Collison 1 AST),,,0 - 3,3.0
4,21800854,13,2,71,1,7:12 PM,10:57,,,MISS Antetokounmpo 1' Finger Roll Layup,,


Optional: Dataframes can become large. In pandas you can set some options to make it more visible if needed

In [5]:
#Since the datset is fairly large you'll see plenty of elipses(...). 
#If that's the case, you can set the following options to expand the data 
#You can adjust these as you'd like
import pandas
pandas.set_option('display.max_colwidth',250)
pandas.set_option('display.max_rows',250)

Some of the most valuable fields of `PlayByPlay`are the following:
`EVENTMSGTYPE`
`EVENTMSGACTIONTYPE`
`HOMEDESCRIPTION`
and `VISITORDESCRIPTION`.

`EVENTMSGTYPE` gives us the type of event that has occurred. This can vary per game. This is why finding these and placing them into an Enum or other type structure is a good idea.

In [6]:
#List unique values in the df['EVENTMSGTYPE'] colum
print(f'EVENTMSGTYPE: {sorted(df.EVENTMSGTYPE.unique())}')

EVENTMSGTYPE: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 18]


In [7]:
#For quick refernce, here's an Enum for `EVENTMSGTYPE`
#This list may be incomplete as a thourogh play by play scan is necessary

from enum import Enum

class EventMsgType(Enum):
    FIELD_GOAL_MADE = 1
    FIELD_GOAL_MISSED = 2
    FREE_THROWfree_throw_attempt = 3
    REBOUND = 4
    TURNOVER = 5
    FOUL = 6
    VIOLATION = 7
    SUBSTITUTION = 8
    TIMEOUT = 9
    JUMP_BALL = 10
    EJECTION = 11
    PERIOD_BEGIN = 12
    PERIOD_END = 13

Using the `EVENTMSGTYPE` field we can begin to examine the event types to see what typical values will be in the `EVENTMSGACTIONTYPE` `HOMEDESCRIPTION` and `VISITORDESCRIPTION` fields.

In [93]:
#### pull the data for a specfic EVENTMSGTYPE
df.loc[df['EVENTMSGTYPE'] == 1].head() #hint: use the EVENTMSGTYPE values above to see different data

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,SCORE,SCOREMARGIN
3,21800854,9,1,1,1,7:11 PM,11:25,Turner 27' 3PT Jump Shot (3 PTS) (Collison 1 AST),,,0 - 3,3
6,21800854,15,1,72,1,7:12 PM,10:54,,,Antetokounmpo 1' Putback Layup (2 PTS),2 - 3,1
17,21800854,26,1,1,1,7:14 PM,9:26,,,Lopez 26' 3PT Jump Shot (3 PTS) (Antetokounmpo 1 AST),5 - 3,-2
18,21800854,28,1,80,1,7:14 PM,9:08,Bogdanovic 16' Step Back Jump Shot (2 PTS),,,5 - 5,TIE
22,21800854,34,1,76,1,7:15 PM,8:29,,,Antetokounmpo Running Finger Roll Layup (4 PTS),7 - 5,-2


Now that we've seen what the output of `EVENTMSGTYPE` is, let's dig into `EVENTMSGACTIONTYPE`.

For this next exercise, let's pull all unique `EVENTMSGACTIONTYPE` values for `EVENTMSGTYPE = 1`

_Note: `EVENTMSGACTIONTYPE` ids have a very loose correlation to `EVENTMSGTYPE` ids. This means that `EVENTMSGTYPE` ids share some of the same `EVENTMSGACTIONTYPE` ids. This allows the NBA to have a 'Missed Field Goal' share the same '3PT Jump Shot' with a 'Made Field Goal'. Now, that being said, they are not always unique. We'll see this towards the end._

In [53]:
#List unique values in the df['EVENTMSGTYPE'] column
emt_df = df.loc[df['EVENTMSGTYPE'] == 1]
print(f'EVENTMSGACTIONTYPE: {sorted(emt_df.EVENTMSGACTIONTYPE.unique())}')

EVENTMSGACTIONTYPE: [1, 3, 5, 6, 7, 41, 44, 47, 50, 52, 58, 66, 71, 72, 73, 75, 76, 78, 79, 80, 86, 97, 98, 99, 108]


# So how do we know what each `EVENTMSGACTIONTYPE` is?

Let the fun begin.

Apply some regular expressions, that are `EVENTMSGTYPE` specific, against `HOMEDECSRIPTION` and `VISITORDESCRIPTION` while keeping track of the `EVENTMSGACTIONTYPE`. 

To see the regular expressions in action, take the example listed in the comments, along with the regex, and head on over to https://regex101.com/ or your favorite regex interative tool.

# `EVENTMSGTYPE == 1`
The following regex expression `'(\s{2}|\' )([\w+ ]*)` will look for the type of basket within the `VISITORDESCRIPTION` or `HOMEDESCRIPTION` and tie that to the `EVENTMSGACTIONTYPE`.

Example: Given a `VISITORDESCRIPTION == 'Young Cutting Layup Shot (2 PTS) (Collison 1 AST)'` and a `EVENTMSGACTIONTYPE = 98`, the code will produce an output of `CUTTING_LAYUP_SHOT = 99`

Let's see it in action...

_Note: The regex may need to be adjusted over time to account for changes in the data_

In [78]:
#Mapping out all of the EventMsgActionTypes for EventMsgType 1
import re
import operator

#the following expression is specific to EventMsgType 1
p = re.compile('(\s{2}|\' )([\w+ ]*)')

#get the PlayByPlay data from the Pacers game_id
plays = playbyplay.PlayByPlay(game_id).get_normalized_dict()['PlayByPlay']

#declare a few variables
description = ''
event_msg_action_types = {}

#loop over the play by play data
for play in plays:
    if play['EVENTMSGTYPE'] == 1:
        description = play['HOMEDESCRIPTION'] if play['HOMEDESCRIPTION'] is not None else play['VISITORDESCRIPTION']
        if description is not None:
            #do a bit of searching(regex) and a little character magic: underscores and upper case
            event_msg_action = re.sub(' ', '_', p.search(description).groups()[1].rstrip()).upper()
            #Add it to our dictionary
            event_msg_action_types[event_msg_action] = play['EVENTMSGACTIONTYPE']
            
#sort it all
event_msg_action_types = sorted(event_msg_action_types.items(), key=operator.itemgetter(0))

#output a class that we could plug into our code base
for action in event_msg_action_types:
    print(f'\t{action[0]} = {action[1]}')

	3PT_JUMP_SHOT = 1
	3PT_PULLUP_JUMP_SHOT = 79
	3PT_STEP_BACK_JUMP_SHOT = 80
	ALLEY_OOP_DUNK = 52
	CUTTING_DUNK_SHOT = 108
	CUTTING_FINGER_ROLL_LAYUP_SHOT = 99
	CUTTING_LAYUP_SHOT = 98
	DRIVING_FINGER_ROLL_LAYUP = 75
	DRIVING_LAYUP = 6
	DRIVING_REVERSE_LAYUP = 73
	DUNK = 7
	FINGER_ROLL_LAYUP = 71
	FLOATING_JUMP_SHOT = 78
	HOOK_SHOT = 3
	JUMP_BANK_SHOT = 66
	JUMP_SHOT = 1
	LAYUP = 5
	PULLUP_JUMP_SHOT = 79
	PUTBACK_LAYUP = 72
	REVERSE_LAYUP = 44
	RUNNING_DUNK = 50
	RUNNING_FINGER_ROLL_LAYUP = 76
	RUNNING_LAYUP = 41
	STEP_BACK_JUMP_SHOT = 80
	TIP_LAYUP_SHOT = 97
	TURNAROUND_FADEAWAY = 86
	TURNAROUND_HOOK_SHOT = 58
	TURNAROUND_JUMP_SHOT = 47


# `EVENTMSGTYPE == 2`
We'll reuse the regex expression `(\s{2}|' )([\w+ ]*)` from `EVENTMSGTYPE == 1` for `EVENTMSGTYPE == 2`. EventMsgType 2 are missed field goals. Again, it'll look for the type of basket within the `VISITORDESCRIPTION` or `HOMEDESCRIPTION` and tie that to the `EVENTMSGACTIONTYPE`.

Example: Given a `HOMEDESCRIPTION == 'MISS Collison 24' 3PT Jump Shot'` and a `EVENTMSGACTIONTYPE = 2`, the code will produce an output of `3PT_JUMP_SHOT = 1`

Let's see it in action...

In [95]:
#Mapping out all of the EventMsgActionTypes for EventMsgType 2
import re
import operator

#the following expression is specific to EventMsgType 1
p = re.compile('(\s{2}|\' )([\w+ ]*)')

#get the PlayByPlay data from the Pacers game_id
plays = playbyplay.PlayByPlay(game_id).get_normalized_dict()['PlayByPlay']

#declare a few variables
description = ''
event_msg_action_types = {}

#loop over the play by play data
#do a bit of findall(regex) and a little character magic: underscores and upper case
#we're using a findall here as we have to deal with the extra word MISS at the beginning of the text.
#that extra text means we'll have multiple matches for our regex.
for play in plays:
    if play['EVENTMSGTYPE'] == 2:
        match = list()
        if play['HOMEDESCRIPTION'] is not None: 
            match = p.findall(play['HOMEDESCRIPTION'])
        
        if not match:
            match = p.findall(play['VISITORDESCRIPTION'])

        event_msg_action = re.sub(' ', '_', match[0][1]).upper()
        event_msg_action_types[event_msg_action] = play['EVENTMSGACTIONTYPE']
        
       # if play['EVENTMSGACTIONTYPE']
        
event_msg_action_types = sorted(event_msg_action_types.items(), key=operator.itemgetter(0))

for action in event_msg_action_types:
    print(f'\t{action[0]} = {action[1]}')

	3PT_JUMP_SHOT = 1
	3PT_PULLUP_JUMP_SHOT = 79
	3PT_STEP_BACK_JUMP_SHOT = 80
	ALLEY_OOP_DUNK = 52
	ALLEY_OOP_LAYUP = 43
	CUTTING_LAYUP_SHOT = 98
	DRIVING_FINGER_ROLL_LAYUP = 75
	DRIVING_LAYUP = 6
	DRIVING_REVERSE_LAYUP = 73
	FADEAWAY_JUMPER = 63
	FINGER_ROLL_LAYUP = 71
	FLOATING_JUMP_SHOT = 78
	HOOK_SHOT = 3
	JUMP_BANK_SHOT = 66
	JUMP_SHOT = 1
	LAYUP = 5
	PULLUP_JUMP_SHOT = 79
	REVERSE_LAYUP = 44
	RUNNING_FINGER_ROLL_LAYUP = 76
	RUNNING_JUMP_SHOT = 2
	RUNNING_LAYUP = 41
	STEP_BACK_JUMP_SHOT = 80
	TIP_DUNK_SHOT = 107
	TIP_LAYUP_SHOT = 97
	TURNAROUND_FADEAWAY_BANK_JUMP_SHOT = 105
	TURNAROUND_JUMP_SHOT = 47


# What About Blocks?
So if you've taken a close look at the data, especially that where `EVENTMSGTYPE == 2` you may have noticed that a few of the missed field goals were due to some incredible shot blocking players. By adding a few lines of code, we can find these shot blockers. Dealing with this data is a bit beyond the scope of this notebook, but it's worth pointing out that the data is in there. One idea is to play it into it's own play by play block (just a thought).

In [91]:
#Blocks are not included in the event feed but are a part of the EVENTMSGTYPE 2
import re
import operator

print('------------------')

#the following expression is specific to EventMsgType 1
p = re.compile('(\s{2}|\' )([\w+ ]*)')

#get the PlayByPlay data from the Pacers game_id
plays = playbyplay.PlayByPlay(game_id).get_normalized_dict()['PlayByPlay']

#declare a few variables
description = ''
event_msg_action_types = {}

#loop over the play by play data
#do a bit of findall(regex) and a little character magic: underscores and upper case
#we're using a findall here as we have to deal with the extra word MISS at the beginning of the text.
#that extra text means we'll have multiple matches for our regex.
for play in plays:
    if play['EVENTMSGTYPE'] == 2:
        match = list()
        if play['HOMEDESCRIPTION'] is not None: 
            match = p.findall(play['HOMEDESCRIPTION'])

            #looking for blocks
            if len(match) & (play['VISITORDESCRIPTION'] is not None):
                print(play['VISITORDESCRIPTION'])

        if not match:
            match = p.findall(play['VISITORDESCRIPTION'])
            
            #looking for blocks
            if len(match) & (play['HOMEDESCRIPTION'] is not None):
                print(play['HOMEDESCRIPTION'])


        event_msg_action = re.sub(' ', '_', match[0][1]).upper()
        event_msg_action_types[event_msg_action] = play['EVENTMSGACTIONTYPE']
            
event_msg_action_types = sorted(event_msg_action_types.items(), key=operator.itemgetter(0))

print('------------------')


------------------
Antetokounmpo BLOCK (1 BLK)
Joseph BLOCK (1 BLK)
Turner BLOCK (1 BLK)
Young BLOCK (1 BLK)
Turner BLOCK (2 BLK)
Snell BLOCK (1 BLK)
Lopez BLOCK (1 BLK)
------------------
