# NBA Advanced Stats Video

[NBA Advanced Stats](https://www.nba.com/stats/) maintains a videos of (pretty much) every play in every game.
We can use the play-by-play data to get the video URL of any given play. 

Leverages the [`nba_api`](https://github.com/swar/nba_api) Python package, specifically the [`playbyplayv2`](https://github.com/swar/nba_api/blob/master/docs/nba_api/stats/endpoints/playbyplayv2.md) and [`videoeventsasset`](https://github.com/swar/nba_api/blob/master/docs/nba_api/stats/endpoints/videoevents.md) endpoints.

In [43]:
# Imports
import numpy as np
import pandas
import urllib.parse
import requests

from nba_api.stats.endpoints import playbyplayv2


## Step 1 - Getting Play-by-Play Data

To get the video for a play, we will need a `game_id` and an `event_id`.
`game_id` can be sourced a number of different ways, including: 
- `nba_api` endoints (e.g.  `leaguegamefinder` or `scoreboardv2`) 
- [NBA.com](https://www.nba.com) using the box score URL

The example `game_id` below is from the [CHI @ CLE]() game, Jan 2nd 2023 (Donovan Mitchell's 71-point game). We'll be looking Mitchell's putback off his own missed freethrow at the end of the fourth quarter.

Once we have the `game_id` we can source the `event_id` from the play-by-play data.

In [44]:
# Get the play-by-play data for the game
game_id = "0022200552" # CHI @ CLE, 2023-01-02
df = playbyplayv2.PlayByPlayV2(game_id).get_data_frames()[0]

# Looking at plays at the end of the fourth quarter
df[df['PERIOD'] == 4].tail(10)

Unnamed: 0,GAME_ID,EVENTNUM,EVENTMSGTYPE,EVENTMSGACTIONTYPE,PERIOD,WCTIMESTRING,PCTIMESTRING,HOMEDESCRIPTION,NEUTRALDESCRIPTION,VISITORDESCRIPTION,...,PLAYER2_TEAM_NICKNAME,PLAYER2_TEAM_ABBREVIATION,PERSON3TYPE,PLAYER3_ID,PLAYER3_NAME,PLAYER3_TEAM_ID,PLAYER3_TEAM_CITY,PLAYER3_TEAM_NICKNAME,PLAYER3_TEAM_ABBREVIATION,VIDEO_AVAILABLE_FLAG
488,22200552,697,8,0,4,9:42 PM,0:04,,,SUB: Drummond FOR White,...,Bulls,CHI,0,0,,,,,,0
489,22200552,702,4,0,4,9:42 PM,0:03,Mitchell REBOUND (Off:3 Def:4),,,...,,,0,0,,,,,,1
490,22200552,703,1,72,4,9:44 PM,0:03,Mitchell 2' Putback Layup (58 PTS),,,...,,,0,0,,,,,,1
491,22200552,705,9,1,4,9:44 PM,0:03,,,Bulls Timeout: Regular (Reg.6 Short 0),...,,,0,0,,,,,,0
492,22200552,706,8,0,4,9:44 PM,0:03,SUB: Osman FOR Lopez,,,...,Cavaliers,CLE,0,0,,,,,,0
493,22200552,707,8,0,4,9:44 PM,0:03,,,SUB: White FOR Williams,...,Bulls,CHI,0,0,,,,,,0
494,22200552,708,8,0,4,9:44 PM,0:03,,,SUB: Dosunmu FOR Drummond,...,Bulls,CHI,0,0,,,,,,0
495,22200552,712,2,63,4,9:45 PM,0:00,,,MISS DeRozan 27' 3PT Fadeaway Jumper,...,,,0,0,,,,,,1
496,22200552,713,4,0,4,9:45 PM,0:00,,,Bulls Rebound,...,,,0,0,,,,,,0
497,22200552,714,13,0,4,9:45 PM,0:00,,End of 4th Period (9:45 PM EST),,...,,,0,0,,,,,,1


We can see that the EVENTNUM for the putback is `703`. We can use this to format the NBA video endpoint with the information we have about the play.

There are two different ways to get video URLs:

### A. Using just the play-by-play data we already have.
   
**Advantages**: Uses info we already have; no more requests to the NBA.com api are needed

**Disadvantages**: Generates the url for the NBA.com page, not the video file itself

### B. Using an additional request to the `videoeventsasset` endpoint
   
**Advantages**: Generates the url for the video file itself

**Disadvantages**: Requires an additional request; if we want to get URLs for multiple plays it's a new request for each one.

In [45]:
event_id = 703

## Step 2A - Just Play-by-Play Data

The last step to use this method is to extract and format the description of the play. Note that the description of a given play is shown in one of three columns: 
- `HOMEDESCRIPTION`;
- `NEUTRALDESCRIPTION`; or 
- `VISITORDESCRIPTION`), 

depending on which team the action is with.

In [38]:
def eventDescription(df, INDEX=False, EVENTNUM=False, URL=True):
  # Given an NBA play-by-play DataFrame and EVENTUM, returns the description for that play
  # Accepts the DataFrame index or the NBA-provided EVENTNUM for the play

  if EVENTNUM:
    INDEX = df.index[df['EVENTNUM'] == int(EVENTNUM)].tolist()[0]

  if not INDEX:
    print('No location entered')
    return
  
  else:
    # Get all three possible description locations and find the first that isn't "None"
    descRows = ['HOMEDESCRIPTION','NEUTRALDESCRIPTION','VISITORDESCRIPTION']
    descValues = df.iloc[INDEX][descRows].values.flatten().tolist() 
    descEvent = next((item for item in descValues if item is not None), 'No Description')
    
    # Optionally return a readable description instead of a URL-parsed version
    if URL:
      return urllib.parse.quote(descEvent)
    else:
      return descEvent

In [47]:
def getEventVidPage(df, event_id, season):
  # Given a game's play-by-play DataFrame and an event within that game, returns the NBA.com video page of that event.
  # At the moment requires the season to be entered manually, but will eventually not require that addition.
  # Season is a string of the season years in the form 'YYYY-YY' (e.g. '2022-23')

  game_id = df.at[0, 'GAME_ID']
  description = eventDescription(df, EVENTNUM=event_id)

  vidURL = 'https://www.nba.com/stats/events?CFID=&CFPARAMS=&GameEventID={}&GameID={}&Season={}&flag=1&title={}'.format(
    event_id,
    game_id,
    season,
    description
  )

  return vidURL

season = '2022-23'
getEventVidPage(df, event_id, season)

'https://www.nba.com/stats/events?CFID=&CFPARAMS=&GameEventID=703&GameID=0022200552&Season=2022-23&flag=1&title=Mitchell%202%27%20Putback%20Layup%20%2858%20PTS%29'

## Step 2B - `videoeventsasset` Request

The `videoeventsasset` endpoint is not supported by `nba_api`, so we have to do the request ourselves.

In [52]:
def getEventVidURL(event_id, game_id, resolution='LARGE'):
  # Given a game_id and event_id from within that game, returns the video URL of that event.
  # Also optionally accepts different video resolutions as a string input (SMALL, MEDIUM, LARGE)

  headers = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'x-nba-stats-origin': 'stats',
    'x-nba-stats-token': 'true',
    'Connection': 'keep-alive',
    'Referer': 'https://stats.nba.com/',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache'
  }

  vidRes = {
    'SMALL': 'surl',
    'MEDIUM': 'murl',
    'LARGE': 'lurl'
  }

  vidURL = 'https://stats.nba.com/stats/videoeventsasset?GameEventID={}&GameID={}'.format(
    event_id, 
    game_id)

  r = requests.get(vidURL, headers=headers)
  json = r.json()
  videoUrls = json['resultSets']['Meta']['videoUrls']
  playlist = json['resultSets']['playlist']
  
  return videoUrls[0][vidRes[resolution]]


getEventVidURL(event_id, game_id)

'https://videos.nba.com/nba/pbp/media/2023/01/02/0022200552/703/09495445-c118-67ad-b084-f9dcbe97b7db_1280x720.mp4'

## Final URLs

The two URLs we've retrieved from the two methods are:

1. [Method 1](https://www.nba.com/stats/events?CFID=&CFPARAMS=&GameEventID=703&GameID=0022200552&Season=2022-23&flag=1&title=Mitchell%202%27%20Putback%20Layup%20%2858%20PTS%29)

`https://www.nba.com/stats/events?CFID=&CFPARAMS=&GameEventID=703&GameID=0022200552&Season=2022-23&flag=1&title=Mitchell%202%27%20Putback%20Layup%20%2858%20PTS%29`

2. [Method 2](https://videos.nba.com/nba/pbp/media/2023/01/02/0022200552/703/09495445-c118-67ad-b084-f9dcbe97b7db_1280x720.mp4)

`https://videos.nba.com/nba/pbp/media/2023/01/02/0022200552/703/09495445-c118-67ad-b084-f9dcbe97b7db_1280x720.mp4`