<font size = "6"> Fordham Sports Analytics Society Big Data Bowl 2023 - Data Preparation </font>

<font size = "4"> Prepare data sets provided in the case for our exploratory work and model building. </font>

- Authors:  Peter Majors, Chris Orlando, and Jack Townsend
- Kaggle:  https://www.kaggle.com/competitions/nfl-big-data-bowl-2023/overview (Resources)
- Our Github:  https://github.com/peterlmajors/FSAS_BigDataBowl_2023 (Up-To-Date Code)

<font size="5"> Importing And Merging Original Data</font>

In [2]:
#Import Required Packages
import pandas as pd
import numpy as np
import math

#Notebook Settings
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)

In [2]:
#Importing Kaggle Data

# #Games - Basic Information On All Games
games = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/games.csv")

# #pffScout - PFF Judgements For Each Player On Each Play
pffScout = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/pffScoutingData.csv")

# #Players - Basics On Players
players = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/players.csv")

# #Plays - Everthing About Specific Plays
plays = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/plays.csv")

# #Week - Frame-By-Frame Player Tracking
week1 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week1.csv")
week2 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week2.csv")
week3 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week3.csv")
week4 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week4.csv")
week5 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week5.csv")
week6 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week6.csv")
week7 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week7.csv")
week8 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week8.csv")

In [8]:
#Merging All Data Together (Not Needed If Importing Merged DataFrames In Next Cell)

#Merge Game Info And Play Info, Each Play Now Has More Context
pbp = pd.merge(games, plays, how = "inner", on = "gameId")

# #Append All Week Dataframes #Prefer For Weeks To Stay As Separate
week = pd.DataFrame()
weeks = [week1, week2, week3, week4, week5, week6, week7, week8]
for i in range(0,len(weeks)):
    week = week.append(weeks[i])

# #Merge Player Tracking, PFF Grading, and Player History Data
ptrack = pd.merge(pffScout, week, how = "inner", on = ["gameId", "playId", "nflId"])
ptrack = pd.merge(ptrack, players, how = "inner", on = ["nflId"])

#Export pbp Data Frame For Safe Keeping
pbp.to_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/merged_data/pbp.csv")

  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])


<font size="5"> Find Distance of Each Player From The Quarterback </font>

In [9]:
#Find QB Distance Tracking and Merge Onto ptrack DataFrame (Distance As Yards)
ptrack_qb = ptrack[ptrack.pff_positionLinedUp == "QB"]
ptrack_qb = ptrack_qb[['gameId', 'playId', 'frameId', 'x','y', 's', 'a', 'dis','o','dir']]
ptrack = pd.merge(ptrack, ptrack_qb, how = 'inner', on = ['gameId', 'playId', 'frameId'])
ptrack = ptrack.rename(columns = {"x_x":"x", "y_x": "y", "s_x": "s", "a_x": "a", "dis_x": "dis", "o_x": "o", "dir_x": "dir",
                       "x_y":"x_qb", "y_y": "y_qb", "s_y": "s_qb", "a_y": "a_qb", "dis_y": "dis_qb", "o_y": "o_qb", "dir_y": "dir_qb"})
ptrack['dist_from_qb'] = np.hypot((ptrack.x - ptrack.x_qb), (ptrack.y - ptrack.y_qb))

#Remove NaN Values From ptrack
ptrack = ptrack.fillna(0)

<font size="5"> Find Angle of Each Player From The Quarterback and Orientation Deviation From That Angle</font>

In [12]:
#Calculate Angles Between Players And QB (Angles Work Same As Orientation and Direction Metrics From Kaggle)
for i in range(len(ptrack)):
    #Subtract From 90 Degrees To Alter The Starting Axis Of The Angle
    ptrack.loc[i, 'angle_to_qb'] = 90 - (math.atan2(((ptrack.loc[i, 'y_qb'] - ptrack.loc[i, 'y'])**2),((ptrack.loc[i, 'x_qb'] - ptrack.loc[i, 'x'])**2)))*(180/math.pi)

#Adjust For Quadrants Relative To QB Position On The Field
ptrack.loc[(ptrack.y < ptrack.y_qb) & (ptrack.x > ptrack.x_qb), 'angle_to_qb'] = 180 - ptrack.angle_to_qb #Bottom Right
ptrack.loc[(ptrack.y < ptrack.y_qb) & (ptrack.x < ptrack.x_qb), 'angle_to_qb'] = 180 + ptrack.angle_to_qb #Bottom Left
ptrack.loc[(ptrack.y > ptrack.y_qb) & (ptrack.x < ptrack.x_qb), 'angle_to_qb'] = 360 - ptrack.angle_to_qb #Top Left

#Calculate Deviation Between Orientation of Player And Their Angle To The Quarterback
#Evaluating Offensive Linemen, This Means We Want To Measure If Their Backs Are To The QB They're Protecting
#Positive = Player's Left Shoulder Turned Towards QB So Many Degrees // Negative = Player's Right Shoudler Turned Towards QB So Many Degrees
ptrack.angle_to_qb_diff_o = ptrack.o - ptrack.angle_to_qb

#Change QB Angles To Themselves To 0
ptrack.loc[ptrack['pff_positionLinedUp'] == 'QB', 'angle_to_qb'] = 0
ptrack.loc[ptrack['pff_positionLinedUp'] == 'QB', 'angle_to_qb_diff_o'] = 0


  ptrack.angle_to_qb_diff_o = ptrack.o - ptrack.angle_to_qb


In [50]:
#Attempt To Create Faster Implementation of QB-To-Player Angle (First For Loop In Previous Code Block)

#Previous Mean
ptrack.angle_to_qb.mean()

#Previous Math
90 + (math.atan2(((ptrack.loc[1, 'y_qb'] - ptrack.loc[1, 'y'])**2),((ptrack.loc[1, 'x_qb'] - ptrack.loc[1, 'x'])**2)))*(180/math.pi)

#Proposed Math (Using Numpy (np))
(np.degrees(np.arctan2(((ptrack.y_qb - ptrack.y)**2), (ptrack.x_qb - ptrack.x)**2))).mean()

174.36871305154258

<font size = '5'> Determine When Pass Rushers Enter Immediate Zone On Known Blocking Plays (First Block In Play) </font>

In [77]:
#Find All Tracking Data For All Frames Where Blocks Occur Between Rushers and Passers

#Create Data Frame With Only Pass Blocking Plays
ptrack_block = ptrack.loc[ptrack['pff_blockType'] != '0']

#Find Rows With Players Who Were Blocked Against (And Who Have The Role of Pass Rusher)
ptrack_block_rushers = ptrack.loc[(ptrack.nflId.isin(ptrack_block.pff_nflIdBlockedPlayer)) & (ptrack.pff_role == "Pass Rush")]
ptrack_block_rushers = ptrack_block_rushers[['gameId', 'playId', 'nflId', 'frameId', 'pff_role', 'pff_positionLinedUp', 'x', 'y', 's', 'a', 'dis', 'o', 'dir', 'displayName']]

#Merge Pass Blocking Plays Data With Pass Rusher Data
ptrack_imm_box = ptrack_block.merge(ptrack_block_rushers, left_on = ['gameId', 'playId', 'pff_nflIdBlockedPlayer', 'frameId'], right_on = ['gameId', 'playId', 'nflId', 'frameId'], how = 'inner')

#Reduce To Columns Of Interest
ptrack_imm_box = ptrack_imm_box[['gameId', 'playId', 'nflId_x', 'frameId', 'pff_role_x', 'pff_positionLinedUp_x', 'pff_blockType','x_x', 'y_x', 's_x', 'a_x', 'dis_x', 'o_x', 'dir_x', 'displayName_x','nflId_y', 'pff_role_y', 'pff_positionLinedUp_y', 'x_y', 'y_y', 's_y', 'a_y', 'dis_y', 'o_y', 'dir_y', 'displayName_y']]

#Rename Columns
ptrack_imm_box = ptrack_imm_box.rename(columns = {"nflId_x":"nflId_blocker", "displayName_x": "displayName_blocker", "pff_role_x": "pff_role_blocker", "pff_positionLinedUp_x": "pff_positionLinedUp_blocker", "x_x": "x_blocker", "y_x": "y_blocker", "s_x": "s_blocker", "a_x": "a_blocker", "dis_x": "dis_blocker", "o_x": "o_blocker", "dir_x": "dir_blocker", "nflId_y":"nflId_rusher", "displayName_y": "displayName_rusher", "pff_role_y": "pff_role_rusher", "pff_positionLinedUp_y": "pff_positionLinedUp_rusher", "x_y": "x_rusher", "y_y": "y_rusher", "s_y": "s_rusher", "a_y": "a_rusher", "dis_y": "dis_rusher", "o_y": "o_rusher", "dir_y": "dir_rusher"})

#Calculate Distance Between Pass Blocker And Pass Rusher at Each Frame
ptrack_imm_box['blocker_rusher_distance'] = round(np.hypot((ptrack_imm_box.y_blocker - ptrack_imm_box.y_rusher), (ptrack_imm_box.x_blocker - ptrack_imm_box.x_rusher)),2)

#Calculate Difference Between Rusher Direction and Blocker Orientation
ptrack_imm_box['diff_btw_rusher_dir_blocker_o'] = abs((ptrack_imm_box.dir_rusher - 180) - ptrack_imm_box.o_blocker)


In [78]:
#Create The Immediate Zone For Pass Blockers And Vertices Between Coordinates Of Interest

#Define The Width And Depth Of This Zone
imm_box_width = .75
imm_box_depth = 1.20

#bl = Bottom Left Immediate Zone Coordinate
ptrack_imm_box['bl_im_x'] = round(ptrack_imm_box.x_blocker - (imm_box_width * np.cos(np.radians(ptrack_imm_box.o_blocker))),2)
ptrack_imm_box['bl_im_y'] = round(ptrack_imm_box.y_blocker + (imm_box_width * np.sin(np.radians(ptrack_imm_box.o_blocker))),2)

#br = Bottom Right Immediate Zone Coordinate
ptrack_imm_box['br_im_x'] = round(ptrack_imm_box.x_blocker + (imm_box_width * np.cos(np.radians(ptrack_imm_box.o_blocker))),2)
ptrack_imm_box['br_im_y'] = round(ptrack_imm_box.y_blocker - (imm_box_width * np.sin(np.radians(ptrack_imm_box.o_blocker))),2)

#fl = Front Left Immediate Zone Coordinate
ptrack_imm_box['fl_im_x'] = round(ptrack_imm_box.bl_im_x + (imm_box_depth * np.cos(np.radians(ptrack_imm_box.o_blocker - 90))),2)
ptrack_imm_box['fl_im_y'] = round(ptrack_imm_box.bl_im_y - (imm_box_depth * np.sin(np.radians(ptrack_imm_box.o_blocker - 90))),2)

#fr = Front Right Immediate Zone Coordinate
ptrack_imm_box['fr_im_x'] = round(ptrack_imm_box.br_im_x + (imm_box_depth * np.cos(np.radians(ptrack_imm_box.o_blocker - 90))),2)
ptrack_imm_box['fr_im_y'] = round(ptrack_imm_box.br_im_y - (imm_box_depth * np.sin(np.radians(ptrack_imm_box.o_blocker - 90))),2)

In [79]:
#Calculate Relevant Vertcies Between Points In The Immediate Zone And Dot Products of The Vertices

#Vertex From Bottom Left Corner To Top Left of Immediate Zone
ptrack_imm_box['AB_x'] = ptrack_imm_box.fl_im_x - ptrack_imm_box.bl_im_x
ptrack_imm_box['AB_y'] = ptrack_imm_box.fl_im_y - ptrack_imm_box.bl_im_x
#Vertex From Top Left Corner To Top Right Corner of Immediate Zone
ptrack_imm_box['BC_x'] = ptrack_imm_box.fr_im_x - ptrack_imm_box.fl_im_x
ptrack_imm_box['BC_y'] = ptrack_imm_box.fr_im_y - ptrack_imm_box.fl_im_y
#Vertex From Bottom Left Corner of Immediate Zone To Blocker
ptrack_imm_box['AM_x'] = ptrack_imm_box.x_blocker - ptrack_imm_box.bl_im_x
ptrack_imm_box['AM_y'] = ptrack_imm_box.y_blocker - ptrack_imm_box.bl_im_y
#Vertex From Top Left Corner of Immediate Zone To Blocker
ptrack_imm_box['BM_x'] = ptrack_imm_box.x_blocker - ptrack_imm_box.fl_im_x
ptrack_imm_box['BM_y'] = ptrack_imm_box.y_blocker - ptrack_imm_box.fl_im_y

#Calculate Dot Products of Vertices

#AB AM Dot Product
ptrack_imm_box['AB_AM_dot'] = (ptrack_imm_box.AB_x * ptrack_imm_box.AM_x) + (ptrack_imm_box.AB_y * ptrack_imm_box.AM_y)
#AB AB Dot Product
ptrack_imm_box['AB_AB_dot'] = (ptrack_imm_box.AB_x * ptrack_imm_box.AB_x) + (ptrack_imm_box.AB_y * ptrack_imm_box.AB_y)
#BC BM Dot Product
ptrack_imm_box['BC_BM_dot'] = (ptrack_imm_box.BC_x * ptrack_imm_box.BM_x) + (ptrack_imm_box.BC_y * ptrack_imm_box.BM_y)
#BC BC Dot Product
ptrack_imm_box['BC_BC_dot'] = (ptrack_imm_box.BC_x * ptrack_imm_box.BC_x) + (ptrack_imm_box.BC_y * ptrack_imm_box.BC_y)

In [80]:
#Binary Column Created Determining If A Rusher Is In A Pass Blocker's Immediate Zone
ptrack_imm_box['rusher_in_imm_box'] = np.where(((0 <= ptrack_imm_box.AB_AM_dot) & (ptrack_imm_box.AB_AM_dot <= ptrack_imm_box.AB_AB_dot) & (0 <= ptrack_imm_box.BC_BM_dot) & (ptrack_imm_box.BC_BM_dot <= ptrack_imm_box.BC_BC_dot)), True, False)

#Merge The ptrack_imm_box Data Frame Onto ptrack
ptrack_imm_box = ptrack_imm_box[['gameId','playId','nflId_blocker','frameId', 'rusher_in_imm_box']]
ptrack = ptrack.merge(ptrack_imm_box, left_on = ['gameId','playId','nflId','frameId'], right_on = ['gameId','playId','nflId_blocker','frameId'], how = 'left')

<font size="5"> Export Player Tracking Data Frame With New Fields </font>

In [18]:
#Export Distance Data Frame To .csv
ptrack.to_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/merged_data/ptrack.csv")