<font size = "6"> Fordham Sports Analytics Society Big Data Bowl 2023 - Data Preparation </font>

<font size = "4"> Prepare data sets provided in the case for our exploratory work and model building. </font>

- Authors:  Peter Majors, Chris Orlando, and Jack Townsend
- Kaggle:  https://www.kaggle.com/competitions/nfl-big-data-bowl-2023/overview (Resources)
- Our Github:  https://github.com/peterlmajors/FSAS_BigDataBowl_2023 (Up-To-Date Code)

<font size="5"> Importing And Merging Original Data</font>

In [1]:
#Import Required Packages
import pandas as pd
import numpy as np
import math

#Notebook Settings
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)

In [2]:
#Importing Kaggle Data

# #Games - Basic Information On All Games
games = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/games.csv")

# #pffScout - PFF Judgements For Each Player On Each Play
pffScout = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/pffScoutingData.csv")

# #Players - Basics On Players
players = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/players.csv")

# #Plays - Everthing About Specific Plays
plays = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/plays.csv")

# #Week - Frame-By-Frame Player Tracking
week1 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week1.csv")
week2 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week2.csv")
week3 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week3.csv")
week4 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week4.csv")
week5 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week5.csv")
week6 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week6.csv")
week7 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week7.csv")
week8 = pd.read_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/case_data/week8.csv")

In [8]:
#Merging All Data Together (Not Needed If Importing Merged DataFrames In Next Cell)

#Merge Game Info And Play Info, Each Play Now Has More Context
pbp = pd.merge(games, plays, how = "inner", on = "gameId")

# #Append All Week Dataframes #Prefer For Weeks To Stay As Separate
week = pd.DataFrame()
weeks = [week1, week2, week3, week4, week5, week6, week7, week8]
for i in range(0,len(weeks)):
    week = week.append(weeks[i])

# #Merge Player Tracking, PFF Grading, and Player History Data
ptrack = pd.merge(pffScout, week, how = "inner", on = ["gameId", "playId", "nflId"])
ptrack = pd.merge(ptrack, players, how = "inner", on = ["nflId"])

#Export pbp Data Frame For Safe Keeping
pbp.to_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/merged_data/pbp.csv")

  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])
  week = week.append(weeks[i])


<font size="5"> Find Distance of Each Player From The Quarterback </font>

In [9]:
#Find QB Distance Tracking and Merge Onto ptrack DataFrame (Distance As Yards)
ptrack_qb = ptrack[ptrack.pff_positionLinedUp == "QB"]
ptrack_qb = ptrack_qb[['gameId', 'playId', 'frameId', 'x','y', 's', 'a', 'dis','o','dir']]
ptrack = pd.merge(ptrack, ptrack_qb, how = 'inner', on = ['gameId', 'playId', 'frameId'])
ptrack = ptrack.rename(columns = {"x_x":"x", "y_x": "y", "s_x": "s", "a_x": "a", "dis_x": "dis", "o_x": "o", "dir_x": "dir",
                       "x_y":"x_qb", "y_y": "y_qb", "s_y": "s_qb", "a_y": "a_qb", "dis_y": "dis_qb", "o_y": "o_qb", "dir_y": "dir_qb"})
ptrack['dist_from_qb'] = np.hypot((ptrack.x - ptrack.x_qb), (ptrack.y - ptrack.y_qb))

#Remove NaN Values From ptrack
ptrack = ptrack.fillna(0)

<font size="5"> Find Angle of Each Player From The Quarterback and Orientation Deviation From That Angle</font>

In [12]:
#Calculate Angles Between Players And QB (Angles Work Same As Orientation and Direction Metrics From Kaggle)
for i in range(len(ptrack)):
    #Subtract From 90 Degrees To Alter The Starting Axis Of The Angle
    ptrack.loc[i, 'angle_to_qb'] = 90 - (math.atan2(((ptrack.loc[i, 'y_qb'] - ptrack.loc[i, 'y'])**2),((ptrack.loc[i, 'x_qb'] - ptrack.loc[i, 'x'])**2)))*(180/math.pi)

#Adjust For Quadrants Relative To QB Position On The Field
ptrack.loc[(ptrack.y < ptrack.y_qb) & (ptrack.x > ptrack.x_qb), 'angle_to_qb'] = 180 - ptrack.angle_to_qb #Bottom Right
ptrack.loc[(ptrack.y < ptrack.y_qb) & (ptrack.x < ptrack.x_qb), 'angle_to_qb'] = 180 + ptrack.angle_to_qb #Bottom Left
ptrack.loc[(ptrack.y > ptrack.y_qb) & (ptrack.x < ptrack.x_qb), 'angle_to_qb'] = 360 - ptrack.angle_to_qb #Top Left

#Calculate Deviation Between Orientation of Player And Their Angle To The Quarterback
#Evaluating Offensive Linemen, This Means We Want To Measure If Their Backs Are To The QB They're Protecting
#Positive = Player's Left Shoulder Turned Towards QB So Many Degrees // Negative = Player's Right Shoudler Turned Towards QB So Many Degrees
ptrack.angle_to_qb_diff_o = ptrack.o - ptrack.angle_to_qb

#Change QB Angles To Themselves To 0
ptrack.loc[ptrack['pff_positionLinedUp'] == 'QB', 'angle_to_qb'] = 0
ptrack.loc[ptrack['pff_positionLinedUp'] == 'QB', 'angle_to_qb_diff_o'] = 0


  ptrack.angle_to_qb_diff_o = ptrack.o - ptrack.angle_to_qb


In [50]:
#Attempt To Create Faster Implementation of QB-To-Player Angle (First For Loop In Previous Code Block)

# #Previous Mean
# ptrack.angle_to_qb.mean()

# #Previous Math
# 90 + (math.atan2(((ptrack.loc[1, 'y_qb'] - ptrack.loc[1, 'y'])**2),((ptrack.loc[1, 'x_qb'] - ptrack.loc[1, 'x'])**2)))*(180/math.pi)

# #Proposed Math (Using Numpy (np))
# (np.degrees(np.arctan2(((ptrack.y_qb - ptrack.y)**2), (ptrack.x_qb - ptrack.x)**2))).mean()

174.36871305154258

<font size = '5'> Create The Responsibility Zone </font>

In [20]:
#Create The Responsibility Zone In The Main ptrack Data Frame 

#Define The Width And Depth Of This Zone
resp_box_width = .75
resp_box_depth = 1

#bl = Bottom Left
ptrack['bl_rz_x'] = ptrack.x - (resp_box_width * np.cos(np.radians(ptrack.o)))
ptrack['bl_rz_y'] = ptrack.y + (resp_box_width * np.sin(np.radians(ptrack.o)))

#br = Bottom Right
ptrack['br_rz_x'] = ptrack.x + (resp_box_width * np.cos(np.radians(ptrack.o)))
ptrack['br_rz_y'] = ptrack.y - (resp_box_width * np.sin(np.radians(ptrack.o)))

#fl = Front Left
ptrack['fl_rz_x'] = ptrack.bl_rz_x + (resp_box_depth * np.cos(np.radians(ptrack.o - 90)))
ptrack['fl_rz_y'] = ptrack.bl_rz_y - (resp_box_depth * np.sin(np.radians(ptrack.o - 90)))

#fr = Front Right
ptrack['fr_rz_x'] = ptrack.br_rz_x + (resp_box_depth * np.cos(np.radians(ptrack.o - 90)))
ptrack['fr_rz_y'] = ptrack.br_rz_y - (resp_box_depth * np.sin(np.radians(ptrack.o - 90)))


<font size="5"> Export Player Tracking Data Frame With New Fields </font>

In [18]:
#Export Distance Data Frame To .csv
ptrack.to_csv("C:/Users/Peter/Python Scripts/Case Competitions/NFL Big Data Bowl 2023/merged_data/ptrack.csv")