# Best Pass Defenders To Break on The Ball

In this notebook, we will identify which players are best at defending the pass while the ball is in the air. On pass plays, it's the defender's goal to prevent pass catchers from catching the ball in any way that doesn't draw a flag. In zone, defenders are covering a designated area of the field while watching the Quarterback. On the other hand, in Man-to-Man coverage, the defender is tracking a receiver step for step. 

With both coverages, defenders must rely on speed, agility, and quick reflexes to prevent the receiver from catching the ball. Once the Quarterback throws the ball in a receiver's direction,  a close defender has the option to either play the man or to go after the ball for a pass deflection or an interception. We will show which players cover the most ground while the ball is in the air, resulting in either an incompletion or interception. 



In [None]:
import pandas as pd
import numpy as np
import glob

- We use all weeks except for Week 2 due to anomolies in the data.

In [None]:
#Players & Play DataFrames
plays = pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv')
players = pd.read_csv('../input/nfl-big-data-bowl-2021/players.csv')
#Adjust regex accordingly for certain weeks
filenames = glob.glob('../input/nfl-big-data-bowl-2021/*[1|3-9].csv')

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

#Tracking Data excluding week 2
week = pd.concat(dfs, ignore_index=True)


## **1.0 Preprocessing**

- For all pass plays the play description indicates the intended receiver. For incompletions, the defender in coverage is in parentheses. While for a completed pass, the defender in parentheses is the player that made the tackle.
- We will extract all the intended receivers while also extracting the defenders for incomplete or intercepted passes by using regex.

In [None]:
week.head()

- Extracting Intended Receiver (D.Goedert) and player in coverage (D.Riley)  

In [None]:
plays['playDescription'][22]

In [None]:
# Regex Extract the defending passer in parentheses, intended receiver, and the player who intercepted
plays['passDefender'] = plays['playDescription'].str.extract(r'\(([A-Z]\.[a-zA-Z].*?)\)', expand=True)
plays['interceptedDefender'] = plays['playDescription'].str.extract(r'[INTERCEPTED by]\s([A-Z]\.[a-zA-Z]+)')

# Replace intended receiver with the Interceptor
plays.loc[(~plays['interceptedDefender'].isnull()), 'passDefender'] = plays['interceptedDefender']
plays.drop('interceptedDefender',axis=1, inplace=True)

#IF multiple defenders, use the first defender
defenders = plays['passDefender'].str.split(',', n=2, expand=True)
plays['passDefender'] = defenders[0]

#Extract indented receiver
plays['intendedReceiver'] = plays['playDescription'].str.extract(r'[to|intended for]\s([A-Z]\.[a-zA-Z]+)')

In [None]:
#Relevant columns 
cols = ['gameId', 'playId', 'frameId', 'x','y', 's', 'a', 'event', 'nflId', 'displayName', 'position', ]

week = week[cols]

#add Defender in coverage and Intended Receiver to tracking data
week = pd.merge(week, plays[['gameId','playId','passResult','passDefender','intendedReceiver']], how='left', on=['gameId','playId'])

In [None]:
#Normalize Name for Joins
names = week['displayName'].str.split(n=2, expand=True)
week['displayName_norm'] = names[0].str[0]+'.'+names[1]

In [None]:
#Keep trackign data of extracted players from playDescription.
thrown_at = week.loc[(week['intendedReceiver']==week['displayName_norm'])|(week['passDefender']==week['displayName_norm'])]
thrown_at.replace(to_replace='None', value=np.nan)

In [None]:
#Identify the offensive and Defensive players for future sort.
thrown_at['Position'] = np.where(thrown_at['position'].isin(['RB','WR','TE','HB','FB']),'O','D')
thrown_at = thrown_at.sort_values(['gameId', 'playId', 'frameId', 'Position'])

#Remove nulls
thrown_at = thrown_at.loc[thrown_at['passDefender'].notna()]
thrown_at = thrown_at.loc[thrown_at['intendedReceiver'].notna()]

- Each play in the player tracking data is tagged with an *event* label, which indicates details of the play such as ball snap, pass release (pass_forward), pass arrive (pass_arrived), etc.
- We will focus only on pass_forward and pass_arrived by calculating the Euclidean Distance between the defender and the receiver and the defender and the ball for both events.
- We transpose the player tracking data to have each play information in one row

In [None]:
week.head()

In [None]:
#GroupBy, Transpose
thrown_at = thrown_at[thrown_at['event'].isin(['pass_forward','pass_arrived'])]

play_group = thrown_at.groupby(['gameId', 'playId'])
X=play_group.agg({'displayName':lambda x:x.tolist(),'event':lambda x:x.tolist(),
                  'x':lambda x:x.tolist(), 'y':lambda x:x.tolist(),
                  's':lambda x:x.tolist(),'a': lambda x:x.tolist()})

#Keep plays only with 4 events
X = X.reset_index()
X = X.loc[X['event'].str.len() == 4]

In [None]:
X[['D_displayName','O_displayName', 'throwaway', 'throwaway1']] = pd.DataFrame(X.displayName.tolist(), index= X.index)
X[['D_forward_x','O_forward_x','D_arrived_x','O_arrived_x']] = pd.DataFrame(X.x.tolist(), index= X.index)
X[['D_forward_y','O_forward_y','D_arrived_y','O_arrived_y']] = pd.DataFrame(X.y.tolist(), index= X.index)
X[['D_forward_s','O_forward_s','D_arrived_s','O_arrived_s']] = pd.DataFrame(X.s.tolist(), index= X.index)
X[['D_forward_a','O_forward_a','D_arrived_a','O_arrived_a']] = pd.DataFrame(X.a.tolist(), index= X.index)

In [None]:
X = X.drop(['displayName','event','x','y','s','a','throwaway','throwaway1'], axis=1)

In [None]:
#Add football location
football_forward = week.loc[(week['displayName']=='Football')&(week['event']=='pass_forward')]
football_arrival = week.loc[(week['displayName']=='Football')&(week['event']=='pass_arrived')]

X = pd.merge(X, football_forward[['gameId','playId','x','y']], how='left', on=['gameId','playId']).rename(columns={'x': 'ball_forward_x', 'y': 'ball_forward_y'})
X = pd.merge(X, football_arrival[['gameId','playId','x','y']], how='left', on=['gameId','playId']).rename(columns={'x': 'ball_arrived_x', 'y': 'ball_arrived_y'})

In [None]:
def euclidean_distance(x1,y1,x2,y2):
    dx = (x1-x2)**2
    dy = (y1-y2)**2
    dist = np.sqrt(dx+dy)
    
    return dist

X['forwardDisatnce_players'] = X.apply(lambda x: euclidean_distance(x['D_forward_x'], x['D_forward_y'], x['O_forward_x'], x['O_forward_y']), axis=1)
X['arrivedDisatnce_players'] = X.apply(lambda x: euclidean_distance(x['D_arrived_x'], x['D_arrived_y'], x['O_arrived_x'], x['O_arrived_y']), axis=1)
# X['distanceDiff_players'] = X['forwardDisatnce_players']-X['arrivedDisatnce_players']

X['forwardDisatnce_D_to_football'] = X.apply(lambda x: euclidean_distance(x['D_forward_x'], x['D_forward_y'], x['ball_forward_x'], x['ball_forward_y']), axis=1)
X['arrivedDisatnce_D_to_football'] = X.apply(lambda x: euclidean_distance(x['D_arrived_x'], x['D_arrived_y'], x['ball_arrived_x'], x['ball_arrived_y']), axis=1)
# X['distanceDiff_ball'] = X['forwardDisatnce_D_to_football']-X['arrivedDisatnce_D_to_football']

In [None]:
X.shape

In [None]:
X

In [None]:
#Add pass result
X = pd.merge(X, plays[['gameId', 'playId', 'passResult']], how='left', on=['gameId', 'playId'])
X.shape

- When the Quarterback throws their intended ball to a receiver, the defender in coverage must decide quickly whether they want to play the receiver or the ball. Going after the receiver, you have a better opportunity to tackle the defender once he catches the ball while going for the ball is riskier but can pay off with an interception if timed correctly. 
- Going for the ball or the defender varies in each situation. Thus for this analysis, the variable arrived_distance indicates the smaller distance of the ball or the player to the defender. Assuming the smaller distance is whether the defender brakes on the player or the ball.

In [None]:
X.loc[X['arrivedDisatnce_players'] < X['arrivedDisatnce_D_to_football'], 'arrived_distance'] = X['arrivedDisatnce_players']
X.loc[X['arrivedDisatnce_players'] > X['arrivedDisatnce_D_to_football'], 'arrived_distance'] = X['arrivedDisatnce_D_to_football']

X['break_on_the_ball'] = X['forwardDisatnce_players'] - X['arrived_distance']

## 2.0 Analysis
- In this analysis, we will examine which defenders can cover the most group to cause an incompletion or a turnover on a pass. We filter on Interceptions and Incompletions and get rid of any negative plays, which are anomalies.


In [None]:
break_ups = X[(X['passResult'].isin(['I', 'IN']))&(X['break_on_the_ball']>0)]

break_ups.sort_values('break_on_the_ball', ascending=False).head(10)

- Some of the records with a high break_on_the_ball value are due to HailMary's where the Quarterback throws up a punt-like pass, allowing time for all his receivers to get under the ball in hopes for a miracle catch. To avoid these cases, we will use the median of break_on_the_ball.
- Minimum 5 plays

In [None]:
#use median for hail mary outliers
leaders = break_ups.groupby(['D_displayName']).agg({'break_on_the_ball':'median', 'playId':'count'}).reset_index().rename(columns={'playId': 'num_plays'})
leaders = leaders[leaders['num_plays']>=5].sort_values('break_on_the_ball', ascending=False)
leaders = leaders.rename(columns={"D_displayName":"Name"})
leaders.head(15)

In [None]:
# leaders[leaders['Name']=='Darius Slay']

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(rc={'figure.figsize':(11.7,8.27)})
ax = sns.barplot(x="Name", y="break_on_the_ball", data=leaders.head(10), palette="Greens_d")
plt.xticks(rotation=70)
plt.title("Median Yards Defender Took Before Causing INT/Incompletion ")

In [None]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
ax = sns.barplot(x="Name", y="num_plays", data=leaders.sort_values('num_plays', ascending=False).head(10), palette="Reds_d")
plt.xticks(rotation=70)
plt.title("Most Plays Broken Up")


- Knowing an estimated range of how fast a defender can get to the ball can help coaches decide defensive schemes. Coaches can give players like Kevin Byard (avg 9.73 yards to make a play on the ball) a more prominent region to cover when playing zone since he has a great range. Having a player like this can shrink other parts of the field for fellow defenders. 
- While players like Darius Slay (15 total breakups and avg 0.66 yards to make a play on a ball) are probably in man coverage more than zone due to their low number of yards when causing pass disruptions. 


## 3.0: Further Work
- In future work, instead of using the tackler's name as the pass defender for completed passes, we plan to use the player tracking data to determine which defender was the nearest to the receiver when the ball arrives.  This will generate more insights into the "completed" data points where heuristics such as catch probability can be calculated based on the nearest defender's distance. Also, adding the nearest defender at arrival will eliminate the assumption that the defender who tackles the ball catcher was the receiver's closest defender. There were anomalies in the completion data (completions were not used in this analysis) due to a defender making a tackle to a pass catcher down the field, causing the distance to be much greater than what it was. 
