# The Effects of Fatigue on Defensive Coverage

### Cornerback is among the most critical positions on the defense, yet one of the most difficult to play due to the speed, technique, and quickness required to guard the best receivers in the league. According to the *New York Times* article, [How Offense Took Over the N.F.L.](https://www.nytimes.com/2019/01/19/sports/nfl-offense-records.html), teams passed the football 47% of the time in 1980. However, in the past three years, teams have averaged around a 60% passing rate. With NFL offenses passing on more plays than ever, it is imperative for teams to have lockdown cornerbacks.

![](https://nflmocks.com/wp-content/uploads/imagn-images/2017/07/15225265.jpeg)

## Introduction

The most effective cornerbacks are able to create or maintain a tight separation between himself and the target receiver at the point of pass arrival. The tighter the degree of separation, the harder it will be for a receiver to catch the ball. 

**The goal of this analysis is to...**
1. Identify cornerbacks who maintain the tightest separation throughout the play
1. Identify cornerbacks who are able to create tight separation throughout the play
1. Explore the impact of fatigue on separation for the cornerbacks in each category

## 1. Data Wrangling

Prior to analysis, the data was cleaned, standardized,and aggregated by play. For each play, the target receiver and closest defender were identified, along with their position, separation, speed, and orientation at the point of pass release and pass arrival.

A few key notes:
* Pass Classification: passes were classified by distance
    * 1: distance < 10 yards
    * 2: 10 yards <= distance <= 20 yards
    * 3: distance > 20 yards
    
    
* Difference in Separation = separationAtRelease - separationAtArrival
    * Positive difference in separation: defender gained ground on the receiver
    * Negative difference in separation: defender lost ground on the receiver
    
    
* For the purpose of this analysis, we define cornerbacks as both "CB" and "DB". 

* Players with less than 30 plays were filtered out of the analysis.

* Plays were filtered for all plays where a wide receiver was the target receiver and a cornerback was the closest defender.

* For classification of pass_outcomes, pass_outcome_touchdown is included in pass_complete and pass_outcome_interception is included in pass_incomplete

In [None]:
# Import Libraries 
from collections import namedtuple
from datetime import datetime
import glob
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pytz
import seaborn as sns
import time
from tqdm import tqdm
import warnings 

pd.set_option('max_columns', 1000)
warnings.filterwarnings('ignore')

In [None]:
"""
Function to calculate the distance between two points

Parameters:
    p1: point in form of [p1, p2]
    p2: point in the form of [p1, p2]
    
Returns: euclidean distance between p1 and p2
"""
def calcDistance(p1, p2):
    return (math.sqrt( ((p1[0]-p2[0])**2)+((p1[1]-p2[1])**2) ))

'''
Function to reduce memory usage of dataframe 

Code for function from Revised BDB Data by Joe Andruzzi (jdruzzi)
'''
def reduce_mem_usage(df):

    start_mem = df.memory_usage().sum() / 1024**2
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df

'''
Class to import, clean and standardize the data

Code for class edited from Revised BDB Data by Joe Andruzzi (jdruzzi)
'''
class CreateNFLData:

    def __init__(self):
        pass

    """
    Load data from all weeks into one dataframe.
    
    Parameters: None
    Return: dataFrame with all weeks of data
    """
    def LoadData(self):
        globbed_files = glob.glob("../input/nfl-big-data-bowl-2021/week*.csv") #creates a list of all csv files
        data = []
        for csv in tqdm(globbed_files):
            frame = pd.read_csv(csv, index_col=0)
            frame = reduce_mem_usage(frame)
            data.append(frame)

        WeekData = pd.concat(data).reset_index()
        
        return WeekData


    """
    Function to standardize data coordinates, orientation, direction, and possession teams.
    
    Parameter: 
        W: dataframe to be standardized
        
    Returns: dataframe with standardized columns
    """
    def Standardize(self,W):
        W['Dir_rad'] = np.mod(90 - W.dir, 360) * math.pi/180.0
        W['ToLeft'] = W.playDirection == "left"
        W['TeamOnOffense'] = "home"
        W.loc[W.possessionTeam != W.PlayerTeam, 'TeamOnOffense'] = "away"
        W['YardLine_std'] = 100 - W.yardlineNumber
        W.loc[W.yardlineSide.fillna('') == W.possessionTeam,  
                'YardLine_std'
                ] = W.loc[W.yardlineSide.fillna('') == W.possessionTeam,  
                'yardlineNumber']
        W['X_std'] = W.x
        W.loc[W.ToLeft, 'X_std'] = 120 - W.loc[W.ToLeft, 'x'] 
        W['Y_std'] = W.y
        W.loc[W.ToLeft, 'Y_std'] = 160/3 - W.loc[W.ToLeft, 'y'] 
        W['Dir_std'] = W.Dir_rad
        W.loc[W.ToLeft, 'Dir_std'] = np.mod(np.pi + W.loc[W.ToLeft, 'Dir_rad'], 2*np.pi)
        W['X_std'] = round(W['X_std'],2)
        W['Y_std'] = round(W['Y_std'],2)
        W['Orientation_rad'] = np.mod(-W.o + 90, 360) * math.pi/180.0
        W['Orientation_std'] = W.Orientation_rad
        W.loc[W.ToLeft, 'Orientation_std'] = np.mod(np.pi + W.loc[W.ToLeft, 'Orientation_rad'], 2*np.pi)
        return W

    """
    Function to merge the game and play data into the dataframe df
    
    Parameters:
        df: dataframe for play and game data to be merged with
        
    Returns: dataframe merged with play and game data
    """
    def mergeGameData(self,df):
        games = pd.read_csv('../input/nfl-big-data-bowl-2021/games.csv', index_col=0)
        plays = pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv', index_col=0, usecols= ['gameId', 'playId','yardlineNumber','possessionTeam','yardlineSide'])
        df = pd.merge(df, games, how='left', left_on=['gameId'], right_on=['gameId'])
        df['PlayerTeam'] = np.where(df['team'] == 'home', df['homeTeamAbbr'], df['visitorTeamAbbr'] )
        df = df.drop(['homeTeamAbbr','visitorTeamAbbr'], axis=1)
        df = pd.merge(df, plays, how='left', left_on=['gameId', 'playId'], right_on=['gameId', 'playId'])
        df['OnOffense'] = np.where(df['PlayerTeam'] == df['possessionTeam'], True, False)
        return df

"""
Class to store information for one play
"""
class play:
        
    def __init__(self, gameId, playId):
        self.gameId = gameId
        self.playId = playId
        self.targetReceiver = None #target
        self.receiverPos = None
        self.receiverName = None
        self.receiverArrivedLocation = None
        self.receiverArrivedSpeed = None
        self.receiverArrivedOrientation = None
        self.receiverToBall = None
        self.defender = None
        self.defenderPos = None
        self.defenderName = None
        self.defenderArrivedLocation = None
        self.defenderArrivedSpeed = None
        self.defenderArrivedOrientation = None
        self.separationAtArrival = None
        self.footballReleaseLocation = None
        self.releaseTimestamp = None
        self.footballArriveLocation = None
        self.arriveTimestamp = None
        self.distanceOfPass = None
        self.passClassification = None
        self.timeElapsed = None
        self.events = None
        self.passOutcome = None
        self.receiverReleaseLocation = None
        self.receiverReleaseSpeed = None
        self.receiverReleaseOrientation = None
        self.defenderReleaseLocation = None
        self.defenderReleaseSpeed = None
        self.defenderReleaseOrientation = None
        self.separationRelease = None
        self.separationDiff = None
        self.separationOverTime = None
        self.timeIntoGameRelease = None
        self.timeIntoGameArrive = None

        
    def add_receiver_information(self, position, name, location, speed, orientation, distanceToBall):
        self.receiverPos = position
        self.receiverName = name
        self.receiverArrivedLocation = location
        self.receiverArrivedSpeed = speed
        self.receiverArrivedOrientation = orientation
        self.receiverToBall = distanceToBall
    
    def add_defender_information(self, nflId, position, name, location, separation, speed, orientation):
        self.defender = nflId
        self.defenderPos = position
        self.defenderName = name
        self.defenderArrivedLocation = location
        self.separationAtArrival = separation
        self.defenderArrivedSpeed = speed
        self.defenderArrivedOrientation = orientation

    def add_football_release_information(self, location, timestamp):
        self.footballReleaseLocation = location
        self.releaseTimestamp = timestamp
        
    def add_football_arrive_information(self, location, timestamp):
        self.footballArriveLocation = location
        self.arriveTimestamp = timestamp
        
    def add_pass_information(self, distance, targetReceiver):
        self.distanceOfPass = distance
        if (distance < 10):
            self.passClassification = 1
        elif (10 <= distance <= 20):
            self.passClassification = 2
        else: # > 20
            self.passClassification = 3
        self.targetReceiver = targetReceiver
    
    def add_game_time(self, timeIntoGameRelease, timeIntoGameArrive, timeElapsed):
        self.timeIntoGameRelease = timeIntoGameRelease
        self.timeIntoGameArrive = timeIntoGameArrive
        self.timeElapsed = timeElapsed
        
    def add_event_information(self, events, outcome):
        self.events = events
        self.passOutcome = outcome
        
    def add_release_information(self, receiver, receiverSpeed, receiverOrientation, defender, defenderSpeed, defenderOrientation, distance):
        self.receiverReleaseLocation = receiver
        self.receiverReleaseSpeed = receiverSpeed
        self.receiverReleaseOrientation = receiverOrientation
        self.defenderReleaseLocation = defender
        self.defenderReleaseSpeed = defenderSpeed
        self.defenderReleaseOrientation = defenderOrientation
        self.separationRelease = distance
        
    def add_separation_metrics(self, diffSeparation, separationOverTime):
        self.separationDiff = diffSeparation
        self.separationOverTime = separationOverTime

In [None]:
# Initalize class to use to import and clean data 
Create = CreateNFLData()

# Load data and reduce memory usage
WeekData = Create.LoadData()

# Merge play and game data into play by play data
WeekGameData = Create.mergeGameData(WeekData)

# Standardize play by play data
Finaldf = Create.Standardize(WeekGameData)

# Drop unneeded columns
cleanedData = Finaldf.drop(columns=['x', 'y', 'o', 'jerseyNumber', 'yardlineSide', 'yardlineNumber', 'ToLeft'])

#cleanedData.to_csv("finalDf.csv")

In [None]:
# Get all play information for plays with pass_arrived or pass_forward events
forward_passes = cleanedData.query('event == "pass_forward" or event == "pass_arrived"')

# Define named tuple to use for storing plays
PlayIdentifier = namedtuple("UniquePlay", ["game", "play"])

# Dictionary to map play identifier tuple to play class object
plays = {}

# Try to identify the football location and time at pass_release and pass_arrived for all plays 
for index, row in forward_passes.iterrows():
    gameId = row["gameId"]
    playId = row["playId"]
    
    # Create play identifier tuple
    p = PlayIdentifier(gameId, playId)
        
    currentPlay = plays.get(p)
    
    if (currentPlay == None):
        currentPlay = play(gameId, playId)
    
    if (row["team"] == 'football'):  
        # Try to get football release position and time for pass_forward
        if(row["event"] == "pass_forward"):
            currentPlay.add_football_release_information([row["X_std"], row["Y_std"]], row["time"])
        
        # Try to get football arrival position and time for pass_arrived
        elif(row["event"] == "pass_arrived"):
            currentPlay.add_football_arrive_information([row["X_std"], row["Y_std"]], row["time"])
    
    plays[p] = currentPlay
    
# Import csv with target receiver for each play
targetedReciever = pd.read_csv("../input/nfl-big-data-bowl-2021-bonus/targetedReceiver.csv")

# Datetime format for play by play (UTC)
fmt1 = '%Y-%m-%dT%H:%M:%S.%fZ' # 2018-10-12T00:25:17.099Z

# Filter all plays to keep plays with release and arrive ball location and target receiver 
validPlays = {}
for p in plays.keys():
    c = plays.get(p)
    
    # Check if we have a release location and arrive location for pass
    if ((c.footballReleaseLocation is not None) and (c.footballArriveLocation is not None)):
        #Get target receiver for current play
        currentPlay = targetedReciever[(targetedReciever.gameId == c.gameId) & (targetedReciever.playId == c.playId)]
        
        #Confrim we have only one row = one target reciever
        assert len(currentPlay.index) == 1
        
        receiver = currentPlay.iloc[0]['targetNflId']
        
        # Filter out plays where receiver is None/undefined
        if(math.isnan(receiver) == False):
            # Calculate distance between football release location and arrive location (pass distance)
            passDist = calcDistance(c.footballReleaseLocation, c.footballArriveLocation)
            
            # Add objects to current play class object
            c.add_pass_information(passDist, receiver)
            
            # Add play to play dictionary
            validPlays[p] = c 

assert(len(validPlays.keys()) == 14485)
            
# Dictionary for plays containing all needed information
finalPlays = {}
count = 0

fmt2 = '%m/%d/%Y %H:%M:%S'# 12/10/2020 20:00:00

for p in validPlays.keys():
    c = validPlays.get(p)
    
    currentPlayData = cleanedData[(cleanedData.gameId == c.gameId) & (cleanedData.playId == c.playId)]
    
    gameTime = currentPlayData.gameTimeEastern.unique()
    assert (len(gameTime) == 1)
    gameDate = currentPlayData.gameDate.unique()
    assert (len(gameDate) == 1)
    gameTime = gameTime[0]
    gameDate = gameDate[0]
    
    eastUS = pytz.timezone('US/Eastern')
    timestampEastern = eastUS.localize(datetime.strptime(gameDate + " " + gameTime, fmt2))
    timestampUTC = timestampEastern.astimezone(pytz.utc)  
    
    timeRelease = datetime.strptime(c.releaseTimestamp, fmt1).astimezone(pytz.utc)  
    timeArrive = datetime.strptime(c.arriveTimestamp, fmt1).astimezone(pytz.utc)  
    
    timeIntoGameRelease = (timeRelease - timestampUTC).total_seconds()
    timeIntoGameArrive = (timeArrive - timestampUTC).total_seconds()
    
    timeElapsed = (timeArrive - timeRelease).total_seconds()
    
    c.add_game_time(timeIntoGameRelease, timeIntoGameArrive, timeElapsed)
    
    passArrivedDataReceiver = currentPlayData[(currentPlayData.event == "pass_arrived") & (currentPlayData.OnOffense == True) & (currentPlayData.nflId == c.targetReceiver)]
    if (len(passArrivedDataReceiver) != 1):
        # No data on target reciever at pass_arrived
        count+=1
        continue
    
    passArrivedDataDefense = currentPlayData[(currentPlayData.event == "pass_arrived") & (currentPlayData.OnOffense == False)]
    
    passForwardData = currentPlayData[(currentPlayData.event == "pass_forward")]
    
    events = currentPlayData.event.unique()
    outcome = list(filter(lambda k: 'pass_outcome' in k, events))
    if (len(outcome) == 0):
        # Exclude plays without a pass_outcome
        count+=1
        continue
    
    outcome = outcome[0]
    
    if (outcome == "pass_outcome_touchdown"):
        outcome = "pass_outcome_caught"
    
    c.add_event_information(events, outcome)
    
    position = passArrivedDataReceiver.iloc[0]["position"]
    name = passArrivedDataReceiver.iloc[0]["displayName"]
    receiverArriveLocation = [passArrivedDataReceiver.iloc[0]["X_std"], passArrivedDataReceiver.iloc[0]["Y_std"]]
    distReceiverToBall = calcDistance(c.footballArriveLocation, receiverArriveLocation)
    
    c.add_receiver_information(position, name, receiverArriveLocation, passArrivedDataReceiver.iloc[0]["s"], passArrivedDataReceiver.iloc[0]["Orientation_std"], distReceiverToBall)
    
    minDist = 120
    defender = None
    position = None
    name = None
    speed = None
    orientation = None
    location = [0, 0]
    
    for index, row in passArrivedDataDefense.iterrows():
        if (row["displayName"] != "Football"):
            distanceToTarget = calcDistance(receiverArriveLocation, [row['X_std'], row['Y_std']])
            
            if(distanceToTarget < minDist):
                minDist = distanceToTarget
                defender = row['nflId']
                position = row['position']
                name = row['displayName']
                speed = row["s"]
                orientation = row["Orientation_std"]
                location = [row['X_std'], row['Y_std']]
    
    c.add_defender_information(defender, position, name, location, minDist, speed, orientation )
    
    
    defenderReleaseLocationData = passForwardData[passForwardData.nflId == defender]
    if(len(defenderReleaseLocationData.index) == 0):
        # No data on defender at time of pass_forward
        count+=1
        continue
    
    defenderReleaseLocation = [defenderReleaseLocationData.iloc[0]['X_std'], defenderReleaseLocationData.iloc[0]['Y_std']]

    targetReleaseLocationData = passForwardData[passForwardData.nflId == c.targetReceiver]
    if (len(targetReleaseLocationData.index) == 0):
        # No data on target receiver at time of pass_forward
        count+=1
        continue
    
    targetReleaseLocation = [targetReleaseLocationData.iloc[0]['X_std'], targetReleaseLocationData.iloc[0]['Y_std']]
    
    separation = calcDistance(defenderReleaseLocation, targetReleaseLocation)   
    
    c.add_release_information(targetReleaseLocation, targetReleaseLocationData.iloc[0]['s'], targetReleaseLocationData.iloc[0]['Orientation_std'], defenderReleaseLocation, defenderReleaseLocationData.iloc[0]['s'], defenderReleaseLocationData.iloc[0]['Orientation_std'], separation)
    
    diffSeparation = c.separationRelease - c.separationAtArrival
    if (c.timeElapsed < 0.5):
        # exclude plays where time elapsed is less than half a second
        count+=1
        continue
    
    separationOverTime = diffSeparation / c.timeElapsed
    
    c.add_separation_metrics(diffSeparation, separationOverTime)
      
    finalPlays[p] = c

assert (count == 635)
assert (len(finalPlays.keys()) == 13850)

# Create dataframe with calculated play information
playInformation = pd.DataFrame(columns=['gameId', 'playId', 'targetReceiver','receiverPos','receiverName','receiverArrivedLocation', 'receiverArrivedSpeed','receiverArrivedOrientation',
                                        'receiverDistanceToBall','defender','defenderPos','defenderName','defenderArrivedLocation','defenderArrivedSpeed','defenderArrivedOrientation',
                                        'separationAtArrival','footballReleaseLocation','releaseTimestamp','footballArriveLocation','arriveTimestamp','distanceOfPass',
                                        'passClassification','timeElapsed','events','passOutcome','receiverReleaseLocation','receiverReleaseSpeed','receiverReleaseOrientation',
                                        'defenderReleaseLocation','defenderReleaseSpeed','defenderReleaseOrientation','separationRelease','separationDiff', 'separationOverTime',
                                        'timeIntoGameRelease','timeIntoGameArrive'])

for p in finalPlays.keys():
    c = finalPlays.get(p)
    playInformation = playInformation.append({'gameId': c.gameId, 
                                              'playId': c.playId, 
                                              'targetReceiver': c.targetReceiver, 
                                              'receiverPos': c.receiverPos, 
                                              'receiverName':c.receiverName, 
                                              'receiverArrivedLocation': c.receiverArrivedLocation, 
                                              'receiverArrivedSpeed': c.receiverArrivedSpeed, 
                                              'receiverArrivedOrientation': c.receiverArrivedOrientation,
                                              'receiverDistanceToBall' : c.receiverToBall,
                                              'defender': c.defender,
                                              'defenderPos' : c.defenderPos, 
                                              'defenderName' : c.defenderName, 
                                              'defenderArrivedLocation' : c.defenderArrivedLocation, 
                                              'defenderArrivedSpeed': c.defenderArrivedSpeed, 
                                              'defenderArrivedOrientation' : c.defenderArrivedOrientation, 
                                              'separationAtArrival' : c.separationAtArrival,
                                              'footballReleaseLocation' : c.footballReleaseLocation, 
                                              'releaseTimestamp' : c.releaseTimestamp, 
                                              'footballArriveLocation' : c.footballArriveLocation, 
                                              'arriveTimestamp' : c.arriveTimestamp, 
                                              'distanceOfPass' : c.distanceOfPass,
                                              'passClassification' : c.passClassification,
                                              'timeElapsed' : c.timeElapsed, 
                                              'events' : c.events, 
                                              'passOutcome' : c.passOutcome,
                                              'receiverReleaseLocation' : c.receiverReleaseLocation, 
                                              'receiverReleaseSpeed' : c.receiverReleaseSpeed, 
                                              'receiverReleaseOrientation' : c.receiverReleaseOrientation, 
                                              'defenderReleaseLocation' : c.defenderReleaseLocation, 
                                              'defenderReleaseSpeed' : c.defenderReleaseSpeed, 
                                              'defenderReleaseOrientation' : c.defenderReleaseOrientation,
                                              'separationRelease' : c.separationRelease, 
                                              'separationDiff' : c.separationDiff, 
                                              'separationOverTime' : c.separationOverTime, 
                                              'timeIntoGameRelease' : c.timeIntoGameRelease, 
                                              'timeIntoGameArrive': c.timeIntoGameArrive}, ignore_index=True)
    
#playInformation.head()
#playInformation.to_csv("playAnalysis.csv")

playInformation = playInformation.replace("pass_outcome_interception", "pass_outcome_incomplete")

plays = pd.read_csv("../input/nfl-big-data-bowl-2021/plays.csv")

playInformation = playInformation.merge(plays, on=["gameId", "playId"])

## 2. Analysis

### Lowest Average Separation At Pass Release and Arrival 
As we can see from the scatter plot below, there are two players that fall into the bottom left corner with low averages in both separation at pass release and separation at pass arrival. William Jackson, cornerback for the Bengals, and Byron Jones, cornerback for the Cowboys, averaged the lowest separation at release and arrival meaning they are able to keep a close coverage to the receiver throughout the entire play. While both players have a lower average at pass arrival than at pass release, Jackson is consistently able to close over half a yard on average from a separation of 2.4 yards at release to 1.7 yards at arrival in addition to maintaining close coverage throughout the play. The top players in this category are great candidates for man coverage as they are able to maintain a tight degree of separation throughout the play. 

In [None]:
DBWRCoverage = playInformation[((playInformation.defenderPos == "CB") | (playInformation.defenderPos == "DB")) & (playInformation.receiverPos == "WR")]

distanceAtRelease = DBWRCoverage[["separationRelease", "defenderName"]]
distanceAtArrival = DBWRCoverage[["separationAtArrival", "defenderName"]]

df = DBWRCoverage.defenderName.value_counts().rename_axis('defenderName').reset_index(name='plays')

avgRelease = distanceAtRelease.groupby('defenderName', as_index=False)['separationRelease'].mean()

avgArrival = distanceAtArrival.groupby('defenderName', as_index=False)['separationAtArrival'].mean()

finalData = avgRelease.merge(avgArrival, on="defenderName").sort_values(by=["separationRelease"])

finalData = finalData.merge(df, on="defenderName").sort_values(by=["separationAtArrival"])

finalData = finalData[(finalData.plays > 30)].reset_index(drop=True)

subset = finalData[(finalData.separationAtArrival < 3.0) & (finalData.separationRelease < 3.0)]

top5Arrival = subset.head(5)

plt.figure(1, figsize=(7,7))
sns.scatterplot(data=subset, x='separationRelease', y='separationAtArrival', hue='defenderName', size='plays', sizes=(80,200))
plt.legend(bbox_to_anchor=(1.1, 1),borderaxespad=0)
plt.xlim(left=1.5, right=3.1)
plt.ylim(bottom=1.5, top=3.1)
plt.title('Separation At Pass Arrival vs. Release', fontsize=18)
plt.xlabel('Separation At Pass Release (yards)', fontsize=15)
plt.ylabel('Separation At Pass Arrival (yards)', fontsize=15)
plt.show()

### Greatest Decrease in Average Separation from Pass Release to Arrival

As we can see from the box plot below, there are two players who are outliers in average greatest decrease in separation i.e. these two players can gain the most yards on average on the receiver from pass release to pass arrival. These players are Darryl Roberts, cornerback for the Jets, and Terrance Mitchell, cornerback for the Browns. Roberts and Mitchell are not the best cornerbacks in the league with pretty average player grades. However, this stat provides value into hustle - these players fight to stay with and gain ground on the wide receiver as the ball is in the air. Though Roberts and Mitchell rank 48 and 87 respectively in lowest average separation at pass release and arrival, these two players are strong personnel for zone-coverage due to their ability to quickly close on the receiver. 

In [None]:
# greatest decrease in separation

#remove outlier
diffSeparation = DBWRCoverage[(DBWRCoverage.gameId != 2018120902) & (DBWRCoverage.playId != 2017)]

diffSeparation = diffSeparation[["separationDiff", "defenderName"]]

diffSeparationAvg = diffSeparation.groupby('defenderName', as_index=False)['separationDiff'].mean()
diffSeparationAvg = diffSeparationAvg.sort_values(by=["separationDiff"], ascending=False)

diffSeparationAvg = diffSeparationAvg.merge(df, on=["defenderName"]).reset_index(drop=True)
diffSeparationAvg = diffSeparationAvg[(diffSeparationAvg.plays > 30)].reset_index(drop=True)

top5Separation = diffSeparationAvg.head(5)

print(top5Separation)

diffSeparationAvg.drop(columns = ["plays"]).plot(kind='box', subplots=True, layout=(1,1), sharex=False, sharey=False, figsize=(3,3), 
                                        title='Distribution of Average Separation Difference')

plt.show()

### Effects of Fatigue on the Top Players

We begin our analysis of the effects of fatigue by looking at the trend of separation throughout the game and EPA throughout the game for the top 5 players in each category. EPA, Expected Points Added, measures the value of individual plays in terms of points, and is calculated as the difference in Expected Points (EP) of the down, distance, and field position at the start of the play and the end of the play. Therefore, defenders strive to decrease EPA throughout the play.

As you can see below, for the top 5 players by lowest average separation, average EPA increases by about 0.15 while average separation at arrival increases by 0.5 yards from the beginning to end of the game. However, for top 5 players by greatest average difference in separation, average EPA only increases slightly by about 0.01 while average separation at arrival decreases by about 0.3 yards from beginning to end of the game. From this we can conclude that for players that play a tighter coverage, fatigue sets in throughout the game causing looser coverage as the game progresses.

In [None]:
top5Separation = top5Separation[['defenderName']]

top5Arrival = top5Arrival[['defenderName']]

arrival = DBWRCoverage.query("defenderName in @top5Arrival.defenderName")

separation = DBWRCoverage.query("defenderName in @top5Separation.defenderName")

plt.figure(1)
m, b = np.polyfit(arrival['timeIntoGameArrive'], arrival['epa'], 1)
#add linear regression line to scatterplot 
plt.plot(arrival['timeIntoGameArrive'],[(m*x)+b for x in arrival['timeIntoGameArrive']], label='Top 5 Players by Lowest Average Separation')

m, b = np.polyfit(separation['timeIntoGameArrive'], separation['epa'], 1)
#add linear regression line to scatterplot 
plt.plot(separation['timeIntoGameArrive'],[(m*x)+b for x in separation['timeIntoGameArrive']], label='Top 5 Players by Greatest Difference in Separation')
plt.legend(bbox_to_anchor=(1.5, 1),borderaxespad=0)

plt.title('EPA vs. Game Time Elapsed', fontsize=16)
plt.xlabel('Game Time Elapsed (seconds)', fontsize=12)
plt.ylabel('EPA', fontsize=12)

plt.show()

plt.figure(2)
m, b = np.polyfit(arrival['timeIntoGameArrive'], arrival['separationAtArrival'], 1)
#add linear regression line to scatterplot 
plt.plot(arrival['timeIntoGameArrive'],[(m*x)+b for x in arrival['timeIntoGameArrive']], label='lowestSeparation')

m, b = np.polyfit(separation['timeIntoGameArrive'], separation['separationAtArrival'], 1)
#add linear regression line to scatterplot 
plt.plot(separation['timeIntoGameArrive'],[(m*x)+b for x in separation['timeIntoGameArrive']], label='greatestDiffSeparation')
plt.legend(bbox_to_anchor=(1.5, 1),borderaxespad=0)

plt.title('Separation at Arrival vs. Game Time Elapsed', fontsize=16)
plt.xlabel('Game Time Elapsed (seconds)', fontsize=12)
plt.ylabel('Avg Separation At Arrival (yards)', fontsize=12)

plt.show()

Suppose you're a coach for the Raiders in 2018 and Gareon Conley is one of your starting cornerbacks. However, you recently acquired Rashaan Melvin in free agency. Gareon Conley ranks 3rd in lowest average separation at pass release and arrival, while Rashaan Melvin ranks 15th in greatest average difference in separation. As a coach, is there a time in the game when it's most effective to make the substitution?

Let's look at pass coverage for passes over 20 yards. Though Conley starts the game with tighter coverage on average separation at pass arrival and a lower average EPA, as the game progresses, there comes a point where Melvin would be more effective as Conley's average separation at pass arrival increases and thus, average EPA increases. As a coach, it may be more effective to either make a substitution towards the end of the third quarter for a fresh Melvin or to have more rest for Conley earlier on in the game to keep his coverage tighter throughout the whole game. 

In [None]:
Melvin = DBWRCoverage[(DBWRCoverage.defenderName == "Rashaan Melvin")]
Conley = DBWRCoverage[(DBWRCoverage.defenderName == "Gareon Conley")]

m, b = np.polyfit(Melvin['timeIntoGameArrive'], Melvin['separationAtArrival'], 1) 
plt.plot(Melvin['timeIntoGameArrive'],[(m*x)+b for x in Melvin['timeIntoGameArrive']], label="Rashaan Melvin")

m, b = np.polyfit(Conley['timeIntoGameArrive'], Conley['separationAtArrival'], 1)
plt.plot(Conley['timeIntoGameArrive'],[(m*x)+b for x in Conley['timeIntoGameArrive']], label="Gareon Conley")

plt.legend(bbox_to_anchor=(1.5, 1),borderaxespad=0)
plt.ylim(bottom=0.0, top = 3.0)
plt.title('Separation at Arrival vs. Game Time Elapsed', fontsize=15)
plt.xlabel('Game Time Elapsed (seconds)', fontsize=12)
plt.ylabel('Avg Separation At Arrival (yards)', fontsize=12)

plt.show()

m, b = np.polyfit(Melvin['timeIntoGameArrive'], Melvin['epa'], 1)
plt.plot(Melvin['timeIntoGameArrive'],[(m*x)+b for x in Melvin['timeIntoGameArrive']], label="Rashaan Melvin")

m, b = np.polyfit(Conley['timeIntoGameArrive'], Conley['epa'], 1)
plt.plot(Conley['timeIntoGameArrive'],[(m*x)+b for x in Conley['timeIntoGameArrive']], label="Gareon Conley")

plt.title('EPA vs. Game Time Elapsed', fontsize=15)
plt.xlabel('Game Time Elapsed (seconds)', fontsize=12)
plt.ylabel('EPA', fontsize=12)
plt.legend(bbox_to_anchor=(1.5, 1),borderaxespad=0)
plt.show()

## 3. Future Directions

**1. Effect of Player Load on Separation**

With the increase in availability of biometric data, many teams have begun tracking player load, a measurement of a player's work commonly calculated from acceleration data. The addition of player load data to play by play enables coaches and trainers to evaluate the effect of load on performance. With regard to cornerbacks, since we see a trend of an increase in separation as the game progresses, it is likely there is a point where player load effects the separation a cornerback is able to create or maintain. With this analysis, the coaching staff could make an informed decision on when to make a substitution or how to better manage the player's load to maintain a tight average of separation throughout the game. 

**2. Coverage Breakdown: Man vs. Zone**

Breaking down the analysis of separation related to fatigue a step further, a cornerback's separation at arrival over time could be evaluated in a man coverage versus a zone coverage. With this analysis, a coach could better decide defensive coverage depending on defensive scheme, an opponent, or healthy personnel. Players with a strong closing speed, who can gain tightness in separation may be better suited for a zone coverage, while players who are able to maintain a tight degree of separation from pass release to arrival may be better suited for man coverage in the beginning of the game, but does that trend hold throughout the whole game?

**3. Situational Breakdown**

Lastly, identifying trends in separation for players under certain situations may help identify effective personnel for certain game day situations. For example, in looking at end of game situations where the defense is winning in a one possession game, you may identify one cornerback who plays a tighter coverage than average to try to eliminate the pass to his receiver. On the other hand, what if a different cornerback plays a looser coverage than normal to be able to help out his teammates if needed. Which player do you go with? It may depend on the opponent, coaching staff, or other personnel on the field. However, with this information, coaches are informed on which player to choose in certain situations.