# **Player Coverage Analysis üèà**

## Overview
- ***Introduction:*** How well does a defensive player cover his opponent?
- ***Preparation:*** Mesure the distances (What this script does)
- ***Conclusion:*** Calculate the average (Skip till here to see plots and results of this analysis)
- ***Problems & Potential***
- What do you think this data can be usefull for?

Have fun and please leave feedback!

### ***Introduction:*** How well does a defensive player cover his opponent?
![Visual example of good cover](https://drive.google.com/uc?export=view&id=1pYEFdvegoDZrfGmNP7b4idRwB_YB5N22)

To gain insights into this metric we can calculate the distance between two players. Because of long calculation times for even only one poition we will pick: cornerback and widereceiver in every full play (from snap till it is completely over). This demands some preparation and a lot of calculation time for the kaggle cloud üìä.

#### Note: This could be done with every other pair of positions by simply chaing the letters in the script to the corresponding position.

### ***Preparation:*** Mesure the distances (What this script does)
It takes the position of every cornerback and sorts out the position of a given widereceiver and mesures the distance for every secound of every play.
Using this data a list of average distances or "coverage" is created.

Of course this is only one variable that could be used to mesure the efficiency of a cornerback, as there are many more metrics than just distance to widereceiver to take into account - for example the play situation the player is facing, where the ball is, a interception, etc.

![How it works](https://drive.google.com/uc?export=view&id=1wwhEZBTP-nUO8uIzCkDtOMkCgOre2DpZ)




In [None]:
import numpy as np
import pandas as pd
import os

#takes the week data and extracts two positions to get the distance, this is the core of this analysis
def getPerformanceForWeek(weekNr):
    week = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2021/week' + str(weekNr) + '.csv')
    week.fillna(0)
    if not os.path.exists('./week_' + str(weekNr)):
        os.makedirs('./week_' + str(weekNr))

    week["new_secoud"] = week.time.apply(lambda x: x.split("T")[1].split(".")[0])
    
    #filter relevant positions
    week_cbs = week[(week.position == "CB")] #(week.gameId == 2018090600) if you want to make it only for one game of week 1
    week_wrs = week[(week.position == "WR")] #(week.gameId == 2018090600) if you want to make it only for one game of week 1

    #group it for: play per player per week
    week_wrs_plays = [x for _, x in week_wrs.groupby(['nflId', 'playId'])]
    week_cbs_plays = [x for _, x in week_cbs.groupby(['nflId','playId'])]

    i = 1 # for logging
    for _refPlay in week_cbs_plays:
        _refPlay = _refPlay.drop_duplicates(subset='new_secoud', keep="last") # generates a secound based play-data-frame
        _refPlayId = _refPlay['playId'].iloc[0]
        shortestStartDistance = 9999; #holds the shortest start distance to the other position
        selectedPlayCombination = None;
        print("(Week " + str(weekNr) + ") Working on " + str(i) + "/" + str(len(week_cbs_plays)))
        i += 1 # for logging

        for play in week_wrs_plays:
            selectedPlay = play[(play.playId == _refPlayId)]
            selectedPlay = selectedPlay.drop_duplicates(subset='new_secoud', keep="last") # generates a secound based play-data-frame
            if(not selectedPlay.empty) :
                # Merge the players to calculate the distance on the secound based frame
                _refPlayMerged = _refPlay.merge(selectedPlay, on=['playId', 'new_secoud'], how='inner', suffixes=('_1', '_2'));
                if(_refPlayMerged.shape[0] < 3): # dont add if less than 3 data-points in the play
                    continue;

                #distance calculation with merged data
                _refPlayMerged['distance'] = abs((_refPlayMerged['x_1'] - _refPlayMerged['x_2']) + (_refPlayMerged['y_1'] - _refPlayMerged['y_2']))
                _refPlayMerged['avg_distance'] = _refPlayMerged['distance'].sum() / _refPlayMerged.shape[0]
                startDistace = _refPlayMerged['distance'].iloc[0] # picks the distance at the first secound of the play
                if(shortestStartDistance > startDistace): # this chooses the nearest opponent player (there can be more than one CB and WR)
                    shortestStartDistance = startDistace;
                    selectedPlayCombination = _refPlayMerged;
                    _refPlayMerged.to_csv('./week_' + str(weekNr) + '/merged_week_' + str(weekNr) + '_play_'+ str(i) +'.csv', index=False) # save merged play to CSV
                
            pass

# now analyse the data and save it to a csv per week (a week takes ~1,5h to analyse)
getPerformanceForWeek(1)
getPerformanceForWeek(2)
""" Only first 2 because kaggle only allows 9h notbook run time
getPerformanceForWeek(3)
getPerformanceForWeek(4)
getPerformanceForWeek(5)
getPerformanceForWeek(6)
getPerformanceForWeek(7)
getPerformanceForWeek(8)
getPerformanceForWeek(9)
getPerformanceForWeek(10)
getPerformanceForWeek(12)
getPerformanceForWeek(13)
getPerformanceForWeek(14)
getPerformanceForWeek(15)
getPerformanceForWeek(16)
getPerformanceForWeek(17)"""



# Run this command to create a ZIP containing the created files which can be downloaded at once
# !zip -r file.zip /kaggle/working

### ‚ú® ***Analysis:*** Data Visualisation
Now to the interesting stuff, the visual graphs and plots we all want so see.
The following code will take a given week and playIndex to create a plot of the distance between the CB and WR for that play. This could be done for every play.
Here I choose some plays to display what the distance metric looks like.

Just as example These are the following play matchups in order: Week 1 Play 1; Week 1 Play 2; Week 2 Play 5; Week 2 Play 6:

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

# Loads the data of the play
def displayPlayDistancePlot(weekId, playIndex):
    fileName = os.listdir("/kaggle/working/week_" + str(weekId))[playIndex]
    play = pd.read_csv('/kaggle/working/week_' + str(weekId) + '/' + fileName)
    plotMatchup(play['playId'].iloc[0], play)

# Creates a plot for display
def plotMatchup(_refPlayId, _refPlayMerged):
    fig = plt.figure(figsize=(8, 6))
    ax1 = fig.add_subplot(111)
    ax1.text(0.02, 65, "Play " + str(_refPlayId) + " - " + _refPlayMerged['displayName_1'].iloc[0] + " vs " + _refPlayMerged['displayName_2'].iloc[0])
    ax1.text(0.02, 60, "Avg. distance (rounded to two digits): " + str(round(_refPlayMerged['avg_distance'].iloc[0], 2)) + " yards")
    ax1.plot(_refPlayMerged['new_secoud'], _refPlayMerged['distance'])
    ax1.set_xlabel("Play time in seconds")
    ax1.set_ylabel("Distance in yards")
    ax1.set_ylim(0,70)
    fig.tight_layout()
    plt.show()

displayPlayDistancePlot(1, 1)
displayPlayDistancePlot(1, 2)
displayPlayDistancePlot(2, 5)
displayPlayDistancePlot(2, 6)

### ü§ì ***Conclusion:*** Calculate the average
Now we can get to the nerdy part. Calculating the average distance of a player for every play of a game (or week).
The following script takes a given week and calculates the average for every player. This could be done for every week.
The prepared data for each play can be analysed in many ways. This is one of the simplest ways.
To conclude this analysis we will print the smallest distance "highscore" tables of the first two weeks, one table per week:

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

# Creates a list of avrage distances for a match (week)
def generateList(weekId):
    fileNames = os.listdir("/kaggle/working/week_" + str(weekId)) #"/kaggle/working/week_" + str(weekId)
    playerPerformances = {} # format: {NflId: PlayData}
    for fileName in fileNames:
        play = pd.read_csv('/kaggle/working/week_' + str(weekId) + '/' + fileName) #'/kaggle/working/week_' + str(weekId) + '/' + fileName
        
        # add results to a player
        if(not int(play['nflId_1'].iloc[0]) in playerPerformances):
            playerPerformances[int(play['nflId_1'].iloc[0])] = []
        playerPerformances[int(play['nflId_1'].iloc[0])].append(play);
    printPlayerStats(getPerformanceStats(playerPerformances))

# Converts the data to a different format to calculate overall avrages
def getPerformanceStats(playerPerformance):
    newPlayerStats = {}
    for key, value in playerPerformance.items():
        if(not key in newPlayerStats):
            newPlayerStats[key] = { "name": "Nan", "avgDistances": [] }

        for performanceData in value:
            newPlayerStats[key]["name"] = performanceData['displayName_1'].iloc[0];
            newPlayerStats[key]["avgDistances"].append(performanceData['avg_distance'].iloc[0]);
            pass
    return newPlayerStats

def printPlayerStats(stats):
    dataFrameData = {'id': [], 'name': [], 'avgDistance': [], 'max': [], 'min': [], 'entrys': []};
    for key, value in stats.items():
        avgDistance = round(sum(stats[key]["avgDistances"]) / len(stats[key]["avgDistances"]), 2)
        dataFrameData['id'].append(key);
        dataFrameData['name'].append(stats[key]["name"]);
        dataFrameData['avgDistance'].append(avgDistance);
        dataFrameData['max'].append(max(stats[key]["avgDistances"]));
        dataFrameData['min'].append(min(stats[key]["avgDistances"]));
        dataFrameData['entrys'].append(len(stats[key]["avgDistances"]));
        # print(stats[key]["name"] + ": " + str(avgDistance) + " min: " + str(min(stats[key]["avgDistances"])) + " max: " + str(max(stats[key]["avgDistances"])) + " entrys: " + str(len(stats[key]["avgDistances"])))
    df = pd.DataFrame (dataFrameData, columns = ['id', 'name', 'avgDistance', 'max', 'min', 'entrys'])
    sortedDf = df[(df.entrys > 3)].sort_values(by=['avgDistance'])
    print(sortedDf)
    _tmpDisplayDf = sortedDf[['name', 'avgDistance', 'entrys']];
    # hide axes
    fig, ax = plt.subplots()
    fig.patch.set_visible(False)
    ax.axis('off')
    ax.axis('tight')

    table = ax.table(cellText=_tmpDisplayDf.values, colLabels=['Name', 'Average', 'Entrys'], loc='center')
    table.scale(2, 2)
    fig.tight_layout()


generateList(1)
generateList(2)
"""generateList(3)
generateList(4)
generateList(5)
generateList(6)
generateList(7)
generateList(8)
generateList(9)
generateList(10)
generateList(11)
generateList(12)
generateList(13)
generateList(14)
generateList(15)
generateList(16)
generateList(17)"""

### üí• Problems
- The prep-script will always pick the enemy player with the shortest initial distance at a play to define the opponent for distance messure. This could lead to a false identification and wrong data.
- Execution takes ~ 1,5h for one week on the default Kaggle CPU, a execution on GPU may be faster but I have no idea how to do that.

### üí° Potential
- The code performance could be improved or run on a faster CPU to create faster results, currently one week analysis takes ~1,5h
- As this script is not hardcoded for only the CB and WR positions any two defensive and the corrosponding offensive positions could be plugged in to mesure a average distance between those two positions.
- Currently only the avrage distance is extracted, potentialy the whole merged dataframe could be analysed in many different ways.
- Adding the events in the plot display to show what happened

### What do you think this data can be usefull for? What ways would you use the script? Write a comment.