<div id = "tableOfContents"></div>

# Table of Contents

[1 Introduction](#1introduction)

[2 Inferring and Animating Matchups](#2hmm)

[3 Categorizing Passes](#3nmf)

[4 Player Evaluation](#4eval)

   [4.1 Pass Frequency](#41freq)

   [4.2 Pass Efficiency](#41eff)
    
[5 Conclusions](#5conclusion)

<div id="1introduction"></div>

# 1. Introduction

In 2014, the research team of Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry published the paper [Characterizing the Spatial Structure of Defensive Skill in Professional Basketball][1] which focused on creating defensive metrics for players in terms of their impact on both shot frequency and shot efficiency based upon positional data from the 2013-14 NBA season. Here, we adapt the methodology of Franks et al. to positional data from the 2018 NFL season to evaluate the impact of defensive players on pass frequency and pass efficiency. This analysis proceeds as follows: we initially identify plays to be discarded from consideration. After this, we discern who is guarding whom at each time point during plays. Next, we categorize the passes to partition the field. Finally, we infer the impact of each defender in each region of the field.

When looking at the various plays, some plays are clearly not of interest. One such category is those that have no quarterback on the field. These could be fake punts, plays from the wildcat formation, or something similar, and intuitively, these are plays that are not insightful. Passes thrown behind the line of scrimmage very frequently are screen passes, and so the presence of blockers obfuscates the impact of defenders. Accordingly, plays that fall into these categories are discarded. For plays that have more than 5 offensive linemen, there are at most 5 offensive players that have positional information. Due to the fact that on some occasions, the additional offensive lineman is actually the target of the pass (e.g., Dion Dawkins in Week 10), and the complexity that would be added by having a varying number of offensive players, plays without exactly 6 offensive players were discarded. Plays missing data during the play were also discarded. Of interest, 40 of the 64 plays that fall into this category involve Baltimore safety Tony Jefferson.

[1]: https://arxiv.org/pdf/1405.0231v2.pdf

<div id="2hmm"></div>

# 2. Inferring and Animating Matchups

To begin, we must discern the defender-offender matchups. This is done by defining a Hidden Markov Model (HMM) to model the flow of matchups over the course of a play. Importantly, there are two perspectives that may be used to do this. First, we may take the perspective of the defence, and identify the attention that each defender is paying to each offender. To build the model, we define the ideal position for a defender to be an affine combination of 4 points: the location of the offender, the location of the ball, the point of the back of the endzone nearest to the offender and the point of the first down yard line nearest to the offender. The EM algorithm is used to find the weights for this affine combination, as well as the values for the HMM. Alternatively, we may look from the perspective of the offence, and see which defender each offender thinks is paying attention to him. In this case, we reverse the roles of the offence and defence. 

The following animation (taken from Josh Allen's 75 yard TD pass to Robert Foster in Week 12) gives a clear visualization of the HMM. The black lines connecting an offensive and defensive player shows the amount of attention paid based upon the width of the line. The arrows associated with each player dot give the velocity of the player, and the solid black dot gives the ideal position at each point in time for Robert Foster. In this animation, we see the attention that each defender is paying to each offender. Just after the snap, we see a large amount of fluctuation in who is guarding whom. In particular, Barry Church (#42 Jacksonville) goes from paying attention to Josh Allen (#17 Buffalo) to Zay Jones (#11) to Robert Foster (#16). Similarly, at the time of the snap, Robert Foster is initially covered by A.J. Bouye (#21 Jacksonville). As Foster progresses through his route and Jacksonville drops back into coverage, Foster has gained the attention of Barry Church and Myles Jack (#44), and this attention is maintained until just before the pass is thrown, when Jack shifts his attention to Zay Jones.

In [None]:
'''
Cell 1: Import libraries to be used throughout.
'''

import random, pandas as pd, numpy as np, math, json, os, time, scipy.sparse as mySparse
from scipy.stats import norm as normalDist
from scipy.sparse import csr_matrix 
from sklearn.decomposition import NMF
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.cluster import KMeans

%matplotlib inline
import matplotlib as mpl, matplotlib.pyplot as plt
from matplotlib import animation, rc
plt.rcParams["animation.html"] = "jshtml"
plt.rcParams['animation.embed_limit'] = 2**23
plt.rcParams['figure.figsize'] = [3.0, 2.0]

from IPython.display import IFrame, HTML

In [None]:
'''
Cell 2: Declare values and functions that will be used throughout
'''

#A constant for weighting the impact of velocity on modeling
MYSTERYVELMULTIPLIER = 1/10

#Set this as 1 to get 30 random plays and run the EM algorithm to estimate the parameters
BUILDTHEPARAMS = 0

#This will flip the model from looking at which offender the defender is guarding 
#to which defender the offender thinks is guarding him
REVERSE = 0 

#These parameters come into play later to skip computation of pre-calculated values
METADATALEVEL = 2
PARAMLEVEL = 3

EM=False
drawSd=True
drawRho=True
maxEM=100
useVel=True
games = []
gamePlays = {}
#There are some plays with multiple "ball_snap" events
DIRMODIFIER = '../input/mydata/'
NFLDATADIR = '../input/nfl-big-data-bowl-2021/'
BALLSNAPMAP = {'2018091605,2715': 11,
               '2018120905,1426': 11
              }
PASSFORWARDMAP = {'2018110410,3640': 26,
                  '2018111106,2650': 36
                 }

colourDict = {'ARI': ['#97233F', '#000000'],
              'ATL': ['#A71930', '#000000'],
              'BAL': ['#241773', '#000000'],
              'BUF': ['#00338D', '#C60C30'],
              'CAR': ['#0085CA', '#101820'],
              'CHI': ['#0B162A', '#C83803'],
              'CIN': ['#FB4F14', '#000000'],
              'CLE': ['#311D00', '#000000'],
              'DAL': ['#003594', '#000000'],
              'DEN': ['#FB4F14', '#000000'],
              'DET': ['#0076B6', '#000000'],
              'GB': ['#203731', '#000000'],
              'HOU': ['#03202F', '#000000'],
              'IND': ['#002C5F', '#000000'],
              'JAX': ['#006778', '#000000'],
              'KC': ['#E31837', '#000000'],
              'LA': ['#003594', '#000000'],
              'LAC': ['#0080C6', '#000000'],
              'MIA': ['#008E97', '#000000'],
              'MIN': ['#4F2683', '#000000'],
              'NE': ['#002244', '#000000'],
              'NO': ['#D3BC8D', '#000000'],
              'NYG': ['#0B2265', '#000000'],
              'NYJ': ['#125740', '#000000'],
              'OAK': ['#000000', '#000000'],
              'PHI': ['#004C54', '#000000'],
              'PIT': ['#FFB612', '#000000'],
              'SEA': ['#002244', '#000000'],
              'SF': ['#AA0000', '#000000'],
              'TB': ['#D50A0A', '#000000'],
              'TEN': ['#0C2340', '#000000'],
              'WAS': ['#773141', '#000000']}

f = open(NFLDATADIR + 'games.csv', 'r')
w = f.read().strip().split('\n')
f.close()
headers = w[0].split(',')

#print(headers, '\n')
for line in w[1:]:
    line = line.split(',')
    games.append({headers[j]: line[j] for j in range(len(headers))})


f = open(NFLDATADIR + 'plays.csv', 'r')
w = f.read().strip().split('\n')
f.close()
headers = w[0].split(',')

#print(headers, '\n')
for line in w[1:]:
    flag = 0
    line = list(line)
    for j in range(len(line)):
        if line[j] == '"':
            flag = 1-flag
        elif line[j] == ',' and flag:
            line[j] = ';'
    line = ''.join(line).split(',')
    playsDict = gamePlays.get(line[0], [])
    playsDict.append({headers[j]: line[j] for j in range(len(headers))})
    gamePlays[line[0]] = playsDict
    
def constructDataForPos(df, play, possession, defence, reverse, printer=1):
    dfPos = df[(df['playId'] == int(play['playId']))]
    dfPos = dfPos.drop_duplicates()
    frameIds = dfPos.frameId.unique()
    ballSnapped = passThrown = passOutcome = -1
    ballSnapped = BALLSNAPMAP.get(play['gameId'] + ',' + play['playId'], -1)
    passThrown = PASSFORWARDMAP.get(play['gameId'] + ',' + play['playId'], -1)
    direction = 0
    for index, row in dfPos.iterrows():
        if row['playDirection'] == 'right':
            assert direction != -1
            direction = 1
        elif row['playDirection'] == 'left':
            assert direction != 1
            direction = -1
        if row['event']:
            if BALLSNAPMAP.get(play['gameId'] + ',' + play['playId'], -1) == -1 and row['event'] == 'ball_snap':
                assert not (ballSnapped >= 0 and int(row['frameId']) != ballSnapped) #Ball snapped at 2 time points?
                ballSnapped = int(row['frameId'])
            elif PASSFORWARDMAP.get(play['gameId'] + ',' + play['playId'], -1) == -1 and row['event'] in ['pass_forward', 'pass_shovel']:
                assert not (passThrown >= 0 and int(row['frameId']) != passThrown) #Multiple forward passes?
                passThrown = int(row['frameId'])
            elif row['event'].startswith('pass_outcome'):
                if passOutcome >= 0 and int(row['frameId']) != passOutcome: #Multiple pass outcomes?
                    pass
                passOutcome = int(row['frameId'])
    if passOutcome == -1:
        passOutcome = len(dfPos['frameId'].unique()) - 1 #Lop off the last frame, because occasionally a player is missing that frame
    if printer:
        print('Ball snapped: %d\tForward Pass: %d\tPass Outcome: %d' % (ballSnapped, passThrown, passOutcome))
    ballSnapped = 0
    dfPos = dfPos[(dfPos['frameId'] >= ballSnapped) & (dfPos['frameId'] <= passOutcome)]
    
    ballLocs = dfPos[dfPos['team'] == 'football'][['time', 'x', 'y']]
    if reverse: 
        oPlayers = [*map(int, sorted(dfPos[dfPos['team'] == defence]['nflId'].unique()))]
        dPlayers = [*map(int, sorted(dfPos[dfPos['team'] == possession]['nflId'].unique()))]
    else:
        oPlayers = [*map(int, sorted(dfPos[dfPos['team'] == possession]['nflId'].unique()))]
        dPlayers = [*map(int, sorted(dfPos[dfPos['team'] == defence]['nflId'].unique()))]
    DEFENDERS = len(dPlayers)
    OFFENDERS = len(oPlayers)# - 1 #Take away 1 for the passer
    ballX = ballLocs.iloc[0,]['x']
    qbX = dfPos[dfPos['team'] == possession].iloc[0,]['x'] #Not necessarily QB, but no offensive player should be on the wrong side of the ball
    if play['absoluteYardlineNumber']:
        absolute = int(play['absoluteYardlineNumber'])
    elif play['yardlineNumber'] == '50': #Midfield => 10+50
        absolute = 60
    elif play['yardlineNumber'] and play['yardlineSide'] and play['possessionTeam']:
        if (play['yardlineSide'] == play['possessionTeam'] and qbX > ballX) or (play['yardlineSide'] != play['possessionTeam'] and qbX < ballX):
            absolute = 120 - 10 - int(play['yardlineNumber'])
        elif (play['yardlineSide'] == play['possessionTeam'] and qbX < ballX) or (play['yardlineSide'] != play['possessionTeam'] and qbX > ballX):
            absolute = int(play['yardlineNumber']) + 10
        
        
    #If reversing to look at who the WR "thinks" is guarding him, focus on my endzone?
    goalLocX = 60 + (-1)**reverse*direction*60 
    goalLoc = [goalLocX, 27] #This is the middle of the back of the endzone -- my equivalent to hoop location
    
    playerDict = {}
    defData = {}
    
    for dPlayer in dPlayers:
        dMat = dfPos[dfPos['nflId'] == dPlayer]
        jersey = [*map(int, sorted(dMat['jerseyNumber'].unique()))][0]
        name = [*sorted(dMat['displayName'].unique())][0]
        playerDict[dPlayer] = '%s #%s' % (name.replace('.', ''), jersey)
        velX = []
        velY = []
        for index, row in dMat.iterrows():
            velX.append(row['s']*math.sin(math.radians(row['dir'])))
            velY.append(row['s']*math.cos(math.radians(row['dir'])))
        dMat = dMat.assign(velX=velX)
        dMat = dMat.assign(velY=velY)
        defData[dPlayer] = dMat[['gameId', 'playId', 'event', 'nflId', 'time', 'x', 'y', 'velX', 'velY']]
        
    offData = {}
    
    for oPlayer in oPlayers:
        oMat = dfPos[dfPos['nflId'] == oPlayer]
        jersey = [*map(int, sorted(oMat['jerseyNumber'].unique()))][0]
        name = [*sorted(oMat['displayName'].unique())][0]
        playerDict[oPlayer] = '%s #%s' % (name.replace('.', ''), jersey)
        velX = []
        velY = []
        goalX = []
        goalY = []
        firstDownX = []
        firstDownY = []
        for index, row in oMat.iterrows():
            velX.append(row['s']*math.sin(math.radians(row['dir'])))
            velY.append(row['s']*math.cos(math.radians(row['dir'])))
            goalX.append(goalLocX)
            goalY.append(row['y'])
            firstDownX.append(absolute+direction*int(play['yardsToGo']))
            firstDownY.append(row['y'])
        oMat = oMat.assign(velX=velX)
        oMat = oMat.assign(velY=velY)
        oMat = oMat.assign(goalX=goalX)
        oMat = oMat.assign(goalY=goalY)
        oMat = oMat.assign(firstDownX=firstDownX)
        oMat = oMat.assign(firstDownY=firstDownY)
        offData[oPlayer] = oMat[['gameId', 'playId', 'event', 'nflId', 'time', 'x', 'y', 'velX', 'velY', 'goalX', 'goalY', 'firstDownX', 'firstDownY']]
        
    uniqueGameClock = dfPos['frameId'].unique()
    observedX = [0 for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]
    observedY = [0 for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]
    designX = [[0, 0, 0, 0] for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]
    designY = [[0, 0, 0, 0] for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]
    velocities = [{__: 0 for __ in ['vxDef', 'vyDef', 'vxOff', 'vyOff']} for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]
    info = [[-1, -1] for _ in range(len(uniqueGameClock)*OFFENDERS*DEFENDERS)]

    idx = 0
    
    for oPlayer in oPlayers:
        for dPlayer in dPlayers:
            endIdx = idx+len(uniqueGameClock)
            indices = [*range(idx, endIdx)]
            for j in range(len(indices)):
                info[indices[j]] = [oPlayer, dPlayer]
                velocities[indices[j]]['vxDef'] = defData[dPlayer]['velX'].iloc[j]
                velocities[indices[j]]['vyDef'] = defData[dPlayer]['velY'].iloc[j]
                velocities[indices[j]]['vxOff'] = offData[oPlayer]['velX'].iloc[j]
                velocities[indices[j]]['vyOff'] = offData[oPlayer]['velY'].iloc[j]
                observedX[indices[j]] = defData[dPlayer]['x'].iloc[j]
                observedY[indices[j]] = defData[dPlayer]['y'].iloc[j]
                #Design[0] is the player position
                designX[indices[j]][0] = offData[oPlayer]['x'].iloc[j]
                designY[indices[j]][0] = offData[oPlayer]['y'].iloc[j]
                #Design[1] is the ball location
                designX[indices[j]][1] = ballLocs['x'].iloc[j]
                designY[indices[j]][1] = ballLocs['y'].iloc[j]
                #Design[2] is the location of the perpendicular to the back of the endzone
                designX[indices[j]][2] = offData[oPlayer]['goalX'].iloc[j]
                designY[indices[j]][2] = offData[oPlayer]['goalY'].iloc[j]
                #Design[3] is the location of the perpendicular to the first down line
                designX[indices[j]][3] = offData[oPlayer]['firstDownX'].iloc[j]
                designY[indices[j]][3] = offData[oPlayer]['firstDownY'].iloc[j]
            idx = endIdx
    return {'designX': designX, 'designY': designY, 'observedX': observedX, 'observedY': observedY,
            'gc': uniqueGameClock, 'oPlayers': oPlayers, 'dPlayers': dPlayers,
            'info': info, 'velocities': velocities, 'playerDict': playerDict}

def runEMalgorithm(mysteryVelMultiplier, reverse, EM, useVel, drawSd, drawRho, maxEM, pdatList, globalDesign, globalObserved, globalVelocitiesObserved, globalVelocitiesDesign, printer=1):    
    if reverse:
        initWeights = [1.02, 0.05, 0.07, -0.14]
        errSdPos=1.3
        errSdVel=0.78
        rho=0.96
    else:
        initWeights = [0.72, 0.03, 0.07, 0.18]
        errSdPos=1.6
        errSdVel=0.9
        rho=0.95
    numWeights = len(initWeights)
    attArrList = {}
    alphaMat = []
    betaMat = []
    initWeights = [j/sum(initWeights) for j in initWeights]
    pars = initWeights[:] + [errSdPos, errSdVel, rho]
    if printer:
        print('Parameters:', pars)
    design = np.array(globalDesign)
    observed = np.array(globalObserved)
    count = 1
    parsPrev = [-1000000000000 for j in pars] #Initialize the parameters?
    ############ RUN EM ######################## 
    #If there's no EM, then you just need 1 iteration to get the matrices, and such
    while (EM or (not EM and count == 1)) and sum([(parsPrev[j] - pars[j])**2 for j in range(len(pars))]) > 10**(-8) and count < maxEM:

        ############ E - step ######################
        idx = 0
        expected = [0 for _ in range(len(globalObserved))]
        weights = pars[:numWeights]
        errSdPos, errSdVel, rho = pars[numWeights:]
        for p in pdatList: #Go through the plays
            ## EM - Baum Welch
            ## i is offense
            ## j is defense
            pdat = pdatList[p]
            playerDict = pdat['playerDict']
            uniqueGameClock = pdat['gc']
            designX = pdat['designX']
            designY = pdat['designY']
            observedX = pdat['observedX']
            observedY = pdat['observedY']
            nr = len(designX)
            velocities = pdat['velocities']

            dPlayers = pdat['dPlayers']
            oPlayers = pdat['oPlayers']

            DEFENDERS = len(dPlayers)
            OFFENDERS = len(oPlayers)
            oPlayers2 = [j for j in oPlayers]

            transition = ((OFFENDERS*rho - 1)/(OFFENDERS-1))*np.identity(OFFENDERS) + np.full((OFFENDERS, OFFENDERS), (1-rho)/(OFFENDERS-1)) #One of these 5s is offensive players, and the other is defence

            scalingMatrix = [[0 for _ in range(len(uniqueGameClock))] for _ in range(DEFENDERS)]# matrix(nrow=5,ncol=length(uniqueGameClock)+1) -- rows will be OFFENDERS or DEFENDERS?

            alphaMat = [[[0 for _ in range(DEFENDERS)] for _ in range(OFFENDERS)] for _ in range(len(uniqueGameClock))]
            betaMat = [[[0 for _ in range(DEFENDERS)] for _ in range(OFFENDERS)] for _ in range(len(uniqueGameClock))]
            xiMat = [[[[0 for _ in range(DEFENDERS)] for _ in range(OFFENDERS)] for _ in range(OFFENDERS)] for _ in range(len(uniqueGameClock)-1)]
            emitMat = [[[0 for _ in range(DEFENDERS)] for _ in range(OFFENDERS)] for _ in range(len(uniqueGameClock))]

            #Forward Step
            #Time step 0
            for i in range(OFFENDERS):
                for j in range(DEFENDERS):
                    firstIdx = (j+DEFENDERS*i)*len(uniqueGameClock)

                    #Take the position of the offensive player, ball, and goal, 
                    #and construct what is currently the "ideal" defender's position
                    defMeanX = np.matmul(designX[firstIdx], weights) 
                    defMeanY = np.matmul(designY[firstIdx], weights)

                    meanVx0 = velocities[firstIdx]['vxOff']#offence velocities are columns 2 and 3; defence is 0 and 1
                    meanVy0 = velocities[firstIdx]['vyOff']

                    #The projected variables are the ideal defender position + a modifier based on the velocities?
                    projectedX = defMeanX+mysteryVelMultiplier*meanVx0
                    projectedY = defMeanY+mysteryVelMultiplier*meanVy0

                    #The difference between the ideal and the observed defender position
                    meanVx = projectedX-observedX[firstIdx]
                    meanVy = projectedY-observedY[firstIdx]
                    normalizer = math.sqrt(meanVx**2+meanVy**2)

                    projectedDist = math.sqrt((projectedY-observedY[firstIdx])**2+(projectedX-observedX[firstIdx])**2)

                    mag = 0.02*projectedDist
                    meanVx = meanVx*mag/normalizer
                    meanVy = meanVy*mag/normalizer

                    alphaMat[0][i][j] = emitMat[0][i][j] = normalDist.pdf(observedX[firstIdx], loc=defMeanX, scale=errSdPos)*normalDist.pdf(observedY[firstIdx], loc=defMeanY, scale=errSdPos)

                    if useVel:
                        alphaMat[0][i][j] = emitMat[0][i][j] = emitMat[0][i][j]*normalDist.pdf(velocities[firstIdx]['vxDef'], loc=meanVx, scale=errSdVel)*normalDist.pdf(velocities[firstIdx]['vyDef'], loc=meanVy, scale=errSdVel)
            for j in range(DEFENDERS):
                scalingMatrix[j][0] = sum([alphaMat[0][i][j] for i in range(OFFENDERS)])
            for i in range(OFFENDERS):
                for j in range(DEFENDERS):
                    alphaMat[0][i][j] = alphaMat[0][i][j]*(1/scalingMatrix[j][0])
                    #This scales alphaMat[0] so that the columns sum to 1
                    betaMat[len(uniqueGameClock) - 1][i][j] = 1
            for t in range(1, len(uniqueGameClock)):
                for j in range(DEFENDERS):
                    indices = [(j+DEFENDERS*i)*len(uniqueGameClock) + t for i in range(OFFENDERS)]
                    for i in range(len(indices)):
                        ind = indices[i]
                        dCurX = observedX[ind]
                        dCurY = observedY[ind]

                        defMeanX = np.matmul(designX[ind], weights) 
                        defMeanY = np.matmul(designY[ind], weights)

                        meanVx0 = velocities[ind]["vxOff"]
                        meanVy0 = velocities[ind]["vyOff"]

                        projectedX = defMeanX+mysteryVelMultiplier*meanVx0
                        projectedY = defMeanY+mysteryVelMultiplier*meanVy0

                        meanVx = projectedX-observedX[ind]
                        meanVy = projectedY-observedY[ind]
                        normalizer = math.sqrt(meanVx**2+meanVy**2)

                        projectedDist = math.sqrt((projectedY-observedY[ind])**2+(projectedX-observedX[ind])**2)

                        mag = 0.02*projectedDist
                        meanVx = meanVx*mag/normalizer
                        meanVy = meanVy*mag/normalizer
                        emitMat[t][i][j] = normalDist.pdf(dCurX, loc=defMeanX, scale=errSdPos)*normalDist.pdf(dCurY, loc=defMeanY, scale=errSdPos)
                        if useVel:
                            emitMat[t][i][j] = emitMat[t][i][j]*normalDist.pdf(velocities[ind]['vxDef'], loc=meanVx, scale=errSdVel)*normalDist.pdf(velocities[ind]['vyDef'], loc=meanVy, scale=errSdVel)
                    v = np.array(np.matmul(np.matrix([alphaMat[t-1][i][j] for i in range(OFFENDERS)]), np.matrix(transition))[0])[0]
                    matrix = [v[i]*emitMat[t][i][j] for i in range(OFFENDERS)]
                    for i in range(OFFENDERS):
                        alphaMat[t][i][j] = matrix[i]
                    scaleCur = sum(matrix)
                    ## check is emission matrix is below machine precision
                    ## if so just choose max 
                    if scaleCur < 10**(-321): #Underflow -- not good
                        print('Underflow on iteration %d for defender %d -- %d: %s' % (t, j, dPlayers[j], playerDict[dPlayers[j]]))
                        print(matrix)
                        print(v)
                        print([emitMat[t][i][j] for i in range(OFFENDERS)])
                        for line in alphaMat[t]:
                            print(' '.join(['%10.8f' for _ in line]) % tuple(line))
                        scaleCur = 1
                        alphaCur = [0 for _ in range(OFFENDERS)]
                        alphaCur[np.argmax(matrix)] = 1
                        for i in range(OFFENDERS):
                            alphaMat[t][i][j] = alphaCur[i]
                    scalingMatrix[j][t] = scaleCur

                for i in range(OFFENDERS):
                    for j in range(DEFENDERS):
                        alphaMat[t][i][j] = alphaMat[t][i][j]*(1/scalingMatrix[j][t])
            ## Backward step
            for t in range(len(uniqueGameClock)-2, -1, -1):
                for j in range(DEFENDERS):
                    v = [emitMat[t+1][i][j]*betaMat[t+1][i][j] for i in range(OFFENDERS)] #Entry-wise product for the emission and prior alpha matrix
                    matrix = np.array(np.matmul(np.matrix(v), np.matrix(transition)))[0]
                    for i in range(OFFENDERS):
                        betaMat[t][i][j] = matrix[i]
                for i in range(OFFENDERS):
                    for j in range(DEFENDERS):
                        betaMat[t][i][j] = betaMat[t][i][j]*(1/scalingMatrix[j][t+1])
            attArr = [[[alphaMat[t][i][j]*betaMat[t][i][j] for j in range(DEFENDERS)] for i in range(OFFENDERS)] for t in range(len(uniqueGameClock))]

            if EM and drawRho:
                for j in range(DEFENDERS):
                    for t in range(len(uniqueGameClock)-1):
                        aVec = [alphaMat[t][i][j] for i in range(OFFENDERS)]
                        bVec = [betaMat[t+1][i][j]*emitMat[t+1][i][j] for i in range(OFFENDERS)]
                        cVec = np.matmul(np.matrix(aVec).transpose(), np.matrix(bVec))
                        dVec = np.multiply(cVec, np.matrix(transition))
                        dVec = np.array(dVec/dVec.sum())
                        for row in range(OFFENDERS):
                            for col in range(OFFENDERS):
                                xiMat[t][row][col][j] = dVec[row][col]

            ePosLen = len(uniqueGameClock)*OFFENDERS*DEFENDERS*2
            ePos = [0 for j in range(ePosLen)]
            for i in range(OFFENDERS):
                for j in range(DEFENDERS):
                    posIds = [k for k in range(len(pdat['info'])) if pdat['info'][k] == [oPlayers2[i], dPlayers[j]]]
                    for p0 in range(len(posIds)):
                        ePos[posIds[p0]] = ePos[posIds[p0]+len(uniqueGameClock)*OFFENDERS*DEFENDERS] = attArr[p0][i][j]
            for j in range(ePosLen):
                expected[idx+j] = ePos[j]
            idx += ePosLen
            attArrList[p] = attArr

        #FINISHED THE E STEP!!

        #Now begin the M step
        if EM:
            parsPrev = weights[:] + [errSdPos, errSdVel, rho]
            expected0 = [j + np.finfo(float).eps for j in expected]
            # Weighted least squares
            desEx = [np.multiply(design[j], expected0[j]) for j in range(len(expected))]
            xexInv = np.linalg.inv(np.matmul(design.transpose(), desEx))
            desExY = np.matmul(np.array(desEx).transpose(), observed)
            gamHat = np.matmul(xexInv, desExY)
            gamCorrection = xexInv.sum(axis=1)/xexInv.sum()*(1-sum(gamHat))
            weights = list(gamHat+gamCorrection)

            if drawRho:
                summedMat = [[sum([xiMat[t][x][y][j] for t in range(len(uniqueGameClock)-1) for j in range(DEFENDERS)]) for y in range(OFFENDERS)] for x in range(OFFENDERS)]
                #The rhoPrior seems like a fudge factor to speedup convergence?
                numer = np.diag(summedMat).sum() + rhoPrior[0]
                denom = np.array(summedMat).sum() - np.diag(summedMat).sum() + rhoPrior[1]

                M = 1/4*numer/denom

                rho = M/(M+1)
            transition = ((OFFENDERS*rho - 1)/(OFFENDERS-1))*np.identity(OFFENDERS) + np.full((OFFENDERS, OFFENDERS), (1-rho)/(OFFENDERS-1))
            if drawSd:
                step10 = np.matmul(design, weights)
                step11 = np.subtract(observed, step10)
                step12 = np.power(step11, 2)
                step13 = np.multiply(expected0, step12)
                errSdPos = math.sqrt(sum(step13)/len(globalObserved))
                if useVel:
                    step20 = np.subtract(globalVelocitiesObserved, globalVelocitiesDesign)
                    step21 = np.power(step20, 2)
                    step22 = np.multiply(expected0, step21)
                    errSdVel = math.sqrt(sum(step22)/len(globalVelocitiesObserved))
            pars = weights[:] + [errSdPos, errSdVel, rho]
        if printer:
            print('Pars in count %d:\n\t%s' % (count, [round(1000000*q)/1000000 for q in pars]))
        count += 1
    return attArrList, alphaMat, betaMat, pars

def rotator(center, points, angle, subtract=0):
    c = math.cos(math.radians(-angle))
    s = math.sin(math.radians(-angle))
    vertices = []
    for j in range(len(points)):
        x = points[j][0] - subtract*center[0]
        y = points[j][1] - subtract*center[1]
        vertices.append([c*x - s*y + center[0], s*x + c*y + center[1]])
    return vertices

def myUpdate(i, playerIds, ballCircle, widgets, attention, playerData, footballData, ballHandlers, direction, ballSnapped, passThrown, passOutcome, oPlayers, dPlayers, alphaMat, betaMat, table, firstDown, circleSize):
    table.get_celld()[(0, len(dPlayers)//2)].get_text().set_text("Frame #%d -- Event: %1s" % (i+1, footballData['events'][i]))
    width, widthMod = 5, 1/20
    for j, circle in enumerate(widgets['circles']):
        myId = playerIds[j]
        x, y = playerData[myId]['x'][i], playerData[myId]['y'][i]
        circle.center = x, y
        widgets['annotations'][j].set_position(circle.center)
        widgets['triangles'][j].set_xy(rotator([playerData[myId]['x'][i], playerData[myId]['y'][i]], [[circleSize, 0], [-circleSize, 0], [0, 2*circleSize]], playerData[myId]['orientation'][i]))
        length = playerData[myId]['speed'][i]
        points = [[x+width/10, y], [x+width*widthMod, y+3*length/4], [x, y+length], [x-width*widthMod, y+3*length/4], [x-width*widthMod, y]]
        widgets['velArrows'][j].set_xy(rotator([x, y], points, playerData[myId]['angle'][i], subtract=1))
        if widgets['ideal'][0] and playerData[myId]['name'] == widgets['ideal'][0]:
            if len(widgets['weights']) == 3:
                widgets['ideal'][1].center = x*widgets['weights'][0] + footballData['x'][i]*widgets['weights'][1] + (60+60*direction)*widgets['weights'][2], y*widgets['weights'][0] + footballData['y'][i]*widgets['weights'][1] + 27*widgets['weights'][2]
            else:
                widgets['ideal'][1].center = x*widgets['weights'][0] + footballData['x'][i]*widgets['weights'][1] + (60+60*direction)*widgets['weights'][2] + firstDown*widgets['weights'][3], y*widgets['weights'][0] + footballData['y'][i]*widgets['weights'][1] + y*widgets['weights'][2] + y*widgets['weights'][3]
    ballCircle.center = footballData['x'][i], footballData['y'][i]
    for j in range(len(ballHandlers)):
        widgets['polygons'][j].set_xy([[footballData['x'][i], footballData['y'][i]], [playerData[ballHandlers[j]]['x'][i] - 0*math.sin(math.radians(playerData[ballHandlers[j]]['angle'][i])), playerData[ballHandlers[j]]['y'][i] - 0*math.cos(math.radians(playerData[ballHandlers[j]]['angle'][i]))], [60 + direction*60, 27]])
        if i+1 == ballSnapped: #i starts from 0. FrameIds start from 1
            widgets['polygons'][j].set_alpha(0.5)
        elif i+1 == passThrown:
            widgets['polygons'][j].set_alpha(0)
    if ballSnapped <= i+1 <= passOutcome:
        for o in range(len(oPlayers)):
            for d in range(len(dPlayers)):
                line = widgets['matchups'][len(dPlayers)*o + d]
                line.set_xdata([playerData[oPlayers[o]]['x'][i], playerData[dPlayers[d]]['x'][i]])
                line.set_ydata([playerData[oPlayers[o]]['y'][i], playerData[dPlayers[d]]['y'][i]])
                line.set_linewidth(5*attention[i-ballSnapped][o][d])
                table.get_celld()[(o+2,d+1)].get_text().set_text("%.5f/%.5f" % (alphaMat[i-ballSnapped][o][d], betaMat[i-ballSnapped][o][d]))
    elif i+1 > passOutcome:
        for line in widgets['matchups']:
            line.set_linewidth(0)
        for i0 in range(len(oPlayers)):
            for j in range(len(dPlayers)):
                table.get_celld()[(i0+2,j+1)].get_text().set_text("?")
    return (ballCircle, widgets,)

def makeAnimation(play, reverse, ideal, focus):
    FOOTBALLCOLOUR = '#CD853F'
    CIRCLESIZE = 1
    playerIds = []
    playerData = {}
    myGame = [game for game in games if game['gameId'] == str(play[0])][0]
    allPlays = gamePlays[myGame['gameId']]
    chosenPlay = [p1 for p1 in allPlays if p1['playId'] == str(play[1])][0]
    gameId = myGame['gameId']
    playId = chosenPlay['playId']
    possession = 'home' if chosenPlay['possessionTeam'] == myGame['homeTeamAbbr'] else 'away'
    defence = 'away' if chosenPlay['possessionTeam'] == myGame['homeTeamAbbr'] else 'home'

    print('GameId: %1s\nPlayId: %1s' % (myGame['gameId'], chosenPlay['playId']))
    #pd.set_option('display.max_rows', 1400)
    #pd.set_option('display.max_columns', 500)
    df0 = pd.read_csv(NFLDATADIR + 'week%s.csv' % myGame['week'], header=0, skip_blank_lines=True)
    playDf = df0[(df0['gameId'] == int(myGame['gameId'])) & (df0['playId'] == int(chosenPlay['playId']))]
    playDf = playDf.drop_duplicates()
    
    theData = constructDataForPos(playDf, chosenPlay, possession, defence, reverse, printer=False)
    pdatList = {myGame['gameId']+'_'+chosenPlay['playId']: theData}
    nr = len(theData['designX'])
    localDesign = theData['designX']+theData['designY']
    localObserved = theData['observedX']+theData['observedY']
    localVelocityObserved = [j['vxDef'] for j in theData['velocities']] + [j['vyDef'] for j in theData['velocities']] #Defence velocity X then Defence velocity Y
    localVelocityDesign = [j['vxOff'] for j in theData['velocities']] + [j['vyOff'] for j in theData['velocities']] #Offence velocity X then Offence velocity Y
    globalDesign = localDesign[:]
    globalObserved = localObserved[:]
    globalVelocitiesDesign = localVelocityDesign[:]
    globalVelocitiesObserved = localVelocityObserved[:]
    #Get the attention values based on who each defender is guarding
    attArrList, alphaMat, betaMat, pars = runEMalgorithm(MYSTERYVELMULTIPLIER, reverse, False, True, True, True, 100, pdatList, globalDesign, globalObserved, globalVelocitiesObserved, globalVelocitiesDesign, printer=False)
    
    teams = {'home': myGame['homeTeamAbbr'], 'away': myGame['visitorTeamAbbr']}
    colours = {'home': [colourDict[myGame['homeTeamAbbr']][0], '#FFFFFF'], 'away': ['#FFFFFF', colourDict[myGame['visitorTeamAbbr']][0]]}
    colourSingle = {'home': colourDict[myGame['homeTeamAbbr']][0], 'away': colourDict[myGame['visitorTeamAbbr']][0]}
    if reverse: #Offence perspective
        titleStr = 'D\\O'
        offColour = colourSingle[defence]
        defColour = colourSingle[possession]
        weights = [1.02, 0.05, 0.07, -0.14]
        oPlayers2 = [*map(int, sorted(playDf[playDf['team'] == defence]['nflId'].unique()))]
        dPlayers = [*map(int, sorted(playDf[playDf['team'] == possession]['nflId'].unique()))]
    else: #Defence perspective
        titleStr = 'O\\D'
        offColour = colourSingle[possession]
        defColour = colourSingle[defence]
        weights = [0.72, 0.03, 0.07, 0.18]
        oPlayers2 = [*map(int, sorted(playDf[playDf['team'] == possession]['nflId'].unique()))]
        dPlayers = [*map(int, sorted(playDf[playDf['team'] == defence]['nflId'].unique()))]
    DEFENDERS = len(dPlayers)
    OFFENDERS = len(oPlayers2)
    playerDict = {}
    for dPlayer in dPlayers:
        dMat = playDf[playDf['nflId'] == dPlayer]
        jersey = [*map(int, sorted(dMat['jerseyNumber'].unique()))][0]
        name = [*sorted(dMat['displayName'].unique())][0]
        playerDict[dPlayer] = '%s #%s' % (name.replace('.', ''), jersey)
    
    for oPlayer in oPlayers2:
        oMat = playDf[playDf['nflId'] == oPlayer]
        jersey = [*map(int, sorted(oMat['jerseyNumber'].unique()))][0]
        name = [*sorted(oMat['displayName'].unique())][0]
        playerDict[oPlayer] = '%s #%s' % (name.replace('.', ''), jersey)
    
    frameIds = playDf.frameId.unique()
    
    players = playDf[playDf['team'] != 'football']
    football = playDf[playDf['team'] == 'football']

    field = plt.imread(DIRMODIFIER + "field2.png")
    plt.figure(figsize=(24, 16))
    plt.axis([0, 120, 0, 55])
    ax = plt.axes(xlim=(0, 120), ylim=(0, 55))
    
    plt.imshow(field, zorder=0, extent=[0,120, 0, 55])
    plt.xlim(0,120)

    ax.axis('off')
    fig = plt.gcf()
    ax.grid(False)  # Remove grid

    #cbar = plt.colorbar(orientation="horizontal")

    z = 1
    annotations = []
    playerCircles = []
    velArrows = []
    
    playDesc = ax.text(120 / 2 - 6 / 1.5 + 0.10, 55 - 6 / 1.5 - 0.35, '%s\nDown: %s & %s' % (chosenPlay['playDescription'].replace('. ', '.\n'), chosenPlay['down'], chosenPlay['yardsToGo']),
                       fontsize=20, ha='center', va='top', wrap=True)
    
    possessingTeam = 'home' if chosenPlay['possessionTeam'] == myGame['homeTeamAbbr'] else 'away'
    #print('Possessing Team:', possessingTeam)
    ballSnapped = passThrown = passOutcome = -1
    ballSnapped = BALLSNAPMAP.get(chosenPlay['gameId'] + ',' + chosenPlay['playId'], -1)
    passThrown = PASSFORWARDMAP.get(chosenPlay['gameId'] + ',' + chosenPlay['playId'], -1)
    firstWR = -1
    ballHandlers = []
    triangles = []
    matchups = []
    highlight = ['T']
    polygons = []
    convex = ['#808080', '#FF0000', '#0000FF', '#00FFFF', '#008080']
    direction = 0
    width, widthMod = 5, 1/20
    dirs = []
    footballData = {'name': 'football', 'team': 'football', 'x': [], 'y': [], 'events': []}
    for index, row in football.iterrows():
        if row['frameId'] == frameIds[-1]: #I'm going to ignore the last frame, because occasionally someone misses the last frame
            continue
        footballData['x'].append(row['x'])
        footballData['y'].append(row['y'])
        footballData['events'].append(row['event'])
        if row['event'] == 'ball_snap':
            dirs.append(row['x'])
        elif 0 < len(dirs) < 5:
            dirs.append(row['x'])
        myTime = row['time'].split('T')[1].split('Z')[0]
        myTime = 3600*int(myTime.split(':')[0]) + 60*int(myTime.split(':')[1]) + float(myTime.split(':')[2])
        #plt.scatter(row['x'], row['y'], c='#CD853F', s=1000, zorder=z)
    for index, row in players.iterrows():
        direction = 0
        if row['playDirection'] == 'right':
            assert direction != -1
            direction = 1
        elif row['playDirection'] == 'left':
            assert direction != 1
            direction = -1
        else:
            1/0
        if row['frameId'] == frameIds[-1]: #I'm going to ignore the last frame, because occasionally someone misses the last frame
            continue
        if not playerData.get(int(row['nflId']), ''):
            #print(row['displayName'])
            playerIds.append(int(row['nflId']))
            playerData[int(row['nflId'])] = {'name': row['displayName'].replace('.', ''), 'jersey': str(int(row['jerseyNumber'])), 
                                             'team': row['team'], 'x': [], 'y': [], 'speed': [], 'acceleration': [],
                                             'distance': [], 'orientation': [], 'angle': [], 'events': []
                                            }
            annotations.append(ax.annotate(playerData[int(row['nflId'])]['jersey'], xy=[row['x'], row['y']], color=colours[playerData[int(row['nflId'])]['team']][1],
                                           horizontalalignment='center',
                                           verticalalignment='center', fontweight='bold'))
            playerCircles.append(plt.Circle((row['x'], row['y']), CIRCLESIZE, color=colours[playerData[int(row['nflId'])]['team']][0], zorder=3))
            triangles.append(plt.Polygon(rotator([row['x'], row['y']], [[CIRCLESIZE, 0], [-CIRCLESIZE, 0], [0, 2*CIRCLESIZE]], row['o']),
                                         color=colours[playerData[int(row['nflId'])]['team']][0], zorder=2))
            #velArrows.append(plt.arrow(row['x'], row['y'], row['s']*math.cos(math.radians(row['dir'])), row['s']*math.sin(math.radians(row['dir'])),linewidth=5, zorder=2, color='#FF0000'))
            x, y = row['x'], row['y']
            length = row['s']
            points = [[x+width/10, y], [x+width*widthMod, y+3*length/4], [x, y+length], [x-width*widthMod, y+3*length/4], [x-width*widthMod, y]]
            velArrows.append(plt.Polygon(rotator([x, y], points, row['dir'], subtract=1),
                             zorder=2, color='#FF0000', alpha=0.2))
            #widgets['velArrows'][j].set_xy(rotator([x, y], points, playerData[myId]['angle'][i], subtract=1))
            if row['position'] == 'QB':
                qbX = row['x']
            if row['position'] == 'WR' and firstWR == -1:
                firstWR = int(row['nflId'])
            if row['position'] in ['RB', 'FB', 'WR', 'TE', 'HB']:
                ballHandlers.append(int(row['nflId']))
        playerData[int(row['nflId'])]['x'].append(row['x'])
        playerData[int(row['nflId'])]['y'].append(row['y'])
        playerData[int(row['nflId'])]['speed'].append(row['s'])
        playerData[int(row['nflId'])]['acceleration'].append(row['a'])
        playerData[int(row['nflId'])]['distance'].append(row['dis'])
        playerData[int(row['nflId'])]['orientation'].append(row['o'])
        playerData[int(row['nflId'])]['angle'].append(row['dir'])
        playerData[int(row['nflId'])]['events'].append(row['event'])
        myTime = row['time'].split('T')[1].split('Z')[0]
        myTime = 3600*int(myTime.split(':')[0]) + 60*int(myTime.split(':')[1]) + float(myTime.split(':')[2])
        #plt.scatter(row['x'], row['y'], c=colourDict[teams[row['team']]], s=1000, zorder=z)
        z += 1
        if row['event']:
            if BALLSNAPMAP.get(chosenPlay['gameId'] + ',' + chosenPlay['playId'], -1) == -1 and row['event'] == 'ball_snap':
                assert not (ballSnapped >= 0 and int(row['frameId']) != ballSnapped) #Ball snapped at 2 time points?
                ballSnapped = int(row['frameId'])
            elif PASSFORWARDMAP.get(chosenPlay['gameId'] + ',' + chosenPlay['playId'], -1) == -1 and row['event'] in ['pass_forward', 'pass_shovel']:
                assert not (passThrown >= 0 and int(row['frameId']) != passThrown) #Multiple forward passes?
                passThrown = int(row['frameId'])
            elif row['event'].startswith('pass_outcome'):
                if passOutcome >= 0 and int(row['frameId']) != passOutcome: #Multiple pass outcomes?
                    pass
                passOutcome = int(row['frameId'])

    if passOutcome == -1:
        passOutcome = len(players['frameId'].unique())
    ballSnapped = 0 #I'm setting this to 0 so that the matchups are shown before the snap
    columnColours = ['#FFFFFF' for j in range(DEFENDERS+1)]
    cellColours = [['#FFFFFF'] + [defColour for j in range(DEFENDERS)]] + [[offColour]+['#FFFFFF' for j in range(DEFENDERS)] for i in range(OFFENDERS)]
    cellText = [[titleStr] + [playerDict[dPlayers[j]] for j in range(DEFENDERS)]] + [[playerDict[oPlayers2[i]]]+['?' for j in range(DEFENDERS)] for i in range(OFFENDERS)]
    table = plt.table(cellText=cellText,
                      colLabels=['' for j in range(DEFENDERS+1)],
                      colColours=columnColours,
                      colWidths=[1/(DEFENDERS+1) for j in range(DEFENDERS+1)],
                      loc='bottom',
                      cellColours=cellColours,
                      cellLoc='center')
    table.get_celld()[(0,0)].visible_edges = "TBL"
    for j in range(1, DEFENDERS):
        cell = table.get_celld()[(0,j)]
        cell.visible_edges = "TB"
    table.get_celld()[(0,DEFENDERS)].visible_edges = "TBR"
    table.get_celld()[(0,DEFENDERS//2)].get_text().set_text("Frame #1 -- Event: " + footballData['events'][0])
    table.auto_set_font_size(False)
    table.set_fontsize(14)
    table.scale(1, 4)
    for cell in table.properties()['children']:
        cell._text.set_color('black')
    #print('Ball snapped: %d\tForward Pass: %d\tPass Outcome: %d' % (ballSnapped, passThrown, passOutcome))

    if len(ballHandlers) != 5:
        print('Wrong number of ball handlers:', len(ballHandlers))
        #1/0
        
    if direction == 0:
        print(dirs)
        if all([dirs[j+1] == dirs[j] for j in range(len(dirs)-1)]): #The ball position doesn't move?
            print('The ball stayed in the same place')
        elif all([dirs[j+1] >= dirs[j] for j in range(len(dirs)-1)]): #Direction from snap is right => "forward" is left
            print('Direction = -1')
        elif all([dirs[j+1] <= dirs[j] for j in range(len(dirs)-1)]): #Direction from snap is left => "forward" is right
            print('Direction = +1')
        else: #This would be strange. It means that the ball moves both right and left immediately after the snap
            print('Mixture?')
    ballX = footballData['x'][0]
    ballCircle = plt.Circle((footballData['x'][0], footballData['y'][0]), CIRCLESIZE, color=FOOTBALLCOLOUR, zorder=4)

    if chosenPlay['absoluteYardlineNumber']:
        absolute = int(chosenPlay['absoluteYardlineNumber'])
    elif chosenPlay['yardlineNumber'] == '50': #Midfield => 10+50
        absolute = 60

    l = plt.Line2D([absolute, absolute], [0, 55], color="#0000FF", linewidth=10, zorder=1)
    ax.add_line(l)

    firstDown = absolute+direction*int(chosenPlay['yardsToGo'])
    l = plt.Line2D([firstDown, firstDown], [0, 55], color="#FFFF00", linewidth=10, zorder=1)
    ax.add_line(l)
    
    if ideal[0]:
        for key in playerData:
            if ideal[0] == playerData[key]['name'].replace('.', ''):
                if len(weights) == 3:
                    idealCenter = playerData[key]['x'][0]*weights[0] + footballData['x'][0]*weights[1] + (60+60*direction)*weights[2], playerData[key]['y'][0]*weights[0] + footballData['y'][0]*weights[1] + 27*weights[2]
                else:
                    idealCenter = playerData[key]['x'][0]*weights[0] + footballData['x'][0]*weights[1] + (60+60*direction)*weights[2] + firstDown*weights[3], playerData[key]['y'][0]*weights[0] + footballData['y'][0]*weights[1] + playerData[key]['y'][0]*weights[2] + playerData[key]['y'][0]*weights[3]
                ideal[1] = plt.Circle(idealCenter, CIRCLESIZE, color='#000000', zorder=3)
    
    for _ in range(len(dPlayers)*len(oPlayers2)):
        l = plt.Line2D([0, 0], [0, 0], color="#000000", linewidth=5, zorder=1)
        ax.add_line(l)
        matchups.append(l)
    
    if highlight:
        ballHandlers = [j for j in ballHandlers if playerData[j]['name'] in highlight]
        
    hide = []
    if focus:
        for j in range(len(playerCircles)):
            if playerData[playerIds[j]]['name'] not in focus:
                triangles[j].set_alpha(0.3)
                playerCircles[j].set_alpha(0.3)
                annotations[j].set_alpha(0.3)
                velArrows[j].set_alpha(0.3)

    for j in range(len(ballHandlers)):
        polygons.append(plt.Polygon([[footballData['x'][0], footballData['y'][0]], [playerData[ballHandlers[j]]['x'][0], playerData[ballHandlers[j]]['y'][0]], [60 + direction*60, 27]], 
                                    color=convex[j], alpha=0))
        ax.add_line(polygons[-1])
    
    for triangle in triangles:
        ax.add_patch(triangle)
    for circle in playerCircles:
        ax.add_patch(circle)
    for arrow in velArrows:
        ax.add_patch(arrow)
    ax.add_patch(ballCircle)
    if ideal[0]:
        ax.add_patch(ideal[1])

    widgets = {'polygons': polygons, 'circles': playerCircles, 'triangles': triangles,
               'matchups': matchups, 'annotations': annotations, 'velArrows': velArrows, 'ideal': ideal, 'weights': weights}
    
    #fig.set_size_inches((24, 16))
    anim = animation.FuncAnimation(fig, myUpdate,
                                   fargs=(playerIds, ballCircle, widgets, attArrList[gameId+'_'+playId], playerData, footballData, ballHandlers, direction, ballSnapped, passThrown, passOutcome, oPlayers2, dPlayers, alphaMat, betaMat, table, firstDown, CIRCLESIZE),
                                   frames=len(frameIds)-1, interval=100)

    return anim

In [None]:
'''
Cell 3: Animate the 75 TD pass from Josh Allen to Robert Foster against Jacksonville
'''

ideal = ['Robert Foster', None]
focus = []

anim = makeAnimation(['2018112501', '875'], False, ideal, focus)
#HTML(anim.to_jshtml())
plt.rcParams['animation.html'] = 'html5'
display(anim)
plt.clf()

<div id="3nmf"></div>

# 3. Categorizing Passes

After discerning who is guarding who for a given play, we categorize passes based on the location on the field and find each player's loadings onto these pass categories. We begin by looking at a 64 yd $\times$ 70 yd grid, discretized into 1 yd $\times$ 1 yd squares. For a pass $p = (x, y)$, where $x$ is the horizontal position where $x=0$ is the left sideline, and $y$ is the vertical position where $y=0$ is the line of scrimmage, $p$ is mapped to the cell $c_{j, k}$ where $j = floor(min(58.3, max(x, -5))) + 6$ and $k = floor(min(y, 69)) + 1$. For each player $i$ who was targeted on at least 15 passes thrown beyond the line of scrimmage, we define $X_{i,v}$ to be the numer of targets by player $i$ in cell $v$ based on the discretization of the field. Then, for each player $i$, we use a log Gaussian Cox Process to fit an intensity surface given by $\vec{\lambda}_{i} = (\lambda_{i,c_{1,1}}, \lambda_{i,c_{1,2}},\cdots,\lambda_{i,c_{64,70}})^T$. To fit the intensity surface, GNU Octave was used with the software package [GPstuff][2]. To illustrate the intensity surfaces, the following figure provides a scatter plot of targets overlaying the intensity surface for Julio Jones.

[2]: https://research.cs.aalto.fi/pml/software/gpstuff/

In [None]:
LENGTH = 70
WIDTH = 64
FRACTIONW = 64/WIDTH
FRACTIONL = 70/LENGTH #Make the blocks 1x1

f = open('../input/mydata/2495454_Julio JonesLambda.txt', 'r')
w = f.read().strip().split('\n')
f.close()
#print(len(w))

bins = [[0 for _ in range(WIDTH)] for _ in range(LENGTH)]
lambdas = [*map(float, w)]
s = sum(lambdas)

for j in range(len(lambdas)):
    lambdas[j] = lambdas[j]/s

for row in range(LENGTH):
    for col in range(WIDTH):
        bins[row][col] = float(lambdas[col*LENGTH + row])

fig = plt.figure(figsize=(12,7),frameon=False)#(12,7)
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8]) #where to place the plot within the figure

#mpl.colors.Normalize(vmin=Z.min(), vmax=Z.max(), clip=False)
pcm = ax.pcolormesh([w*FRACTIONW for w in range(WIDTH)], [l*FRACTIONL for l in range(LENGTH)], bins,
                    norm=mpl.colors.Normalize(vmin=0, vmax=max([bins[l][w] for l in range(LENGTH) for w in range(WIDTH)])),
                    cmap='jet',
                    shading='auto')
fig.colorbar(pcm, ax=ax, extend='max')

f = open('../input/mydata/2495454_Julio Jones.txt', 'r')
w = f.read().strip().split('\n')
f.close()
xVals, yVals = [], []
for line in w:
    line = [*map(float, line.split())]
    xVals.append(line[0])
    yVals.append(line[1])
plt.scatter(xVals, yVals, marker='D', color='black')
plt.title('Targets and Intensity Surface for Julio Jones')
plt.show()

We now construct the matrix $\Lambda = [\overline{\lambda}_1, \overline{\lambda}_2, \cdots]^T$ over all of the players, where $\sum\limits_{j=1}^{64\times 70} \overline{\lambda}_{i,j} = 1$ for each $i$. The final step at this point is to factor $\Lambda$ into two non-negative matrices $WL \approx \Lambda$. We know that $\Lambda$ is a $N \times V$ matrix, where $N$ is the number of players, and $V$ is the number of cells. Additionally, $W$ is a $N \times K$ matrix, and $L$ is a $K \times V$ matrix. Using this matrix factorization, $K$ gives the number of bases to categorize passes; $L_{b,c}$ gives the intensity of cell $c$ in basis $b$; and $W_{i,b}$ gives the weight for player $i$ in basis $b$. Based on the below figure, $K = 5$ was chosen, because the reduction in the Kullback-Leibler divergence by choosing a larger number of bases was negligible, but there was a noticeable benefit from 5 bases compared to 4 bases.

In [None]:
img = plt.imread('../input/mydata/klFactorizations.png')
fig, ax = plt.subplots(figsize=(9,6))
ax.axis('off')
plt.imshow(img)
plt.show()

For these 5 bases, we may associate each cell to one of the 5 bases based on which basis has the most weight in a given cell. The following figure colour codes the cells by basis. From this figure, we see a very intuitive breakdown of a football field. Broadly, the field is partitioned into very long passes (basis 1: red), short passes (basis 2: green), passes between about 10 and 40 yards on the left side of the field (basis 3: blue), passes up to about 40 yards on the right side of the field (basis 4: orange), and passes between about 10 and 35 yards over the middle (basis 5: purple). Going forward, we use evaluate the impact of defenders in each of these bases.

In [None]:
K = 5

LENGTH = 70
WIDTH = 64
FRACTIONW = 64/WIDTH
FRACTIONL = 70/LENGTH #Make the blocks 1x1

with open('../input/mydata/factorization%d.json' % K) as f:
    data = json.load(f)

#Normalize the bases, and then figure out which basis has the largest weighting in each tile
basis = []
for j in range(WIDTH*LENGTH):
    basis.append(sorted(range(K), key=lambda a: data['loadings'][a][j]/sum(data['loadings'][a]), reverse=True)[0])
    
colourMap = {1: 'red',
             2: 'green',
             3: 'blue',
             4: 'orange',
             5: 'purple',
            }

from matplotlib.patches import Rectangle

fig = plt.figure(figsize=(9,6))
ax = plt.axes([0.1, 0.1, 0.8, 0.8]) #where to place the plot within the figure
plt.xlim(0, WIDTH)
plt.ylim(0, LENGTH)
#plt.rcParams['figure.figsize'] = [9.0, 6.0]

for row in range(LENGTH):
    for col in range(WIDTH):
        restricted = Rectangle([col*FRACTIONW, row*FRACTIONL], FRACTIONW, FRACTIONL, color=colourMap[1 + basis[col*LENGTH + row]],alpha=0.8, fill=True)
        ax.add_patch(restricted)

plt.show()

<div id="4eval"></div>

# 4. Player Evaluation

To evaluate players in each basis, we cluster the defensive players by constructing a matrix describing how frequently each defender is guarding an offender in each basis, and applying Principal Component Analysis on that matrix. Using 3 clusters with Tremaine Edmunds, Tre'Davious White, and Jordan Poyer to give the centres of the clusters, the defenders approximately cluster into linebackers, cornerbacks, and safeties. 

<div id="41freq"></div>

## 4.1 Pass Frequency

To model the impact of defenders on pass frequency and also efficiency, we follow the outline of Franks et al. For play $n$, we let $\mathcal{S}_n$ be a categorical random variable to describe the target and basis of the pass, with the possibility that no player will be targeted. Reasons for this might be that the quarterback is sacked on the play, or the pass is thrown away. We model $\mathcal{S}_n$ by $$\mathbb{P}\left[\mathcal{S}_n(i, b) = 1 \vert \alpha_{i,b}, Z_n\right] = \dfrac{\exp\left(\alpha_{i,b} + \sum\limits_{j=1}^{d_n} Z_n(j, i) \beta_{j,b}\right)}{1 + \exp\left(\alpha_{i,b} + \sum\limits_{j=1}^{d_n} Z_n(j, i) \beta_{j,b}\right)},$$ where $\alpha_{i,b}$ is the frequency that offender $i$ is the target of a pass in basis $b$; $\beta_{j,b}$ is the impact of defender $j$ on pass frequency in basis $b$; and $Z_n(j, i)$ is the proportion of time that defender $j$ spent guarding offender $i$ in play $n$. Using the variational method described by Franks et al., we arrive at the following table of outcomes for defenders in each basis.



In [None]:
with open('../input/mydata/columnHeaders.json') as f:
    columnHeaders = json.load(f)
    
with open('../input/mydata/clusterLabels.json') as f:
    clusterLabels = json.load(f)
    
with open('../input/mydata/defenceIds.json') as f:
    defenceIds = json.load(f)
    
with open('../input/mydata/distanceList.json') as f:
    distanceList = json.load(f)
    
with open('../input/mydata/medianDistances.json') as f:
    medianDistances = json.load(f)
    
with open('../input/mydata/modelParameters.json') as f:
    params = json.load(f)
    
def getDefensiveRanks(parEst, betaSd, rankInterval, component, defIds, defenderGroups, colHeaders, group=0, tp='eff', medianDist=None, distsList=None, minFaced=10, K=5, filename=None):
    '''
    A function to calculate the player ranks for a given cluster in a given basis
    '''
    groupIdx = [j for j in range(len(defenderGroups)) if defenderGroups[j] == group]
    if tp == 'sel':
        colNames = colHeaders['wInterceptSel'] + colHeaders['wSelection'] + colHeaders['xInterceptSel'] + colHeaders['xSelection'] 
    else:
        colNames = colHeaders['wInterceptEff'] + colHeaders['wEfficiency'] + colHeaders['xInterceptEff'] + colHeaders['xSlopeEff'] + colHeaders['xEfficiency'] 
        
    indices = [j for j in range(len(colNames)) if colNames[j].startswith('beta') and colNames[j].split('+')[-1] == str(component)]

    pars = [parEst[j] for j in indices]
    if medianDist:
        medianDist = [medianDist[j] for j in [j for j in range(len(colHeaders['xDistance'])) if colHeaders['xDistance'][j].startswith('beta') and colHeaders['xDistance'][j].split('+')[-1] == str(component)]]
    if distsList:
        numFaced = [len(distsList[j]) for j in [j for j in range(len(colHeaders['xDistance'])) if colHeaders['xDistance'][j].startswith('beta') and colHeaders['xDistance'][j].split('+')[-1] == str(component)]]
    
    intIndices = [j for j in range(len(colNames)) if colNames[j].startswith('int-b') and colNames[j].split('+')[1] == str(component)]
    
    parsInt = [parEst[j] for j in intIndices] #Get the intercept parameters for the given basis
    parsInt = [parsInt[defenderGroups[j]] for j in range(len(defIds))] #list of the intercept parameters based on cluster
    
    #There's only 1 of these per basis, and only occurs with efficiency
    gamma = [parEst[j] for j in range(len(colNames)) if colNames[j].startswith('gamma') and colNames[j].split('+')[-1] == str(component)]
    
    effect = [pars[j] + parsInt[j] for j in range(len(pars))]
    
    if tp in ['eff', 'effVB']:
        effect = [effect[j] + gamma[0]*medianDist[j] for j in range(len(effect))]

    sdInd = [j for j in range(len(colHeaders['betaSdSel'])) if colHeaders['betaSdSel'][j].startswith('beta') and colHeaders['betaSdSel'][j].split('+')[-1] == str(component)]
    sdEffect = [betaSd[j] for j in sdInd]
    
    rkInd = [j for j in range(len(colHeaders['bRanksStrSel'])) if colHeaders['bRanksStrSel'][j].startswith('beta') and colHeaders['bRanksStrSel'][j].split('+')[-1] == str(component)]
    rankInterval = [rankInterval[j] for j in rkInd]
    
    tableIndices = [j for j in range(len(defIds)) if defenderGroups[j] == group and numFaced[j] >= minFaced]
    
    return sorted([[defIds[j], effect[j], sdEffect[j], rankInterval[j], medianDist[j]] for j in tableIndices], key=lambda a: a[1])
    
NDISPLAY = 15
nClusters = 3
allTheRanks = []
mPost = params['frequency']['mPost']
alphaSd = params['frequency']['alphaSd']
betaSd = params['frequency']['betaSd']
aRanksStr = ['[%s]' % j for j in params['frequency']['aRanksStr']]
bRanksStr = ['[%s]' % j for j in params['frequency']['bRanksStr']]
colNames = columnHeaders['wInterceptSel'] + columnHeaders['wSelection'] + columnHeaders['xInterceptSel'] + columnHeaders['xSelection'] 
averages = [mPost[j] for j in [j for j in range(len(colNames)) if colNames[j].startswith('int-b')]]
aRanksStr = [j.replace('.25', '').replace('.75', '') for j in aRanksStr]
bRanksStr = [j.replace('.25', '').replace('.75', '') for j in bRanksStr]

for component in range(K):
    print('Basis:', component+1)
    for g in range(nClusters):
        print('\tCluster:', g)
        ranks = getDefensiveRanks(mPost, betaSd, bRanksStr, component, defenceIds,
                                  clusterLabels, columnHeaders, group=g, tp='sel',
                                  medianDist=medianDistances, distsList=distanceList, minFaced=10)
        allTheRanks.append(ranks)
        if len(ranks):
            print('\t\t%2s\t%30s\t%8s\t%8s' % ('Rk', 'Name', 'Effect', 'Std. Dev'))
        for j in range(min(NDISPLAY, len(ranks))):
            print('\t\t%2d\t%30s\t%8.4f\t%8.4f' % (j+1, ranks[j][0], ranks[j][1], ranks[j][2]))
        if len(ranks):
            print('\t\t.\n\t\tAverage: %8.4f' % averages[component*nClusters + g])
        if len(ranks) > NDISPLAY:
            print('\t\t.')
            q = min(NDISPLAY, len(ranks)-NDISPLAY)
            for j in range(q):
                print('\t\t%2d\t%30s\t%8.4f\t%8.4f' % (len(ranks)+j-q+1, ranks[j-q][0], ranks[j-q][1], ranks[j-q][2]))
        print('')

In many cases these results correspond nicely with perception of defenders. For example, when looking at Bases 3 and 4, many prominent cornerbacks such as Patrick Peterson, Stephon Gilmore, Tre'Davious White, Jalen Ramsey, and Xavien Howard reduce the frequency of passes in these clusters. When looking at linebackers, we see that Matt Milano is one of the best linebackers at denying pass frequency in basis 2, but that Bobby Wagner faces passes more frequently than average. Likewise, Eddie Jackson is near the bottom for safeties reducing pass frequency in basis 5. While these results seem counterintuitive, it is worth keeping in mind that reduction in pass frequency is only way to influence a game.

<div id="42eff"></div>

## 4.2 Pass Efficiency

The second quality to consider in defenders is their impact on pass efficiency. Here, we consider only plays in which a pass was thrown to a target, and let $Y_n$ be the binomial value that is $1$ if the pass was completed in play $n$, and $0$ otherwise. Accordingly, $Y_n$ is modelled by $$\mathbb{P}\left[Y_n \vert \mathcal{S}_n(i, b) = 1, j, \mathcal{D}_n, \alpha_{i,b}, \beta_{j,b}, \gamma_{b} \right] = \dfrac{\exp{\left(\alpha_{i,b} + \beta_{j,b} + \gamma_{b}\mathcal{D}_n\right)}}{1 + \exp{\left(\alpha_{i,b} + \beta_{j,b} + \gamma_{b}\mathcal{D}_n\right)}}.$$ Similar to before, $\alpha_{i,b}$ is the impact of the targetted receiver on pass efficiency in basis $b$, and $\beta_{j,b}$ is the impact of the defender on pass efficiency in basis $b$. By defining $\mathcal{D}_n$ to be the distance between the defender and target at the time the pass was thrown, and $\gamma_b$ to be the influence that the distance between the target and defender has on a pass being completed in basis $b$, the $\gamma_{b}\mathcal{D}_n$ term gives insights into the importance of the distance between the target and defender. To infer the variables for pass efficiency, PyStan was used, and the $\hat{R}$ measure for each variable was approximately $1$.

In [None]:
mPost = params['efficiencyStan']['mPost']
alphaSd = params['efficiencyStan']['alphaSD']
betaSd = params['efficiencyStan']['betaSD']
aRanksStr = ['[%s]' % j for j in params['efficiencyStan']['aRanksStr']]
bRanksStr = ['[%s]' % j for j in params['efficiencyStan']['bRanksStr']]
colNames = columnHeaders['wInterceptEff'] + columnHeaders['wEfficiency'] + columnHeaders['xInterceptEff'] + columnHeaders['xSlopeEff'] + columnHeaders['xEfficiency'] 
betaInds = [j for j in range(len(colNames)) if colNames[j].startswith('beta')]
intercepts = [mPost[j] for j in [j for j in range(len(colNames)) if colNames[j].startswith('int-b')]]
slopes = [mPost[j] for j in [j for j in range(len(colNames)) if colNames[j].startswith('gamma')]]
medians = [[medianDistances[j] for j in range(len(betaInds)) if colNames[betaInds[j]].split('+')[-1] == str(component)] for component in range(K)]
averages = []
for component in range(K):
    for g in range(nClusters):
        med = np.median([medians[component][j] for j in range(len(medians[component])) if clusterLabels[j] == g])
        averages.append(intercepts[component*nClusters + g] + slopes[component]*med)
        
aRanksStr = [j.replace('.25', '').replace('.75', '') for j in aRanksStr]
bRanksStr = [j.replace('.25', '').replace('.75', '') for j in bRanksStr]

for component in range(K):
    print('Basis:', component+1)
    for g in range(nClusters):
        print('\tCluster:', g)
        ranks = getDefensiveRanks(mPost, betaSd, bRanksStr, component, defenceIds,
                                  clusterLabels, columnHeaders, group=g, tp='eff',
                                  medianDist=medianDistances, distsList=distanceList, minFaced=10)
        allTheRanks.append(ranks)
        if len(ranks):
            print('\t\t%2s\t%30s\t%8s\t%8s' % ('Rk', 'Name', 'Effect', 'Std. Dev'))
        for j in range(min(NDISPLAY, len(ranks))):
            print('\t\t%2d\t%30s\t%8.4f\t%8.4f' % (j+1, ranks[j][0], ranks[j][1], ranks[j][2]))
        if len(ranks):
            print('\t\t.\n\t\tAverage: %8.4f' % averages[component*nClusters + g])
        if len(ranks) > NDISPLAY:
            print('\t\t.')
            q = min(NDISPLAY, len(ranks)-NDISPLAY)
            for j in range(q):
                print('\t\t%2d\t%30s\t%8.4f\t%8.4f' % (len(ranks)+j-q+1, ranks[j-q][0], ranks[j-q][1], ranks[j-q][2]))
        print('')
        


These results show that Bobby Wagner is the linebacker with the greatest impact on pass completions in Basis 2, with Matt Milano coming in just behind him. Conversely, Jamie Collins limits the number of passes thrown his way in Basis 2, but is near the bottom when defending the target of a pass. For cornerbacks, we see that location and performance are related. For example, Holton Hill and Xavien are among the best at denying completions in Basis 3, but not in Basis 4, while Denzel Ward, Stephon Gilmore, and Jalen Ramsey are among the most impactful cornerbacks in Basis 4. Finally, Chicago appears to have been very good at limiting completion efficiency on passes over the middle (Basis 5) during the 2018 season, because of the safety duo of Eddie Jackson (#1) and Adrian Amos (#10).

<div id="5conclusion"></div>

# 5. Conclusions

This body of statistical analysis gives notable insights into the impact defenders have on the frequency and efficiency of passes in various regions of a football field. This validates the perception about defenders such as Milano, Wagner, Jackson, Gilmore, Ramsey, and White who are viewed as elite coverage players as truly elite talents in pass defence. Of note, this analysis focuses exclusively on players who are providing primary coverage against a receiver, but secondary coverage has no impact on the pass efficiency evaluation. Future research on this topic might consider the impact of a secondary defender upon pass efficiency, which would likely give valuable insights into the coverage abilities of safeties, in particular. Additionally, this analysis did not make a distinction between man and zone coverage. It may be profitable to identify the effect of defenders on pass frequency and efficiency from both man and zone coverage schemes.