### This notebook aims to employ player tracking data to tackle two main questions:
- Which cornerbacks are the best at closely tracking receivers as they try to get open before the ball is thrown? 
- Which cornerbacks are the best at closing on receivers when the ball is in the air?

In order to answer these questions, we developed a simple score for evaluating cornerbacks' performance on each of the following events during a play: 
- from 'ball snap' to 'pass forward' (before the ball is thrown)
- from 'pass forward' to the next non-None event in the play (ball is in the air). 

This score consists of two components: personal ability and coverage ability, where each component is measured by simple and related metrics defined in terms of player tracking data on different plays.

**Personal ability** is used to measure how good a cornerback is, which is composed of three related metrics as follows:
- agility is used to measure the ability to change direction of a cornerback during a play, which is defined by the mean of the maximum direction changing of a cornerback on consecutive timestamps during different plays
- speed is used to measure the fastest speed the cornerback can run, which is defined by the mean of the maximum speed of a cornerback during different plays
- acceleration is used to measure a rate at which a cornerback will reach his fastest speed, which is defined by the mean of the maximum acceleration changing of a cornerback on consecutive timestamps during different plays

The personal ability is computed by weighted sum of these three metrics, whose weights are assigned subjectively with respect to importance and the nature of the event being considered

**Coverage ability** is used to measure how good a cornerback can cover the closest offensive player during a play, which is composed of six related metrics as follows:
- distance tracking is used to measure the ability to get close to his responsible offensive player, which is defined by the mean of the mean distance between a cornerback and the closest offensive player during different plays
- speed tracking is used to measure the ability to match his speed toward the speed of his responsible offensive player, which is defined by the mean of the mean different speed between a cornerback and the closest offensive player during different plays
- acceleration tracking is used to measure the ability to match his accerelation toward the acceleration of his responsible offensive player, which is defined by the mean of the mean different acceleration between a cornerback and the closest offensive player during different plays
- orientation tracking is used to measure the ability to match his orientation toward the orientation of his responsible offensive player, which is defined by the mean of the mean different orientation between a cornerback and the closest offensive player during different plays
- direction tracking is used to measure the ability to match his direction toward the direction of his responsible offensive player, which is defined by the mean of the mean different direction between a cornerback and the closest offensive player during different plays
- persuit is used to measure the ability of a cornerbak in closing in his responsible offensive player, which is defined by the closest distance and the time used to reach the closest distance between a cornerback and the closest offensive player during different plays

Note that we consider to use only plays that the mean distance between a cornerback and the closest offensive player is smaller than a given threshold to compute these six metrics as this condition indicates that at least one offensive player is covered by the cornerback

The coverage ability is computed by weighted sum of these six metrics, whose weights are assigned subjectively with respect to importance and the nature of the event being considered

The score used to evaluate cornerback's performance is computed by weighted sum of personal and coverage abilities, whose weights are assigned with respect to importance and the nature of the event being considered

Here, we hypothesize that some cornerbacks may perform better than others in closely tracking recievers before the ball is thrown. Some may perform better than others during the ball is in the air. By combining the scores computed from both events, we may be able to pinpoint the best cornerbacks in the game.

We present the usefulness of the devloped score through experiments on a small set of examples containing a few best cornerbacks in 2018 and a larger set of examples containing first 40 cornerbacks selected from the 'players.csv' 

In [None]:
import numpy as np
import pandas as pd
import random
from scipy.spatial import distance
import matplotlib.pyplot as plt
import pickle
from prettytable import PrettyTable
import operator

### Functions used to collect tracking data from a set of plays for each cornerback 

For each cornerback, we selected a set of 100 plays as a representation of all plays for the cornerback.

Then the corresponding tracking data of each selected play is collected from the corresponding week.

The collected tracking data is filtered according to the event being considered, where 0 covers timestamps from ball snap to pass forward events indicating tracking data before the ball is thrown and 1 covers timestamps from pass forward to non-None events indicating tracking data during the ball is in the air.

The event filtered data is seperated based on the position of players (defense: CB, offense: WR and TE)

In [None]:
def get_cb_players(n):
    cb_players_info = []
    players = pd.read_csv('../input/nfl-big-data-bowl-2021/players.csv')
    all_cb_playerids = list(players.loc[players.position == 'CB'].nflId.values)
    #print(len(all_cb_playerids))
    #sample_cb_playerids = random.sample(all_cb_playerids, n)
    for cb_pid in all_cb_playerids[:n]:
        cb_players_info.append([float(cb_pid), players.loc[players.nflId == cb_pid].displayName.values[0]])
    return cb_players_info

def random_select_cb_plays(cb_id, n_wk, n):
    #in order to save time, we merely collect all plays involving a given cb from n_wk weeks (out of 17 weeks)
    all_cb_plays = []
    for i in range(1, n_wk + 1):
        wk_tracking_df = pd.read_csv('../input/nfl-big-data-bowl-2021/week%i.csv' %i)
        cb_wk_tracking_df = wk_tracking_df.loc[wk_tracking_df.nflId == cb_id]
        for index, row in cb_wk_tracking_df.iterrows():
            all_cb_plays.append([i, row['gameId'], row['playId']])
    #randomly select n plays
    #print(len(all_cb_plays))
    if n > len(all_cb_plays):
        return all_cb_plays
    else:
        cb_sample_plays = random.sample(all_cb_plays, n)
        return cb_sample_plays

def collect_tracking_data_in_sel_play(sel_play):
    wk_tracking_df = pd.read_csv('../input/nfl-big-data-bowl-2021/week%i.csv' %sel_play[0])
    play_tracking_data = wk_tracking_df.loc[(wk_tracking_df.gameId == sel_play[1]) & (wk_tracking_df.playId == sel_play[2])]
    return play_tracking_data


def get_events_in_play(play_tracking_data):
    events_in_play = play_tracking_data.loc[(play_tracking_data.event != 'None')]
    return list(events_in_play.event.unique())

def get_all_timestamps_in_play(play_tracking_data):
    timestamps = list(play_tracking_data.time.unique())
    tes = []
    for ts in timestamps:
        e = play_tracking_data.loc[play_tracking_data.time == ts].event.values[0]
        tes.append([ts, e])
    return tes

def collect_timestamps_btw_events(tes, e1, e2):
    timestamps_btw_events = []
    flag = False
    for i in range(len(tes)):
        if tes[i][1] == e1:
            flag = True
            timestamps_btw_events.append(tes[i][0])
        elif tes[i][1] == e2:
            timestamps_btw_events.append(tes[i][0])
            break
        elif flag:
            timestamps_btw_events.append(tes[i][0])
        else:
            continue
    return timestamps_btw_events

# event_type:
# 0: ball_snap-->pass_forward (before passing)
# 1: pass_forward-->next-non-None-event (ball in the air)
def collect_tracking_data_btw_events_in_play(play_tracking_data, event_type):
    events = get_events_in_play(play_tracking_data)
    tes = get_all_timestamps_in_play(play_tracking_data)
    timestamps_btw_event = collect_timestamps_btw_events(tes, events[event_type], events[event_type + 1])
    play_tracking_data_btw_events = play_tracking_data.loc[play_tracking_data.time.isin(timestamps_btw_event)]
    return play_tracking_data_btw_events

def select_play_tracking_data_btw_events_by_positions(play_tracking_data_btw_events, positions):
    sel_play_tracking_btw_events = play_tracking_data_btw_events.loc[play_tracking_data_btw_events.position.isin(positions)]
    return sel_play_tracking_btw_events

def get_players_tracking_data_btw_events(sel_play_tracking_data_btw_events):
    unique_player_ids = list(sel_play_tracking_data_btw_events.nflId.unique())
    unique_ts = list(sel_play_tracking_data_btw_events.time.unique())
    players_tracking_data_btw_events = {}
    for pid in unique_player_ids:
        dat = {'x':[], 'y':[], 's':[], 'a':[], 'dis':[], 'o':[], 'dir':[]}
        for ts in unique_ts:
            dat['x'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                 (sel_play_tracking_data_btw_events.nflId == pid)].x.values[0])
            dat['y'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                 (sel_play_tracking_data_btw_events.nflId == pid)].y.values[0])
            dat['s'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                 (sel_play_tracking_data_btw_events.nflId == pid)].s.values[0])
            dat['a'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                 (sel_play_tracking_data_btw_events.nflId == pid)].a.values[0])
            dat['dis'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                   (sel_play_tracking_data_btw_events.nflId == pid)].dis.values[0])
            dat['o'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                 (sel_play_tracking_data_btw_events.nflId == pid)].o.values[0])
            dat['dir'].append(sel_play_tracking_data_btw_events.loc[(sel_play_tracking_data_btw_events.time == ts) & 
                                                                   (sel_play_tracking_data_btw_events.nflId == pid)].dir.values[0])
        players_tracking_data_btw_events[pid] = dat
    return players_tracking_data_btw_events

### Functions used to identify a closest offensive player and another cornerback for each timestamp during a play of a given cornerback

In [None]:
def distances_btw_players_btw_events(x1_btw_events, y1_btw_events, x2_btw_events, y2_btw_events):
    dists_btw_events = []
    for i in range(len(x1_btw_events)):
        dists_btw_events.append(distance.euclidean([y1_btw_events[i], x1_btw_events[i]], [y2_btw_events[i], x2_btw_events[i]]))
    return dists_btw_events

#identify the closest offensive player at each timestamp for a given cb player
def identify_cb_cls_offense(db_tracking_data_btw_events, off_tracking_data_btw_events, cb_id):
    cls_offenseids = []
    cls_offensedists = []
    cb_xs = db_tracking_data_btw_events[cb_id]['x']
    cb_ys = db_tracking_data_btw_events[cb_id]['y']
    off_pids = list(off_tracking_data_btw_events.keys())
    all_off_dists = []
    for o_pid in off_pids:
        o_xs = off_tracking_data_btw_events[o_pid]['x']
        o_ys = off_tracking_data_btw_events[o_pid]['y']
        all_off_dists.append(distances_btw_players_btw_events(cb_xs, cb_ys, o_xs, o_ys))
    for i in range(len(cb_xs)):
        off_dists_at_i = []
        for j in range(len(off_pids)):
            off_dists_at_i.append(all_off_dists[j][i])
        cls_offensedists.append(min(off_dists_at_i))
        cls_offenseids.append(off_pids[np.argmin(off_dists_at_i)])
    return {'cls_ids': cls_offenseids, 'cls_dists': cls_offensedists}

def identify_cb_cls_defense(db_tracking_data_btw_events, cb_id):
    cls_defenseids = []
    cls_defensedists = []
    cb_xs = db_tracking_data_btw_events[cb_id]['x']
    cb_ys = db_tracking_data_btw_events[cb_id]['y']
    db_pids = list(db_tracking_data_btw_events.keys())
    db_pids.remove(cb_id)
    all_db_dists = []
    for db_pid in db_pids:
        db_xs = db_tracking_data_btw_events[db_pid]['x']
        db_ys = db_tracking_data_btw_events[db_pid]['y']
        all_db_dists.append(distances_btw_players_btw_events(cb_xs, cb_ys, db_xs, db_ys))
    for i in range(len(db_xs)):
        db_dists_at_i = []
        for j in range(len(db_pids)):
            db_dists_at_i.append(all_db_dists[j][i])
        cls_defensedists.append(min(db_dists_at_i))
        cls_defenseids.append(db_pids[np.argmin(db_dists_at_i)])
    return {'cls_ids': cls_defenseids, 'cls_dists': cls_defensedists}

def collect_stat_of_cls_player(players_tracking_data, cb_cls_player_in_play, stat):
    stat_dat = []
    for i in range(len(cb_cls_player_in_play['cls_ids'])):
        cls_id = cb_cls_player_in_play['cls_ids'][i]
        stat_dat.append(players_tracking_data[cls_id][stat][i])
    return stat_dat

### Functions used to collect tracking data corresponding to a given cornerback and his closest offensive player during the specified event within each selected play

In [None]:
def collect_cb_tracking_data_of_sel_plays(cb_id, n_wk, n, event_type):
    dat = {'xs':[], 'ys':[], 'ss':[], 'as':[], 'diss':[], 'os':[], 'dirs':[], 'o_cls_xs':[], 'o_cls_ys':[], 'o_cls_ss':[], 'o_cls_as':[], 'o_cls_diss':[],
          'o_cls_os':[], 'o_cls_dirs':[], 'o_cls_dists':[]}
           #'d_cls_xs':[], 'd_cls_ys':[], 'd_cls_ss':[], 'd_cls_as':[], 'd_cls_diss':[], 'd_cls_os':[],'d_cls_dirs':[], 'd_cls_dists':[]}
    #get sample plays
    cb_sample_plays = random_select_cb_plays(cb_id, n_wk, n)
    for play in cb_sample_plays:
        #print('%s %d' %(play[0], play[1]))
        try:
            #collect tracking data in a selected play
            play_tracking_data = collect_tracking_data_in_sel_play(play)
            #collect tracking data between event
            play_tracking_data_btw_events = collect_tracking_data_btw_events_in_play(play_tracking_data, event_type)
        
            db_play_tracking_btw_events = select_play_tracking_data_btw_events_by_positions(play_tracking_data_btw_events, ['CB'])
            off_play_tracking_btw_events = select_play_tracking_data_btw_events_by_positions(play_tracking_data_btw_events, ['WR', 'TE'])
            db_players_tracking_data = get_players_tracking_data_btw_events(db_play_tracking_btw_events)
            #print(db_players_tracking_data.keys())
            dat['xs'].append(db_players_tracking_data[cb_id]['x'])
            dat['ys'].append(db_players_tracking_data[cb_id]['y'])
            dat['ss'].append(db_players_tracking_data[cb_id]['s'])
            dat['as'].append(db_players_tracking_data[cb_id]['a'])
            dat['diss'].append(db_players_tracking_data[cb_id]['dis'])
            dat['os'].append(db_players_tracking_data[cb_id]['o'])
            dat['dirs'].append(db_players_tracking_data[cb_id]['dir'])
        
            off_players_tracking_data = get_players_tracking_data_btw_events(off_play_tracking_btw_events)
            cb_cls_offense_in_play = identify_cb_cls_offense(db_players_tracking_data, off_players_tracking_data, cb_id)
            dat['o_cls_dists'].append(cb_cls_offense_in_play['cls_dists'])
            dat['o_cls_xs'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'x'))
            dat['o_cls_ys'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'y'))
            dat['o_cls_ss'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 's'))
            dat['o_cls_as'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'a'))
            dat['o_cls_diss'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'dis'))
            dat['o_cls_os'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'o'))
            dat['o_cls_dirs'].append(collect_stat_of_cls_player(off_players_tracking_data, cb_cls_offense_in_play, 'dir'))
        except:
            continue
    return dat

### Functions used to compute metrics for evaluating cornerback's performance

In [None]:
#higher is better
def max_agility(cb_dat):
    max_agi = []
    plays_dirs = cb_dat['dirs']
    for dir_p in plays_dirs:
        try:
            diff_dirs = []
            for i in range(1, len(dir_p)):
                diff_dirs.append(abs(dir_p[i-1] - dir_p[i]))
            max_agi.append(max(diff_dirs))
        except:
            continue
    return np.mean(max_agi)

#higher is better
def max_speed(cb_dat):
    max_spd = []
    plays_ss = cb_dat['ss']
    for s_p in plays_ss:
        max_spd.append(max(s_p))
    return np.mean(max_spd)

#higher is better
def max_acceleration(cb_dat):
    max_acc = []
    plays_as = cb_dat['as']
    for a_p in plays_as:
        try:
            diff_as = []
            for i in range(1, len(a_p)):
                diff_as.append(a_p[i-1] - a_p[i])
            max_acc.append(max(diff_as))
        except:
            continue
    return np.mean(max_acc)

#lower is better
def mean_cls_dist(cb_dat, threshold):
    mean_dist = []
    plays_cls_dists = cb_dat['o_cls_dists']
    for cls_dist_p in plays_cls_dists:
        m_cls_dist_p = np.mean(cls_dist_p)
        if m_cls_dist_p < threshold:
            mean_dist.append(m_cls_dist_p)
    return np.mean(mean_dist)

#lower is better
def matching_cls_speed(cb_dat, threshold):
    mean_diff_ss = []
    plays_cls_dists = cb_dat['o_cls_dists']
    plays_ss = cb_dat['ss']
    plays_cls_ss = cb_dat['o_cls_ss']
    for i in range(len(plays_cls_dists)):
        m_cls_dist_p = np.mean(plays_cls_dists[i])
        if m_cls_dist_p < threshold:
            try:
                diff_ss = []
                s_p = plays_ss[i]
                cls_s_p = plays_cls_ss[i]
                for j in range(len(s_p)):
                    diff_ss.append(abs(s_p[j] - cls_s_p[j]))
                mean_diff_ss.append(np.mean(diff_ss))
            except:
                continue
    return np.mean(mean_diff_ss)

#lower is better
def matching_cls_acceleration(cb_dat, threshold):
    mean_diff_as = []
    plays_cls_dists = cb_dat['o_cls_dists']
    plays_as = cb_dat['as']
    plays_cls_as = cb_dat['o_cls_as']
    for i in range(len(plays_cls_dists)):
        m_cls_dist_p = np.mean(plays_cls_dists[i])
        if m_cls_dist_p < threshold:
            try:
                diff_as = []
                a_p = plays_as[i]
                cls_a_p = plays_cls_as[i]
                for j in range(len(a_p)):
                    diff_as.append(abs(a_p[j] - cls_a_p[j]))
                mean_diff_as.append(np.mean(diff_as))
            except:
                continue
    return np.mean(mean_diff_as)

#lower is better
def matching_cls_orientation(cb_dat, threshold):
    mean_diff_os = []
    plays_cls_dists = cb_dat['o_cls_dists']
    plays_os = cb_dat['os']
    plays_cls_os = cb_dat['o_cls_os']
    for i in range(len(plays_cls_dists)):
        m_cls_dist_p = np.mean(plays_cls_dists[i])
        if m_cls_dist_p < threshold:
            try:
                diff_os = []
                o_p = plays_os[i]
                cls_o_p = plays_cls_os[i]
                for j in range(len(o_p)):
                    diff_os.append(abs(o_p[j] - cls_o_p[j]))
                mean_diff_os.append(np.mean(diff_os))
            except:
                continue
    return np.mean(mean_diff_os)

#lower is better
def matching_cls_direction(cb_dat, threshold):
    mean_diff_dirs = []
    plays_cls_dists = cb_dat['o_cls_dists']
    plays_dirs = cb_dat['dirs']
    plays_cls_dirs = cb_dat['o_cls_dirs']
    for i in range(len(plays_cls_dists)):
        m_cls_dist_p = np.mean(plays_cls_dists[i])
        if m_cls_dist_p < threshold:
            try:
                diff_dirs = []
                dir_p = plays_dirs[i]
                cls_dir_p = plays_cls_dirs[i]
                for j in range(len(dir_p)):
                    diff_dirs.append(abs(dir_p[j] - cls_dir_p[j]))
                mean_diff_dirs.append(np.mean(diff_dirs))
            except:
                continue
    return np.mean(mean_diff_dirs)

#lower is better
def pursuit(cb_dat, threshold):
    min_dists = []
    plays_cls_dists = cb_dat['o_cls_dists']
    for cls_dist_p in plays_cls_dists:
        m_cls_dist_p = np.mean(cls_dist_p)
        if m_cls_dist_p < threshold:
            min_dist = min(cls_dist_p)
            ts = np.argmin(cls_dist_p)
            if ts == 0:
                continue
            min_dists.append(min_dist * ts)
    return np.mean(min_dists)

### Weighted sum of the above metrics to measure personal and coverage abilities of a cornerback and weighted sum of both abilities to produce the overall score

In [None]:
def compute_cb_metrics(cb_ids, cb_dats, t):
    cbs_metrics = []
    for cb_id in cb_ids:
        cb_m = []
        #personal metrics
        cb_m.append(max_agility(cb_dats[cb_id]))
        cb_m.append(max_speed(cb_dats[cb_id]))
        cb_m.append(max_acceleration(cb_dats[cb_id]))
        
        #coverage metrics
        cb_m.append(mean_cls_dist(cb_dats[cb_id], t))
        cb_m.append(matching_cls_speed(cb_dats[cb_id], t))
        cb_m.append(matching_cls_acceleration(cb_dats[cb_id], t))
        cb_m.append(matching_cls_orientation(cb_dats[cb_id], t))
        cb_m.append(matching_cls_direction(cb_dats[cb_id], t))
        cb_m.append(pursuit(cb_dats[cb_id], t))
        
        cbs_metrics.append(cb_m)
    
    #normalize data for each column
    cbs_metrics = np.array(cbs_metrics)
    normed_cbs_metrics = cbs_metrics/cbs_metrics.max(axis=0)
    return normed_cbs_metrics

def cb_personal_score(normed_cb_metrics, w):
    cb_personal_metrics = normed_cb_metrics[:3]
    score = 0.0
    for i in range(len(cb_personal_metrics)):
        score += (w[i] * cb_personal_metrics[i])
    return score

def cb_coverage_score(normed_cb_metrics, w):
    cb_coverage_metrics = normed_cb_metrics[3:]
    score = 0.0
    for i in range(len(cb_coverage_metrics)):
        score += (w[i] * (1.0 - cb_coverage_metrics[i]))
    return score

def cb_overall_score(personal_score, coverage_score, w):
    score = (w[0] * personal_score) + (w[1] + coverage_score)
    return score

### Functions used to visualize experimental results, which include comparison of cornerbacks' different metrics on both events, scatter plot illustrating relationship between personal and coverage abilities, etc.

In [None]:
def compare_personal_metrics(cb_names, cbs_metrics_bfr_pass, cbs_metrics_in_air):
    fig, axs = plt.subplots(1, 3, figsize=(20,7))
    fig.suptitle('Compare personal metrics (higher is better)')
    X = np.arange(len(cb_names))
    width = 0.35
    axs[0].bar(X-width/2, list(cbs_metrics_bfr_pass[:,0]), width, label='bfr pass')
    axs[0].bar(X+width/2, list(cbs_metrics_in_air[:, 0]), width, label='in air')
    axs[0].set_title('Agility')
    axs[0].set_xticks(X)
    axs[0].set_xticklabels(cb_names, rotation = 45)
    axs[0].legend()
    axs[1].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 1]), width, label='bfr pass')
    axs[1].bar(X+width/2, list(cbs_metrics_in_air[:, 1]), width, label='in air')
    axs[1].set_title('Speed')
    axs[1].set_xticks(X)
    axs[1].set_xticklabels(cb_names, rotation = 45)
    axs[1].legend()
    axs[2].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 2]), width, label='bfr pass')
    axs[2].bar(X+width/2, list(cbs_metrics_in_air[:, 2]), width, label='in air')
    axs[2].set_title('Acceleration')
    axs[2].set_xticks(X)
    axs[2].set_xticklabels(cb_names, rotation = 45)
    axs[2].legend()
    plt.show()

def compare_coverage_metrics(cb_names, cbs_metrics_bfr_pass, cbs_metrics_in_air):
    fig, axs = plt.subplots(2, 3, sharex=True, figsize=(20, 14))
    fig.suptitle('Compare coverage metrics (lower is better)')
    X = np.arange(len(cb_names))
    width = 0.35
    axs[0, 0].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 3]), width, label='bfr pass')
    axs[0, 0].bar(X+width/2, list(cbs_metrics_in_air[:,3]), width, label='in air')
    axs[0, 0].set_title('Closest distance')
    axs[0, 0].set_xticks(X)
    axs[0, 0].set_xticklabels(cb_names, rotation = 45)
    axs[0, 0].legend()
    axs[0, 1].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 4]), width, label='bfr pass')
    axs[0, 1].bar(X+width/2, list(cbs_metrics_in_air[:,4]), width, label='in air')
    axs[0, 1].set_title('Matching speed')
    axs[0, 1].set_xticks(X)
    axs[0, 1].set_xticklabels(cb_names, rotation = 45)
    axs[0, 1].legend()
    axs[0, 2].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 5]), width, label='bfr pass')
    axs[0, 2].bar(X+width/2, list(cbs_metrics_in_air[:,5]), width, label='in air')
    axs[0, 2].set_title('Matching acceleration')
    axs[0, 2].set_xticks(X)
    axs[0, 2].set_xticklabels(cb_names, rotation = 45)
    axs[0, 2].legend()
    axs[1, 0].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 6]), width, label='bfr pass')
    axs[1, 0].bar(X+width/2, list(cbs_metrics_in_air[:,6]), width, label='in air')
    axs[1, 0].set_title('Matching orientation')
    axs[1, 0].set_xticks(X)
    axs[1, 0].set_xticklabels(cb_names, rotation = 45)
    axs[1, 0].legend()
    axs[1, 1].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 7]), width, label='bfr pass')
    axs[1, 1].bar(X+width/2, list(cbs_metrics_in_air[:,7]), width, label='in air')
    axs[1, 1].set_title('Matching direction')
    axs[1, 1].set_xticks(X)
    axs[1, 1].set_xticklabels(cb_names, rotation = 45)
    axs[1, 1].legend()
    axs[1, 2].bar(X-width/2, list(cbs_metrics_bfr_pass[:, 8]), width, label='bfr pass')
    axs[1, 2].bar(X+width/2, list(cbs_metrics_in_air[:,8]), width, label='in air')
    axs[1, 2].set_title('Pursuit')
    axs[1, 2].set_xticks(X)
    axs[1, 2].set_xticklabels(cb_names, rotation = 45)
    axs[1, 2].legend()
    
    plt.show()

def compare_abilities(cb_names, personal_scores_bfr_pass, personal_scores_in_air, coverage_scores_bfr_pass, coverage_scores_in_air):
    fig, axs = plt.subplots(1, 2, figsize=(20,10))
    fig.suptitle('Compare personal and coverage abilities of CB for different events')
    axs[0].scatter(personal_scores_bfr_pass, coverage_scores_bfr_pass)
    for j, txt in enumerate(cb_names):
        axs[0].annotate(txt, (personal_scores_bfr_pass[j], coverage_scores_bfr_pass[j]))
    axs[0].set_title('event: before the ball is thrown')
    axs[0].set_xlabel('Personal ability')
    axs[0].set_ylabel('Coverage ability')
    
    axs[0].set_xlim([0.0, 1.0])
    axs[0].set_ylim([0.0, 1.0])
    
    axs[1].scatter(personal_scores_in_air, coverage_scores_in_air, color='orange')
    for j, txt in enumerate(cb_names):
        axs[1].annotate(txt, (personal_scores_in_air[j], coverage_scores_in_air[j]))
    axs[1].set_title('event: ball is in the air')
    axs[1].set_xlabel('Personal ability')
    axs[1].set_ylabel('Coverage ability')
    
    
    axs[1].set_xlim([0.0, 1.0])
    axs[1].set_ylim([0.0, 1.0])
    
    plt.show()

### experiment with a small set of examples containing a few best cornerbacks during 2018 NFL season (here, we randomly select only 50 plays for each cornerback to save processing time) 

In [None]:
cb_players_info = get_cb_players(100)
#print(cb_players_info)
cb_ids = [cb_info[0] for cb_info in cb_players_info]
n_wk = 5
n = 50

example_cb_idx = [35, 42, 58, 59]
example_cbs = [cb_players_info[idx] for idx in example_cb_idx]
example_cb_ids = [example_cbs[i][0] for i in range(len(example_cbs))]
example_cb_names = [example_cbs[i][1] for i in range(len(example_cbs))]
print(example_cb_names)

collect tracking data of selected CB on both events

In [None]:
cb_dats_bfr_pass = {}
cb_dats_in_air = {}
for cb_id in example_cb_ids:
    print(cb_id)
    cb_dats_bfr_pass[cb_id] = collect_cb_tracking_data_of_sel_plays(cb_id, n_wk, n, 0)
    cb_dats_in_air[cb_id] = collect_cb_tracking_data_of_sel_plays(cb_id, n_wk, n, 1)

assign closest distance threshold and compute all metrics on both events for each CB

In [None]:
t = 10
normed_cbs_metrics_bfr_pass = compute_cb_metrics(example_cb_ids, cb_dats_bfr_pass, t)
normed_cbs_metrics_in_air = compute_cb_metrics(example_cb_ids, cb_dats_in_air, t)

plot comparison of all metrics correponding to the personal ability

In [None]:
compare_personal_metrics(example_cb_names, normed_cbs_metrics_bfr_pass, normed_cbs_metrics_in_air)

plot comparison of all metrics corresponding to the coverage ability

In [None]:
compare_coverage_metrics(example_cb_names, normed_cbs_metrics_bfr_pass, normed_cbs_metrics_in_air)

From these plots, we can see that Stephen Gilmore can keep the smallest distance to his responsible offensive player and also macth the direction fairly closely to the offensive player, which may indicate the awareness of this player. He can quickly close the gap to the offensive player (persuit) as well.

We can see that some cornerback may perform differently from event to event. These result points out various coverage schemes and strategies have been deployed by a cornerback during a particular event within a play.

Here, we compute personal ability and coverage abilty for the metrics of each cornerback.
For simplicity, we use the same set of weights for both events (i.e., before the ball is thrown and during the ball is in the air).
Note that a weigth of each metric is assigned subjectively; it can be changed to better indicate the importance of the corresponding metric in measuring the cornerback's ability.

In [None]:
pers_abi_w = [0.5, 0.25, 0.25]
cover_abi_w = [0.3, 0.15, 0.15, 0.0, 0.2, 0.2]

personal_scores_bfr_pass = []
coverage_scores_bfr_pass = []
personal_scores_in_air = []
coverage_scores_in_air = []
for i in range(len(example_cb_names)):
    personal_scores_bfr_pass.append(cb_personal_score(normed_cbs_metrics_bfr_pass[i], pers_abi_w))
    coverage_scores_bfr_pass.append(cb_coverage_score(normed_cbs_metrics_bfr_pass[i], cover_abi_w))
    personal_scores_in_air.append(cb_personal_score(normed_cbs_metrics_in_air[i], pers_abi_w))
    coverage_scores_in_air.append(cb_coverage_score(normed_cbs_metrics_in_air[i], cover_abi_w))
compare_abilities(example_cb_names, personal_scores_bfr_pass, personal_scores_in_air, coverage_scores_bfr_pass, coverage_scores_in_air)

Here. we compute score to evaluate performance for each cornerback.
For simplicity, we use the same set of weights for both events (i.e., before the ball is thrown and during the ball is in the air). 

In [None]:
abi_w = [0.3, 0.7]
scores_bfr_pass = []
scores_in_air = []
for i in range(len(example_cb_names)):
    score_bfr_pass = cb_overall_score(personal_scores_bfr_pass[i], coverage_scores_bfr_pass[i], abi_w)
    scores_bfr_pass.append(score_bfr_pass)
    score_in_air = cb_overall_score(personal_scores_in_air[i], coverage_scores_in_air[i], abi_w)
    scores_in_air.append(score_in_air)
    total_score = score_bfr_pass + score_in_air
    print('%s has scores: %.2f (before pass), %.2f (in the air), %.2f (total)' %(example_cb_names[i], score_bfr_pass, score_in_air, total_score))

We can see from the scores that Stephon Gilmore is the best one among these top example cornerbacks both when before the ball is thrown and when the ball is in the air (and total score, of course!). Xavien Howard comes second in terms of the total score as he is the second best cornerback when the ball is in the air. Patrick Peterson is the third best cornerback in total score. Jalen Ramsey is the fourth best among these top cornerbacks. Even though he is the second best cornerback before the ball is thrown, he is the last one when the ball is in the air.

Plot to show performance of top cornerbacks with respect to their scores corresponding to the particular events (before the ball is passed and during the ball is in the air)

In [None]:
plt.figure(figsize=(10, 10))
plt.scatter(scores_bfr_pass, scores_in_air)
for j, txt in enumerate(example_cb_names):
    plt.annotate(txt, (scores_bfr_pass[j], scores_in_air[j]))
plt.xlabel('Score before the ball is passed')
plt.ylabel('Score during the ball is in the air')
plt.show()

### In order to save time and resources, we have used tracking data of the first 40 cornerbacks in 'players.csv' and randomly selected 100 plays from the first 5 weeks (out of 17 weeks) that associate with each selected cornerback to compute the metrics and his score

That is we assume that these 100 sample plays are (good) representation of the whole population (all plays that a cornerback has played).

Note that different sets of sample plays could lead to totally different estimations of the metrics associated with a particular cornerback. To better capture the variance of the whole population, therefore we should use more samples to compute the metrics (the best case is using all plays), which should lead to more consistent estimation of each metric.

The collected tracking data of both events are stored separately using pickle, where cb_tracking_dat_0.p keeps tracking data before the ball is thrown and cb_tracking_dat_1.p keeps tracking data during the ball is in the air.  

In [None]:
cb_tracking_dat_0 = pickle.load(open("../input/cb-tracking-data/cb_tracking_dat_0.p", "rb"))
cb_tracking_dat_1 = pickle.load(open("../input/cb-tracking-data/cb_tracking_dat_1.p", "rb"))

In [None]:
first_40_cb_players_info = get_cb_players(40)
#print(cb_players_info)
first_40_cb_ids = [cb_info[0] for cb_info in first_40_cb_players_info]
first_40_cb_names = [cb_info[1] for cb_info in first_40_cb_players_info]
print(first_40_cb_ids)
print(first_40_cb_names)

assign closest distance threshold and compute all metrics on both events for each CB

In [None]:
t = 10
first_40_normed_cbs_metrics_bfr_pass = compute_cb_metrics(first_40_cb_ids, cb_tracking_dat_0, t)
first_40_normed_cbs_metrics_in_air = compute_cb_metrics(first_40_cb_ids, cb_tracking_dat_1, t)

Select 5 players, and plot comparison of all metrics correponding to the personal ability

In [None]:
rand_cbs_idx = [0, 2, 9, 17, 20]
rand_cb_ids = [first_40_cb_players_info[idx][0] for idx in rand_cbs_idx]
rand_cb_names = [first_40_cb_players_info[idx][1] for idx in rand_cbs_idx]
print(rand_cb_names)

In [None]:
rand_normed_cbs_metrics_bfr_pass = []
rand_normed_cbs_metrics_in_air = []
for idx in rand_cbs_idx:
    rand_normed_cbs_metrics_bfr_pass.append(list(first_40_normed_cbs_metrics_bfr_pass[idx]))
    rand_normed_cbs_metrics_in_air.append(list(first_40_normed_cbs_metrics_in_air[idx]))
compare_personal_metrics(rand_cb_names, np.array(rand_normed_cbs_metrics_bfr_pass), np.array(rand_normed_cbs_metrics_in_air))

plot comparison of all metrics corresponding to the coverage ability

In [None]:
compare_coverage_metrics(rand_cb_names, np.array(rand_normed_cbs_metrics_bfr_pass), np.array(rand_normed_cbs_metrics_in_air))

Interestingly, all metrics (except orientation) corresponding to the coverage ability of Damarious Randall significantly decrease from before the ball is thrown to the ball is in the air. Highly declining of these metrics probably indicate that this player usually change coverage schemes from man (before the ball is thrown) to zone (the ball is in the air).

Here, we compute personal ability and coverage abilty for the metrics of each cornerback. For simplicity, we use the same set of weights for both events (i.e., before the ball is thrown and during the ball is in the air). Note that a weigth of each metric is assigned subjectively; it can be changed to better indicate the importance of the corresponding metric in measuring the cornerback's ability.

In [None]:
f40_pers_abi_w = [0.5, 0.25, 0.25]
f40_cover_abi_w = [0.3, 0.15, 0.15, 0.0, 0.2, 0.2]

first_40_personal_scores_bfr_pass = []
first_40_coverage_scores_bfr_pass = []
first_40_personal_scores_in_air = []
first_40_coverage_scores_in_air = []
for i in range(len(first_40_cb_names)):
    first_40_personal_scores_bfr_pass.append(cb_personal_score(first_40_normed_cbs_metrics_bfr_pass[i], f40_pers_abi_w))
    first_40_coverage_scores_bfr_pass.append(cb_coverage_score(first_40_normed_cbs_metrics_bfr_pass[i], f40_cover_abi_w))
    first_40_personal_scores_in_air.append(cb_personal_score(first_40_normed_cbs_metrics_in_air[i], f40_pers_abi_w))
    first_40_coverage_scores_in_air.append(cb_coverage_score(first_40_normed_cbs_metrics_in_air[i], f40_cover_abi_w))
compare_abilities(first_40_cb_names, first_40_personal_scores_bfr_pass, first_40_personal_scores_in_air, 
                  first_40_coverage_scores_bfr_pass, first_40_coverage_scores_in_air)

From these plots, we can roughly classify cornerbacks into 4 types with respect to their abilities:
- cornerbacks who have high personal ability but have low coverage ability (e.g., Rasul Douglas (before pass), Ronald Darby (before pass), Damarious Randall (in the air), Leshaun Sims (in the air)); these players may be suitable for playing zone coverage
- cornerbacks who have low personal ability but have high coverage ability (e.g.,Malcome Butler, Marlon Humphrey, Robert Alford, Anthony Averett (in the air)); these players may have high awareness as they compensate their low personal ability with high experiences in playing man coverage
- cornerbacks who have high personal and coverage abilities (e.g., Anthony Averett (before pass), Artie Burns (in the air)); these players are good in playing man coverage
- cornerbacks who have low personal and coverage abilities (e.g., Xavien Howard, Kareem Jackson); these players may be mainly deployed to play zone coverage

From these plots, we can also determine a coverage scheme that is regularly deployed by a cornerback during a certain event within a play:
- Damarious Randall is a cornerback who regularly plays man coverage before the ball is thrown and then changes to zone coverage during the ball is in the air
- Rasul Douglas, Artie Burns, Ronald Darby, etc. are cornerbacks who regularly play zone coverage before the ball is thrown and then change to man coverge during the ball is in the air
- Kareem Jackson and Johnathan Joseph, etc. are cornerbacks who regularly play zone coverage during both events
- Robert Alford, Marlon Humphrey, Malcolm Butler, Anthony Averett, etc. are cornerbacks who regularly play man coverage during both events

Compute score for each event in order to rank cornerbacks who perform best during each event and both events

In [None]:
f40_abi_w = [0.3, 0.7]
bfr_pass_scores_dict = {}
in_air_scores_dict = {}
total_scores_dict = {}
for i in range(len(first_40_cb_names)):
    score_bfr_pass = cb_overall_score(first_40_personal_scores_bfr_pass[i], first_40_coverage_scores_bfr_pass[i], f40_abi_w)
    score_in_air = cb_overall_score(first_40_personal_scores_in_air[i], first_40_coverage_scores_in_air[i], f40_abi_w)
    bfr_pass_scores_dict[first_40_cb_names[i]] = score_bfr_pass
    in_air_scores_dict[first_40_cb_names[i]] = score_in_air
    total_scores_dict[first_40_cb_names[i]] = score_bfr_pass + score_in_air
#sorted 
sorted_bfr_pass_scores_dict = sorted(bfr_pass_scores_dict.items(), key=operator.itemgetter(1), reverse=True)
sorted_in_air_scores_dict = sorted(in_air_scores_dict.items(), key=operator.itemgetter(1), reverse=True)
sorted_total_scores_dict = sorted(total_scores_dict.items(), key=operator.itemgetter(1), reverse=True)

Top 10 cornerbacks (out of 40) before the ball is passed

In [None]:
t_bfr_pass = PrettyTable(['Name', 'Score'])
for i in range(10):
    t_bfr_pass.add_row([sorted_bfr_pass_scores_dict[i][0], sorted_bfr_pass_scores_dict[i][1]])
print(t_bfr_pass)

Top 10 cornerbacks (out of 40) during the ball is in the air

In [None]:
t_in_air = PrettyTable(['Name', 'Score'])
for i in range(10):
    t_in_air.add_row([sorted_in_air_scores_dict[i][0], sorted_in_air_scores_dict[i][1]])
print(t_in_air)

Plot to show performance of 40 cornerbacks with respect to their scores corresponding to the particular events (before the ball is passed and during the ball is in the air)

In [None]:
bfr_pass_scores_by_name = [bfr_pass_scores_dict[n] for n in first_40_cb_names]
in_air_scores_by_name = [in_air_scores_dict[n] for n in first_40_cb_names]

plt.figure(figsize=(10, 10))
plt.scatter(bfr_pass_scores_by_name, in_air_scores_by_name)
for j, txt in enumerate(first_40_cb_names):
    plt.annotate(txt, (bfr_pass_scores_by_name[j], in_air_scores_by_name[j]))
plt.xlabel('Score before the ball is passed')
plt.ylabel('Score during the ball is in the air')
plt.show()

Top 10 cornerbacks (out of 40) for both events

In [None]:
t_total = PrettyTable(['Name', 'Score'])
for i in range(10):
    t_total.add_row([sorted_total_scores_dict[i][0], sorted_total_scores_dict[i][1]])
print(t_total)

**As mentioned earlier, the metrics corresponding to a cornerback were computed from merely 100 sample plays, which still left a lot of variance for the whole population unexplained.**

**We believe that computing these metrics from a larger amount of samples should yield us a better score that reflect the true performance of a cornerback!!!**