<h3 style='background:green; border:0; color:white'><center>Quick Navigation<center><h3>

## TABLE OF CONTENTS
* [Introduction](#one)
* [Landing areas prediction](#two)
    - [Data Preparation](#a)
    - [Model Building](#b)
        - [Random Forest](#a)
        - [Neural Network](#b)
    - [Model Prediction](#c)
* [Ranking returners](#three)
* [Dynamic Voronoi](#four)
* [Conclusion and future work](#five)

American Football is the most popular sport in USA, and nearly 96.4 million viewers each year watching this great and excited event in the USA. There are lots of fans and data scientists researching on it. 
Special team is a rarily research topic in the past years. With more and more data available, we as an ordinary people can analyze it. 
Special teams have the responsibility of **creating points, as well as protecting points and field position**. Therefore, it is worth researching on it.
# **<font color=#ff0000 size=4 face="黑体"> “ You are what your record says you are.” </font>** 
>                             - Former NFL Head Coach Bill Parcells authored the well-known quote

In this notebook, we will explore where is the landing area of the football after punting or kicking. We built some machine learning models to tackle this problem, and then we focus on ranking the best returners and drawing the important pitch control feature Voronoi area. There are so many excellent codes, models and participants in this Kaggle competition. We hope that after reading this notebook you will know how to tackle these problems:
<h5>
    <font color=#000000 size=2>
1.     Where will the football land?<br>
2.     Predicting with Machine Learning and NN. <br>
3.     How to rank the Returners? <br>
4.     How to draw dynamic Voronoi area?
    </font>
</h5>

# Part I : Where will the football land?

The first part of this notebook provides coaches and others a brand new angle to predict the landing areas after punting or kicking(kickoff). We divided the whole pitch into 13 areas: A1,A2,B1,B2,C1,C2,D1,D2,E1,E2 and F(endzone,F1 and F2). You can have a view in the following codes and a picture.

In [None]:
#Reading Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option("display.max_columns",None)

df_games = pd.read_csv('../input/nfl-big-data-bowl-2022/games.csv')
df_plays = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')
df_players = pd.read_csv('../input/nfl-big-data-bowl-2022/players.csv')
df_pff = pd.read_csv('../input/nfl-big-data-bowl-2022/PFFScoutingData.csv')

def downcast(df, verbose=True):
    start_mem = df.memory_usage().sum() / 1024**2
    for col in df.columns:
        dtype_name = df[col].dtype.name
        if dtype_name == 'object':
            pass
        elif dtype_name == 'bool':
            df[col] = df[col].astype('int8')
        elif dtype_name.startswith('int') or (df[col].round() == df[col]).all():
            df[col] = pd.to_numeric(df[col], downcast='integer')
        else:
            df[col] = pd.to_numeric(df[col], downcast='float')
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose:
        print('{:.1f}% Compressed'.format(100 * (start_mem - end_mem) / start_mem))    
    return df

seasons = ["2018", "2019", "2020"]
df_tracking = pd.DataFrame()
for s in seasons:
    df_trackingTemp = pd.read_csv("../input/nfl-big-data-bowl-2022/tracking"+s+".csv")
    df_trackingTemp = downcast(df_trackingTemp,False)
    df_tracking = df_tracking.append(df_trackingTemp)    
df_tracking.reset_index(inplace=True)

# Landing areas definition
dict_Y = {'None':0, 'OutOfBound':1, 
                 'A1':2,
                 'B1':3,
                 'C1':4,
                 'D1':5,
                 'E1':6,
                 'F1':7,
                 'A2':8,
                 'B2':9,
                 'C2':10,
                 'D2':11,
                 'E2':12,
                 'F2':13,
                }

# Split the pitch into several areas
def convert_Y_to_area(df):
    df['Y_final'] = dict_Y['None']
    df.loc[(df['Y']<100),'Y_final'] = dict_Y['OutOfBound']
    df.loc[(df['Y']>=100) & (df['Y']<=106),'Y_final'] = dict_Y['A1']
    df.loc[(df['Y']>106) & (df['Y']<=108),'Y_final'] = dict_Y['B1']
    df.loc[(df['Y']>108) & (df['Y']<=110),'Y_final'] = dict_Y['C1']
    df.loc[(df['Y']>110) & (df['Y']<=111),'Y_final'] = dict_Y['D1']
    df.loc[(df['Y']>111) & (df['Y']<=112),'Y_final'] = dict_Y['E1']
    df.loc[(df['Y']>112),'Y_final'] = dict_Y['F1']
    #df.loc[(df['Y']>112) & (df['Y']<=113),'Y_final'] = 'F1'
    df.loc[(df['Y']>=200) & (df['Y']<=206),'Y_final'] = dict_Y['A2']
    df.loc[(df['Y']>206) & (df['Y']<=208),'Y_final'] = dict_Y['B2']
    df.loc[(df['Y']>208) & (df['Y']<=210),'Y_final'] = dict_Y['C2']
    df.loc[(df['Y']>210) & (df['Y']<=211),'Y_final'] = dict_Y['D2']
    df.loc[(df['Y']>211) & (df['Y']<=212),'Y_final'] = dict_Y['E2']
    df.loc[(df['Y']>212),'Y_final'] = dict_Y['F2']
    #df.loc[(df['Y']>212) & (df['Y']<=213),'Y_final'] = 'F2'
    df.loc[(df['Y']>300),'Y_final'] = dict_Y['OutOfBound']    
    return df

# We only focus on the trackings of football
df_football = df_tracking.query('displayName=="football"')
#Standardizing tracking data so its always in direction of kicking team vs raw on-field coordinates.
df_football.loc[df_football['playDirection'] == "left", 'x'] = 120-df_football.loc[df_football['playDirection'] == "left", 'x']
df_football.loc[df_football['playDirection'] == "left", 'y'] = 160/3-df_football.loc[df_football['playDirection'] == "left", 'y']
df_kickers = df_plays[~df_plays['kickerId'].isnull()]
df_plays_PK_ids = df_kickers.query('specialTeamsPlayType=="Punt" | specialTeamsPlayType=="Kickoff"')[['gameId','playId']]
df_PK_football_trackings = df_plays_PK_ids.merge(df_football,on=['gameId','playId'])
df_Y = df_PK_football_trackings[['x','y','event','frameId','gameId','playId']]
# Now we transform x,y coordinate into classified area coordinate numbers
df_Y['Y'] = (((160/3/2) + df_Y['y']) / (160/3/2)).astype('int32')*100 + ((10+df_Y['x'])/10).astype('int32')
df_Y = convert_Y_to_area(df_Y)

def create_field_background(size=(10, 12)):
    fig, ax = plt.subplots(1,1,figsize=size)    
    ymin = 0
    ymax = 120
    half_w = 53.33/2
    xmin = 0
    xmax = 53.33    
    ax.set_ylabel('')
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_ylim([0,120])
    #ax.legend()
    ax.set_xlim([xmin,xmax])
    ax.set_facecolor('green')
    ax.patch.set_alpha(0.3)
    x = 10
    ax.text(x,  26,  "A2", fontsize=30, color="#ffffff")
    ax.text(x, 56, "B2", fontsize=30, color="red")
    ax.text(x, 76, "C2", fontsize=30, color="#ffffff")
    ax.text(x, 92, "D2", fontsize=30, color="#ff0000")
    ax.text(x, 103, "E2", fontsize=30, color="#ffffff")
    x = 38
    ax.text(x,  26,  "A1", fontsize=30, color="#ffffff")
    ax.text(x, 56, "B1", fontsize=30, color="red")
    ax.text(x, 76, "C1", fontsize=30, color="#ffffff")
    ax.text(x, 92, "D1", fontsize=30, color="#ff0000")
    ax.text(x, 103, "E1", fontsize=30, color="#ffffff")
    ax.text(x-10, 112, "F", fontsize=30, color="#ffffff")
    ax.hlines([50,70,90,100,110], xmin, xmax, "white", linestyles='dashed') 
    ax.vlines([half_w], ymin, ymax-10, "yellow", linestyles='dashed') 
    plt.fill_between([0,53.3],[120,120],[110],facecolor="red",alpha=0.3)
    plt.fill_between([0,half_w],[110,110],[100,100],facecolor="#FF00FF",alpha=0.3)
    plt.fill_between([half_w,53.3],[110,110],[100,100],facecolor="blue",alpha=0.3)
    plt.fill_between([0,half_w],[100,100],[90,90],facecolor="#00FFFF",alpha=0.3)
    plt.fill_between([half_w,53.3],[100,100],[90,90],facecolor="green",alpha=0.3)
    plt.fill_between([0,half_w],[90,90],[70,70],facecolor="#D7FFEE",alpha=0.3)
    plt.fill_between([half_w,53.3],[90,90],[70,70],facecolor="gold",alpha=0.3)
    plt.fill_between([0,half_w],[70,70],[50,50],facecolor="#AAFFAA",alpha=0.3)
    plt.fill_between([half_w,53.3],[70,70],[50,50],facecolor="lime",alpha=0.3)
    plt.fill_between([0,53.3],[50,50],facecolor="azure",alpha=0.3)    
    ax.legend(bbox_to_anchor=(1,0),loc='lower right', fontsize=40) # 标签位置    
    return fig,ax

def draw_event_landed_points(df,event_name,size=1):
    fig,ax = create_field_background()
    groups = df.query('event==@event_name').groupby('event')
    groups.get_group(event_name).plot(x='y', y='x', ax=ax, style='.', label='Football', color='red', markersize=size)
    ax.set_xlabel('')

draw_event_landed_points(df_Y,"kick_received",3)

#### From the above graph, we can see clearly that most of kick_received happened downfield. The areas are E1,E2 and F(endzone), which means that the football after kickoff will land these areas mostly. We should avoid the football into B1 and B2. Now our purpose is to predict where is the ball landing areas, so that our coaches can guide their players in the match or as their reference.

# Data Preparation

### How can we know the landing area or can we predict the landing area by using machine learning method?
Firstly, we mapped 13 areas into categories(1,2,3,...,12,13). Then we get the prediction number or the probabilities of these 13 areas. Therefore, the model will select the number(or area) with the greatest probablity as the prediction result in machine learning model. In the tracking data, there are some 'event' taht denote the football landing frame and the x,y coordinates, so we can know where the football landed and in which area. We select these events('kick_received', 'punt_land', 'punt_downed', 'out_of_bounds', 'fair_catch', 'touchback', 'kickoff_land') as the important landing denoting events. You can see the number(Y_final) in the following table.

In [None]:
df_temp = df_Y.query('event=="punt_received" | event=="kick_received" | \
                                event == "punt_land" | event == "punt_downed" | \
                                event=="out_of_bounds" | event=="fair_catch"| \
                                event =="touchback" | event == "kickoff_land"')
df_temp = df_temp[['gameId','playId','frameId','Y_final']]
df_temp = df_temp.groupby(['gameId','playId'])

df_Y_final = pd.DataFrame()
for key,df_value in df_temp:
    if (df_Y_final.empty):
        df_Y_final = df_value.head(1)
    else:
        df_Y_final = df_Y_final.append(df_value.iloc[0])
        
df_Y_final = df_Y_final.drop('frameId',axis=1)
df_Y_final

In the second step, we must select some features to predict the Y_final(or areas). The features include some tracking columns in the first event (such as 'ball_snap', 'kick_off', which means that a play began and the data in the same frameId will be our X data for training and predicting), the height,weight,age of a player(kicker or punter), some features in PFFScouting data.

#### The first event data

In [None]:
df_temp = df_Y_final.merge(df_plays,on=['gameId','playId'])
df_temp.rename({'kickerId':'nflId'},axis=1,inplace=True)
df_X_tracking = df_temp.merge(df_tracking,on = ['gameId','playId','nflId'])
df_X_tracking.drop('index',inplace=True,axis=1)
df_first_event = pd.DataFrame()
groups = df_X_tracking.groupby(['gameId','playId'],as_index=False)
for key,df_value in groups:    
    df_track1 = df_value[df_value['event']!="None"]
    if (df_first_event.empty):
        df_first_event = df_track1.head(1)#pd.DataFrame({},columns=df0['index'])
    else:
        df_first_event = df_first_event.append(df_track1.iloc[0])    
    
keep_cols = ['gameId','playId','frameId','x','y','s','a','dis','o','dir','nflId','specialTeamsPlayType']
df_PK_tracking_all= df_first_event[keep_cols]

#### Players' height, weight and age

In [None]:
# convert height to meter
h_ft_in = (df_players.height.str.contains('-'), 'height')
df_players.loc[h_ft_in] = df_players.loc[h_ft_in].str.split('-').str[0].astype(int)*12 \
    + df_players.loc[h_ft_in].str.split('-').str[1].astype(int)
df_players['height'] = df_players.height.astype(int) / 39.37
# convert birthday to years old
df_players['birthDate'] = pd.to_datetime(df_players['birthDate'], infer_datetime_format = True) 
df_players['birthDate'] = np.round((pd.Timestamp.now() - df_players['birthDate']).dt.days/365)
df_players = df_players.rename(columns={'birthDate': 'age'})
df_players['age'] = round(df_players['age'].fillna(df_players['age'].mean()), 2)
df_players_X = df_players[['nflId','height','weight','age']]
df_PK_tracking_all_players = df_PK_tracking_all.merge(df_players_X,on='nflId')

#### PFFScouting data

In [None]:
df_X = df_pff[['gameId','playId','hangTime','kickType','kickDirectionActual','returnDirectionActual','kickoffReturnFormation','kickContactType']]
df_X = df_PK_tracking_all.merge(df_X,on=['gameId','playId'])
df_pkType = pd.get_dummies(df_X['specialTeamsPlayType'],prefix="pk")
df_kickType = pd.get_dummies(df_X['kickType'],prefix="kickType")
df_kick_direction = pd.get_dummies(df_X['kickDirectionActual'],prefix='kick_dire')
df_re_direction = pd.get_dummies(df_X['returnDirectionActual'],prefix="re_dire")
df_formation = pd.get_dummies(df_X['kickoffReturnFormation'],prefix="formation")
df_contackType = pd.get_dummies(df_X['kickContactType'],prefix="kickContactType")
PFF_temp_df = pd.concat([df_X[['gameId','playId','hangTime']],df_pkType,df_kickType,df_kick_direction,df_re_direction,df_formation,df_contackType],axis=1)
PFF_temp_df['hangTime'] = round(PFF_temp_df['hangTime'].fillna(PFF_temp_df['hangTime'].mean()), 2)

#### Merging X and Y 

In [None]:
df_X_Final = df_PK_tracking_all_players.merge(PFF_temp_df,on=['gameId','playId'])
df_train = df_X_Final.merge(df_Y_final,on=['gameId','playId'])

# Machine learning models for training and predicting

In [None]:
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# cols_to_use = ['hangTime', 'x', 'y', 'weight','height','dis','dir','a','age']
cols_to_use = ['x', 'y', 's', 'a', 'dis', 'o', 'dir',
       'height', 'weight', 'age', 'hangTime', 'pk_Kickoff', 'pk_Punt',
       'kickType_A', 'kickType_B', 'kickType_D', 'kickType_F', 'kickType_K',
       'kickType_N', 'kickType_O', 'kickType_P', 'kickType_Q', 'kickType_R',
       'kickType_S', 'kick_dire_C', 'kick_dire_L', 'kick_dire_R', 're_dire_C',
       're_dire_L', 're_dire_R', 'formation_10-0-0', 'formation_5-0-4',
       'formation_5-3-2', 'formation_6-0-3', 'formation_6-0-4',
       'formation_6-2-2', 'formation_7-0-3', 'formation_7-1-2',
       'formation_8-0-1', 'formation_8-0-2', 'formation_8-0-3',
       'formation_8-1-0', 'formation_8-1-1', 'formation_9-0-0',
       'formation_9-0-1', 'formation_9-1-0', 'kickContactType_BB',
       'kickContactType_BC', 'kickContactType_BF', 'kickContactType_BOG',
       'kickContactType_CC', 'kickContactType_CFFG', 'kickContactType_DEZ',
       'kickContactType_ICC', 'kickContactType_KTB', 'kickContactType_KTC',
       'kickContactType_KTF', 'kickContactType_MBC', 'kickContactType_MBDR',
       'kickContactType_OOB']
X = df_train[cols_to_use]
y = df_train.Y_final
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, random_state = 11)
X_valid,X_test ,y_valid , y_test =train_test_split(X_valid,y_valid,test_size=0.5,random_state=42)

models = {
    "XGBClassifier": XGBClassifier(),
    "Decision Tree Classifier": DecisionTreeClassifier(),
    "Random Forest Classifer": RandomForestClassifier(random_state = 5)         
}

for name, model in models.items():
    model.fit(X_train, y_train)
    print(name + " trained")
    
print("-------------------------", '\n')

for name, model in models.items():
    print(name)
    predictions = model.predict(X_valid)
    print("y_valid Accuracy: %.2f%%" % (accuracy_score(y_valid, predictions, normalize=True) * 100.0))
    predictions=model.predict(X_test)
    print("y_test Accuracy: %.2f%%" % (accuracy_score(y_test, predictions, normalize=True) * 100.0))

# Nerual Network Model

In [None]:
from keras.layers import Dense,Input,Flatten,concatenate,Dropout,Lambda,BatchNormalization,LeakyReLU
from keras.models import Model
from keras.callbacks import Callback
from  keras.callbacks import EarlyStopping,ModelCheckpoint
from tensorflow.keras.utils import plot_model
from sklearn.model_selection import train_test_split, KFold
import time

def get_model(x_tr,y_tr,x_val,y_val,x_test,y_test):
    inp = Input(shape = (x_tr.shape[1],))
    x = Dense(1024, input_dim=X.shape[1])(inp)
    x = LeakyReLU(alpha=0.3)(x)
    x = Dropout(0.2)(x)
    x = BatchNormalization()(x)
    x = Dense(512)(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.2)(x)
    x = BatchNormalization()(x)
    x = Dense(256)(x)
    x = LeakyReLU(alpha=0.1)(x)
    x = Dropout(0.2)(x)
    x = BatchNormalization()(x)
    
    out = Dense(13, activation='softmax')(x)
    model = Model(inp,out)

    model.compile(optimizer='adam', loss='categorical_crossentropy',weighted_metrics=['accuracy'])    
    bsz = 100
    steps = x_tr.shape[0]/bsz
    call_backs = EarlyStopping(monitor='val_loss', min_delta=0, patience=150, verbose=1, restore_best_weights=True)
    check_point = ModelCheckpoint('/kaggle/working/best_model.tf',save_best_only=True, verbose=1, save_weights_only=True)

    model.fit(x_tr, y_tr, epochs=100, batch_size=bsz,verbose=1,callbacks=[call_backs,check_point])

    print('Starting the prediction of x_val')
    y_pred = model.predict(x_val)
    y_pred = model.predict(x_test)
    argmax_pred = np.argmax(y_pred,axis=1)
    argmax_true=np.argmax(y_test,axis=1)
    comp = argmax_pred-argmax_true
    error = np.count_nonzero(comp)
    total = y_test.shape[0]
    accuracy = 1- error/total
    print("this is accuracy:")
    print(accuracy)

    return model


areas = df_train.Y_final
y = np.zeros((areas.shape[0], 13))
for idx, target in enumerate(list(areas)):
    y[idx][target-1] = 1

s_time = time.time()
for k in range(1):
    kfold = KFold(2, random_state = 42 + k, shuffle = True)    
    for k_fold, (tr_inds, val_inds) in enumerate(kfold.split(areas)):
        print("---------------------------------")
        print("The %dth fold: "%(k_fold+1))
        print("---------------------------------")
        tr_x,tr_y = X.iloc[tr_inds],y[tr_inds]
        val_x,val_y = X.iloc[val_inds],y[val_inds]
        val_x, test_x, val_y, test_y = train_test_split(val_x, val_y, test_size=0.01)
        val_x, test_x2, val_y, test_y2 = train_test_split(val_x, val_y, test_size=0.5)
        
        tr_x = pd.concat([tr_x,test_x2],axis=0)      
        tr_y =np.concatenate((tr_y,test_y2),axis=0)
        
        model = get_model(tr_x,tr_y,val_x,val_y,test_x,test_y)

In [None]:
plot_model(model=model,show_shapes=True)

# Part II

## Ranking Returners

As I have mentioned in the introduction, returners and tacklers are both quite important for a special teams, they will give their own team a spark, so in the part II, we evaluate returners and Tacklers based our calculation. Evaluating a player is uneasy,it has to consider many sides to fully evaluate them well.For returner, we value a returner through 4 index, one is a returner who bleak how many tacklers related to a returner, we measure by missedTackler, a returner needs **energy** to run, so we measure Enery index via player's weight or his BMI multiply his speed square; the third index we find how many returner returned opponent Touch Down Zone and win his team scores. Last but not least, how many yardage that returner has rushed, we also considered.

### Calculating each missedTackler

In [None]:
df_track = df_tracking[['gameId','playId','nflId','team','jerseyNumber']]
def count_missedTackler(row):
    if row=='0':
        return 0
    count = len(row.split(';'))
    return count
keep_cols = ['gameId','playId','missedTackler']
df_missedTackler = df_pff[keep_cols]
df_missedTackler['missedTackler'] = df_missedTackler['missedTackler'].fillna('0')
df_missedTackler['missedTackler_count'] = df_missedTackler['missedTackler'].apply(count_missedTackler)
df_missedTackler.drop('missedTackler',inplace=True,axis=1)

# Returners ranking

In [None]:
keep_cols = ['gameId','playId','possessionTeam','specialTeamsPlayType','kickReturnYardage','kickerId','returnerId']
# keep_cols = ['gameId','playId','possessionTeam','kickerId','returnerId','kickReturnYardage']
df_returner = df_plays[keep_cols]
df_returner = df_returner[~df_returner['returnerId'].isnull()]
df_returner[['nflId_returner','nflId2','nflId3']] = df_returner['returnerId'].str.split(';',expand=True)
df_returner.drop(['returnerId','nflId2','nflId3'],inplace=True,axis=1)
df_returner['nflId_returner'] = df_returner['nflId_returner'].astype(float)
df_returner['kickReturnYardage'] = df_returner['kickReturnYardage'].fillna(0)
keep_cols = ['gameId','playId','frameId','event','s','nflId','a']
df_track_returner = df_tracking[keep_cols]
df_returner = df_returner.rename(columns={'nflId_returner':'nflId'})
df_returner_names= df_returner.merge(df_players,on='nflId')
returner_features = df_track_returner.merge(df_returner_names,on=['gameId','playId','nflId'])

# Returner's energy

In [None]:
returner_features=returner_features.rename(columns={'nflId':'nflId_returner'})
df_returner_features = df_missedTackler.merge(returner_features,on=['gameId','playId'])
df_returner_features['BMI'] = (df_returner_features['weight'] * 703) / (df_returner_features['height'] ** 2)
df1 = df_returner_features.groupby(['gameId','playId'],as_index=False)['s'].max()
df1.rename({'s':'Energy'},axis=1,inplace=True)
df_returner_features = df_returner_features.merge(df1,on=['gameId','playId']) 
df_returner_features['Energy_BMI'] = (df_returner_features['BMI'] * 0.45359237) * (df_returner_features['Energy'] ** 2)
# df_returner_features['Energy'] = (df_returner_features['weight'] * 0.45359237) * (df_returner_features['Energy'] ** 2)
df_returner_features.drop_duplicates('nflId_returner')

In [None]:
from sklearn.preprocessing import StandardScaler

feature_columns = ['missedTackler_count','kickReturnYardage','Energy','Energy_BMI']
scaler = StandardScaler()
df2 = scaler.fit_transform(df_returner_features[feature_columns])
df2 = pd.DataFrame(df2)
dict_fc = {0:'missedTackler_count',1:'kickReturnYardage',2:'Energy',3:'Energy_BMI'}
for col in df2.columns:
    df2 = df2.rename({col:dict_fc[col]},axis=1)
df_returner_features.drop(feature_columns,axis=1,inplace=True)
df_returner_features = pd.concat([df_returner_features,df2],axis=1)
keep_cols = ['nflId_returner','missedTackler_count','kickReturnYardage','Energy','Energy_BMI']
df_returner_rank = df_returner_features.groupby(['gameId','playId'],as_index=False).mean()[keep_cols]
df_returner_rank['returner_score']=df_returner_rank['missedTackler_count']*0.2 \
                                             +df_returner_rank['kickReturnYardage']*0.5 \
                                             +df_returner_rank['Energy']*0.3

df_returner_td = df_returner_features.query('event=="touchdown"')[['gameId','playId','frameId','event','nflId_returner']]
df_returner_td = df_returner_td.groupby('nflId_returner',as_index=False).count()
df_returner_td = df_returner_td[['nflId_returner','event','gameId','playId']]
df_returner_td.rename({'event':'td_count'},axis=1,inplace=True)
df_returner_td = df_returner_td[['nflId_returner','td_count']]
df_returner_rank = df_returner_rank.groupby(['nflId_returner'],as_index=False).mean()
df_returner_rank = df_returner_rank.merge(df_returner_td,how='outer',on='nflId_returner')
df_returner_rank['td_count'] = df_returner_rank['td_count'].fillna(0)
scaler = StandardScaler()
X3 = scaler.fit_transform(df_returner_rank[['nflId_returner','td_count']])
dict_fc = {0:'nflId_returner',1:'td_count'}
X3 = pd.DataFrame(X3)
for col in X3.columns:
    X3 = X3.rename({col:dict_fc[col]},axis=1)
X3.reset_index(inplace=True)
X3.drop('nflId_returner',inplace=True,axis=1)
df_returner_rank.drop('td_count',inplace=True,axis=1)
df_returner_rank = pd.concat([df_returner_rank,X3],axis=1)
df_returner_rank.drop('index',inplace=True,axis=1)
df_returner_rank['returner_score'] = 100*(df_returner_rank['returner_score'] + (1+df_returner_rank['td_count']))/2

## Ranking according to the score

In [None]:
df_returner_rank['rank_by_score'] = df_returner_rank['returner_score'].rank(ascending=False)
df_returner_rank = df_returner_rank.sort_values('rank_by_score',ascending=True)
df_returner_rank = df_returner_rank[['nflId_returner','rank_by_score','returner_score']]
df_returner_rank.rename({'nflId_returner':'nflId'},axis=1,inplace=True)
df_returner_rank = df_returner_rank.merge(df_players,on=['nflId'])

# Visulization

In [None]:
import plotly.express as px
returner_top_ten = df_returner_rank[0:50]
check = returner_top_ten['rank_by_score'].value_counts().reset_index()
check.rename({'rank_by_score':'cc'},axis=1,inplace=True)
check.rename({'index':'rank_by_score'},axis=1,inplace=True)
check = check.merge(returner_top_ten,on=['rank_by_score'])
check = check[['returner_score','displayName']]

check.columns = [
    
    'scores', 
    'playerName'
]

check = check.sort_values('scores')

fig = px.bar(
    check, 
    y='playerName', 
    x="scores", 
    orientation='h', 
    title='Number of games for every date', 
    height=900, 
    width=800
)

fig.show()

# Part III :  Pitch control --Voronoi Visulization Animation

We firstly put voronoi and players' movement in a field together in NFL competition, each player has a control area , so in a field or a sport competition, a field control is also vital, so we visulize a play in a game.

In [None]:
import dateutil
from math import radians
from IPython.display import Video

from matplotlib import animation
from matplotlib.animation import FFMpegWriter

import matplotlib.patches as patches
from scipy.spatial import Voronoi, voronoi_plot_2d

def voronoi_finite_polygons_2d(vor, radius=None):
    """
    Reconstruct infinite voronoi regions in a 2D diagram to finite
    regions.

    Parameters
    ----------
    vor : Voronoi
        Input diagram
    radius : float, optional
        Distance to 'points at infinity'.

    Returns
    -------
    regions : list of tuples
        Indices of vertices in each revised Voronoi regions.
    vertices : list of tuples
        Coordinates for revised Voronoi vertices. Same as coordinates
        of input vertices, with 'points at infinity' appended to the
        end.

    """

    if vor.points.shape[1] != 2:
        raise ValueError("Requires 2D input")

    new_regions = []
    new_vertices = vor.vertices.tolist()

    center = vor.points.mean(axis=0)
    if radius is None:
        radius = vor.points.ptp().max()*2

    # Construct a map containing all ridges for a given point
    all_ridges = {}
    for (p1, p2), (v1, v2) in zip(vor.ridge_points, vor.ridge_vertices):
        all_ridges.setdefault(p1, []).append((p2, v1, v2))
        all_ridges.setdefault(p2, []).append((p1, v1, v2))

    # Reconstruct infinite regions
    for p1, region in enumerate(vor.point_region):
        vertices = vor.regions[region]

        if all([v >= 0 for v in vertices]):
            # finite region
            new_regions.append(vertices)
            continue

        # reconstruct a non-finite region
        ridges = all_ridges[p1]
        new_region = [v for v in vertices if v >= 0]

        for p2, v1, v2 in ridges:
            if v2 < 0:
                v1, v2 = v2, v1
            if v1 >= 0:
                # finite ridge: already in the region
                continue

            # Compute the missing endpoint of an infinite ridge

            t = vor.points[p2] - vor.points[p1] # tangent
            t /= np.linalg.norm(t)
            n = np.array([-t[1], t[0]])  # normal

            midpoint = vor.points[[p1, p2]].mean(axis=0)
            direction = np.sign(np.dot(midpoint - center, n)) * n
            far_point = vor.vertices[v2] + direction * radius

            new_region.append(len(new_vertices))
            new_vertices.append(far_point.tolist())

        # sort region counterclockwise
        vs = np.asarray([new_vertices[v] for v in new_region])
        c = vs.mean(axis=0)
        angles = np.arctan2(vs[:,1] - c[1], vs[:,0] - c[0])
        new_region = np.array(new_region)[np.argsort(angles)]

        # finish
        new_regions.append(new_region.tolist())

    return new_regions, np.asarray(new_vertices)


def calculate_dx_dy_arrow(x, y, angle, speed, multiplier):
    if angle <= 90:
        angle = angle
        dx = np.sin(radians(angle)) * multiplier * speed
        dy = np.cos(radians(angle)) * multiplier * speed
        return dx, dy
    if angle > 90 and angle <= 180:
        angle = angle - 90
        dx = np.sin(radians(angle)) * multiplier * speed
        dy = -np.cos(radians(angle)) * multiplier * speed
        return dx, dy
    if angle > 180 and angle <= 270:
        angle = angle - 180
        dx = -(np.sin(radians(angle)) * multiplier * speed)
        dy = -(np.cos(radians(angle)) * multiplier * speed)
        return dx, dy
    if angle > 270 and angle <= 360:
        angle = 360 - angle
        dx = -np.sin(radians(angle)) * multiplier * speed
        dy = np.cos(radians(angle)) * multiplier * speed
        return dx, dy

def create_football_field(linenumbers=True,
                          endzones=True,
                          highlight_line=False,
                          highlight_line_number=55,
                          highlight_first_down_line=False,
                          yards_to_go=10,
                          highlighted_name='Line of Scrimmage',
                          fifty_is_los=False,
                          figsize=(12, 6.33)):
    """
    Function that plots the football field for viewing plays.
    Allows for showing or hiding endzones.
    """
    rect = patches.Rectangle((0, 0), 120, 53.3, linewidth=0.1,
                             edgecolor='r', facecolor='darkgreen', zorder=0)

    fig, ax = plt.subplots(1, figsize=figsize)
    ax.add_patch(rect)

    plt.plot([10, 10, 10, 20, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70, 80,
              80, 90, 90, 100, 100, 110, 110, 120, 0, 0, 120, 120],
             [0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3,
              53.3, 0, 0, 53.3, 53.3, 0, 0, 53.3, 53.3, 53.3, 0, 0, 53.3],
             color='white')
    if fifty_is_los:
        plt.plot([60, 60], [0, 53.3], color='gold')
        plt.text(62, 50, '<- Player Yardline at Snap', color='gold')
    # Endzones
    if endzones:
        ez1 = patches.Rectangle((0, 0), 10, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ez2 = patches.Rectangle((110, 0), 120, 53.3,
                                linewidth=0.1,
                                edgecolor='r',
                                facecolor='blue',
                                alpha=0.2,
                                zorder=0)
        ax.add_patch(ez1)
        ax.add_patch(ez2)
    plt.xlim(0, 120)
    plt.ylim(-5, 58.3)
    plt.axis('off')
    if linenumbers:
        for x in range(20, 110, 10):
            numb = x
            if x > 50:
                numb = 120 - x
            plt.text(x, 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white')
            plt.text(x - 0.95, 53.3 - 5, str(numb - 10),
                     horizontalalignment='center',
                     fontsize=20,  # fontname='Arial',
                     color='white', rotation=180)
    if endzones:
        hash_range = range(11, 110)
    else:
        hash_range = range(1, 120)

    for x in hash_range:
        ax.plot([x, x], [0.4, 0.7], color='white')
        ax.plot([x, x], [53.0, 52.5], color='white')
        ax.plot([x, x], [22.91, 23.57], color='white')
        ax.plot([x, x], [29.73, 30.39], color='white')

    if highlight_line:
        hl = highlight_line_number + 10
        plt.plot([hl, hl], [0, 53.3], color='yellow')
        #plt.text(hl + 2, 50, '<- {}'.format(highlighted_name),
        #         color='yellow')
        
    if highlight_first_down_line:
        fl = hl + yards_to_go
        plt.plot([fl, fl], [0, 53.3], color='yellow')
        #plt.text(fl + 2, 50, '<- {}'.format(highlighted_name),
        #         color='yellow')
    return fig, ax


In [None]:
def animate_player_movement(weekNumber, playId, gameId):
    weekData = df_tracking
    playData = df_plays
    
    playHome = weekData.query('gameId==' + str(gameId) + ' and playId==' + str(playId) + ' and team == "home"')
    playAway = weekData.query('gameId==' + str(gameId) + ' and playId==' + str(playId) + ' and team == "away"')
    playFootball = weekData.query('gameId==' + str(gameId) + ' and playId==' + str(playId) + ' and team == "football"')
    #playHome
    
    playHome['time'] = playHome['time'].apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    playAway['time'] = playAway['time'].apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    playFootball['time'] = playFootball['time'].apply(lambda x: dateutil.parser.parse(x).timestamp()).rank(method='dense')
    
    
    
    maxTime = int(playAway['time'].unique().max())
    minTime = int(playAway['time'].unique().min())
    
    yardlineNumber = playData.query('gameId==' + str(gameId) + ' and playId==' + str(playId))['yardlineNumber'].item()
    yardsToGo = playData.query('gameId==' + str(gameId) + ' and playId==' + str(playId))['yardsToGo'].item()
    absoluteYardlineNumber = playData.query('gameId==' + str(gameId) + ' and playId==' + str(playId))['absoluteYardlineNumber'].item() - 10
    playDir = playHome.sample(1)['playDirection'].item()
    
    if (absoluteYardlineNumber > 50):
        yardlineNumber = 100 - yardlineNumber
    if (absoluteYardlineNumber <= 50):
        yardlineNumber = yardlineNumber
        
    if (playDir == 'left'):
        yardsToGo = -yardsToGo
    else:
        yardsToGo = yardsToGo
    
    fig, ax = create_football_field(highlight_line=True, highlight_line_number=yardlineNumber, highlight_first_down_line=True, yards_to_go=yardsToGo)
    playDesc = playData.query('gameId==' + str(gameId) + ' and playId==' + str(playId))['playDescription'].item()
    plt.title(f'Game # {gameId} Play # {playId} \n {playDesc}')
    
    def update_animation(time):
        patch = []
        if True:
            ############################
            # Home players' location
            homeX = playHome.query('time == ' + str(time))['x']
            homeY = playHome.query('time == ' + str(time))['y']
            homeNum = playHome.query('time == ' + str(time))['jerseyNumber']
            homeOrient = playHome.query('time == ' + str(time))['o']
            homeDir = playHome.query('time == ' + str(time))['dir']
            homeSpeed = playHome.query('time == ' + str(time))['s']
            patch.extend(plt.plot(homeX, homeY, 'o',c='gold', ms=20, mec='white'))

            # Home players' jersey number 
            for x, y, num in zip(homeX, homeY, homeNum):
                patch.append(plt.text(x, y, int(num), va='center', ha='center', color='black', size='medium'))

            # Home players' orientation
            for x, y, orient in zip(homeX, homeY, homeOrient):
                dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
                patch.append(plt.arrow(x, y, dx, dy, color='gold', width=0.5, shape='full'))

            # Home players' direction
            for x, y, direction, speed in zip(homeX, homeY, homeDir, homeSpeed):
                dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
                patch.append(plt.arrow(x, y, dx, dy, color='black', width=0.25, shape='full'))


            #########################
            # Away players' location
            awayX = playAway.query('time == ' + str(time))['x']
            awayY = playAway.query('time == ' + str(time))['y']
            awayNum = playAway.query('time == ' + str(time))['jerseyNumber']
            awayOrient = playAway.query('time == ' + str(time))['o']
            awayDir = playAway.query('time == ' + str(time))['dir']
            awaySpeed = playAway.query('time == ' + str(time))['s']
            patch.extend(plt.plot(awayX, awayY, 'o',c='orangered', ms=20, mec='white'))

            # Away players' jersey number 
            for x, y, num in zip(awayX, awayY, awayNum):
                patch.append(plt.text(x, y, int(num), va='center', ha='center', color='white', size='medium'))

            # Away players' orientation
            for x, y, orient in zip(awayX, awayY, awayOrient):
                dx, dy = calculate_dx_dy_arrow(x, y, orient, 1, 1)
                patch.append(plt.arrow(x, y, dx, dy, color='orangered', width=0.5, shape='full'))

            # Away players' direction
            for x, y, direction, speed in zip(awayX, awayY, awayDir, awaySpeed):
                dx, dy = calculate_dx_dy_arrow(x, y, direction, speed, 1)
                patch.append(plt.arrow(x, y, dx, dy, color='black', width=0.25, shape='full'))

            # Football' location
            footballX = playFootball.query('time == ' + str(time))['x']
            footballY = playFootball.query('time == ' + str(time))['y']
            patch.extend(plt.plot(footballX, footballY, 'o', c='black', ms=10, mec='white', data=playFootball.query('time == ' + str(time))['team']))

        # Voronoi polygon 
        carrier_loc = (playHome.query('time == ' + str(time))[['x','y']]).values
        notcarrier_loc = (playAway.query('time == ' + str(time))[['x','y']]).values
        points = np.append(carrier_loc, notcarrier_loc, axis=0)
        vor = Voronoi(points)
        regions, vertices = voronoi_finite_polygons_2d(vor)
        for i, region in enumerate(regions):
            polygon = vertices[region]
            patch.extend(plt.fill(*zip(*polygon), color='yellow', alpha=0.6))
#             patch.extend(plt.fill(*zip(*polygon), alpha=0.8))
        
        return patch
    
    ims = [[]]
    for time in np.arange(minTime, maxTime+1):
        patch = update_animation(time)
        ims.append(patch)
        
    anim = animation.ArtistAnimation(fig, ims, repeat=False)
    
    return anim




# anim = animate_player_movement(1, 366, 2018090600)
# anim = animate_player_movement(1, 1374, 2018122302)
anim = animate_player_movement(1, 36, 2018111900)

writer = FFMpegWriter(fps=7)
anim.save('animation_notrail.mp4', writer=writer)
Video("animation_notrail.mp4")

# Conclusion and Future work

We hope predicting landing area can guide our coaches to make thier strategies. However, this is initial work, and there are a lot of work to do. We did not finish Tackler's prediction, because there are a lot of interaction among players. We shoul not omit them, and we hope that Graph Neural Network can process both temporal and spatial info in NFL big data. So we hope we can provide more insightful suggestion in the near future.