###  1. Introduction
In the present day NFL play games, the role of specialTeams has assumed great importance. The productivity of the specialTeams bothers on the response time; i.e. how quickly the player makes an appropriate move. 

The speed of the player which in the recent past is being recognised to be the 'bane' of the NFL play game is a very important phenomenon. The importance of speed takes greater prominence since, NFL play games are now more of 'running games' rather than 'passing games' in nature.

This presentation bothers on the relevance of speed of NFL specialTeams. The goal is to promote insights which were deduced from data analysis of NFL datasets that, would lead to the optimization of the speed of NFL players. 

Specifically, insights drawn from the analysis of the NFL datasets led to: 
    - the presentation of the present state of NFL specialTeams speed
    - development of a ML based model to clarify mitigating factors of the specialTeams speed phenomenon.
    - development of a simple statistical analysis model - Ordinary Least Square (OLS) model.
    - the presentation of a Graphical User Interface for the monitoring and management of this phenomenon.

It is hoped that, this work will contribute, in one way or the other, to the knowledge-base of the NFL.

###  2. Python Libraries and Modules
Initiate appropriate libraries and modules to be used for data analysis

In [None]:
# load needed libraries and modules
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

###  3.  Read Data Files. Select Variables to Work on. Remove NaN Values. 
#### 3.1. Load the Datasets
Read NFL datasets into memory to facilitate processing. Read the Plays and 2018, 2019 and 2020 NFL Tracking into memory.

In [None]:
# read the plays dataset.
Play = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv') 
# select variables of interest fo the study
Play = Play[['gameId','playId','quarter','possessionTeam','specialTeamsPlayType','specialTeamsResult','playResult']]
# remove all blank and Nan value rows
nan_value = float("NaN") #Convert NaN values to empty string.
Play. replace("", nan_value, inplace=True)
Play. dropna(subset =[ 'gameId','quarter','possessionTeam','specialTeamsPlayType','specialTeamsResult','playResult'], inplace=True)

In [None]:
# read the 2018tracking dataset.
d2018 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2018.csv') 
# select variables of interest fo the study
d2018 = d2018[['gameId','playId','x','y','s','dis','dir','position']]
# remove all blank and Nan value rows
nan_value = float("NaN") #Convert NaN values to empty string.
d2018. dropna(subset = ['gameId','playId','x','y','s','dis','dir','position'], inplace=True)

In [None]:
# read the 2019tracking dataset.
d2019 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2019.csv') 
# select variables of interest fo the study
d2019 = d2019[['gameId','playId','x','y','s','dis','dir','position']]
# remove all blank and Nan value rows
nan_value = float("NaN") #Convert NaN values to empty string.
d2019. replace("", nan_value, inplace=True)
d2019. dropna(subset = ['gameId','playId','x','y','s','dis','dir','position'], inplace=True)

In [None]:
# read the 2020tracking dataset.
d2020 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2020.csv') 
# select variables of interest fo the study
d2020 = d2020[['gameId','playId','x','y','s','dis','dir','position']]
# remove all blank and Nan value rows
nan_value = float("NaN") #Convert NaN values to empty string.
d2020. replace("", nan_value, inplace=True)
d2020. dropna(subset = ['gameId','playId','x','y','s','dis','dir','position'], inplace=True)

####  3.2. Merge Files.  Remove NaN Values. 
Merge the files as required to forge dataset, named NFLProject for convinience, to beused for the analsis and study.
The final merged of the original Plays and 2018, 2019 and 2020 NFL Tracking datasets is made up of 35171290 rows × 13 columns. 

In [None]:
# merge the 2018, 2019 and 2020 tracking datasets
files = [d2018,d2019,d2020]
# Create an empty dataframe, because Python would raise a NameError
# saying that it doesn't recognize `list_stacked`.
Track = pd.DataFrame() 
for file in files:
    Track = pd.concat([Track, file])

# remove all blank and Nan value rows
nan_value = float("NaN") #Convert NaN values to empty string.
Track. replace("", nan_value, inplace=True)
Track. dropna(subset = ['gameId','playId','x','y','s','dis','dir','position'], inplace=True)

In [None]:
# merge plays and 3 tracking datasets
NFLProject = pd.merge(Play, Track, on=["gameId", "playId"])
#NFLProject
# 35171290 rows × 13 columns

### 4. Statistical Distribution of 'NFL specialTeamsPlayType' and  'specialTeamsResult' data
An exploratory data analysis (EDA) presentation of the merged dataset. This presentation is minimal in scope emphasising only the essential distributions. 
####  4.1. NFL  specialTeamsPlayType by specialTeamsResult  distribution
A look at the distribution of NFL specialTeamsPlayType and specialTeamsResult. Presentation include a table, a plot and a characterisation of salient observations.

 - About 44.5% of Blocked Kick Attempt score results from Extra Point play while, the outstanding 55.5% are from Field Goal play.  
 - All (100%) of Blocked Punt, Downed and Fair Catch scores result from Punt play.  
 - About 57% of Kick Attempt Good score results from Extra Point play while, the outstanding 43% are from Field Goal play. 
 - About 30% of Kick Attempt No Good score results from Extra Point play while, the outstanding 70% are from Field Goal play.  
 - 100% of Kick Team Recovery score results from Kickoff play.  
 - About 72% of Muffed score results from Punt play while, the outstanding 28% are from Kickoff play.  
 - About 70% of Non-Special Teams Result score come from Punt play while, 15% come from Extar Point play and the     outstanding 15% are from Field Goal play.  
 - About 95% of Out of Bounds score result from Punt play while, the remaining 5% are from Kickoff play.  
 - About 50% of Return score results from Punt play while, the outstanding 50% are from Kickoff play.  
 - Only about 10% of Touchback score results from Punt play while, the outstanding 90% are from Kickoff play.

In [None]:
# a plot of the crosstab
pd.crosstab(NFLProject.specialTeamsResult, NFLProject.specialTeamsPlayType, normalize='index').plot.bar(stacked=True)

####  4.2. NFL specialTeamsResult given specialTeamsPlayType and quarter distribution
A look at the distribution of NFL specialTeamsResult given specialTeamsPlayType and quarter. Presentation include a table, a heatmap plot and a characterisation of salient observations.

  - Blocked Kick Attempt scores most likely result from Extra Point play in the 2nd quarter of game.
  - Blocked Punt scores result from Punt play in all 4 quarters of game and more prominently in the 1st and 3rd quarters.
  - Downed scores result from Punt play in all 4 quarters and quarter 5 of game. They are more prominent in the 2nd and 4th quarters.
  - Fair catch scores result from Kickoff and Punt play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 1st and 2nd quarters.
  - Kick Attempt Good scores result from Extra Point and Field Goal play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 2nd and 4th quarters.
  - Kick Attempt No Good scores result from Extra Point and Field Goal play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 2nd and 4th quarters.
  - Kickoff Team Recovery scores result from Kickoff play in all 4 quarters of game. They are more likely to occur in the 4th quarters.
  - Muffed scores result from Kickoff and Punt play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 2nd and 3rd quarters.
  - Non-Special Teams Result scores result from Extra Point , Field Goal and Puntplay in all 4 quarters of game. They are more likely to occur in the 2nd and 4th quarters.
  - Out of Bounds scores result from Field Goal, Kickoff and Punt play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 2nd and 4th quarters.
  - Return scores result from Kickoff and Punt play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 2nd and 3rd quarters.
  - Touchback scores result from Kickoff and Punt play in all 4 quarters and quarter 5 of game. They are more likely to occur in the 3rd quarter.
  
  - The heatmap plot below shows an intensity of occurence pattern with a concentration range of between 0 and 1.

In [None]:
import seaborn as sns
sns.heatmap(pd.crosstab(NFLProject.specialTeamsResult, [NFLProject.specialTeamsPlayType, NFLProject.quarter],normalize='columns'),
            cmap="YlGnBu", annot=True, cbar=True)

#### 4.3. Distribution of 'speed' data
A dataset of 35171290 rows indicates that:

In [None]:
# derive the essential statistics of the 's' variable
NFLProject['s'].describe()

- the maximum 'speed' of NFL players is 15.63 yds/sec which is 31.97mph
- the average 'speed' of NFL players is 3.65 yds/sec which is 7.476 mph.
- the median 'speed' of NFL players is 3.16 yds/sec which is 6.46 mph.

Furthermore, general observation indicate that most sprints to the zone to score touchdowns is usually at about 10.57 yds/sec, ie. 21.64 mph. 

The proposal is that, programmed and regular monitoring could very significantly improve on these levels.

###  5.  A Statistical  Model For Prediction of speed in NFL games
A statistical model to further explore the NFL player speed. This model considers the relationships between the x, y, 'dis, 'dir', specialTeamsPlayType and s(speed) variables.
####  5.1. Ordinary Least Squares (OLS) Model
Apply the following Ordinary Least Squares model:
##              result = ols(formula="s ~ y + x + dis + dir + specialTeamsPlayType", data = NFLProject).fit()



####  5.2. interpretation of the Results of the OLS model

- Adj. R-squared = 0.996 indicates that the fit of the model is very good. This is to say that assuming certain conditions are met, the R-squared value of 0.996 indicates a pretty good fit.
- The Y-intercept suggests that, if T.Field Goal, T.Kickoff, T.Punt, y, x, dis and dir are zero, then the expected output (i.e. SPEED - the speed of team across all specialTeamsPlayTypes is given by the intercept coefficient of -0.0156.
- The 'T.Field Goal' coefficient suggests that if everything else is held constant, a change of one unit in 'T.Field Goal' will result in an expected increase of the 'SPEED' by about 0.0002 units.
- The 'T.Kickoff' coefficient suggests that if everything else is held constant, a change of one unit in 'T.Kickoff' will result in an expected increase of the 'SPEED' by about 0.0523 units.
- The 'T.Punt' coefficient suggests that if everything else is held constant, a change of one unit in 'T.Punt' will result in an expected increase of the 'SPEED' by about 0.0338 units.
- The 'y' coefficient suggests that if everything else is held constant, a change of one unit in 'y' will result in an expected decrease of the 'SPEED' by about 0.000003283 units.
- The 'x' coefficient suggests that if everything else is held constant, a change of one unit in 'x' will result in an expected decrease of the 'SPEED' by about 0.0000004162 units.
- The 'dis' coefficient suggests that if everything else is held constant, a change of one unit in 'dis' will result in an expected increase of the 'SPEED' by about 9.8978 units.
- The 'dir' coefficient suggests that if everything else is held constant, a change of one unit in 'dir' will result in an expected increase of the 'SPEED' by about 0.000002328 units.
- The std err reflects the level of accuracy of the coefficients. The lower it is, the higher is the level of accuracy. The p-value of less than 0.05 indicates that we are dealing with statistically significant distributions.
- Confidence Interval represents with a likelihood of 95%, the range in which our coefficients are likely to fall.

#### 5.3. A Graphical User Interface (GUI)
- A plausible Graphical User Interface (GUI) which is based on the findings of the OLS model in paragraph 5.2 above is presented here.
- This GUI is an interactive tool suggested as an auxiliary speed monitoring device. Adaptation of the concepts of this tool can be of immense use for NFL Team Manager(s), specialTeamsManagers and administrators in general
- The number of FieldGoals, Kickoffs and Punts stipulated in the GUI are based on the averages derived from processed data.

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
from ipywidgets import interact, interactive, fixed, interact_manual

# Label widget for image
infoone = widgets.Label(value='GRAPHICAL USER INTERFACE: ',style={'description_width': 'initial'})
infotwo = widgets.Label(value='based on NFL Data Sets: ',style={'description_width': 'initial'})
infothree = widgets.Label (value= 'Games, Plays, 2018,  ',style={'description_width': 'initial'})
infofour = widgets.Label (value= '2019 and 2020 Tracking  ',style={'description_width': 'initial'})

# Image Widget
vbox_image_label=widgets.VBox([infoone,infotwo,infothree,infofour])

# spacing
label1 = widgets.Label(value='.......',style={'description_width':'initial'})
vbox_space = widgets.VBox([label1],style={'description_width':'initial'})

# label widget for sliders
labelA = widgets.Label(value='NFL specialTeamsPlayTypes SPEED MONITOR..               ',style={'description_width':'initial'})
labelB = widgets.Label(value="     1.Monitor Speed for specialTeamsPlayType(s);     ",style={'description_width':'initial'})
labelC = widgets.Label(value='     2.Self-explanatory labeling;                     ',style={'description_width':'initial'})
labelD = widgets.Label(value='     3-Choose from specialTeamsPlayType(s) sliders;   ',style={'description_width':'initial'})
labelE = widgets.Label(value='     HOW IT WORKS:        ',style={'description_width':'initial'})
labelF = widgets.Label(value='       Goal is to Monitor speed of NFL players        ',style={'description_width':'initial'})
labelG = widgets.Label(value='       For optimized specialTeamsPlayType result,        ',style={'description_width':'initial'})

vbox_info=widgets.VBox([labelA, labelB, labelC, labelD, labelE, labelF, labelG])

def funcT(Intercept, T_FieldGoal, T_Kickoff, T_Punt, Xaxis, Yaxis, dis, dir ):
    
    display((-Intercept + 0.0002*T_FieldGoal + 0.0523*T_Kickoff + 0.0338*T_Punt + Xaxis - Yaxis + dis - dir ))

w = interactive(funcT, Intercept = widgets.FloatSlider(min=0.0156, max=0.0156, value=0.0156, description="Intercept",layout=widgets.Layout(width ="200px"),slider_color='red'),
                       T_FieldGoal = widgets.IntSlider(min=0, max=3, value=0, step=1, description="#FieldGoal",layout=widgets.Layout(width ="200px"),slider_color='blue'),
                       T_Kickoff = widgets.IntSlider(min=0, max=10, value=0, step=1, description="#Kickoff",layout=widgets.Layout(width ="200px"),slider_color='blue'),
                       T_Punt = widgets.IntSlider(min=0, max=5, value=0, step=1, description="#Punt",layout=widgets.Layout(width ="200px"),slider_color='blue'),
                       Xaxis = widgets.FloatSlider(min=0.0000004162, max=0.0000004162, value=0.0000004162, description="xConstant",layout=widgets.Layout(width ="200px"),slider_color='red'),
                       Yaxis = widgets.FloatSlider(min=0.000003283, max=0.000003283, value=0.000003283, description="yConstant",layout=widgets.Layout(width ="200px"),slider_color='red'),
                       dis = widgets.FloatSlider(min=9.8978, max=9.8978, value=9.8978, description="disConstant",layout=widgets.Layout(width ="200px"),slider_color='red'),
                       dir = widgets.FloatSlider(min=0.000002328, max=0.000002328, value=0.000002328, description="dirConstant",layout=widgets.Layout(width ="200px"),slider_color='red'),
                       
    )

GUIpage = widgets.HBox([vbox_image_label,vbox_space, vbox_info,vbox_space, w])
display(GUIpage)

###  6. Summary and Remarks
"The faster the ball moves,say from the long snapper to the punter for example, the quicker the punt gets the ball off into the air thereby reducing the number of turnovers for his team on 4th down". This captures the essence of NFL play games.

NFL datasets were analysed and some insights drawn from the analysis were discussed with a view and desire to participating in strenghtening the knowledge-base of the NFL.

Susequently, a simple and plausible approach for how to improve on speed, of specialTeams in particular, was presented.

The concept and the suggested approach are firm and enduring. This is because it could be easily updated as more data become available. The concept bothers on close monitoring of 'speed' of players to promote and ensure improved performance and team victory rates.

Furthermore, it is possible to search for more robust models which could greatly improve the veracity of the concept of this presentation.