In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Introduction**

   Many strides have been made to advance the game of football on two sides of the ball over the years. However, many forget about the third side of the ball–special teams. This year’s prompt seemed like an opportunity to take a deep dive into an overlooked aspect of football that can have a significant impact on the final score of the game. 

   What can also often be overlooked is the rigor of manipulating football data to make it ready for analysis compared to other sports such as baseball, known for being the first sport to truly embrace analytics. One factor to consider for comparison’s sake is that in baseball the focus of the analysis is fairly consistent, as in most cases it will focus on outcomes at the plate where each batter faces the outfield. However, in football each team marches up and down the field facing an opposing direction. 

   It is often said that baseball these days is played on a spreadsheet. In order to play football on a spreadsheet a sizable amount of data manipulation is required in order for the data to be normalized to a point where it can be analyzed in a consistent and easily translatable, actionable manner. 

   Speaking of spreadsheets, teams often choose reserve players from their full rosters at all levels of the game to play special teams, as it can be treated as an afterthought not fit for the starters. However, teams that master the discipline of special teams can give themselves an edge to win, and every little advantage matters, as the great Vince Lombardi once famously declared that “football is a game of inches, and inches make the champion”.

   A punt returned for a touchdown, or a blocked field goal can swing the results of a game. We believe that investing into special teams is definitely a worthwhile pursuit especially at a time where many teams have invested heavily into analytics to gain an edge over their opponents. These recommendations could lead to wise teams getting one step ahead.  

**Focus of Analysis**

   For our project we decided to focus on punts, as those inches translated into yards and subsequently–field position, can make all the difference. We decided to make our key performance indicator (KPI) the return yardage on punts and we wanted to focus on what teams can do to improve their outcomes in terms of our KPI during each punting sequence. 

   We considered various topics but ultimately we took a particular interest in the punt formation in terms of the matchup of gunners versus vises. We believe that this is the most crucial matchup in terms of explaining net punt yardage. The reason for this is because having one uncovered gunner can wreak havoc on the chances of having a successful punt return.

   Field position is crucial in the game of football, and teams have an incentive to place high importance on it. The average drive has begun at the 28-yard line for the 2018-2020 seasons according to pro-footballreference.com data. We aimed to take a look at the same time range as the data given to us for this project for consistency purposes. In those three seasons, the average start of each drive and the standard deviations were as follows:



https://docs.google.com/document/d/14x7_iEuB1fiznRcfLrVp2jJsH9jDuqibQ8Eyx2dcFHw/edit

A difference of even two yards would be considered significant as the standard deviations seen here hover between the 1.5-2-yard range. This is important given the fact that we also observed a moderate correlation between the average start of each drive and points per drive (.36 correlation coefficient) as well as drive scoring percentage (.38 correlation coefficient) for these three seasons. 

With that being said, we felt that an analysis on punting strategies leading to an improvement in a team’s net punt yardage relative to an opponent’s would be a worthwhile pursuit since the recommendations would be easy for any team to implement and the benefits would be impactful. Football is about placing your moving chess pieces in the right places at the right time given the situation and we believe that we have an actionable recommendation for punts. 

   Consider the distribution below. Moving up even a few yards would make a significant difference. A three-to-five yard increase could be the difference between having to punt or kicking a field goal and those three points could make or break a game. 


https://docs.google.com/document/d/1roBJaj9mgQWFhG8txazPwgpqas5_otMHXJms74nSkOs/edit

**Process**

   These were the steps we followed to make the provided data ready for our analysis:


* Extracted, transformed, and loaded all five of the initially provided datasets
* Enhanced and transformed the loaded data to address things such as:

 - Providing a direct means of player-team or player-game association. We augmented the model by creating a new table named player_game_team from existing data.
 - Normalized the returnerId values into a new table named play_returners
 - Normalized the penaltyCodes and penaltyJersyNumber values into a new table named play_penalties
 - Parsed penaltyCodes and pentalyJerseyNumber values in the plays data and stored them in a dataframe
 - Computed the defensePenaltyCnt and offensePenaltyCnt values
 - Parsed the multiple values delimited by a semicolon in the PFF Scouting data  
 - Added the following fields to the existing pff_scouting database table:
   - gunnersCnt = count of gunners
   - visesCnt = count of vises


We then used Stata and SPSS to do statistical analysis.

*Flipping the Field For More Ready-for-Review Plays*

Building off the dataset we described above, these were the steps we followed for our method of visualizing plays in a normalized format across all teams.


* Transformed all the right-to-left plays to be left-to-right for consistency
* Rotated the field 90 degrees so plays display vertically instead of horizontally 
* Looked at video of plays to audit the transformation and ensure it still had the original details intact
* To compare multiple plays in the same visual we normalized the line of scrimmage (LOS)
  - We calculated the offset of each player's (and the football's) y-coordinate from each play's LOS and then recalculated the y-coordinates to be relative to a LOS that is always at the 50-yard line.


**Methodology**

Our analysis used player-tracking data at the time the ball was snapped in order to first find each punt formation. We were interested in each player’s position on the field in terms of their x-y coordinates, but particularly interested in the count of gunners and the count of the opposing vises. Our assumption was that if there was an imbalance between the two we would observe significant differences in the outcome in terms of net punt yardage.

 Our analysis looked at 5,935 punts from 2018-2020 and we found the exact distance of each punt using the x-y coordinates of the football itself and tracked it’s flight path and subsequent return when applicable. 

  Our data in its inception included punt plays spanning from left-to-right and vice-versa. In order to make each play easier to understand and compare we normalized all of the plays to be left-to-right, and eventually south-to-north upon its vertical transformation. This would allow for easily applicable overlays to two or more plays.

  More details can be found on the transformation process and its explanatory power in the Results section to follow, as well as the Appendix for how it was done. 


**Results**

*Hypothesis Testing:*
- Data Transformations:
  - Using our dataset, we created a column to flag plays where the gunners outnumbered the vises. We used this column to split the sample.
  - Filtered the dataset to only follow the football in order to exclude duplicated playIDs.

SPSS Output: Ran an independent t-test on this data. 


https://docs.google.com/document/d/1czGlFrGW1BFJ5sO87ZXJlU80Bv3oCWZdtDGqI2BkE14/edit

- Interpretation:
  - We accepted “Levene’s Test for Equality of Variances” because we observed p>α (.083>0.05).  
  - There is a significant difference in net punt return yardage when a team’s gunners outnumber opposing vises (t5933=-3.476, p<.001).
  - We can see that the difference in means for our sample data is -4.553 and the 95% confidence interval shows that the true difference in means is between -7.121-1.310. 
  - The difference in return yardage when gunners outnumber vises (Mean=36.03; SD=15.72) and less or equal vises to gunners (Mean=40.59; SD=12.41) was significant (t5933=-3.476, p< 001), therefore we fail to reject the null hypothesis (where the null is: an advantage in gunners should lead to an decrease in net punt yardage). 
  - Recommendation:  Having 1 or more gunners than vises results in a decrease of 4.5 yards per punt return.


- In cases where vises outnumber gunners:
  - Data Transformations:
    - The same as before, except this time we created a column that flagged for cases where vises outnumbered gunners. 
  - SPSS Output: Ran an independent t-test on this data. 


https://docs.google.com/document/d/1eElY-2B6r2kwzFZt3_SRGi7BUoaOVmtzC7yuq0DDq-A/edit

- Interpretation:
  - We accepted “Levene’s Test for Equality of Variances” because we observed p>α (.485>0.05).  
  - There is a significant difference in return yardage when a team’s vises outnumber opposing gunners  (t5933=8.55, p<.001).
  - We can see that the difference in means for our sample data is 2.835 and the 95% confidence interval shows that the true difference in means is between 2.185 – 3.485 yards. 
  - The difference in net punt yardage when vises outnumber gunners (Mean = 42.27; SD=12.01) and less or equal vises to gunners (Mean=39.44; SD=12.637) was significant (t5933=8.550, p<.001).
  - Here we fail to reject the null hypothesis that having an advantage in vises leads to an increase in net punt yardage
  - Recommendation: Having 1 or more vises than gunners results in an increase of 2.84 yards per punt return.


Now, in order to illustrate our directional normalization results, we will share examples. PlayId 218 from gameId 2019092211 when the Saints played the Seahawks is shown below, pre-snap. It is not normalized and based on the x-y coordinates the play moves from left to right. Deonte Harris scored a touchdown on this play (https://www.youtube.com/watch?v=-h34qUml52c). 


https://docs.google.com/document/d/1Ovv10GP5nL9UVHM4dotpafCd6vbJy2hBFAd97q3zgmY/edit

For an example of a right-to-left play, let us review playId 1339 from gameId 2018121610. In this play the Patriots punt to the Steelers and do an excellent job of clearing the ball out of the end zone to down it at the Steelers one-yard line (https://www.youtube.com/watch?v=BQbs7gpfkcI). 


https://docs.google.com/document/d/1VlCzRkjH35BTmf5Oxiq-kRUG7Yyv6lqgrEIcaPWe-J8/edit

We then flipped that play so that it also went left-to-right.


https://docs.google.com/document/d/1D_mRTDqD_0XS84QcYlA-ATaAqc1VHWpH6sRnfUZar_4/edit

Once all plays were left-to-right, we transformed the x-y coordinates for a vertical field.

This is how playId 218 from gameId 2019092211 when the Saints played the Seahawks looks vertical:


https://docs.google.com/document/d/17kvJ6BQFmUbKl6cwPFa2gF5A_YDm9Uitfk8r-i4zW7U/edit

Now this is how the Patriots versus Steelers play looks verticalized:


https://docs.google.com/document/d/1z1d3Nys5J9mI0tNXQHcy2nStPbQTPZngTyBH7yia2Zo/edit

With the plays running in the same direction vertically it makes it easier for coaches to analyze game film. All plays have the endzone for the team possessing the ball at the top. This will save cognitive time for coaches and players, as the focus will always be just on understanding the play itself versus trying to figure out which direction things went and re-calibrating from there.

 Another benefit of this approach is the ability to overlay multiple plays on the same visual. The figure below has both plays we have been discussing on one field. In order to show multiple plays on a vertical field we need to normalize the LOS for each play. For this analysis we chose the 50-yard line.


https://docs.google.com/document/d/1QVxeSMt1vaA0CbKqBZs2oDEHCJOJyE5OwLLyrVriI_s/edit

**Conclusion**

  Looking at all plays in one consistent direction will save milliseconds each time teams review a new play. They no longer need to waste time trying to figure out which end zone is relevant for that specific play. Over the course of a season all those milliseconds add up and they give coaches more time to focus on instruction. 

  One might compare those milliseconds to the aforementioned inches and that extra instruction may just lead to the inches that make their team the champion. Especially if they can gain every seemingly small edge that they can, such as aiming for advantages in the gunner/vise matchup and wreaking havoc on special teams, as championship teams make sure to give their all on all three facets of the game. 


**Appendix**

- Detailed Code: https://www.kaggle.com/zayuhtheiv/appendix-a2b3e5/edit
- StataDataset: https://docs.google.com/spreadsheets/d/1pviymC8HL4Cgo77gm0RMDwno5YDg6YtUG2uDtBjXVJ4/edit?usp=sharing
- SPSSDataset: https://docs.google.com/spreadsheets/d/1kLzdpQKf1BBrQhZUvfIfID7Eq-Ewf6GX/edit#gid=232218285