NFL Punt Analytics

Programming note: We performed our analysis in SQL Server and Visual FoxPro. We are just using the Kernel to list the queries and programs we ran, knowing they will not run in the Kernel. When reviewing the Kernel, we noticed that asterisks were being deleted from our statements when the Kernel was saved (they showed up OK in the editing window). Therefore, in select statements, if you see a table being selected without specified columns (table name followed by a period and no column name after), there should be an asterisk at that spot.

Our key thinking point was how to make a rule change for punts to increase player safety, and the rule had to be actionable. The thing that helped us most for this Kaggle competition was something unexpected.

The main Kevin (McKee) has been watching the "Last Call" show for the past 2 years. After each weekend of NFL games, Mike Pereira and Dean Blandino (former Vice Presidents of Officiating at the NFL) explain calls and interesting things from the prior week’s games. They not only go into the details of the rules, they also explain the theory behind the rules. Getting in the mindset of former NFL officials, and learning about the rule making process, was extremely valuable when it got to the point of proposing a new rule. Kevin felt like he had insider knowledge from watching the show.

We decided to ignore things that couldn't be changed. For example (fact not true), assume 100% of punt concussion occurred in domed stadiums. Would we propose a rule to outlaw domed stadiums? Extreme example, but we also considered a plethora of other examples we felt could not be implemented.

There have been numerous rules already put in place (ie, "12-2-8   Makes lowering the head to initiate contact with the helmet a foul.") (source: NFL 2018 Rule Book) to increase player safety. We wanted to make sure not to duplicate any of those rules. 

Our initial thoughts included line formation changes, gunner changes and punt receiver positioning. Our proposal was not something we expected before we started looking at the data.

We first ran the SQL statements in the "SQL tables" kernel. This brought all the competition csv files into a SQL database. The Next Gen Stats files were all put in one table. Microsoft SQL Server 2017 was used.

Looking at the tables, we noticed the role data for the players did not have if the player was on offense or defense. We created a new table with the following query:

SELECT Distinct [Role]

  INTO [KaggleNFLPunt].[dbo].[roles]

  FROM [KaggleNFLPunt].[dbo].[play_player_role_data]

  ORDER BY [Role]

We then added a column to this table called offdef, and manually entered "O" or "D" depending on if this player role was on offense or defense. Offense or defense was determined using the appendix from the Punt Data and Attributes Manual included in the contest data.

We ran the following to get the total number of plays in the database:

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID] 

There were 6,681 rows returned, so there were that many plays in the database. (Going forward, when we give a play count after a query, it will be the number of rows returned by the query.)

We then ran the following: (Going forward, we will not be putting this in front of each query. Assume each query listed is run and the results discussed below the query.)

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE [video_review].[PlayID] > 0

This told us there were 37 plays resulting in concussions in the database.

Next, we had to determine how many of the punt formation plays were punts. We noticed that if the word "punts" was not in the play_information database column PlayDescription, the play was not a punt.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

There 219 non-punt plays in the database.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND [video_review].[PlayID] > 0
  
One of the non-punt plays resulted in a concussion.
Notice the previous 2 queries are the same except for the last line:

AND [video_review].[PlayID] > 0

Going forward, when we give a query that returns a number of plays, we will also give the number of concussions, but not include that query, because it will be the same as the previous query except for adding "AND [video_review].[PlayID] > 0" at the end.
  
We wanted a slide (#2) to break out the various non-punt plays in the data. 

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('blocked',[play_information].[PlayDescription]) > 0
  
29 blocked punts, 0 concussions

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('blocked',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('aborted',[play_information].[PlayDescription]) > 0
  
5 aborted punts, 0 concussions. The aborted punts include the punter fumbling based on the play description.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('blocked',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('aborted',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('pass',[play_information].[PlayDescription]) > 0

19 passes, 0 concussions. 
Note: The order we are running the SQL statements determines the results. There was one blocked punt that turned into a pass play. As we are counting that in the blocked punts, we are not counting that in the pass plays. This is not relevant to our analysis, just wanted to point it out for completeness. 

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('blocked',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('aborted',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('pass',[play_information].[PlayDescription]) = 0  

  AND CHARINDEX('penalty',[play_information].[PlayDescription]) > 0 

142 penalties, 0 concussions. (Does not include 6 penalties in the blocked category, and 4 in the pass category.)

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('blocked',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('aborted',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('pass',[play_information].[PlayDescription]) = 0  

  AND CHARINDEX('penalty',[play_information].[PlayDescription]) = 0 

24 runs, 1 concussion. (Since the play was none of the above types, it was a run.) 
All this information is summarized on Slide #2.

Now it is time to analyze the punt plays.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) > 0

1,663 fair catches, 2 concussions.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) > 0

810 downed, 3 concussion.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) > 0

402 touchbacks, 0 concussions.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('muff',[play_information].[PlayDescription]) > 0

201 muffed punts, 2 concussions.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('muff',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('for ',[play_information].[PlayDescription]) > 0

2,746 regular punts (i.e. none of the types already listed), 29 concussions.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('muff',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('for ',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('out of bounds',[play_information].[PlayDescription]) > 0

639 out of bounds, 0 concussions.

SELECT [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('muff',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('for ',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('out of bounds',[play_information].[PlayDescription]) = 0

1 uncategorized play, 0 concussions. Reading the description, this was a touchback, so this will be added to the touchback total on the slide.

We prepared slide #3 with this information, along with the percentage of punts each play accounted for, and the concussion percentage for each play type. We noticed that fair catches had a concussion percentage of only 0.12%. If we could increase the percentage of fair catches, this should reduce the number of concussions. We figured the way to do this would be to award the team some yardage after a fair catch. To determine how many yards, we needed more information.

We first wanted to see average return distance. 

SELECT SUBSTRING([play_information].[PlayDescription],CHARINDEX('for ',[play_information].[PlayDescription])+4,3) As RetYds, 

  [play_information].*, [video_review].* 

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

  AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('muff',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('for ',[play_information].[PlayDescription]) > 0

  order by retyds

From the SQL results window, we selected all and copied with headers, and pasted to an Excel workbook. We inserted column B, and used the following formula to put the return yards in this column, as the yards from the SQL query needed some processing.  

“=IF(ISNUMBER(A2),A2,IF(LEFT(A2,2)="no",0,VALUE(LEFT(A2,1))))”

We found that 10.1 yards was the average return distance. We then created a pivot table and used that to create the graph seen in Slide #4. 

SELECT [ngs].* , 

  [play_player_role_data].[Role],
 
  [roles].[offdef]

  FROM [KaggleNFLPunt].[dbo].[ngs]

  LEFT JOIN [KaggleNFLPunt].[dbo].[play_player_role_data]

  ON  [ngs].[GameKey] =  [play_player_role_data].[GameKey]

  AND  [ngs].[PlayID] =  [play_player_role_data].[PlayID]

  AND  [ngs].[GSISID] =  [play_player_role_data].[GSISID]

  LEFT JOIN [KaggleNFLPunt].[dbo].[roles]

  ON [play_player_role_data].[Role] = [roles].[Role]

  WHERE [ngs].[Event] = 'punt_received' AND [roles].[Role] IS NOT NULL

  ORDER BY [ngs].[PlayID], [ngs].[GameKey], [roles].[offdef]

This gave us the Next Gen Stats info at the time the punt was received. We copied the results to a csv file named puntngs.csv. (We ran the same query later, changing the event from ‘punt_received’ to ‘fair_catch’ to analyze fair catches.)

SELECT  [play_information].*, [video_review].* 

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

 AND CHARINDEX('fair catch',[play_information].[PlayDescription]) = 0

 AND CHARINDEX('downed',[play_information].[PlayDescription]) = 0

 AND CHARINDEX('touchback',[play_information].[PlayDescription]) = 0

 AND CHARINDEX('muff',[play_information].[PlayDescription]) = 0

  AND CHARINDEX('for ',[play_information].[PlayDescription]) > 0

This gave us the play information for regular punt plays. We copied the results to a csv file named puntdefd.csv.

Following is the Visual Foxpro code we ran to get the distance of the nearest 3 opposing players at the time of the catch. We assumed the punt returner was the man fielding the ball. (There was no ball data in the Next Gen Stats provided to us, which would have allowed us to identify the player receiving the punt.) We feel any error because of this assumption will be minimal. When adding the Next Gen Stats to our play information database, if there was no Next Gen info, or if there were duplicate punt_received events for the specific Game/Play, we would delete that record from our play information database. We ended up excluding 54 plays (and 46 fair catch plays below) due to this. As this was less than 2% (3% for fair catches) of the plays, we do not think this missing information would have changed our analysis.

----- Begin FoxPro Code -----
SET EXCLUSIVE OFF

CLOSE ALL

CLEAR ALL

RELEASE ALL

SET TALK ON

SET DEFAULT TO c:\nfl

CLOSE DATABASES ALL

IF FILE('puntngs.dbf')

  DELETE FILE 'puntngs.dbf'

ENDIF

CREATE TABLE puntngs( ;

  nSeason    N(4,0), ;

  nGameKey   N(3,0), ;

  nPlayId    N(4), ;

  nGSISID    N(5), ;

  cTime      c(23), ;

  nX         N(12,8), ;

  nY         N(11,8), ;

  nDis       N(4,2), ;

  nO         N(12,8), ;

  nDir       N(12,8), ;

  cEvent     c(24), ;

  cRole      c(4), ;

  cOffDef    c(1))

SELECT puntngs

APPEND FROM puntngs.CSV TYPE DELIMITED

IF FILE('puntdefd.dbf')

  DELETE FILE 'puntdefd.dbf'

ENDIF

CREATE TABLE puntdefd( ;

  nSYear     N(4,0), ;

  sSType     c(4), ;

  nGameKey   N(3,0), ;

  cGameDate  c(10), ;

  nWeek      N(2,0), ;

  nPlayId    N(4), ;

  cGameClock c(5), ;

  cYardLine  c(6), ;

  cQuarter   N(1,0), ;

  cPlayType  c(4), ;

  cPossTeam  c(3), ;

  cHTVT      c(7), ;

  cScore     c(7), ;

  cPlayDesc  c(254))

SELECT puntdefd

APPEND FROM puntdefd.CSV TYPE DELIMITED

ALTER TABLE puntdefd ADD COLUMN nDef1 N(12,8)

ALTER TABLE puntdefd ADD COLUMN nDef2 N(12,8)

ALTER TABLE puntdefd ADD COLUMN nDef3 N(12,8)

nPRX = 0

nPRY = 0

SELECT puntdefd

GO TOP

DO WHILE !EOF()

  SELECT puntngs.* ;

    FROM puntngs ;

    WHERE puntdefd.nGameKey = puntngs.nGameKey ;

    AND puntdefd.nPlayId = puntngs.nPlayId ;

    AND puntngs.cRole = "PR " ;

    INTO CURSOR curNGS

  nRecs = RECCOUNT()

  IF nRecs = 1

    nPRX = curNGS.nX

    nPRY = curNGS.nY

    SELECT puntngs.*, ;

      SQRT((nPRX - puntngs.nX)^2 + (nPRY - puntngs.nY)^2) AS nDefDist ;

      FROM puntngs ;

      WHERE puntdefd.nGameKey = puntngs.nGameKey ;

      AND puntdefd.nPlayId = puntngs.nPlayId ;

      AND puntngs.cOffDef = "O" ;

      ORDER BY nDefDist ;

      INTO CURSOR curNGS
      
    SELECT curNGS

    GO TOP

    REPLACE puntdefd.nDef1 WITH curNGS.nDefDist

    SKIP

    REPLACE puntdefd.nDef2 WITH curNGS.nDefDist

    SKIP
    REPLACE puntdefd.nDef3 WITH curNGS.nDefDist

  ELSE

    SELECT puntdefd

    DELETE

  ENDIF

  SELECT puntdefd

  SKIP

ENDDO

SELECT puntdefd

PACK

COPY TO puntdefd TYPE xl5

----- End FoxPro Code -----

This gave us an Excel workbook, which when then worked on. We did the same analysis for fair catches, running the following 2 SQL commands before running the Foxpro program above.

SELECT [ngs].* , 

  [play_player_role_data].[Role],
 
  [roles].[offdef]

  FROM [KaggleNFLPunt].[dbo].[ngs]

  LEFT JOIN [KaggleNFLPunt].[dbo].[play_player_role_data]

  ON  [ngs].[GameKey] =  [play_player_role_data].[GameKey]

  AND  [ngs].[PlayID] =  [play_player_role_data].[PlayID]

  AND  [ngs].[GSISID] =  [play_player_role_data].[GSISID]

  LEFT JOIN [KaggleNFLPunt].[dbo].[roles]

  ON [play_player_role_data].[Role] = [roles].[Role]

  WHERE [ngs].[Event] = 'fair_catch' AND [roles].[Role] IS NOT NULL

  ORDER BY [ngs].[PlayID], [ngs].[GameKey], [roles].[offdef]


SELECT  [play_information].*, [video_review].*

  FROM [KaggleNFLPunt].[dbo].[play_information]

  LEFT JOIN [KaggleNFLPunt].[dbo].[video_review]

  ON [play_information].[GameKey] = [video_review].[GameKey]

  AND [play_information].[PlayID] = [video_review].[PlayID]

  WHERE CHARINDEX('punts',[play_information].[PlayDescription]) > 0

 AND CHARINDEX('fair catch',[play_information].[PlayDescription]) > 0

To calculate the return distance based on the nearest defender (Slide 5), we put the following formulas in the Excel spreadsheet.

Col S: “=INT(O2)” – Integer value for closest defender

Col V: “=IFERROR(SEARCH("for ",N2),0)”  -- use for next calculation

Col W: “=IFERROR(IF(V2>0,VALUE(MID(N2, V2+3,3)),"N/A"),0)”  -- return distance

For punt returns, we created a pivot table, and then created a chart from the pivot table data (Slide #5). This confirmed that the further away the defenders are at the time of the catch, the more yardage one gains. We noted that if a defender is within 6 yards, average return distance is less than 8 yards. If the nearest defender is more than 10 yards away, the average return is greater than 10 yards.

We also created pivot tables for both punt return and fair catches, showing the number of each play based on distance from the nearest defender. We combined the data to get the fair catch % of total punt plays broken out by the distance to the nearest defender (Slide #6).

Fair catches are called over 50% of the time when the nearest defender is within 4 years. Eyeballing slides 4 and 5, punt returner seems to be giving up about 5 yards by doing this. Based on this analysis, we are proposing the rule changes in our slides. See slides for details.

