# Introduction
Understanding when, why and how penalties occur in the National Football League would allow the NFL and all 32 teams to build a greater understanding of the game. Penalties committed on the defensive side of the ball make the game easier for the offense, giving the offense free chances, free yards and ultimately making scoring a lot easier. Particularly, playing in the secondary is one of the most challenging positions in football, as the athleticism and ability to “predict” where a receiver is going is immensely difficult, which can be cause for penalties to occur on the defense. For this notebook we analyzed a few different ideas when it came to penalties:

1. What penalties are most likely to be called and who are they likely called on?
2. In what situations is a defense more likely to commit a penalty?
3. Overall, does the likelihood of penalties decrease as the season goes on?

# Data Preprocessing
Though we were given the NFL's internal data from all passing plays in the 2018 season, data required considerable cleanup and preprocessing to be useful for this project.

Our preprocessing steps included creating a database on SQL Server to store all the given data and create new data sets. We first imported all the given data and assigned them primary and secondary keys as required. Then we joined each week's data set with plays.csv based on 'playId' and 'gameId' and created new tables for each week. This allowed us to access a more detailed data set for each week that contained information about plays and players. We also cleaned the data to ensure that the coordinates are flipped when necessary to match the direction of the offense’s target endzone. Furthermore, we created a calculated 'distance from line of scrimmage' column to understand player positioning. Since we were given coverage data for each play in Week 1, we added an extra column to the Week 1 data set and joined the Week 1 and coverage datasets based on 'playId' and 'gameId'.

For specifically analyzing penalties data, we filtered the plays.csv to when the 'penaltyCodes' column was not null and appended all of these to a new penalties data set. In this new table, we created and added a column called 'penPosition'. Using each week's dataset, we saw where the 'penaltyNumbers' column matched the 'jerseyNumbers' column and then pulled the position of the player and inserted it into 'penPosition'. This allowed us to see the position of the player(s) that a penalty was called on. We also created a column called 'penaltyYardage' that calculated how many yards the offense gained or lost from penalties for each play. Finally, we added a column that showed whether a penalty was accepted, declined or offsetting.

We used a combination of different technologies to perform our analysis and create our visualizations. We used SQL, Excel and Python to process data, and also used SQL specifically to calculate the the correlations between penalties and positions below. We also utilized Tableau to further understand these correlations and to create visualizations.

# Penalties Findings
For our research on penalties, we decided to focus specifically on analyzing trends around Defensive Facemask, Defensive Holding and Defensive Pass Interference penalties as well as trends around penalty yardage.

For the three penalties we investigated, we found that the penalty call was almost always on cornerbacks. Safeties and Outside linebackers were a far second and third for these penalty calls.

**Facemask**

Out of all the defensive facemask calls in 2018, the majority occurred in a traditional dropback when the offense was in Shotgun formation. Almost all the times when this call was made, the catch was completed. Most of these plays also had 4 - 6DBs meaning that there were likely more players outside of the box playing in either nickel or dime coverage.

From these statistics, we made the inference that when the QB drops back to pass, cornerbacks drop back to cover the pass. When the catch is completed, cornerbacks often get called for a facemask penalty when trying to tackle the receiver or when trying to shed a block downfield.

**Defensive Holding**

When looking into DH penalties, we specifically delved into designed and scramble rollout dropbacks. Interestingly, we found that the pass result was almost always complete, so the holding call was most likely declined, meaning the penalty yardage was zero. Even here, cornerbacks were still responsible for most of the holding calls. There were some linebackers with this penalty as well on the rollout, but this was a much smaller number.

From analyzing these numbers, we hypothesized that when the quarterback rolls out left or right and the offensive line is trying to block for him, the corners are more likely to hold the receiver to ensure that he does not get free to catch the ball. This could either be the result of a designed rollout in some circumstance, or in other situations a broken play.

**Defensive Pass Interference**

Defensive pass interference penalties happen on every down but are most likely to happen on first down. Across 202 pass interference calls in 2018, NFL Defenses gave up 3,526 yards for an average of nearly 17.5 yards per pass interference. Of these 202 DPI calls, 157 were committed by cornerbacks, the most of any position.

As a part of our analysis, we visualized the idea of ‘free yardage’ given to offenses because of defensive pass interference calls. In our Tableau file, you can find a ‘Free Yardage Visualization’ which includes nearly 50 data points where the offensive play result was 0 yards, but because of pass interference, the result of the play ended up being huge.

**Penalty Yardage**

We hypothesized that penalty yardage per week would decrease over the course of a season as teams become more disciplined. However, we did not find that trend to be the case. In the first two weeks of 2018, penalty yardage totals on passing plays for the entire league were just over 500 yards before jumping to over 650 yards in Week 3. For most of the 2018 season, penalty yardage totals on passing plays ranged between 250 to 550 total yards, with one outlier in Week 13.

In Week 13, the total penalty yardage on passing plays skyrocketed to 876 total yards. This unusually high number of penalty yardage largely stemmed from a season-high 483 penalty yards via defensive pass interference penalties. When analyzing this number in context of Week 13, we realized that over half of these 483 yards came from six different pass interference calls of 25 yards or longer across various games, which created this unusual outlier in our data.

Overall cornerbacks allow the most penalty yardage by far of any position, nearly 4000 total yards per season.

In [None]:
from IPython.display import IFrame

#https://public.tableau.com/profile/dan.mcgee#!/vizhome/VisualizationofCoverageData/SuccessofCoverages
#https://public.tableau.com/views/USDroughtIntensityAnalysis2000-2018/USDroughtIntensity2000-2018?:embed=y&:display_count=yes

# Import Tableau Visualization 
IFrame('https://public.tableau.com/views/VisualizationofPenaltyData/FreeYardageVisualization?:embed=y&:display_count=yes&?:showVizHome=no', width = 750, height = 1000)

# Summary, Conclusions and Future Considerations
Facemask penalties can be called on almost any player on the field, but in terms of pass coverage, they are almost always called on cornerbacks. We can attribute this to the fact of smaller cornerbacks struggling to tackle larger wide receivers downfield, grabbing anywhere they can to bring their opponent down. Facemask discipline can be improved by working on tackling form or can be completely avoided if corners deny the receiver the ball completely.

When analyzing defensive holding calls, we came to a few conclusions regarding quarterback rollouts. When the quarterback finds no one to be open after making his reads, he may scramble out of the pocket to generate some offense. In these situations, it is likely the wide receiver may abandon his route to break free and get open for the quarterback. When this happens, it is more likely the cornerback is encouraged to hold the receiver as to not allow him to break loose and catch the ball. Put simply: the longer the quarterback has the ball in his hands, the more likely a defender is to hold. Defensive holding calls can thus be prevented with increased pass rush.

Defensive pass interference calls are the costliest of any penalty an NFL defense can commit, as they are on average 17.5 yards which is longer than the 15-yard maximum penalty given for personal fouls. Though cornerbacks are mainly to blame for committing pass interference, that is largely because they are the ones playing coverage on receivers nine times out of ten. Overall, NFL teams cannot typically expect that they will become more disciplined from a penalty standpoint as the year progresses as there is no direct downward trend in penalty yards per week as the season rolls on. However, if defenses can become more disciplined and stay away from defensive holding and pass interference, they will inevitably find more success as the free yardage and free downs that come with those plays can be detrimental.

In future situations, we would be interested to investigate the trends in pass coverage penalties compared to those in running situations. Some penalties we would be interested to investigate are facemasks and offensive and defensive holding.