In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Identification & Evaluation of Monsterbacks
An exploratory analysis aiming to identify and evaluate the utility players of the secondary

### The So What

The game of football is constantly changing. Analytics is on the rise, Quarterbacks are doing more than ever before, and overall game strategy is becoming more aggressive. Thus, defenses must be ready for any and everything. And with cap space and roster spots at a premium, teams want to invest in talent that can fit any scheme or style of play and perform against any opponent. A monsterback is just that - someone that is a certified defensive play maker. Although his position is a safety or cornerback, he can line up anywhere on the field, remains tight in coverage, and has tunnel vision on the ball after the snap. This analysis will focus on identifying who these players are by keying in on what makes them great - successful play, versatility, and ability to close.
 
A prime example of a monsterback players is the first-ballot hall of famer Troy Polamalu, known not only for his great hair, but his ability to make big time plays all over the field. And more recently, Jamal Adams. [Matt Bowen's 2020 Shutdown Index](https://www.espn.com/nfl/insider/story/_/id/30470420/best-nfl-defensive-backs-minkah-fitzpatrick-tyrann-mathieu-matt-bowen-2020-shutdown-index) names Adams as the best monsterback, citing his "versatility as both as a safety and overhang box defender, he can align in multiple spots to play top-down on the ball."

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/malcolm_jenkins.jpg?raw=true" width="600px">  

### Analysis
 - Identify high performing defenders
 - Determine who the hybrid defensive backs are
 - Evaluate these players' closing ability
 - Determine who stands out

### Top Performers
Diving into the analysis we first take a look at some of 2018's highest performing defenders when targeted. To assess performance, a few key metrics were calculated using the events and pass results present in the data, with a large emphasis on EPA - Because football is so situational, the EPA metric is typically the best indicator of success given the current situation. For example, an offense gaining 10 yards on first down from their own 5 yard line, has a much different impact on the game than a successful 3 yard pass on 4th and goal. EPA accounts for these nuances and provides a more robust assessment of performance.
 
To be deemed an eligible player for this table, a defender must have played a minimum of 100 snaps and been targeted at least 50 times throughout the season. While determining the number of snaps played by each defensive player was straight forward, the remaining outputs relied on target receiver designations provided by the additional dataset 'Target Receiver.' A breakdown of these outputs is as follows -

 - **Number of snaps** : A simple count of the number of plays where the defender was present at the time of the snap (event = ball_snap)
 - **Number of targets** : A defender is tagged as targeted if they are the closest defender to the target receiver at the time the pass has arrived (pass_arrived). If there was no target receiver present in the data at the time of pass arrival, then the closest defender at the frame of the pass outcome (i.e. pass_outcome_caught, pass_outcome_incomplete etc.) was marked as targeted.
 - **Completion %** : A simple percentage of the completed passes (passResult = C) on plays that the defender was targeted.
 - **EPA** : A simple average EPA of all plays where the defender was targeted. The lower the EPA, the better.
 
<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/at_a_glance.png?raw=true" width="500px">  

Sorting on EPA, it appears Marlon Humphrey is the top defender that meets our criteria. Along with hosting the lowest EPA, Humphrey also has the 2nd lowest completion percentage on targeted plays. It also stands out that New England, the LA Chargers, Miami, and Jacksonville each have two players appearing in the top 15. We note these outcomes and push forward to evaluate defensive back versatility. 


### Versatility
As mentioned, Monsterbacks are players that can play many roles. They thrive in the slot, deep in the field, and even hovering near the line to sniff out the run. Because of their ability to succeed in many line ups, these players often split time among these common positions at the snap. To determine who these hybrid defenders are we must evaluate where they line up and how often. However, unlike the meticulously combed over PFF data provided to many NFL teams, these are not given and we must create these designations using the data provided.
 
The 4 key designations considered are outside, slot, safety, and in the box. If a defender was found to line up in one of these positions at the time of the snap during a play, a binary indicator was created for the appropriate position (i.e. for play 146 in game 2018090600, Malcolm Jenkins was the closest defender to the outside WR Calvin Ridley and was considered to be lined up 'outside'). The calculations for these positions are included below -

 - **Outside defender** : The closest defender to the two furthest outside offensive players based on x and y coordinates
 - **Slot defender** : The slot position is any eligible receiver lined up inside the farthest eligible receiver from the ball and outside of the tight end or offensive tackle on the line of scrimmage. So this was deemed the closest defender to the non-outside offensive player(s), excluding the QB, that is not in the backfield (3 yards behind the ball) or attached to the line (3 yards away from ball on the y-axis, in lieu of data on the offensive line) at the snap.
 - **Safety** : A defender outside of the box (8+ yards beyond the line of scrimmage according to PFF definitions)
 - **In the box** : A defender within the rectangle formed 3 yards in either direction of the ball (on the y-axis) and less than 8 yards beyond the line of scrimmage
 
 If a defender was tagged as both outside and safety based on the above definitions, the outside designation took precedence if the defender was aligned directly across from the offensive player -  In this case, within 1 yard in either direction on the y-axis. The same rule applied if a defender was tagged as both the slot and safety position. Additionally, there were many plays where a defender was tagged as both outside and slot. This can be attributed to a number of factors whether it be bunch plays or movement at the time of the snap, but for simplicity, the outside designation took precedence over slot. Plays where the defender was not found to be in any of the four key positions were tagged as 'other.'

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/split_plot.png?raw=true" width="400px"> 

Assessing the results above, it is important to note opportunities for additional exploration. The low percentages of snaps in the box and a significant number of snaps falling in the ‘other’ category both warrant further investigation. It is possible that the defined box area is too restrictive causing such low outcomes. There may also be something in the data that we have overlooked leading to the high number of snaps not covered in our four designations. Due to these considerations, we do not consider the box snaps when determining versatility players.
 
To qualify as a hybrid defender, a player must be a defensive back (excludes linebackers), play at least 100 snaps, and must play a minimum of 10% of snaps in the outside, slot, and safety positions. In total, there were 24 players that make the cut, including FS Tyrann Mathieu, CB Bashaud Breeland,  and S Derwin James. In all, 15 of the players were safeties, 9 cornerbacks, and 1 designated DB.

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/monster_table.png?raw=true" width="600px">  

### Closing Ability
In addition to being versatile, a monsterback also showcases great closing ability. They have the speed to cover ground and minimize the distance between them and the ball/targeted receiver once the ball is in the air. Typically, this places them in the perfect position to make the play - whether it’s a pass break up or tackle.
 
To evaluate this skill among the hybrid defenders, we narrow in on what takes place between the time of throw and the time the ball arrives to the target receiver. Here we ask a few key questions - how does the defender’s distance from the targeted receiver change through the play? How often is the defender in a position to possibly ‘close’ a play at the time of the throw? In these instances, how often does the defender close?
 
When a defender is within 10 yards of the targeted receiver at the throw, the play will be considered possible to close. A play will be considered a closed play if the defender is within 1.5 yards of the target receiver at the end of the interval (in the frame of the pass outcome event). We identify these plays and calculate the distance between the defender and the targeted receiver at each frame tagged with an event during the interval. The stand out monsterbacks will close in on the ball after the throw often.

In one extreme case of great closing, in the week 13 matchup between the Chargers and Steelers, Derwin James was 17 yards from the targeted receiver Antonio Brown when the ball was thrown. However by the time the pass fell incomplete, James was able to close in on Brown. This is an example of James covering ground and coming in quickly to help on a play from the safety position, and is quite impressive considering the distance from Brown at the time of the throw was too far to meet our criteria of plays the defender could have closed on.

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/long_close.png?raw=true" width="600px">  

Overall, we see that Troy Hill is our most efficient closer among hybrid defenders, closing in on 23% of his eligible plays. Derwin James and Minkah Fitzpatrick also stand out as a familiar name from our initial evaluation of defender performance, ranking 2nd and 4th in EPA respectively.

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/closing_efficiency.png?raw=true" width="500px">  


However, a key consideration in this evaluation is that we do not know the defender's assigned role was. In some instances where a defender has not closed a play, it is likely the defender was simply doing his job, and understanding this is key to a true assessment in closing efficiency. Unfortunately, the data does not explicitly give us the defender’s assignment or provide a coverage designation for all plays. This is something to keep in mind to extend and enhance this type of analysis.

### Conclusion
To determine final rankings of monsterbacks in the 2018 season, we calculate an overall score for each hybrid defender by taking the average of the player’s rankings in EPA when targeted and closing efficiency.

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/Derwin%20james.png?raw=true" width="600px">  

Of our qualifying monsterbacks, **Derwin James (S, LAC)** takes the top spot, ranking in the top 20 in both closing efficiency and EPA. Dolphins DB Minkah Fitzpatrick comes in a close second with an impressive ranking in EPA, but slipping out of the top 20 in closing efficiency. For the LA Chargers (and each team with players in the table below), this monsterback ranking highlights a key player that they can use in a variety of situations. Meanwhile for their opponents, this analysis points to defenders opposing teams should keep on their radar when determining game strategy. 

<img src="https://github.com/monicalashay/Big_Data_Bowl_21/blob/main/final_scores.png?raw=true" width="500px">  

Future applications of this evaluation include use cases in free agency and in preparation for the draft. With the analysis presented here and used throughout a season, scout teams will be able to identify defensive backs that play monsterback roles and prepare their teams to succeed in many different schemes/situations. Finally, the monsterback metric can arm teams with the data needed when looking for well rounded and versatile defensive players. 


Code and graphics can be found on Github [here](https://github.com/monicalashay/Big_Data_Bowl_21).