# Who is the best box safety in coverage?
* Emily O'Connell and Benjamin O'Connell
* Big Data Bowl 2021 - College Competition 

As NFL offenses trend towards heavier passing attacks, defenses have been forced to adapt. More wide receivers are on the field at once, and tight ends have become premier pass catchers.  Traditionally, defenses primarily operate out of “base” personnel (4 pass rushers and 3 linebackers/2 cornerbacks/2 safeties in coverage) or “nickel” personnel (4 pass rushers and 2 linebackers/3 cornerbacks/2 safeties in coverage). However, linebackers usually don’t have the speed or coverage skills needed to guard speedy slot receivers or elite tight ends. Teams have tried putting cornerbacks in the position, but most corners don’t have the physicality to guard 6”4’ tight ends and provide support in run defense. As a result, the “box safety” was born. Brought into the public eye by the New England Patriots as a response to All-Pro tight end Zach Ertz in Superbowl LII, the big nickel defense is a direct counter to the rise of tall, receiving tight ends such as Travis Kelce, George Kittle, and Darren Waller. Theoretically, safeties have the coverage skills of a cornerback without sacrificing the size and strength needed to help in run defense and tight end coverage.

Although free safeties and strong safeties are almost always lined up 10+ yards behind the line of scrimmage, box safeties line up as if they are linebackers. This new role is not always easy for the players; it is a fundamental change in how the safety position is played. These safeties that are used to having 10 or more yards of cushion from the receivers now have half of that. However, the payoff for those who master the position is huge. Players like Jamal Adams, Harrison Smith, and Derwin James have become versatile game wreckers capable of neutralizing almost any player on the field. Because reliable box safeties provide such excellent value to a defense, every team is scrambling to find players who can fill the role. In fact, the Arizona Cardinals spent the 8th overall pick in the 2020 NFL Draft on uber-athlete Isaiah Simmons in hopes that he can be moved to any coverage position at will. Additionally, Jeremy Chinn of the Carolina Panthers is a frontrunner to win Defensive Rookie of the Year honors, playing a significant portion of his snaps in the box.

With the box safety position becoming extremely valuable, it is important that coaches and front office personnel are able to evaluate performance effectively. Using the data given to us by the NFL, we created an objective formula to rank the coverage abilities of box safeties in 2018.

These rankings would be useful to both defensive staff and front offices. For example, our system could suggest that the expensive veteran might not be worth his large salary when the rookie sitting behind him is playing just as well. It could also be used to identify talented players who are possibly flying under the radar on other teams that could be acquired cheaply.

In [None]:
import numpy as np
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", 16)

In [None]:
games   = pd.read_csv('../input/nfl-big-data-bowl-2021/games.csv')
players = pd.read_csv('../input/nfl-big-data-bowl-2021/players.csv')
plays   = pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv')

week_1  = pd.read_csv('../input/nfl-big-data-bowl-2021/week1.csv')
week_2  = pd.read_csv('../input/nfl-big-data-bowl-2021/week2.csv')
week_3  = pd.read_csv('../input/nfl-big-data-bowl-2021/week3.csv')
week_4  = pd.read_csv('../input/nfl-big-data-bowl-2021/week4.csv')
week_5  = pd.read_csv('../input/nfl-big-data-bowl-2021/week5.csv')
week_6  = pd.read_csv('../input/nfl-big-data-bowl-2021/week6.csv')
week_7  = pd.read_csv('../input/nfl-big-data-bowl-2021/week7.csv')
week_8  = pd.read_csv('../input/nfl-big-data-bowl-2021/week8.csv')
week_9  = pd.read_csv('../input/nfl-big-data-bowl-2021/week9.csv')

week_10 = pd.read_csv('../input/nfl-big-data-bowl-2021/week10.csv')
week_11 = pd.read_csv('../input/nfl-big-data-bowl-2021/week11.csv')
week_12 = pd.read_csv('../input/nfl-big-data-bowl-2021/week12.csv')
week_13 = pd.read_csv('../input/nfl-big-data-bowl-2021/week13.csv')
week_14 = pd.read_csv('../input/nfl-big-data-bowl-2021/week14.csv')
week_15 = pd.read_csv('../input/nfl-big-data-bowl-2021/week15.csv')
week_16 = pd.read_csv('../input/nfl-big-data-bowl-2021/week16.csv')
week_17 = pd.read_csv('../input/nfl-big-data-bowl-2021/week17.csv')

In [None]:
# make a dataframe of all strong and free safeties for each week
wk1 = week_1[(week_1["position"]=="SS") | (week_1["position"]=="FS")]
wk2 = week_2[(week_2["position"]=="SS") | (week_2["position"]=="FS")]
wk3 = week_3[(week_3["position"]=="SS") | (week_3["position"]=="FS")]
wk4 = week_4[(week_4["position"]=="SS") | (week_4["position"]=="FS")]
wk5 = week_5[(week_5["position"]=="SS") | (week_5["position"]=="FS")]
wk6 = week_6[(week_6["position"]=="SS") | (week_6["position"]=="FS")]
wk7 = week_7[(week_7["position"]=="SS") | (week_7["position"]=="FS")]
wk8 = week_8[(week_8["position"]=="SS") | (week_8["position"]=="FS")]
wk9 = week_9[(week_9["position"]=="SS") | (week_9["position"]=="FS")]
wk10 = week_10[(week_10["position"]=="SS") | (week_10["position"]=="FS")]
wk11 = week_11[(week_11["position"]=="SS") | (week_11["position"]=="FS")]
wk12 = week_12[(week_12["position"]=="SS") | (week_12["position"]=="FS")]
wk13 = week_13[(week_13["position"]=="SS") | (week_13["position"]=="FS")]
wk14 = week_14[(week_14["position"]=="SS") | (week_14["position"]=="FS")]
wk15 = week_15[(week_15["position"]=="SS") | (week_15["position"]=="FS")]
wk16 = week_16[(week_16["position"]=="SS") | (week_16["position"]=="FS")]
wk17 = week_17[(week_17["position"]=="SS") | (week_17["position"]=="FS")]

In [None]:
# merge all of the week to get a dataframe of every safety on every play
frames = [wk1,wk2,wk3,wk4,wk5,wk6,wk7,wk8,wk9,wk10,wk11,wk12,wk13,wk14,wk15,wk16,wk17]
safeties = pd.concat(frames)

# create a df of safeties whenthe ball is snapped and safeties when the pass arrives
safeties_snap = safeties[safeties["event"] == "ball_snap"]
safeties_arrive = safeties[safeties["event"] == "pass_arrived"]

In [None]:
# to find which safeties are in the box, we will only consider roughly the middle thrid of the field, between 35.5 and 17.75 on the y axis
safeties_snap = safeties_snap[(safeties_snap["y"] <= 35.5) & (safeties_snap["y"] >= 17.75)]

In [None]:
# merge the saftey data with the play data
s_play_data = plays.merge(safeties_snap, on=['playId', 'gameId'])

# calculate the distance from line of scrimmage of each safety 
distance = abs(s_play_data.absoluteYardlineNumber - s_play_data.x)
s_play_data["LOSdistance"] = abs(s_play_data.absoluteYardlineNumber - s_play_data.x)

In [None]:
# of the safties in the center third, only consider those less than 6 yards off of the line of srimmage
boxSafeties = s_play_data[s_play_data.LOSdistance < 6.0]

In [None]:
# clean dataframe of when safeties arrive 
arrive = safeties_arrive.x.to_frame()
arrive.rename(columns={'x':'arrive_x'}, inplace=True,errors='ignore')
arrive["arrive_y"] = safeties_arrive.y
arrive = safeties_arrive.drop(safeties_arrive.columns.difference(['gameId','playId', 'x','y', 'nflId', 'displayName', 'epa']), 1)
arrive.rename(columns={'x':'arrive_x', 'y':'arrive_y'}, inplace=True,errors='ignore')
arrive
safety_info = boxSafeties.merge(arrive, on=['playId', 'gameId', 'nflId'])

In [None]:
# create a dataframe of all of the pass catchers
r1 = week_1[week_1.route.notna()]
r2 = week_2[week_2.route.notna()]
r3 = week_3[week_3.route.notna()]
r4 = week_4[week_4.route.notna()]
r5 = week_5[week_5.route.notna()]
r6 = week_6[week_6.route.notna()]
r7 = week_7[week_7.route.notna()]
r8 = week_8[week_8.route.notna()]
r9 = week_9[week_9.route.notna()]
r10 = week_10[week_10.route.notna()]
r11 = week_11[week_11.route.notna()]
r12 = week_12[week_12.route.notna()]
r13 = week_13[week_13.route.notna()]
r14 = week_14[week_14.route.notna()]
r15 = week_15[week_15.route.notna()]
r16 = week_16[week_16.route.notna()]
r17 = week_17[week_17.route.notna()]

In [None]:
# merge all of the week to get a dataframe of every pass catcher on every play when the ball arrived
frames_r = [r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15,r16,r17]
routes = pd.concat(frames_r)
routes = routes[routes["event"] == "pass_arrived"]

In [None]:
# merge the saftey data with the play data
route_info = plays.merge(routes, on=['playId', 'gameId'])

In [None]:
# extract the player's first initial and last name for each data frame
route_info["player"] = route_info['displayName'].str[:1] + '.' + route_info['displayName'].str.rsplit(' ', n=1).str.get(1)
# use a regular expression to extract the target for each pass
route_info['target'] = route_info['playDescription'].str.extract('(to [A-Z][a-z]*.\s*[A-Z]\'*[A-z]+)', expand=False).str.strip()
route_info['target'] = route_info['target'].str.replace(r"to ","")

In [None]:
# create a new DataFrame that shows all the data for each target
passes = route_info[route_info.target == route_info.player]

# Box Safeties in coverage of targeted pass catchers
Below is the data concerning all coverage by box safeties of targeted players. We determined that a safety contributed to coverage if they were within 4 yards of the target both vertically and horizontally. With over 500 total plays, we can see that box safeties had a significant impact on coverage against the pass, highlighting a new trend in the pass-happy NFL.

In [None]:
# only consider box safeties that are within 4 yards virtically and horizontally of the target 
coverage = safety_info.merge(passes, on=['playId', 'gameId'], suffixes = ['_s', '_pc'])
coverage["x_sep"] = abs(coverage.arrive_x - coverage.x_pc)
coverage["y_sep"] = abs(coverage.arrive_y - coverage.y_pc)
boxCoverage = coverage[(coverage["x_sep"] <=4) & (coverage["y_sep"] <=4)]
boxCoverage["coverage"] = boxCoverage['displayName_x']
boxCoverage

In [None]:
# count the targets, completions, and incompletions for each box safety
target = boxCoverage.displayName_x.value_counts().to_frame().reset_index().rename({'index':'displayName', 'displayName_x' : 'targets'},axis=1)
complete = boxCoverage[boxCoverage.passResult_s=='C'].displayName_x.value_counts().to_frame().reset_index().rename({'index':'displayName', 'displayName_x' : 'complete'},axis=1)
incomplete = boxCoverage[boxCoverage.passResult_s=='I'].displayName_x.value_counts().to_frame().reset_index().rename({'index':'displayName', 'displayName_x' : 'incomplete'},axis=1)

In [None]:
stats = pd.merge(target,complete, on='displayName')
stats = pd.merge(stats, incomplete)
snaps = boxSafeties.displayName.value_counts()
snaps = snaps.to_frame().reset_index()
snaps.rename(columns={'index':'displayName', 'displayName': 'snaps'}, inplace=True,errors='ignore')
stats = snaps.merge(stats, on="displayName")
stats = stats[stats["snaps"] >= 10]

In [None]:
boxCoverage["distance"] = ((boxCoverage.x_sep) ** 2 + (boxCoverage.y_sep) ** 2) **(1/2)
dist = boxCoverage.drop(boxCoverage.columns.difference(['displayName_x','distance', 'epa_s']), 1)
# average distance of corner from target
avg_epa = boxCoverage.groupby('displayName_x')['epa_s'].agg(np.mean).to_frame()
avg_dist = boxCoverage.groupby('displayName_x')['distance'].agg(np.mean).to_frame()
final = pd.merge(stats, avg_dist, right_on='displayName_x', left_on='displayName')
final = pd.merge(stats, avg_epa, right_on='displayName_x', left_on='displayName')
final

# Box Safety Season Stats

We chose to rank the effectiveness of box safeties by evaluating the following criteria:
* the percent of plays where the pass catcher they were covering was targeted vs the total snaps they played box safety
* the percent of completions by the pass catcher they were targeting when they played box safety
* Their average EPA as a box safety

We ranked each player on these three factors and then found the average of these to be their overall rank, as we believe that these are three equally important factors when grading box safeties.

In [None]:
final["percent_targeted"] = (final["targets"]/final["snaps"]) *100
final["percent_complete"] = (final["complete"]/final["targets"]) *100
final["target_pct_rank"] = final["percent_targeted"].rank(method='min')
final["complete_pct_rank"] = final["percent_complete"].rank(method='min')
final["avg_epa_rank"] = final["epa_s"].rank(method='min')
final
final["average_ranking"] = (final.target_pct_rank + final.complete_pct_rank + final.avg_epa_rank)/3
final["overall_rank"] = final["average_ranking"].rank(method='min')
final = final.sort_values(by='overall_rank').reset_index().drop({'index'},1)


In [None]:
final