# Introduction: The Legend of Revis Island

![](https://i.pinimg.com/originals/e8/8a/bc/e88abc5615d88985d53f1e8e01802dff.jpg)

The Year is 2009. Week 12 of the NFL Regular season. Finishing my weekend chores _just_ before gametime, I excitedly turn on my Carolina Panthers vs. New York Jets on the living room TV. I throw a bag of popcorn in the microwave, relax, and watch the opening kick off...

... and within the first 4 minutes of play, Darrelle Revis intercepts Delhomme and returns it for a Touchdown. A Pick 6.

Revis would go on to hold Steve Smith, our star receiver, to 1 catch for 5 yards on 6 targets. The Jets would go on to win 17 to 6. 

A Sunday ruined...

--

My nightmarish memory of Darrelle Revis against my team that Sunday was not the exception, but the rule. Revis consistently dominated whomever he was matched up against; which was often future first-ballot HOFers like himself. 2009 Revis was the definition of a "Shutdown Corner."

![](https://i.redd.it/mj85t54nxta51.jpg)

What is so amazing about Revis as a player is that through his career he made a name for himself for matching up against the best receivers in the league, and absolutely annihilating their offensive output. Playing man-to-man with no Safety help, he could hold receivers coming off 200+ yard games to a single catch. His notoriety for shutting down phenomenal WRs incepted the idea of Revis Island in the minds of Football Fans.

> **Revis Island:** A term describing the effect Darrelle Revis had on the receivers he covered. No matter how good they were, Revis always found a way to neutralize their game impact, seemingly getting lost on the Island (Revis' career coincident with the airing of the show LOST, likely where the idea of getting lost on an island came from)

After Revis retired after the 2017 season, I've always wondered if a CB could ever take up the mantle of Revis Island. Would there ever be a defensive player so dominant in the passing game as Revis ever again? 

It is with this in mind that led me to Dat Bowl 2021: **In Search of Revis Island**. Specifically, focusing on answering these select questions:

### 1. Can we quantify not just the reduction in targets and yards, but actual **Game Impact** of a defensive player?

### 2. Who truly are the **Shutdown Defensive Backs** of NFL in 2018?

Below is a outline of the analytics notebook, along with questions/comments addressed in each section:

### PART I: SETTING THE TABLE
**1. Determining Coverage** 
- Determining if a player is in man coverage
- Determining if a play is a man coverage play
- Determining who a DB is covering
        
**2. Not All Catches are Created Equal** 
- Calculating **Game Impact** of a pass attempts. **Game Impact:** Weighting yards gained on a play by the game context 
- Calculating **Game Impact** of an incompletion 
- Visualizing a Receivers Game Impact over the course of a season
- Modifying game impact by matchup difficulty
    
### PART II: THE SEARCH FOR REVIS ISLANDERS

**3. Finding Clutch Defensive Players**
- Comparing Yards forfeited to Game Impact forfeited 
- Calculating Coverage, Target, and Impact **Resistance** of every Defensive Player. **Resistance:** The reduction in output a defensive player inflicts on their opponents
- Scoring & Ranking DBs by their **Resistance** scores

**4. Finding Shutdown Man-Coverage DBs**
- Visualizing Resistance for Man Coverage
- Visualizing Resistance for Man Coverage against the league's best Receivers
    
Enjoy!

_Author's Note: I put my descriptions for charts in markdown **underneath** the chart unless otherwise specfied_

In [None]:
import numpy as np 
import pandas as pd
from statistics import mode
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.subplots import make_subplots
import plotly.express as px
import matplotlib.pyplot as plt


init_notebook_mode(connected=True)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

# import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# PART I: SETTING THE TABLE 

# 1. Determining Coverage

The objective of this section was to determine which defensive player was covering which offensive player on every passing play from 2018. Using frame-by-frame data of distances and velocities, I was able to determine this to a relative strong degree of reliability (metrics below). 

While this process was interesting from a Data Science perspective, an extensive debrief on the subject would be tangental to this notebook's true purpose: Measuring game impact resistance of defensive players. In addition, the preprocessing required to determine coverage takes several hours, so for expediency (and saving space in this NB) I generated csv files of the data rather than include all of the code to rerun. You can absolutely check out my entire process in this [public notebook](https://www.kaggle.com/adam1brownell/setting-the-table-determining-coverage).

Skip to next section if more interested in insights/analytics than preprocessing

**In a Nutshell:** 

I built three models: (1) play-level man/zone predictor, (2) player-level man/zone predictor, and (3) player-level coverage (offense and defense) predictor. All three took in frame by frame data and computed the distance to other offense and defensive players, and attempting to predict labels, both those provided by in the NFL bonus data and labels generated by me spotchecking plays. The features used were a mix of homebrewed values and those outlined in "Unsupervised Methods for Identifying Pass Coverage Among Defensive Backs with NFL Player Tracking Data" (although the unsupervised method was not a very strong predictor in the end). Thanks to [Andika Rachman](https://www.kaggle.com/ar2017) for the boilerplate code. 

Both the play-level and player-level man/zone performed reasonably well, producing 75% Accuracy with an emphasis on low False Negative Rates on the player-level predictions. Confusion Matrix results found in the above notebook.

In [None]:
# Load Data

plays_pd = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2021/plays.csv')

play_man_pd = pd.read_csv('/kaggle/input/setting-the-table-coverage-data-bowl-2018/play_pred.csv') \
                .drop('Unnamed: 0',axis=1)
# player_man_pd = pd.read_csv('/kaggle/input/setting-the-table-coverage-data-bowl-2018/player_level_man_pred.csv') \
#                   .rename(columns={'displayName':'covering','isManPred':'isManCoverage'})


player_man_pd = pd.read_csv('/kaggle/input/setting-the-table-coverage-data-bowl-2018/def_player_level_man_pred.csv') \
                  .rename(columns={'displayName':'defender','isManPred':'isManCoverage'})

coverage_pd = pd.read_csv('/kaggle/input/setting-the-table-coverage-data-bowl-2018/coverage_pairings_2018.csv') \
                .drop('Unnamed: 0',axis=1)
madden_scores = pd.read_excel('/kaggle/input/madden19-ratings/madden_nfl_19_-_full_player_ratings_1.xlsx')[['Name','Overall','Team']]

diff_pd = madden_scores.rename(columns={'Overall':'difficulty','Name':'covering','Team':'off_team'})[['covering','difficulty','off_team']]

df_team_pd = madden_scores.rename(columns={'Name':'defender','Team':'def_team'})[['defender','def_team']]

revis_pd = coverage_pd.merge(player_man_pd, on=['gameId','playId','defender']) \
                      .merge(play_man_pd, on=['gameId','playId']) \
                      .merge(diff_pd,on='covering') \
                      .merge(df_team_pd, on='defender')

# Columns & Helpers needed along the way
revis_pd['completed'] = revis_pd.targetted & (revis_pd.passResult == 'C')

diff = revis_pd[['covering','difficulty']].drop_duplicates().difficulty 
diff_mean = np.mean(diff)
diff_std = np.std(diff)

revis_pd['zscore'] = (revis_pd.difficulty - diff_mean)/diff_std
revis_pd['normalDiff'] = (revis_pd.difficulty - min(diff))/(max(diff)-min(diff))*4+1

### gameImpact strictly shows up after we score our plays

revis_pd

# 2. Not all Catches are Created Equal

While the traditional DB metrics of targets, completions, and yards allowed directionally show who is a stronger/weaker defensive player, they fail to capture the critical element of **game context**. A 5 yard completion on 3rd and 2 is much worse than a 5 yard completion on 2nd and 22, even though the stat sheet would count the, as equal. In addition, holding Julio Jones, a top 10 receiver in 2018, to only 3 catches for 30 yards is a much more impressive feat than holding Russell Gage (not a top 10 receiver in 2018) to the same numbers.

It is with this in mind that we will create a new lens through which to look at our DBs: **Game Impact**. Below are the general rules that will guide our process:
- Weighting Play Outcome by **Game Context:**
    - Critical Incompletions should be rewarded
    - Yardage that nets an offensive milestone (1st downs, TDs, redzone yards) should incure a penalty 
    - Close Game Plays (good and bad) are worth more than blowout game plays
- Highlighting **Matchup Difficulty** in coverage statistics:
    - Good defensive plays on stronger receivers should be rewarded
    - Bad defensive plays on weaker receivers should be punished
    - A strong game against and WR1 should be worth more for a DBs stats than an identical game against an WR3

## Weighting Play Outcome by **Game Context:**

In [None]:
# Visual: Largest Change when converting to EPA Yards
completed_pd = plays_pd[(plays_pd.passResult == 'C')& \
                        ~(plays_pd.playDescription.str.contains('FUMBLE'))&\
                        (plays_pd.playResult > 0)].copy()

completed_pd['epa_yards'] = completed_pd.playResult

# If First Down
completed_pd.loc[completed_pd.playResult >= completed_pd.yardsToGo, 'epa_yards'] = completed_pd.epa_yards * \
                                                                                    (0.8 + completed_pd.down/5)

# If Touchdown
completed_pd.loc[completed_pd.playDescription.str.contains('TOUCHDOWN'), 'epa_yards'] = completed_pd.epa_yards * 1.6

# If Close Game
completed_pd.loc[abs(completed_pd.preSnapHomeScore - completed_pd.preSnapVisitorScore)<=10, 'epa_yards'] = completed_pd.epa_yards * \
                                                                                                          (0.9 + completed_pd.quarter/10)
# If meaningless yards
completed_pd.loc[(completed_pd.epa <= 0), 'epa_yards'] = 0

completed_pd['delta'] = completed_pd.epa_yards-completed_pd.playResult

color = ['green','yellow','orange','red']
dwn = ['1st Down','2nd Down','3rd Down', '4th Down']

layout = go.Layout(
                title='Weighting Contextually Important Completions',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                xaxis = dict(title = 'Yards Gained'),
                yaxis = dict(title = 'Game Impact'),
)

fig = go.Figure(layout = layout)

for i in range(4):
    d = completed_pd[completed_pd.down == i+1]

    trace1 = go.Scatter(
    x=d.playResult,
    y=d.epa_yards,
    text=d.playDescription,
    name= dwn[i],
    hoverinfo='text',
    mode='markers',
    marker=dict(
        color=color[i]))
    
    fig.add_trace(trace1)



fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 0, y1= 100,
      xref= 'x', x0= 0, x1= 100
    )
    
])

fig.add_annotation(
            xref="x",
            yref="y",
            x=46, y=153,
            text="<b>1</b>",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="y",
            x=48, y=0,
            text="<b>2</b>",
            showarrow=False)

iplot(fig)

The above chart quantifies the game context & impact of a completion. While generally increasing linearly with absolute yards gained, **Game Impact**  spikes when a play results in a Touchdown or First Down, particularly on 3rd or 4th down. Large plays on 3rd down in close games are painful for defenses, and so Game Impact captures this effect reasonably well. Conversely, large chunks of yards that have no impact on the game are dropped to 0 game impact, ensuring that garbage-time yardage does not penalize a DB.

To illustrate this with an example:

**Play #1** marked above is a 47 yard touchdown passs on 4th down with less than 2min of play left. An absolutely terrible play to give up, as a stop would likely sealed the game up; hence the Game Impact of that play is significant.

**Play #2** marked above, while resulting in _more_ yards gained than Play #1, was a hail mary that did not reach the endzone before time expires. The DB on the play did his job by preventing a TD, and so the Game Impact is neutralized.

The act of weighting plays by their game context will help our later analysis separate generally good DBs from those that _almost never_ give up a big play... and hopefully reveal a few corners whose performance is actually much worse than their stats show at first glance

In [None]:
incomplete_pd = plays_pd[((plays_pd.passResult == 'I')|(plays_pd.passResult == 'IN'))&
                         ~(plays_pd.playDescription.str.contains('sack'))&
                         ~(plays_pd.playDescription.str.contains('PENALTY'))&
                        ~(plays_pd.playDescription.str.contains('FUMBLE'))&
                        ~(plays_pd.playDescription.str.contains('Intentional Grounding'))&
                        ~(plays_pd.playDescription.str.contains('Roughness'))].copy()

# If increases chance of victory 
incomplete_pd['epa_yards'] = incomplete_pd.epa+incomplete_pd.epa*(incomplete_pd.down-1)
# incomplete_pd.loc[incomplete_pd.passResults == 'IN', 'epa_yards'] = 


# If Close Game
incomplete_pd.loc[abs(incomplete_pd.preSnapHomeScore - incomplete_pd.preSnapVisitorScore)<=10, 'epa_yards'] = incomplete_pd.epa_yards * \
                                                                                                          (0.9 + incomplete_pd.quarter/10)

# If RedZone
incomplete_pd.loc[incomplete_pd.absoluteYardlineNumber <= 45, 'epa_yards'] = incomplete_pd.epa_yards * \
                                                                    (1 + (45-incomplete_pd.absoluteYardlineNumber)/30)


color = ['green','yellow','orange','red']
dwn = ['1st Down','2nd Down','3rd Down', '4th Down']

layout = go.Layout(
                title='Weighting Contextually Important Incompletions',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                xaxis = dict(title = 'Yards to Endzone'),
                yaxis = dict(title = 'Game Impact', range=[-25,0.5]),
    
)

fig = go.Figure(layout = layout)

for i in range(4):
    d = incomplete_pd[incomplete_pd.down == i+1]

    trace1 = go.Scatter(
    x=d.absoluteYardlineNumber,
    y=d.epa_yards,
    text=d.playDescription,
    name= dwn[i],
    hoverinfo='text',
    mode='markers',
    marker=dict(
        color=color[i]))
    
    fig.add_trace(trace1)



fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 0, y1= 0,
      xref= 'paper', x0= 0, x1= 1,
    )
    
])

iplot(fig)


Just as some completions are worse coverage plays than overs, some _incompletitons_ are better defensive plays than others. While all of these plays are incompletions, incompletitions that occur in close games, in the redzone, are intercepted, or otherwise affect the opponents chance of winning are rewarded with a negative game impact score. Notice how many 4th down plays have large downweight on them-- good DBs ensure that 4th down conversions do not happen

In [None]:
# Handle Edge Cases
   
sacks = plays_pd[(plays_pd.passResult == 'S')|plays_pd.playDescription.str.contains('Intentional Grounding')][['gameId','playId']].copy()
sacks['epa_yards'] = 0

off_penalty_mask = ((plays_pd.penaltyCodes!='ING')& \
                   ((plays_pd.penaltyCodes.str.contains('D'))|(plays_pd.penaltyCodes.str.contains('d'))))

catch_val_mask = (plays_pd.playDescription.str.contains('FUMB')) | off_penalty_mask |\
                 (plays_pd.penaltyCodes=='RPS')

fumbles = plays_pd[(plays_pd.passResult != 'S') & catch_val_mask] \
                 [['gameId','playId','offensePlayResult']]

fumbles.columns = ['gameId','playId','epa_yards']

fumbles = fumbles.append(sacks)

epa_pd = completed_pd[['gameId','playId','epa_yards']] \
                    .append(incomplete_pd[['gameId','playId','epa_yards']]) \
                    .append(fumbles)

epa_plays = plays_pd.merge(epa_pd, on=['gameId','playId'],how='left')
epa_plays.loc[epa_plays['epa_yards'].isnull(),'epa_yards'] = epa_plays['playResult']




In [None]:
# Prep for JJ Visual 
revis_pd = revis_pd.merge(epa_plays[['gameId','playId','epa_yards']],on=['gameId','playId']) \
                   .rename(columns={'epa_yards':'gameImpact'})

games_pd = pd.read_csv('../input/nfl-big-data-bowl-2021/games.csv')

impacts = revis_pd[revis_pd.targetted].groupby(['covering','gameId'])['gameImpact'].sum().values
mean_ = round(np.mean(impacts),2)
max_ = 250
print("Average Impact: {}".format(mean_))
print("Max Impact: {}, capped at 250 for visual".format(round(np.max(impacts),2)))

In [None]:
## TODO: 
    # Make Widget to look up other receivers
    # Color Background with Team Logo
    # Color Bars Against Team Colors

## Visualizing Receiver-Level Game Impact

The following few charts showcase how the strength of the receiver plays a crucial role in understanding DB strength, and highlights how not all stat lines for DBs should be treated equally.

In [None]:
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Look at Resistance for Just Julio Jones
wr = 'Julio Jones'

x = revis_pd[revis_pd.covering==wr].groupby('gameId').agg({"playId":["count"],
                                                                  "targetted":["sum"],
                                                                 "completed":["sum"]})

x2 = revis_pd[(revis_pd.targetted)&(revis_pd.covering==wr)].groupby('gameId').agg({"gameImpact":['sum']})

x = x.merge(x2,on='gameId')
x.columns = ['snaps','targets','completed','impact']
x['targetsPerSnap'] = np.round(100*x['targets']/x['snaps'],0)
x['completionsPerTarget'] = np.round(100*x['completed']/x['snaps'],0)

# Get Team Names and week
x_pd = x.merge(games_pd[['gameId','homeTeamAbbr','visitorTeamAbbr','week']],on='gameId')
x_pd['against'] = x_pd.homeTeamAbbr
wr_team = mode(x_pd.homeTeamAbbr)

x_pd.loc[x_pd.against == wr_team, 'against'] = x_pd.visitorTeamAbbr
x_pd = x_pd.drop(['homeTeamAbbr','visitorTeamAbbr'],axis=1) \
           .sort_values('week')

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Bar(x=x_pd.week, y=x_pd.impact, opacity=0.5, name="Game Impact Score", hovertemplate ='Against '+x_pd.against),
    secondary_y=True,
)

fig.add_trace(
    go.Scatter(x=x_pd.week, y=x_pd.targetsPerSnap, name="Targets per Snap (%)", 
               hovertemplate =x_pd.targetsPerSnap.astype(int).astype(str)+'%'),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=x_pd.week, y=x_pd.completionsPerTarget, name="Completions per Snap (%)", 
               hovertemplate =x_pd.completionsPerTarget.astype(int).astype(str)+'%'),
    secondary_y=False,
)

fig.update_layout(shapes=[
    dict(
      type= 'line',
      name='average WR impact',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y2', y0= mean_, y1= mean_,
      xref= 'paper', x0= 0, x1= 0.94
    )])

# Add figure title
fig.update_layout(
    title_text=wr+" Game Impact 2018"
)

# Set x-axis title
fig.update_xaxes(title_text="Game", range=[1, 17])

# Set y-axes titles
fig.update_yaxes(title_text="Receiving Statistics (%)", range=[0, 101], showgrid=False, secondary_y=False)
fig.update_yaxes(title_text="Impact Score", range=[0, 250], showgrid=False, secondary_y=True)

fig.show()

Above: **Julio Jones**, the star receiver for the Atlanta Falcons in 2018. Consistently posting monster games in terms of Game Impact, which makes sense as he was the Falcons go-to receiver for that year. In Week 13 against Baltimore, he is held to only ~10pts of Game Impact. This is a seriously impressive feat given the context of Jones' other appearances, and should be rewarded as such.

Below: **Marvin Hall**, another receiver for the Atlana Falcons in 2018. Even though his Completion/Reception stats are sometimes comparable to Jones (and actually higher in Week 7), you can see from the Game Impact scores that Hall and Jones do not have the same impact on the Game. Hall's ~30pt game against the Giants in Week 7 is a poor defensive by the DB, and should count more heavily against against them than if they had forfeitted the same Game Impact score to Jones

In [None]:
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])


revis_pd['completed'] = revis_pd.targetted & (revis_pd.passResult == 'C')

# Look at Resistance for just Marvin Hall
wr = 'Marvin Hall'

x = revis_pd[revis_pd.covering==wr].groupby('gameId').agg({"playId":["count"],
                                                                  "targetted":["sum"],
                                                                 "completed":["sum"]})

x2 = revis_pd[(revis_pd.targetted)&(revis_pd.covering==wr)].groupby('gameId').agg({"gameImpact":['sum']})

x = x.merge(x2,on='gameId')
x.columns = ['snaps','targets','completed','impact']
x['targetsPerSnap'] = np.round(100*x['targets']/x['snaps'],0)
x['completionsPerTarget'] = np.round(100*x['completed']/x['snaps'],0)

# Get Team Names and week
x_pd = x.merge(games_pd[['gameId','homeTeamAbbr','visitorTeamAbbr','week']],on='gameId')
x_pd['against'] = x_pd.homeTeamAbbr
wr_team = mode(x_pd.homeTeamAbbr)

x_pd.loc[x_pd.against == wr_team, 'against'] = x_pd.visitorTeamAbbr
x_pd = x_pd.drop(['homeTeamAbbr','visitorTeamAbbr'],axis=1) \
           .sort_values('week')

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Bar(x=x_pd.week, y=x_pd.impact, opacity=0.5, name="Game Impact Score", hovertemplate ='Against '+x_pd.against),
    secondary_y=True,
)

fig.add_trace(
    go.Scatter(x=x_pd.week, y=x_pd.targetsPerSnap, name="Targets per Snap (%)", 
               hovertemplate =x_pd.targetsPerSnap.astype(int).astype(str)+'%'),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=x_pd.week, y=x_pd.completionsPerTarget, name="Completions per Snap (%)", 
               hovertemplate =x_pd.completionsPerTarget.astype(int).astype(str)+'%'),
    secondary_y=False,
)

fig.update_layout(shapes=[
    dict(
      type= 'line',
      name='average WR impact',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y2', y0= mean_, y1= mean_,
      xref= 'paper', x0= 0, x1= 0.94
    )])

# Add figure title
fig.update_layout(
    title_text=wr+" Game Impact 2018"
)

# Set x-axis title
fig.update_xaxes(title_text="Game", range=[1, 17])

# Set y-axes titles
fig.update_yaxes(title_text="Receiving Statistics (%)", range=[0, 101], showgrid=False, secondary_y=False)
fig.update_yaxes(title_text="Impact Score", range=[0, 250], showgrid=False, secondary_y=True)

fig.show()

In [None]:
def show_game_impact(wr_name):
    # CREATE WIDGET + SAMPLE

    # Create figure with secondary y-axis
    fig = make_subplots(specs=[[{"secondary_y": True}]])


    revis_pd['completed'] = revis_pd.targetted & (revis_pd.passResult == 'C')

    # Look at Resistance for Just Julio Jones
    wr = revis_pd[['covering']].drop_duplicates().sample()['covering'].values[0]

    x = revis_pd[revis_pd.covering==wr_name].groupby('gameId').agg({"playId":["count"],
                                                                      "targetted":["sum"],
                                                                     "completed":["sum"]})

    if len(x) < 1:
        print('Could not find a receiver named {}...\nBut if you\'re interested, you could look up the stats for {}!'.format(wr_name,wr))
    else:
        x2 = revis_pd[(revis_pd.targetted)&(revis_pd.covering==wr_name)].groupby('gameId').agg({"gameImpact":['sum']})

        x = x.merge(x2,on='gameId')
        x.columns = ['snaps','targets','completed','impact']
        x['targetsPerSnap'] = np.round(100*x['targets']/x['snaps'],0)
        x['completionsPerTarget'] = np.round(100*x['completed']/x['snaps'],0)

        # Get Team Names and week
        x_pd = x.merge(games_pd[['gameId','homeTeamAbbr','visitorTeamAbbr','week']],on='gameId')
        x_pd['against'] = x_pd.homeTeamAbbr
        wr_team = mode(x_pd.homeTeamAbbr)

        x_pd.loc[x_pd.against == wr_team, 'against'] = x_pd.visitorTeamAbbr
        x_pd = x_pd.drop(['homeTeamAbbr','visitorTeamAbbr'],axis=1) \
                   .sort_values('week')

        fig = make_subplots(specs=[[{"secondary_y": True}]])

        fig.add_trace(
            go.Bar(x=x_pd.week, y=x_pd.impact, opacity=0.5, name="Game Impact Score", hovertemplate ='Against '+x_pd.against),
            secondary_y=True,
        )

        fig.add_trace(
            go.Scatter(x=x_pd.week, y=x_pd.targetsPerSnap, name="Targets per Snap (%)", 
                       hovertemplate =x_pd.targetsPerSnap.astype(int).astype(str)+'%'),
            secondary_y=False,
        )

        fig.add_trace(
            go.Scatter(x=x_pd.week, y=x_pd.completionsPerTarget, name="Completions per Target (%)", 
                       hovertemplate =x_pd.completionsPerTarget.astype(int).astype(str)+'%'),
            secondary_y=False,
        )

        fig.update_layout(shapes=[
            dict(
              type= 'line',
              name='average WR impact',
              line=dict(
                color="gray",
                width=1,
                dash="dot",
                ),
              yref= 'y2', y0= mean_, y1= mean_,
              xref= 'paper', x0= 0, x1= 0.94
            )])

        # Add figure title
        fig.update_layout(
            title_text=wr_name+" Game Impact 2018"
        )

        # Set x-axis title
        fig.update_xaxes(title_text="Game", range=[1, 17])

        # Set y-axes titles
        fig.update_yaxes(title_text="Receiving Statistics (%)", range=[0, 101], showgrid=False, secondary_y=False)
        fig.update_yaxes(title_text="Impact Score", range=[0, 250], showgrid=False, secondary_y=True)

        fig.show()

## Hey! Look up a Your Favorite 2018 Receiver's Game Impact Score Below:

In [None]:
show_game_impact(input())

## Highlighting Matchup Difficulty:

While we should combine game impact and matchup difficulty into a single metric--a 40yrd catch on 3rd and 10 is just as bad from a game perspective regardless of who caught it-- it is crucial that we highlight both game impact and matchup difficulty when assessing DB strength.

![](https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/Madden19cover.jpeg/220px-Madden19cover.jpeg)

As a proxy for WR strength, we will use each WRs overall rating from **Madden 2019**, the video game released to simulate this year of football. Even quickly using these raw numbers helps us get a clearly separation of elite DBs vs. others:

In [None]:
diff.hist(bins=20,grid=False)
plt.title('Madden Scores Histogram\nMean: {:.2f} || STD: {:.2f}'.format(diff_mean,diff_std))

plt.gca().spines["right"].set_visible(False)
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["bottom"].set_visible(False)
plt.tick_params(
    axis='x',          # changes apply to the x-axis
    which='both',      # both major and minor ticks are affected
    bottom=False,      # ticks along the bottom edge are off
    top=False,         # ticks along the top edge are off
    labelbottom=True)

plt.show()

To our pleasant surprise, the WR scores from Madden created a relatively normal distribution, centered around 75 and ranging from ~50 to ~100. We simply normalized the rating of each WR score to find the relative strength of each coverage matchup.

In [None]:
press_defenders = revis_pd[revis_pd.press].groupby('defender').count()
press_defenders = press_defenders[press_defenders.gameId >= 10]

press_revis_pd = revis_pd[(revis_pd.press) & revis_pd.defender.isin(list(press_defenders.index))]

x = press_revis_pd.groupby('defender').agg({"playId":["count"],"difficulty":["mean"]})
x.columns = ['numPlays','avgDiff']


diff = x.avgDiff.values
plays = x.numPlays.values
plyrs = x.index

trace1 = go.Scatter(
    x=plays,
    y=diff,
    text=plyrs,
    hoverinfo='text',
    mode='markers'
)

layout = go.Layout(
                title='Finding Elite Press Coverage DBs:',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                xaxis = dict(title = 'Number of Press Plays'),
                yaxis = dict(title = 'Average Difficult of Assignment'),
)

fig = go.Figure(data=trace1, layout = layout)

fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0= 150, x1= 150
    ),
   dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 85, y1= 85,
      xref= 'paper', x0= 0, x1= 1
    ),
    
])

fig.add_annotation(
            xref="paper",
            yref="y",
            x=0, y=86,
            text="consistently pressing Elite WRs",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="paper",
            x=169, y=0,
            text="consistently playing<br>Press Coverage",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="y",
            x=100, y=88.5,
            font = dict(size = 8),
            text="Jalen Ramsey",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="y",
            x=104, y=83,
            font = dict(size = 8),
            text="Jaire Alexander",
            showarrow=False)


iplot(fig)

Even with little processing, we can see how difficulty of assignment starts to separate defensive backs into a variety of different categories. 

For Example: **Jalen Ramsay** & **Jaire Alexander**, while playing about the same number of press plays, play press against a significantly different cohort of receivers. Ramsay consistently plays press against elite receivers, whereas Alexander plays press against more average opponents. This example is simply to show why it is important to consider receiver difficulty when assesing DB strength!

In [None]:
# x = revis_pd[['zscore','targetted','completed','epa_yards']].copy()
# x['diff_group'] = np.round(x.zscore,1)
# x = x[x.diff_group>-1.8].groupby('diff_group').mean().drop('epa_yards',axis=1) \
#  .merge(
#     x[x.completed].groupby('diff_group')[['epa_yards']].mean(),
#     left_index=True,
#     right_index=True
#         )

# fig = make_subplots(specs=[[{"secondary_y": True}]])


# fig.add_trace(go.Scatter(
#     x=x.index,
#     y=x.targetted,
#     mode='lines',
#     name='Targets per Snap'
#     ),
#     secondary_y=False)

# fig.add_trace(go.Scatter(
#     x=x.index,
#     y=x.completed,
#     mode='lines',
#     name='Completions per Snap'),
#     secondary_y=False)

# fig.add_trace(go.Scatter(
#     x=x.index,
#     y=x.epa_yards,
#     mode='lines',
#     name='Game Impact per Completion'),
#     secondary_y=True)

# # Add figure title
# fig.update_layout(
#     title_text='WR Output over WR Strength'
# )

# # Set x-axis title
# fig.update_xaxes(title_text="WR Strength (std)")

# # Set y-axes titles
# fig.update_yaxes(title_text="Targets/Completions (%)", showgrid=False, secondary_y=False)
# fig.update_yaxes(title_text="Game Impact", showgrid=False, secondary_y=True)


# fig.show()

# PART II: THE SEARCH FOR REVIS ISLANDERS

# 3.Finding Clutch DBs

> It's 4th and 3 on your own 11, your divisional rival has the ball with 1:22 left on the clock. Down by 5, they need a touchdown to win. You know the ball is going to their star receiver, who has picked apart defenses all season in these situations...
> 
> ...who do you trust to make the stop?


Now that we have impact scores of plays in addition to raw yardage, we can now take a shot at finding DBs that at exceptionally good at diminishing the impact of the WRs the play against. We can also look for DBs that, by traditional metrics aren't notworthy defenders, but they rarely allow high impact plays... and visa versa!

Taking a note from circuit boards, the term we will use for "diminishing the impact of a player" is **resistance**. This value will be found by calculating the expected Targets/Completions/Game Impact of a WR per snap, and comparing this to the actual values during a WR/DB pairing. A strong DB will have a much higher resistance than a weaker DB

In this section we will introduce 3 new metrics for us to examine:
- **Target Resistance:** How many fewer targets do they receive per snap against this DB?
- **Completion Resistance:** How many fewer completions do they receive per target against this DB?
- **Impact Resistance:** What is the average reduction in Game Impact of the WRs they cover? How "shutdown" is this DB against WRs?


Whereas **Impact Resistance** is a function of yards gained weighted by the game impact of those yards, **Target Resistance** and **Completion Resistance** are functions of reducing in WR output weighted by the difficulty of matchup

Clutch vs. Choke (https://en.wikipedia.org/wiki/Clutch_(sports))

In [None]:
x = revis_pd[revis_pd.targetted].groupby('defender').agg({'playResult':['mean'],'gameImpact':['mean'],
                                                          'targetted':['sum'], 'zscore':['mean']})
x.columns = ['playResult','gameImpact','targetted','difficulty']
x['difficulty'] = x.difficulty - np.mean(x.difficulty)
x = x[x.targetted > 20]



trace1 = go.Scatter(
    x=x.playResult,
    y=100*(x.playResult-x.gameImpact)/x.playResult,
    text=x.index,
    hoverinfo='text',
    mode='markers',
    marker=dict(
    color=x.difficulty,
    size=10,
    colorbar=dict(
            title="Avg. Relative Matchup Difficulty"
        ),
        colorscale="Viridis")
)

layout = go.Layout(
                title='Hidden Busts & Hidden Gems:<br>Comparing Avg. Yards Forfeit to Game Impact',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                xaxis = dict(title = 'Yards Forfeit per Target'),
                yaxis = dict(title = 'Game Impact Difference <br>to Yards Forfeitted (%) '),
)

fig = go.Figure(data=trace1, layout = layout)

fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 0, y1= 0,
      xref= 'paper', x0= 0, x1= 1
    ),
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0=np.mean(x.playResult), x1=np.mean(x.playResult)
    ),

])

fig.add_annotation(
            xref="paper",
            yref="y",
            x=1, y=4,
            text="Choke",
            showarrow=False)

fig.add_annotation(
            xref="paper",
            yref="y",
            x=1, y=-4,
            text="Clutch",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="paper",
            x=np.mean(x.playResult)-0.455, y=1,
            text="Tough",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="paper",
            x=np.mean(x.playResult)+0.33, y=1,
            text="Soft",
            showarrow=False)

iplot(fig)

## Interpreting this Graph:

By visualizing Yards Forfeit vs. the Game Impact of those Yards Forfeitted, we can uncover some interesting insights into the play of defensive players in 2018. To start, along the X Axis shows the typical stat line shown for a DB-- given their WR was targetted, what is the average yards given up. At first glance, this should cleanly separate DBs by skill. However, when taking into account the game ramifications of those yards, we start to see a different picture of DB skill.

_Note: Good corners would have less yards/game impact forfeit, so the bottom left section is the "best" section_

By cutting the dataset into quadraints, we get the following groups:

### 1. Posers: Low Absolute Yards Forfeightted, but High Game Impact Allowed
On paper, these DBs look like very strong defensive players. But what the statline does _not_ capture is the significant game impact of these yards allowed. They may only allow a 4 yard completions, but it'll be on a 3rd and 2 in a close game. Rely on these DBs for 80% of the game, but when you really need a stop, look for a more Clutch DB to cover their WR1

### 2. Busts: High Absolute Yards Forfeightted with High Game Impact Allowed
These DBs have had unfavorable matchups this year. For whatever reason, they have found themselves giving up big yardage plays to the deteriment of their team (and likely season). Supporting these DBs with on-field help and off-field traing is the logical remedy

### 3. Bend-Don't-Breaks: High Absolute Yards Forfeightted, but Low Game Impact Allowed
These DBS understand that an 12 yard completion on 3rd and 15 is acceptable, even if not optimal. They may appear . Often these DBs are covering some of the most elite WRs in the league, and doing a fine job. These DBs are underrated, and despite their stat line at times, should be on the field on pivotal plays

### 4. Gems: Low Absolute Yards Forgeightted with Low Game Impact Allowed
Gems look good on paper and on the field. They are excellent DBs in all situations. They rarely give up yards, and more importantly, rarely give up a big important play that gives the offense momentum. If Gems ask for a higher salary, pay them.

A note on **Holton Hill**-- While seeming to be THE elite DB in the NFL in 2018 based on this graph, it is important to look at the average difficulty of match-ups. Jalen Ramsay, an excellent CB, had a much more challeneging suite of WRs to cover, whereas Hill has one on of the easiest covering duties amongst players plotted. His easy match-ups helped catapault him to be an outlier in the dataset. The color of the dot hopefully provides a thrid yardstick to compare DBs

In [None]:
df = pd.DataFrame(columns=['defender','target_resistance','completion_resistance','impact_resistance'])
d = revis_pd.groupby('defender').count()
d = d[d.gameId>50].index

df['defender'] = d

low_pairs = 0
low_reps = 0
low_targets = 0

# For every defender...
for defender in d:

#     defender = 'Stephon Gilmore'
    
    wrs = revis_pd[revis_pd.defender==defender].covering.unique()
    
    # for every receiver...
    non_cover_plays = revis_pd[revis_pd.covering.isin(wrs)&(revis_pd.defender!=defender)]
    cover_plays = revis_pd[revis_pd.covering.isin(wrs)&(revis_pd.defender==defender)]
    
    non_cover_stats = non_cover_plays.groupby('covering')[['targetted','normalDiff']].mean() \
                                     .merge(
                                        non_cover_plays[non_cover_plays.targetted].groupby('covering')[['gameImpact','completed']].mean(),
                                        right_index=True,left_index=True
    )
    
    non_cover_stats.columns = ['non_cover_'+i if i != 'normalDiff' else 'normalDiff' for i in non_cover_stats.columns]
    
    
    cover_stats = cover_plays.groupby('covering')[['targetted']].sum() \
                                 .merge(
                                    cover_plays[cover_plays.targetted].groupby('covering')[['completed','gameImpact']].sum(),
                                    right_index=True,left_index=True
    )

    cover_stats.columns = ['pairing_'+i for i in cover_stats.columns]
    
    #... if they have played at least 50 snaps ...
    non_cover_snaps = non_cover_plays.groupby('covering')[['gameId']].count()
    non_cover_snaps.columns = ['non_cover_snap_count']
    non_cover_snaps = non_cover_snaps[non_cover_snaps.non_cover_snap_count>=50]
    
    # ...low quantity coverage pairing values should be downweighted until they hit 10
    cover_snaps = cover_plays.groupby('covering')[['gameId']].count()
    cover_snaps.columns = ['pairing_snap_count']
    
    z = cover_stats.merge(cover_snaps,left_index=True,right_index=True) \
                   .merge(non_cover_stats,left_index=True,right_index=True) 
    
    # Calculate Pairing-level resistance
    z['target_resistance'] = z.non_cover_targetted*z.pairing_snap_count-z.pairing_targetted
    z['completion_resistance'] = z.non_cover_completed*z.pairing_targetted-z.pairing_completed
    z['impact_resistance'] = z.non_cover_gameImpact*z.pairing_targetted-z.pairing_gameImpact
#     break
    
#     z['target_resistance'] = (z.non_cover_targetted*z.pairing_snap_count-z.pairing_targetted)*z.normalDiff/(z.non_cover_targetted*z.pairing_snap_count)
#     z['completion_resistance'] = (z.non_cover_completed*z.non_cover_targetted-z.pairing_completed)/(z.non_cover_completed*z.non_cover_targetted)
#     z['impact_resistance'] = (z.non_cover_epa_yards*z.non_cover_targetted-z.pairing_epa_yards)/(z.non_cover_epa_yards*z.non_cover_targetted)
    
    # Fix bug where 0 completions causes problems
    z.loc[z.non_cover_completed == 0, 'completion_resistance'] = np.mean(z[z.non_cover_completed!=0].completion_resistance)
    z.loc[z.non_cover_gameImpact == 0, 'impact_resistance'] = np.mean(z[z.non_cover_gameImpact!=0].impact_resistance)
#     break
    # Can iterate through all 3 with this
        # Reward > 0 resistance on good players
        # Punish < 0 resistance on bad players
    for col in ['target_resistance','completion_resistance','impact_resistance']:
        z.loc[(z[col] < 0), col] = z[col]*(5-z.normalDiff)*z.pairing_snap_count
        z.loc[(z[col] > 0), col] = z[col]*z.normalDiff*z.pairing_snap_count

    df.loc[df.defender==defender,'snaps'] = sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'impact_resistance'] = np.sum(z.impact_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'target_resistance'] = np.sum(z.target_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'completion_resistance'] = np.sum(z.completion_resistance)/sum(z.pairing_snap_count)

In [None]:
db_stats = df
db_stats['target_resistance_rank'] = db_stats.target_resistance.rank(method = 'max', ascending=False)+1
db_stats['completion_resistance_rank'] = db_stats.completion_resistance.rank(method = 'max', ascending=False)+1
db_stats['impact_resistance_rank'] = db_stats.impact_resistance.rank(method = 'max', ascending=False)+1
db_stats['overall_rank'] = ((db_stats.target_resistance_rank+db_stats.completion_resistance_rank+db_stats.impact_resistance_rank)/3).rank(method = 'max')+1

# db_stats[['target_resistance_rank','completion_resistance_rank','impact_resistance_rank','overall_rank']]

In [None]:
fig = go.Figure(data=[go.Scatter3d(x=db_stats.target_resistance_rank, 
                                   y=db_stats.completion_resistance_rank, 
                                   z=db_stats.impact_resistance_rank,
                                   mode='markers',
                                   name='',
                                   hovertemplate = db_stats.defender,
                                    marker=dict(
                                        color=db_stats.overall_rank,
                                        size=10,
                                        colorbar=dict(
                                                title="Overall Rank"
                                            )))])


# Set title
fig.update_layout(scene = dict(
                    xaxis_title='Impact Resistance Rank',
                    yaxis_title='Target Resistance Rank',
                    zaxis_title='Completion Resistance Rank'),
                    width=700,
                    margin=dict(r=20, b=10, l=10, t=10))

# # Add figure title
# fig.update_layout(
#     title_text="The Resistance Cube"
# )

fig.update_layout(showlegend=False)

fig.show()

The above chart plots the **ordinal rank** of each DB across our 3 resistance metrics. The color represents the overal ordinal rank of the DB, with Blue DBs being stronger overall than Yellow DBs. The smaller the ordinal rank, the better (think 1st place).

While the above chart is very cool looking and fun to play around with, it is a little challenging to glean insights from. A more engageable chart of the same information is generate below:

In [None]:
import matplotlib.pyplot as plt
from matplotlib_venn import venn3

plt.figure(num=None, figsize=(20, 20), dpi=80, facecolor='w', edgecolor='k')

good_target = db_stats[db_stats.target_resistance_rank <=16]
good_compl = db_stats[db_stats.completion_resistance_rank <=16]
good_impact = db_stats[db_stats.impact_resistance_rank <=16]

set1 = set(good_target.defender)
set2 = set(good_compl.defender)
set3 = set(good_impact.defender)

names = ["Strong\nTarget Resistance", "Strong\nCompletion Resistance", "Strong\nImpact Resistance"]
venn = venn3([set1, set2, set3], set_labels=(names))

try:
    venn.get_label_by_id('100').set_text(',\n'.join(set1-set2-set3))
    venn.get_patch_by_id('100').set_color('#B2F7EF')
    venn.get_label_by_id('100').set_fontsize(15)
except:x = 1
    
try:
    venn.get_label_by_id('110').set_text(',\n'.join(set1&set2-set3))
    venn.get_patch_by_id('110').set_color('#1B98E0')
    venn.get_label_by_id('110').set_fontsize(15)
except: x = 1

try:
    venn.get_label_by_id('010').set_text(',\n'.join(set2-set3-set1))
    venn.get_patch_by_id('010').set_color('#136F63')
    venn.get_label_by_id('010').set_fontsize(15)
except: x = 1

try:
    venn.get_label_by_id('101').set_text(',\n'.join(set1&set3-set2))
    venn.get_patch_by_id('101').set_color('#090C9B')
    venn.get_label_by_id('101').set_fontsize(15)
except: x = 1

try:
    venn.get_label_by_id('111').set_text(',\n'.join(set1&set2&set3))
    venn.get_patch_by_id('111').set_color('#E8F1F2')
    venn.get_label_by_id('111').set_fontsize(15)
except: x = 1

try:
    venn.get_label_by_id('011').set_text(',\n'.join(set2&set3-set1))
    venn.get_patch_by_id('011').set_color('#034078')
    venn.get_label_by_id('011').set_fontsize(15)
except: x = 1

try:
    venn.get_label_by_id('001').set_text(',\n'.join(set3-set2-set1))
    venn.get_patch_by_id('001').set_color('#4C3957')
    venn.get_label_by_id('001').set_fontsize(15)
except: x = 1



plt.title('Defensive Resistance Scores; Ordinally Ranked')

s = 'DBs not listed fell outside top 10% for Resistance Metrics'
# n = db_stats[~db_stats.index.isin(set1|set2|set3)].defender
# for i in range(len(n)):
#     s += n[i]
#     if (i % 20 == 0) and (i!=0):
#         s+=',\n'
#     else:
#         s+=', '
plt.text(0.5,-0.05,s,horizontalalignment='center',
     verticalalignment='center', transform = plt.gca().transAxes)

plt.show()

## Segmenting DBs by Resistance
The above Venn Diagram captures the top 10% of DBs across the three main categories we discussed at the beginning of this section. With the best DBs in football falling into a distinct categories, lets add some meaining and context to each of these buckets:

### Cohort Descriptions: 

- **Impact Only:** The DBs are likely to give up yardage and catches, but know how to tighten up when it matters. They rarely give up the big play. There is likely a preference for LBs in these metrics, since LBs are often covering short routes by RBs and TEs as opposed to deep plays downfield

- **Targets Only:** Playstyle likely leads to the QB avoiding this matchup unless its open, but when it is, the pass usually nets the offense a large chunk of yards. Could be indicative of inconsistent DB performance by a strong player

- **Completions Only:** Usually  paired up against WRs QBs love to throw to, but often can make a good play to prevent a completion. Scoring may indicate an overcommitment to deflect/disrupt pass, which leads to big plays when missed

- **Impact & Completions Only:** Often play against WRs that QBs love to throw to, even in tight coverage. Despite the high targetting rate, these DBs often make a good play on the ball. This group of DBs are high caliber
    - **Spot Check:** It makes perfect sense that Jalen Ramsey would fall in this cateogory. We saw in our earlier analysis that he was often covering very strong WRs but did a praise-worthy job at mitigating their impact

- **Impact & Targets Only:** Known to allow the receiver to get open and catch on occassion, but mostly on shorter, less meaningful routes. Typically plays tight coverage and does not give up the big play. Likely also plays against WR1 + high-target WRs less often-- perhaps consider moving them to play more difficult matchups.
    - Spot Check: [Humphrey](https://en.wikipedia.org/wiki/Marlon_Humphrey) was first-team AllPro and made the ProBowl the year after this season (2019) for the first time. Maybe the Ravens _also_ saw the same potential this analysis is revealing!

- **Target & Completions Only:** Does a good job covering their players most of the time, but when they get beat they get beat badly

- **All Three Categories** DBs who are represent the top 10% for Target, Completion, and Impact Resistance. This segement is where all DBs should strive to be-- significantly reducing a WR's output across all meaningful metrics. While I don't have data to prove this, I would assume a HOF corner like Revis would often find their seasons stats placing them in this section.

# 4. Finding Shutdown Man-Coverage DBs

In [None]:
man_pd = revis_pd[(revis_pd.isManCoverage == 1) & (revis_pd.isManPlay == 1)].copy()

df = pd.DataFrame(columns=['defender','target_resistance','completion_resistance','impact_resistance'])
d = man_pd.groupby('defender').count()
d = d[d.gameId>50].index

df['defender'] = d

low_pairs = 0
low_reps = 0
low_targets = 0

# For every defender...
for defender in d:

#     defender = 'Stephon Gilmore'
    
    wrs = man_pd[man_pd.defender==defender].covering.unique()
    
    # for every receiver...
    non_cover_plays = revis_pd[revis_pd.covering.isin(wrs)&(revis_pd.defender!=defender)]
    cover_plays = man_pd[man_pd.covering.isin(wrs)&(man_pd.defender==defender)]
    
    non_cover_stats = non_cover_plays.groupby('covering')[['targetted','normalDiff']].mean() \
                                     .merge(
                                        non_cover_plays[non_cover_plays.targetted].groupby('covering')[['gameImpact','completed']].mean(),
                                        right_index=True,left_index=True
    )
    
    non_cover_stats.columns = ['non_cover_'+i if i != 'normalDiff' else 'normalDiff' for i in non_cover_stats.columns]
    
    
    cover_stats = cover_plays.groupby('covering')[['targetted']].sum() \
                                 .merge(
                                    cover_plays[cover_plays.targetted].groupby('covering')[['completed','gameImpact']].sum(),
                                    right_index=True,left_index=True
    )

    cover_stats.columns = ['pairing_'+i for i in cover_stats.columns]
    
    #... if they have played at least 50 snaps ...
    non_cover_snaps = non_cover_plays.groupby('covering')[['gameId']].count()
    non_cover_snaps.columns = ['non_cover_snap_count']
    non_cover_snaps = non_cover_snaps[non_cover_snaps.non_cover_snap_count>=50]
    
    # ...low quantity coverage pairing values should be downweighted until they hit 10
    cover_snaps = cover_plays.groupby('covering')[['gameId']].count()
    cover_snaps.columns = ['pairing_snap_count']
    
    z = cover_stats.merge(cover_snaps,left_index=True,right_index=True) \
                   .merge(non_cover_stats,left_index=True,right_index=True) 
    
    # Calculate Pairing-level resistance
    z['target_resistance'] = z.non_cover_targetted*z.pairing_snap_count-z.pairing_targetted
    z['completion_resistance'] = z.non_cover_completed*z.pairing_targetted-z.pairing_completed
    z['impact_resistance'] = z.non_cover_gameImpact*z.pairing_targetted-z.pairing_gameImpact
#     break
    
#     z['target_resistance'] = (z.non_cover_targetted*z.pairing_snap_count-z.pairing_targetted)*z.normalDiff/(z.non_cover_targetted*z.pairing_snap_count)
#     z['completion_resistance'] = (z.non_cover_completed*z.non_cover_targetted-z.pairing_completed)/(z.non_cover_completed*z.non_cover_targetted)
#     z['impact_resistance'] = (z.non_cover_epa_yards*z.non_cover_targetted-z.pairing_epa_yards)/(z.non_cover_epa_yards*z.non_cover_targetted)
    
    # Fix bug where 0 completions causes problems
    z.loc[z.non_cover_completed == 0, 'completion_resistance'] = np.mean(z[z.non_cover_completed!=0].completion_resistance)
    z.loc[z.non_cover_gameImpact == 0, 'impact_resistance'] = np.mean(z[z.non_cover_gameImpact!=0].impact_resistance)
#     break
    # Can iterate through all 3 with this
        # Reward > 0 resistance on good players
        # Punish < 0 resistance on bad players
    for col in ['target_resistance','completion_resistance','impact_resistance']:
        z.loc[(z[col] < 0), col] = z[col]*(5-z.normalDiff)*z.pairing_snap_count
        z.loc[(z[col] > 0), col] = z[col]*z.normalDiff*z.pairing_snap_count

    df.loc[df.defender==defender,'snaps'] = sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'impact_resistance'] = np.sum(z.impact_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'target_resistance'] = np.sum(z.target_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'completion_resistance'] = np.sum(z.completion_resistance)/sum(z.pairing_snap_count)

In [None]:
revis_stats = man_pd.groupby('defender')[['zscore']].mean().reset_index().merge(df, on=['defender'])

trace1=go.Scatter(x=revis_stats.impact_resistance, 
                                   y=revis_stats.completion_resistance, 
                                   mode='markers',
                                   name='',
                                   hovertemplate = revis_stats.defender,
                                    marker=dict(
                                        color=revis_stats.zscore,
                                        size=revis_stats.snaps/6,
                                        colorbar=dict(
                                                title="Average Coverage Difficulty"
                                            ),
                                    colorscale="speed"))

layout = go.Layout(
                title='Hidden Busts & Hidden Gems:<br>Comparing Avg. Yards Forfeit to Game Impact',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                xaxis = dict(title = 'Impact Resistance',range=[-30,105]),
                yaxis = dict(title = 'Completion Resistance'),
)

fig = go.Figure(data=[trace1], layout = layout)

# Add figure title
fig.update_layout(
    title_text="Shutdown Man-Coverage Players"
)

# fig.add_trace(go.Scatter(x=[0,1,2,0], y=[0,2,0,0], fill="toself", mode='lines'))

fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 0, y1= 0,
      xref= 'paper', x0= 0, x1= 1
    ),
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0=0, x1=0
    ),

])

# x, y = ellipse(x_center=6.5, y_center=150,
# #                    ax1 =[np.cos(np.pi/12), np.sin(np.pi/12)],  ax2=[-np.sin(np.pi/12),np.cos(np.pi/12)],
#                    a=1.5, b =25)
# fig.add_scatter(
#             x=x,
#             y=y,
#             mode = 'lines')

fig.update_layout(showlegend=False)

fig.add_annotation(
            xref="x",
            yref="y",
            x=85, y=3.6,
            text="<b>Revis Island</b>",
            showarrow=False)

fig.add_annotation(
            xref="x",
            yref="y",
            x=54, y=1.8,
            text="<b>Explorers</b>",
            showarrow=False)

l = revis_stats[revis_stats.defender == 'Richard Sherman']
fig.add_annotation(
            xref="x",
            yref="y",
            x=l.impact_resistance.values[0], y=l.completion_resistance.values[0],
            text="<b>Sherman</b>",
            showarrow=True)

fig.show()

## Visualizing Shutdown Corners

Calculating & Visualzing a DB's **Completion Resistance** and **Impact Resistance** on man coverage plays finally reveals to us what we have been searching for: 


## Revis Island!

![](https://pix10.agoda.net/hotelImages/186782/-1/d767492a24b0547a44db71f806e03d1d.jpg?s=1024x768)


... or at least an island that's part of the Revis archipelago.

### Revis Islanders
While strong cornerbacks in their own right when looking at all plays, Jalen Ramsay, James Bradberry, Joe Harden, Taron Johnson, and Marlon Humphrey are cut above the pack when it comes to man-to-man coverage. These is a wide gap across both completion and impact resistance between these players and the rest of the DBs in the NFL. According to game resistance on man coverage, these are the most elite DBs in the leage. If your team is looking to gain a shutdown DB, trade for one of these players.

_Note: Taron Johnson, while I'm sure an elite player, has fewer man-to-man snaps than his fellow Revis Islander. The jury is still out on his status_

### Revis Explorers
There is an interesting cohort of DBs that also falls out of this list: DBs who have a low number of man coverage plays in 2018, but have some promising results. Particularly Anothony Brown, Adoree' Jackson, and Carlton Davis. They have started to separate themselves from the rest of the leage in terms of impact, but have a small sample size and average completion resistance. A recommendation for teams that these DBs play for is to test their mettle by playing more man coverage, ideally against WR1s, and see if they have what it takes to make it to Revis Island. Stephone Gilmore, Adoree' Jackson, and William Jackson all also fall into this category, but have a larger number of snaps. Perhaps they need some more technical help to acheive Revis-Island-status, but they clearly still have the potential!

#### Interesting Specific Examples:

**Janoris Jenkins** sits somewhere between the Revis Islanders and Explorers. Depending on his previous year stats, he could be trending towards Revis-Island status or dropping down out of it. Either way, he is in a year of transition it seems.

**Stephon Gilmore**, the best corner in the game according to all plays is "only" in the Explorers grouping. The explorers are still the top 10% of corners in the league, but I would have expected him to be a Revis Islander This leads me to believe that these he is great in team defenses, but when it comes to one-on-one coverage, there are better specialists than he.

**Richard Sherman** is a famous CB in football today, due to his spectacular ["Swat Heard Round the World"](https://www.youtube.com/watch?v=OhtfaME_v0Y) in coverage against Crabtree. Before this analysis and based on that playoff moment, I assumed he was an excellemt shutdown corner. But the data shows that he is not that great in man-to-man coverage. I guess one play does not make a career...

## Visualizing DB Man Coverage Performance against Elite WRs

In [None]:
best_wrs = man_pd[['covering','difficulty']].drop_duplicates().sort_values('difficulty',ascending=False).head(10)[['covering']]

best_man_pd = man_pd.merge(best_wrs, on='covering')

# man_pd = revis_pd[(revis_pd.isManCoverage == 1) & (revis_pd.isManCoverage == 1)].copy()

df = pd.DataFrame(columns=['defender','target_resistance','completion_resistance','impact_resistance'])
d = best_man_pd.groupby('defender').count()
d = d[d.gameId>25].index

df['defender'] = d

low_pairs = 0
low_reps = 0
low_targets = 0

# For every defender...
for defender in d:

#     defender = 'Stephon Gilmore'
    
    wrs = best_man_pd[best_man_pd.defender==defender].covering.unique()
    
    # for every receiver...
    non_cover_plays = revis_pd[revis_pd.covering.isin(wrs)&(revis_pd.defender!=defender)]
    cover_plays = best_man_pd[best_man_pd.covering.isin(wrs)&(best_man_pd.defender==defender)]
    
    non_cover_stats = non_cover_plays.groupby('covering')[['targetted','normalDiff']].mean() \
                                     .merge(
                                        non_cover_plays[non_cover_plays.targetted].groupby('covering')[['gameImpact','completed']].mean(),
                                        right_index=True,left_index=True
    )
    
    non_cover_stats.columns = ['non_cover_'+i if i != 'normalDiff' else 'normalDiff' for i in non_cover_stats.columns]
    
    
    cover_stats = cover_plays.groupby('covering')[['targetted']].sum() \
                                 .merge(
                                    cover_plays[cover_plays.targetted].groupby('covering')[['completed','gameImpact']].sum(),
                                    right_index=True,left_index=True
    )

    cover_stats.columns = ['pairing_'+i for i in cover_stats.columns]
    
    #... if they have played at least 50 snaps ...
    non_cover_snaps = non_cover_plays.groupby('covering')[['gameId']].count()
    non_cover_snaps.columns = ['non_cover_snap_count']
    non_cover_snaps = non_cover_snaps[non_cover_snaps.non_cover_snap_count>=50]
    
    # ...low quantity coverage pairing values should be downweighted until they hit 10
    cover_snaps = cover_plays.groupby('covering')[['gameId']].count()
    cover_snaps.columns = ['pairing_snap_count']
    
    z = cover_stats.merge(cover_snaps,left_index=True,right_index=True) \
                   .merge(non_cover_stats,left_index=True,right_index=True) 
    
    # Calculate Pairing-level resistance
    z['target_resistance'] = z.non_cover_targetted*z.pairing_snap_count-z.pairing_targetted
    z['completion_resistance'] = z.non_cover_completed*z.pairing_targetted-z.pairing_completed
    z['impact_resistance'] = z.non_cover_gameImpact*z.pairing_targetted-z.pairing_gameImpact
#     break
    
    
    # Fix bug where 0 completions causes problems
    z.loc[z.non_cover_completed == 0, 'completion_resistance'] = np.mean(z[z.non_cover_completed!=0].completion_resistance)
    z.loc[z.non_cover_gameImpact == 0, 'impact_resistance'] = np.mean(z[z.non_cover_gameImpact!=0].impact_resistance)
#     break
    # Can iterate through all 3 with this
        # Reward > 0 resistance on good players
        # Punish < 0 resistance on bad players
    for col in ['target_resistance','completion_resistance','impact_resistance']:
        z.loc[(z[col] < 0), col] = z[col]*(5-z.normalDiff)*z.pairing_snap_count
        z.loc[(z[col] > 0), col] = z[col]*z.normalDiff*z.pairing_snap_count

    df.loc[df.defender==defender,'snaps'] = sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'impact_resistance'] = np.sum(z.impact_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'target_resistance'] = np.sum(z.target_resistance)/sum(z.pairing_snap_count)
    df.loc[df.defender==defender,'completion_resistance'] = np.sum(z.completion_resistance)/sum(z.pairing_snap_count)
    
    

In [None]:
# wr_list = best_man_pd[['defender','covering']].drop_duplicates().groupby('defender')['covering'].apply(list).reset_index()

# df_plot = df.merge(wr_list, on=['defender']).sort_values('impact_resistance',ascending=False)

# df_plot[df_plot.snaps>20]

In [None]:
wr_list = best_man_pd[['defender','covering']].drop_duplicates().groupby('defender')['covering'].apply(list).reset_index()

df_plot = df.merge(wr_list, on=['defender'])

df_plot['impact_resistance'] = df_plot['impact_resistance']/df.snaps

df_plot = df_plot[df_plot.snaps>20].sort_values('impact_resistance',ascending=False)

trace1 = go.Bar(x=df_plot.defender, y=df_plot.impact_resistance, name="Game Impact Score", 
                hovertemplate =['Top Receivers Covered:<br>'+ '<br>'.join(i) for i in df_plot.covering])

layout = go.Layout(
                title='Shutdown DBs against The Best WRs in the League',
                paper_bgcolor='rgba(0,0,0,0)',
                plot_bgcolor='rgba(0,0,0,0)',
                yaxis = dict(title = 'Impact Resistance per Matchup Snap'),
)

fig.update_layout(shapes=[
    dict(
      type= 'line',
      line=dict(
        color="gray",
        width=1,
        dash="dot",
        ),
      yref= 'y', y0= 0, y1= 0,
      xref= 'paper', x0= 0, x1= 1
    ),
])

fig = go.Figure(data=trace1, layout = layout)

fig.update_layout(showlegend=False)

fig.show()

The above chart shows who are the DBs when it comes ot covering the strongest WRs in the league. The elite DBs of the NFL playing against the elite WRs of the NFL.

Interestingly, **Jimmy Smith** has played enough man-to-man snaps against top-10 WRs (25) to make this list. But his inclusion is welcome, because it highlights how strong the remaining 3 DBs are. **Bradberry, Ramsey, and Harden** are on Revis Island against all players, and maintain their dominance with elite WRs. These are clearly the best shutdown DBs in the leage in 2018. 

An surprising element of this chart not immediately obvious that no CB that had more than 25 snaps against a top 10 WR has a negative Impact Resistance. The narrative this fits is that if you can't prove you can cover Julio Jones, they'll pull you off that matchup. 

**Lattimore** is an intruguing entry because he middle of the pack in terms of man-to-man coverage in general, his team trusts him to make the plays against the best in the league. And does a decent job of it too!  clearly has the potential to be the top shutdown corner in the lead, and already is almost there for elite WRs. 

**Marlon Humprhey**, present in Revis Island for all man-coverage plays, does not make this list, leading us to believe that perhaps he is a shutdown corner against your typical WR, but not Julio Jones and company. Or maybe he simply needs more reps against top-10 receivers, and he can prove this analysis wrong in Data Bowl 2021.

### Real World Data

After the 2019 season, Jalen Ramsey [signed a 5 year, 105million dollar contract, 71 million of it guaranteed](https://www.espn.com/nfl/story/_/id/29844575/rams-cb-jalen-ramsey-agree-five-year-105m-contract-agent-says). This was the largest guarantee in a contract for a DB in the history of the NFL. 

Some sports analysts questioned whether he was worth that much. 

But using the data collected from above, Ramsey is absoltely one of THE BEST cornerbacks when it comes to shuting down tough receivers in man-to-man coverage. Which is particuarly impressive given his resume of WRs he covered in 2018. If your team is in need of a modern-day Darelle Revis, paying him a gigantic contract makes perfect sense, and the data supports this.

![](https://www.totalprosports.com/wp-content/uploads/2018/08/GettyImages-908463338.jpg)

# Conclusion: There and Back Again

Let's recap what we covered, and few interesting insights/actions uncovered:
- **Process:** By modifying play outcome by game context and matchup difficulty, we can distill out the true **game impact** of a play. Game Impact often varies significantly from play outcome
- **Process:** By comparing a WR's game impact acheived against a DB versus a WR's typical performance, we can capture a DB's **resistance** to forfeitting targets,completions, and game impact
- **Insight:** While Marlon Humphrey and Tradvious White are stellar DBs, their yards stat line is not what it first appears. They give up fewer yards than average, but the yards they give up are worse than the appear. Turn to a Clutch cornerback like Denzel Ward or John Norman-- who gives up a few more yards but makes the big plays
- **Insight:** Stephon Gilmore, while one of the strongest overall cornerbacks in the league, is not comparable to Darelle Revis in terms of shutdown man coverage. That honor is reserved for Jalen Ramsay and Marlon Humphrey
- **Action:** If your team has a DB that is a "Revis Explorer," provide them with more opportunities to play man-to-man against WR1. You may have a budding Darelle Revis on your hands!
- **Action:** On critical plays in the game, play your most Clutch DB agaisnt their best receiver-- Ignore the statline, Cluth DBs are more likely to get a stop

## Next Steps
Aka "things I wish I could've done"
- I would absolutely love to run this analysis for Darelle Revis' 2009 and 2010 seasons, and see where he compares to the 2018 cohort. Would Jalen Ramsay and company be comperable? Better, worse? It would be a fun yardstick to measure success on
- **No Man is a (Revis) Island:** Sticking with the narrative around individual-player performance, I had to scrap my analysis around team-level strength. But now that I have player-levels stats, it would be a fun exercise to explore if the best defensive teams always have the best players, or if a collection of above-average players does a better job than 1 super star
- **Making a Revis:** A further analysis that would so fun for someone to explore is finding explanatory elements of what makes a shutdown corner. This analysis alluded to strictly output, but finding technique-related differences would be amazing to see (this would require pretty serious tracking equipment...maybe 2024)
- **QB Games** Shaking off QBs, or messing up their reads, would be an amazing future topic. My original metric for difficulty included which read number a WR was. But we did not have QB head movement, so could not check who their first/second reads.
- **Longitudinal Study:** Strong Football metrics can stand the test of time. Would love to see resistance scores over multiple seasons, and see how they improve/degrade for each corner back. Hopefully **Resistance** can be used to track a DBs skill across their career! I also wanted to run this on college prospects and see if it (1) translated to the pros and (2) could result in draft-prospecting strategies

Thanks for reading, and I hope you enjoyed!

![](https://blog.sevenponds.com/wp-content/uploads/2018/01/800px-Sunset_in_Zadar_2-513x344.jpg)