# Deriving success rate from play-by-play details

The purpose of this notebook is to take detailed NFL play-by-play data that I previously scraped from ESPN's website and extract some specific information about each play, including whether the playcall was a run, pass, punt, or field goal, and how many yards end up being gained on the play. That information then can be parsed to see whether a play was successful. For this analysis, I am using Football Outsiders's definition of success rate: a successful play gains 50% of the yards to gain on first down, 70% on second down, and 100% on third or fourth down.

Once that information is extracted from the play-by-play details, the information will be in pandas dataFrames, which I can filter to create a cross-section of any kind of game situations: success rate by down, success rate allowed by a particular defense in away games, etc.

So without further ado, some code to get things set up

In [1]:
# Load necessary packages
import pandas as pd
import numpy as np
import copy

In [2]:
# Load dataframes from disk
gamedata_df = pd.read_csv("espn_gamedata.csv")
# Import csv with special option to make sure it works properly
allplays_df = pd.read_csv("espn_parsedplays_2004-2016.csv", 
                          encoding = "ISO-8859-1",
                          low_memory = False
                         )

# And have a look at what we've got to deal with
print(allplays_df.info())
print(gamedata_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 570330 entries, 0 to 570329
Data columns (total 23 columns):
Unnamed: 0         570330 non-null int64
downdist           570330 non-null object
detail             568720 non-null object
home               570330 non-null object
away               570330 non-null object
possession         570330 non-null object
home_score         570330 non-null int64
away_score         570330 non-null int64
gameId             570330 non-null int64
down               570330 non-null int64
dist               570330 non-null object
home_fieldpos      570330 non-null int64
qtr                570330 non-null object
time_rem           570330 non-null object
secs_rem           570330 non-null int64
home_lead          570330 non-null int64
total_score        570330 non-null int64
adj_lead           570330 non-null float64
OT                 570330 non-null int64
home_possession    570330 non-null int64
home_win           570330 non-null object
season           

## A function to parse the detailed play-by-play

In [3]:
# Objective: Assuming no fumbled snap or pre-snap penalty or other shenanigans,
#  should be able to figure out which plays are "successful" and then look at success rates

# Practical things to deal with:
    # What raw data to handle?
    # Need to be able to take a raw-ish pbp table and extract success rate
    # So make one function that will try and label plays as run/pass, yardage gained, and success
def found_pass(detail):
    d = detail.lower()
    if " pass" in d:
        return True
    elif " sacked " in d:
        return True
    elif " scramble" in d:
        return True
    elif " interception" in d:
        return True
    elif " intercepted" in d:
        return True
    return False
    
def found_run(detail):
    d = detail.lower()
    if not " scramble" in d:
        if " run " in d:
            return True
        elif " rush" in d:
            return True
        elif " left tackle " in d:
            return True
        elif " right tackle " in d:
            return True
        elif " up the middle " in d:
            return True
        elif " left end " in d:
            return True
        elif " right end " in d:
            return True
        elif " left guard " in d:
            return True
        elif " right guard " in d:
            return True
        return False
        
def found_punt(detail):
    d = detail.lower()
    if " punts " in d:
        return True
    elif " punt return" in d:
        return True
    return False
    
def found_fieldgoal(detail):
    d = detail.lower()
    if " field goal " in d:
        return True
    return False

def yds_run( i, detail ):
    words = detail.lower().split()
    # look for yardage in format "for X yards"
    for j, w in enumerate(words):
        if w == "for" and len(words) > j+2:
            if words[j+2].rstrip(".,") in ("yd","yds","yrd","yrds","yard","yards"):
                return int(words[j+1])
            # or "for no gain"
            elif "no" in words[j+1] and "gain" in words[j+2]:
                return 0
        
        # or "X yard run/rush"
        elif w in ("yd","yds","yrd","yrds","yard","yards") and len(words) >= j+2:
            if words[j+1].rstrip(".,") in ("run","rush"):
                return int(words[j-1])
        
    return "x"
    
def yds_passed( i, detail ):
    words = detail.lower().split()
    # look for yardage in format "for X yards"
    for j, w in enumerate(words):
        if w == "for" and len(words) > j+2:
            if words[j+2].rstrip(".,") in ("yd","yds","yrd","yrds","yard","yards"):
                return int(words[j+1])
            # or "for no gain"
            elif "no" in words[j+1] and "gain" in words[j+2]:
                return 0
            
        # or "X yard pass"
        elif w in ("yd","yds","yrd","yrds","yard","yards") and len(words) >= j+2:
            if words[j+1].rstrip(".,") in ("pass"):
                return int(words[j-1])

    # Or maybe pass went incomplete
    if "incomplete" in detail.lower():
        return 0
    
    # Or maybe pass was intercepted. In this case, just say yds_gained is zero
    elif ("intercepted" in detail.lower()) or ("interception" in detail.lower()):
        return 0
    
    return "x"

    
def parse_details(df):
    print(df.columns)
    details = df.detail.values
    down = df.down.values
    
    # Make a bunch of dictionaries for storiing play-specific data
    # This method assumes that play details are entirely unique.
    # If that assumption fails, would need to work on building lists based on order of "details"
    is_parseable = [False for d in details]
    is_run = [False for d in details]
    is_pass = [False for d in details]
    is_punt = [False for d in details]
    is_fieldgoal = [False for d in details]
    yds_gained = ["x" for d in details]
    qual_play = [False for d in details]
    
    # Loop through details going through logic tree to find appropriate values
    for i, d in enumerate(details):
        
        # Look exclusively for play details on downs 1-4
        if down[i] in [1,2,3,4]:
            
            # Try and parse a pass
            if found_run(d):
                is_run[i] = True
                yds_gained[i] = yds_run(i,d)
            
            # Try and parse a run
            elif found_pass(d):
                is_pass[i] = True
                yds_gained[i] = yds_passed(i,d)
            
            # Try and parse a punt
            elif found_punt(d):
                is_punt[i] = True
            
            # Try and parse a field goal
            elif found_fieldgoal(d):
                is_fieldgoal[i] = True
                
    for i, yds in enumerate(yds_gained):
        if (is_run[i] or is_pass[i]) and (yds != "x"):
            is_parseable[i] = True
            qual_play[i] = True
        elif is_punt[i]:
            is_parseable[i] = True
        elif is_fieldgoal[i]:
            is_parseable[i] = True
                
                
    # Now write the columns to the end of the df
    df['is_parseable'] = is_parseable
    df['is_run'] = is_run
    df['is_pass'] = is_pass
    df['is_punt'] = is_punt
    df['is_fieldgoal'] = is_fieldgoal
    df['yds_gained'] = yds_gained
    df['qual_play'] = qual_play
    
    return df

In [4]:
def process_pbp(df):
    # Function to parse details in pbp tables, 
    # adding columns for play selection, yds_gained, and success binary
    
    # Process detail column of df
    parsed_df = parse_details(df)
    
    # Filter to just "quality plays" that are totally parseable
    qp = copy.deepcopy(parsed_df[parsed_df.qual_play == True])
    
    # Get yardage to gain for goal to go situations
    dist = qp.dist.values
    fixed_dist = [d for d in dist]
    home_poss = qp.home_possession.values
    for i, loc in enumerate(qp.home_fieldpos.values):
        if dist[i] == "Goal":
            if home_poss[i] == 1:
                fixed_dist[i] = 50 - loc
            else:
                fixed_dist[i] = loc + 50
                
    qp['yds_to_go'] = fixed_dist
    
    # Make column for successful plays
    down = qp.down.values
    dist = qp.yds_to_go.values
    gain = qp.yds_gained.values
    is_successful = [0 for d in down]
    for i, d in enumerate(down):
        if d == 1:
            if float(gain[i]) >= 0.5*float(dist[i]):
                is_successful[i] = 1
        elif d == 2:
            if float(gain[i] >= 0.7*float(dist[i])):
                is_successful[i] = 1
        elif d in (3,4):
            if gain[i] >= int(dist[i]):
                is_successful[i] = 1
                
    qp['is_success'] = is_successful
    
    # Finally, make column for specifying team on defense
#    home = qp.home.values
#    away = qp.away.values
#    offense = qp.possession.values
    defense = [t for t in qp.home.values]
    zipped_things = zip(qp.possession.values,
                        qp.home.values,
                        qp.away.values)
    for i, (off, home, away) in enumerate(zipped_things):
        if off == home:
            defense[i] = away
        elif off == away:
            defense[i] = home
            
    qp['defense'] = defense
    
    # Rename possession column to offense
    qp.rename(columns={'possession':'offense'}, inplace=True)
    
    
    # and eventually return the dataFrame, 
    # which is now separate in memory from original
    return qp

## Let's see how to use these functions



In [53]:
# Run the processing function
plays = process_pbp(allplays_df)
# Filter to just plays from 2015
plays_2015 = plays.loc[plays.season == 2015]

# Create some sample output to look at
cols = ['offense','defense','home','away','down',
        'yds_to_go','yds_gained','is_success','detail']
plays_2015[cols].sample(10)

Index(['Unnamed: 0', 'downdist', 'detail', 'home', 'away', 'possession',
       'home_score', 'away_score', 'gameId', 'down', 'dist', 'home_fieldpos',
       'qtr', 'time_rem', 'secs_rem', 'home_lead', 'total_score', 'adj_lead',
       'OT', 'home_possession', 'home_win', 'season', 'week', 'is_parseable',
       'is_run', 'is_pass', 'is_punt', 'is_fieldgoal', 'yds_gained',
       'qual_play'],
      dtype='object')


Unnamed: 0,offense,defense,home,away,down,yds_to_go,yds_gained,is_success,detail
494058,DEN,OAK,DEN,OAK,2,5,6,1,(13:21 - 4th) (Shotgun) R.Hillman left guard ...
524711,HOU,JAX,JAX,HOU,1,15,27,1,(13:33 - 4th) B.Hoyer pass short left to A.Hu...
505744,BAL,JAX,JAX,BAL,1,10,2,0,(14:27 - 3rd) J.Forsett left tackle to BLT 22...
498970,DAL,NYG,DAL,NYG,2,3,0,0,(5:40 - 1st) M.Cassel pass incomplete short r...
514323,CAR,NO,CAR,NO,3,3,3,1,(10:35 - 3rd) (Shotgun) C.Newton pass short r...
496311,SD,GB,SD,GB,2,10,25,1,"(6:06 - 1st) (No Huddle, Shotgun) M.Gordon ri..."
505777,JAX,BAL,JAX,BAL,2,10,0,0,(14:56 - 4th) (Shotgun) B.Bortles pass incomp...
520041,ARI,PHI,ARI,PHI,2,6,3,0,(11:46 - 4th) C.Palmer pass short left to Jo....
522053,BAL,PIT,PIT,BAL,2,15,0,0,(14:52 - 3rd) (Shotgun) R.Mallett pass incomp...
499576,NE,MIA,MIA,NE,2,9,13,1,(8:57 - 3rd) (Shotgun) T.Brady pass short lef...


## Have a look at some sample analysis

In [54]:
# Extract success rate by down for each team in the league
grouped = plays_2015.groupby(['offense','down'])
gdf = grouped.is_success.agg(['mean','count'])
# Look at team's success rate
print(grouped.is_success.mean().unstack(0).transpose())
# Also look at how many qualifying plays each team ran
print(grouped.is_success.count().unstack(0).transpose())


# and also get league averages from grouping by down as an index 
gdf.groupby(level='down').mean()

down            1         2         3         4
offense                                        
ARI      0.444668  0.461972  0.470874  0.625000
ATL      0.466135  0.395028  0.476395  0.571429
BAL      0.410678  0.442308  0.372385  0.480000
BUF      0.439294  0.399425  0.413043  0.466667
CAR      0.436735  0.477654  0.426540  0.700000
CHI      0.409692  0.425770  0.458150  0.600000
CIN      0.438178  0.463768  0.421053  0.692308
CLE      0.396476  0.364641  0.426778  0.416667
DAL      0.453333  0.392216  0.367150  0.470588
DEN      0.410417  0.385042  0.372294  0.571429
DET      0.463918  0.436261  0.381395  0.571429
GB       0.398287  0.426396  0.330472  0.478261
HOU      0.390099  0.350404  0.404580  0.333333
IND      0.406048  0.401685  0.409836  0.615385
JAX      0.457082  0.382090  0.354545  0.470588
KC       0.411085  0.447205  0.393365  0.700000
MIA      0.414798  0.397015  0.320574  0.434783
MIN      0.374150  0.398827  0.374408  0.500000
NE       0.434426  0.432507  0.444444  0

Unnamed: 0_level_0,mean,count
down,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.422906,468.40625
2,0.411927,353.71875
3,0.401065,223.78125
4,0.532783,15.5


In [55]:
grouped = plays_2015.groupby(['offense','down'])
gdf = grouped.is_success.agg(['mean','count'])
# Access one team's stats
print(gdf.loc['ARI'])

          mean  count
down                 
1     0.444668    497
2     0.461972    355
3     0.470874    206
4     0.625000      8


In [56]:
grouped = plays_2015[plays_2015.is_pass].groupby(['offense','down'])
gdf = grouped.is_success.agg(['mean','count'])
gdf.unstack(0).transpose().unstack(0)

down,1,1,2,2,3,3,4,4
Unnamed: 0_level_1,mean,count,mean,count,mean,count,mean,count
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
ARI,0.523207,237.0,0.481982,222.0,0.457627,177.0,0.5,6.0
ATL,0.556391,266.0,0.373913,230.0,0.465,200.0,0.470588,17.0
BAL,0.460208,289.0,0.423581,229.0,0.346734,199.0,0.5,18.0
BUF,0.497561,205.0,0.421053,190.0,0.378531,177.0,0.384615,13.0
CAR,0.530702,228.0,0.497436,195.0,0.380368,163.0,0.5,4.0
CHI,0.512195,205.0,0.40625,192.0,0.427027,185.0,0.5,8.0
CIN,0.48954,239.0,0.507463,201.0,0.421384,159.0,0.571429,7.0
CLE,0.447059,255.0,0.356,250.0,0.410138,217.0,0.380952,21.0
DAL,0.541284,218.0,0.384615,208.0,0.340782,179.0,0.333333,12.0
DEN,0.503968,252.0,0.368182,220.0,0.358209,201.0,0.6,10.0


In [57]:
grouped = plays_2015[plays_2015.is_run].groupby(['offense','down'])
gdf = grouped.is_success.agg(['mean','count'])
gdf.unstack(0).transpose().unstack(0)

down,1,1,2,2,3,3,4,4
Unnamed: 0_level_1,mean,count,mean,count,mean,count,mean,count
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
ARI,0.373077,260.0,0.428571,133.0,0.551724,29.0,1.0,2.0
ATL,0.364407,236.0,0.431818,132.0,0.545455,33.0,1.0,4.0
BAL,0.338384,198.0,0.474074,135.0,0.5,40.0,0.428571,7.0
BUF,0.391129,248.0,0.373418,158.0,0.528302,53.0,1.0,2.0
CAR,0.354962,262.0,0.453988,163.0,0.583333,48.0,0.833333,6.0
CHI,0.325301,249.0,0.448485,165.0,0.595238,42.0,1.0,2.0
CIN,0.382883,222.0,0.402778,144.0,0.42,50.0,0.833333,6.0
CLE,0.331658,199.0,0.383929,112.0,0.590909,22.0,0.666667,3.0
DAL,0.37069,232.0,0.404762,126.0,0.535714,28.0,0.8,5.0
DEN,0.307018,228.0,0.411348,141.0,0.466667,30.0,0.5,4.0


In [58]:
gdf.loc['CAR']

Unnamed: 0_level_0,mean,count
down,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.354962,262
2,0.453988,163
3,0.583333,48
4,0.833333,6


In [79]:
gdf.loc['CAR',['mean']]  # returns dataFrame

Unnamed: 0_level_0,mean
down,Unnamed: 1_level_1
1,0.354962
2,0.453988
3,0.583333
4,0.833333


In [80]:
gdf.loc['CAR','mean']  # returns series

down
1    0.354962
2    0.453988
3    0.583333
4    0.833333
Name: mean, dtype: float64

In [60]:
gdf.loc[('CAR',2),'mean']

0.45398773006134968

# Bones for CSR article

In [61]:
# load plays for 2015 and 2016
plays_2015 = plays[plays.season == 2015]
plays_2016 = plays[plays.season == 2016]

In [62]:
# Look at first down success rate overal, then broken down into pass and run
sr_all2015 = plays_2015.groupby(['offense','down']).is_success.agg(['mean','count'])
sr_all2015 = sr_all2015.unstack(0).transpose().unstack(0)

sr_all2016 = plays_2016.groupby(['offense','down']).is_success.agg(['mean','count'])
sr_all2016 = sr_all2016.unstack(0).transpose().unstack(0)

offense_sr = pd.merge(sr_all2015, sr_all2016,
         how='outer', # to capture both STL and LAR
         left_index=True,
         right_index=True,
         suffixes=('_2015','_2016')
        )

# Add row for league averages
offense_sr.loc['league_avg'] = offense_sr.mean()

In [63]:
offense_sr

down,1_2015,1_2015,2_2015,2_2015,3_2015,3_2015,4_2015,4_2015,1_2016,1_2016,2_2016,2_2016,3_2016,3_2016,4_2016,4_2016
Unnamed: 0_level_1,mean,count,mean,count,mean,count,mean,count,mean,count,mean,count,mean,count,mean,count
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
ARI,0.444668,497.0,0.461972,355.0,0.470874,206.0,0.625,8.0,0.449219,512.0,0.418848,382.0,0.424779,226.0,0.5,10.0
ATL,0.466135,502.0,0.395028,362.0,0.476395,233.0,0.571429,21.0,0.493976,498.0,0.467492,323.0,0.417989,189.0,0.533333,15.0
BAL,0.410678,487.0,0.442308,364.0,0.372385,239.0,0.48,25.0,0.422594,478.0,0.388587,368.0,0.369919,246.0,0.444444,18.0
BUF,0.439294,453.0,0.399425,348.0,0.413043,230.0,0.466667,15.0,0.439825,457.0,0.44507,355.0,0.413636,220.0,0.466667,15.0
CAR,0.436735,490.0,0.477654,358.0,0.42654,211.0,0.7,10.0,0.418219,483.0,0.369748,357.0,0.378723,235.0,0.578947,19.0
CHI,0.409692,454.0,0.42577,357.0,0.45815,227.0,0.6,10.0,0.486316,475.0,0.44582,323.0,0.4,185.0,0.307692,13.0
CIN,0.438178,461.0,0.463768,345.0,0.421053,209.0,0.692308,13.0,0.399151,471.0,0.491935,372.0,0.409091,220.0,0.777778,9.0
CLE,0.396476,454.0,0.364641,362.0,0.426778,239.0,0.416667,24.0,0.393665,442.0,0.369231,325.0,0.358744,223.0,0.684211,19.0
DAL,0.453333,450.0,0.392216,334.0,0.36715,207.0,0.470588,17.0,0.465021,486.0,0.512968,347.0,0.458763,194.0,1.0,9.0
DEN,0.410417,480.0,0.385042,361.0,0.372294,231.0,0.571429,14.0,0.416667,456.0,0.408571,350.0,0.353191,235.0,0.5625,16.0


In [64]:
sr_all2016 = plays_2016.groupby(['offense','down']).is_success.agg(['mean','count'])
sr_all2016.loc['CAR','mean']

down
1    0.418219
2    0.369748
3    0.378723
4    0.578947
Name: mean, dtype: float64

In [65]:
offense_sr.loc['CAR'][['1_2015','1_2016']]

down         
1_2015  mean       0.436735
        count    490.000000
1_2016  mean       0.418219
        count    483.000000
Name: CAR, dtype: float64

In [66]:
offense_sr['1_change','mean'] = offense_sr['1_2016','mean']-offense_sr['1_2015','mean']
offense_sr[['1_2015','1_2016','1_change']].sort_values(by=('1_change','mean'),ascending=False)

down,1_2015,1_2015,1_2016,1_2016,1_change
Unnamed: 0_level_1,mean,count,mean,count,mean
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
NO,0.413519,503.0,0.500971,515.0,0.087452
CHI,0.409692,454.0,0.486316,475.0,0.076624
SF,0.378571,420.0,0.440181,443.0,0.061609
HOU,0.390099,505.0,0.449679,467.0,0.05958
GB,0.398287,467.0,0.437367,471.0,0.03908
MIN,0.37415,441.0,0.404977,442.0,0.030828
IND,0.406048,463.0,0.435685,482.0,0.029637
ATL,0.466135,502.0,0.493976,498.0,0.02784
OAK,0.413483,445.0,0.440252,477.0,0.026768
WSH,0.420935,449.0,0.447257,474.0,0.026322


## A bit of analysis
Carolina's success rate on first down dipped a bit from the electrifying 2015 campaign into 2016, from 43.7% to 41.8%. But while the any drop in efficiency may seem like a big deal, what the Panthers experienced was actually one of the smaller year-on-year changes in efficiency in the league. What could potentially be more instructive is to break this down into run vs pass plays and see how successful the offense was running each.

In [67]:
# Look at success rate on passing plays
sr_pass2015 = plays_2015[plays_2015.is_pass].groupby(['offense','down']).is_success.agg(['mean','count'])
sr_pass2015 = sr_pass2015.unstack(0).transpose().unstack(0)

sr_pass2016 = plays_2016[plays_2016.is_pass].groupby(['offense','down']).is_success.agg(['mean','count'])
sr_pass2016 = sr_pass2016.unstack(0).transpose().unstack(0)

pass_sr = pd.merge(sr_pass2015, sr_pass2016,
         how='outer',
         left_index=True,
         right_index=True,
         suffixes=('_2015','_2016')
        )

# Add row for league averages
pass_sr.loc['league_avg'] = pass_sr.mean()

# Do the same for designed runs
sr_run2015 = plays_2015[plays_2015.is_run].groupby(['offense','down']).is_success.agg(['mean','count'])
sr_run2015 = sr_run2015.unstack(0).transpose().unstack(0)

sr_run2016 = plays_2016[plays_2016.is_run].groupby(['offense','down']).is_success.agg(['mean','count'])
sr_run2016 = sr_run2016.unstack(0).transpose().unstack(0)

run_sr = pd.merge(sr_run2015, sr_run2016,
         how='outer',
         left_index=True,
         right_index=True,
         suffixes=('_2015','_2016')
        )

# Add row for league averages
run_sr.loc['league_avg'] = run_sr.mean()

In [68]:
pass_sr.loc[['CAR','league_avg']][['1_2015','1_2016']]

down,1_2015,1_2015,1_2016,1_2016
Unnamed: 0_level_1,mean,count,mean,count
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
CAR,0.530702,228.0,0.4875,240.0
league_avg,0.491067,246.4375,0.493068,246.71875


In [69]:
run_sr.loc[['CAR','league_avg']][['1_2015','1_2016']]

down,1_2015,1_2015,1_2016,1_2016
Unnamed: 0_level_1,mean,count,mean,count
offense,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
CAR,0.354962,262.0,0.349794,243.0
league_avg,0.346678,221.96875,0.357816,220.40625


It turns out that the offense's success rate on first down dipped on pass plays (from 53 to 48.8 percent), but was pretty steady at around 35% on run plays. We can also see that the run/pass ratio changed a little bit between the two seasons, with the team going a bit more run-heavy on first down in 2016.

So what does that mean? It's hard to make sweeping conclusions based on one statistic like this, but I think two things are important to highlight: the offense's pass efficiency dropped on first down, and the team could have tried to compensate for that by running the ball more. My guess is that these changes point to the void created by Michael Oher's season-ending concussion early in 2016 (he only played in 3 games), forcing the team to plug Mike Remmers in at LT. With the dropoff in pass protection (from both OL and the refs) and the resulting beating Cam took over the course of the season, he was unable to maintain his MVP form from 2015.

We can also do the same kind of analysis on second and third downs to see if this basic hypothesis holds up.

In [70]:
print("OVERALL SUCCESS RATE")
print(offense_sr.loc[['CAR','league_avg']][['2_2015','2_2016']])
print("\nPASSING SUCCESS RATE")
print(pass_sr.loc[['CAR','league_avg']][['2_2015','2_2016']])
print("\nRUSHING SUCCESS RATE")
print(run_sr.loc[['CAR','league_avg']][['2_2015','2_2016']])

OVERALL SUCCESS RATE
down          2_2015               2_2016           
                mean      count      mean      count
offense                                             
CAR         0.477654  358.00000  0.369748  357.00000
league_avg  0.411927  353.71875  0.419936  350.28125

PASSING SUCCESS RATE
down          2_2015               2_2016           
                mean      count      mean      count
offense                                             
CAR         0.497436  195.00000  0.383085  201.00000
league_avg  0.426706  219.09375  0.431139  216.28125

RUSHING SUCCESS RATE
down          2_2015             2_2016       
                mean    count      mean  count
offense                                       
CAR         0.453988  163.000  0.352564  156.0
league_avg  0.387537  134.625  0.402165  134.0


Second down tells quite a story about the difference between the 2015 and 2016 offense. Where the 2015 offense was way better than the league average, finding success on nearly half of second downs and performing well above the league average at, the 2016 offense struggled on second down, with just a 38% success rate. That's a drop of 12%! Breaking that down into run and pass plays, both saw precipitous drops in efficiency from 2015 to 2016. Now, there are potential reasons for some of this decline beyond "the offense was bad," like if the offense was faced with farther to gain on the average play, but based on the relatively steady first down success rate between 2015 and 2016, there's more going on here.

In [71]:
print("OVERALL SUCCESS RATE")
print(offense_sr.loc[['CAR','league_avg']][['3_2015','3_2016']])
print("\nPASSING SUCCESS RATE")
print(pass_sr.loc[['CAR','league_avg']][['3_2015','3_2016']])
print("\nRUSHING SUCCESS RATE")
print(run_sr.loc[['CAR','league_avg']][['3_2015','3_2016']])

OVERALL SUCCESS RATE
down          3_2015               3_2016        
                mean      count      mean   count
offense                                          
CAR         0.426540  211.00000  0.378723  235.00
league_avg  0.401065  223.78125  0.403243  220.25

PASSING SUCCESS RATE
down          3_2015              3_2016         
                mean     count      mean    count
offense                                          
CAR         0.380368  163.0000  0.347150  193.000
league_avg  0.378073  188.9375  0.378962  186.625

RUSHING SUCCESS RATE
down          3_2015              3_2016        
                mean     count      mean   count
offense                                         
CAR         0.583333  48.00000  0.523810  42.000
league_avg  0.526762  34.84375  0.532336  33.625


Moving on to third down, the efficiency dropoff between 2015 and 2016 is again pretty obvious, with the offense going from above  to below average in passing, rushing, and overall success rate. The difference here is less stark than on second down, but is still far larger than the dropoff in first-down success rate.

This could be partially a result of the 2016 offense's poor production on second down, as the offense faces a longer yardage situation on average. Let's see if the offense does in fact face longer yardage on third down.

In [78]:
plays_2015.loc['yds_gained'] = pd.to_numeric(plays_2015['yds_gained'])
plays_2015.loc['yds_to_go'] = pd.to_numeric(plays_2015['yds_to_go'])

ytg_2015 = plays_2015.groupby(['offense','down'])
ytg_2015 = ytg_2015.agg(['mean'])
print(ytg_2015.loc[['CAR',3],['yds_to_go','yds_gained']])

plays_2016.loc['yds_gained'] = pd.to_numeric(plays_2016['yds_gained'])
plays_2016.loc['yds_to_go'] = pd.to_numeric(plays_2016['yds_to_go'])

ytg_2016 = plays_2016.groupby(['offense','down'])
ytg_2016 = ytg_2016.agg(['mean'])
ytg_2016.loc[['CAR',3],['yds_gained','yds_to_go']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


             yds_gained yds_to_go
                   mean      mean
offense down                     
CAR     1.0    5.763265  9.853061
        2.0    5.832402  7.765363
        3.0    5.545024  7.203791
        4.0    8.200000  2.000000


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,Unnamed: 1_level_0,yds_gained,yds_to_go
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,mean
offense,down,Unnamed: 2_level_2,Unnamed: 3_level_2
CAR,1.0,6.161491,9.917184
CAR,2.0,4.787115,8.103641
CAR,3.0,5.638298,7.455319
CAR,4.0,4.105263,5.947368


So it appears that the offense did face slightly longer third downs on average in 2016, but the difference is not as large as might be expected. Simple averages leave out a lot of information about the distribution of situations that the offense found itself in on third down, but based the information that's here, it doesn't look like falling behind on down and distance can take much of the blame for the offense's production on third downs in 2016. In fact, the 2016 offense performed better than the 2015 offense in terms of yards gained on first down. It's really on second down that the 2016 offense looks weak again, gaining less than 4.8 yards per play versus 5.8 in 2015.

## A bit of analysis
The numbers here support what we all saw last season. It would be difficult to maintain the kind of success the offense saw in Cam's MVP season, but you could be forgiven for thinking the offense should still be a top unit in 2016. Some regression is to be expected, but the offense's success rates dropped from some of the best in the league in 2015 to well below average in 2016. A lot of that has to do with the offense's performance on second down, and I want to take a moment to tell you why it was all about Cam.

In 2015, the defense really had to respect Cam's legs on any given play, and that affected how defensive ends could attack off the edge. Even if the play gets diagnosed as a pass, the end can't really risk losing contain or else Cam could take off for who-knows-how-many yards. That makes an OT's life easier (Stephen White did a great breakdown on how the threat of Cam running helped Michael Oher in particular), and an average OT is really all an offense needs to be successful. At the same time, the whole O-line was relatively healthy in 2015, and all of that playing time meant that the group really got on the same page so that everyone knew exactly what they needed to do on any given play. Schematically, the OL has an easier job protecting the QB, which means more time for routes to develop downfield. Longer routes can stretch the defense vertically as well as horizontally, creating more space in between zones and leaving receivers one-on-one underneath. You don't need Aaron Rodgers at QB if somebody's always open.

On running plays, the offense runs enough of the option that the defense has to account for Cam as a runner. That makes it tougher for the defense to key on specific reads, makes the linebackers flow just a tick slower, and makes it easier for the OL to make space for a healthy Jonathan Stewart. Norwell, Kalil, and Turner are already pretty good, and this confluence of factors made them look like the best interior OL in football.

This whole situation is predicated on a quarterback that can take off and beat you with his legs. The dual-threat QB makes everyone around him better.

Now, fast forward to 2016. It's week 1 and Denver's defense is teeing off on Cam. There are at least a couple of unnecessary roughness penalties on Cam that get missed, but even if they got called the damage is done. Cam gets knocked around and Ace Boogie ain't dancing quite the same way. Then, within a few weeks, Michael Oher and Ryan Kalil are lost for the season. That entirely serviceable LT gets replaced by Mike Remmers and the interior OL has to rebuild all of the understanding that they developed in 2015 with a new center. Cam's not running as much or as well, so the defense doesn't have to respect the option or the scramble to the same degree. With just a few injuries, the whole temple of Cam Newton falls apart. The offense looks pedestrian, and nowhere is that more evident than on second down. Imagine you've gotten a few yards on first down, you're ahead of the sticks, so you could do anything, except now you've got worse protection for a less mobile QB. The edge rushers don't have to worry so much about containing the passer and you can't count on having time to stretch the defense, so the offense can't attack the defense with the same ideas that had been so successful in 2015. So Shula and the offensive staff had to adjust the team's offensive philosophy mid-season, making a passing game that was constructed to thrive on using play-action and deep passes to take advantage of a defense's respect for Cam Newton's legs into something else. 
The result was the disappointing season we all remember.

## Remind me, why is any of this relevant now?
Any strategic decision requires context to make sense. This past offseason marked a big departure for the Carolina offense, and to really understand why that strategic shift occurred, you have to understand what came before. In a way, all of the big (football) personnel decisions that were made this offseason look like direct responses to what happened in 2016.

#### Big problem: 
- Cam got beat up and played through injury all season.

#### Actions taken:
- Sign Matt Kalil to play LT
- Draft Taylor Moton
- Try and limit hits on Cam by having him run less
- Try and make the quick passing game more effecive

#### Big problem:
- Skill personnel don't particularly fit a quick passing attack.

#### Actions taken:
- Draft Christian McCaffrey
- Draft Curtis Samuel
- Focus on quick-twitch speed guys (Damiere Byrd, Kaelin Clay, and notably not YA BOI Sunshine) for depth at WR

So through 6 weeks of the season, the Panthers are indeed trying some new (and some old) things on offense, with mixed success. As is typical for talented individuals or groups, there are flashes of greatness, but there are also injuries that are being worked around, obvious duds, and other kinks that need to be worked out. That's the nature of evolution. That's what Rivera, Shula, and co. say they're trying to do, and after seeing a few games I mostly believe them. Such an evolution was always going to take a little bit of time, and with Cam's practice reps being as limited as they have been following his offseason shoulder surgery, the transition has been a bit slower than we all wanted.

Whether this kind of strategic shift was a good idea or not, the offense is still figuring out who and what it is, and I will hold off on doing a deep dive into the numbers for now. The armchair analysis will be coming, but I think Shula and the offense deserve some time to perfect their new look before I criticize it too much.