In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("darkgrid")

Note we have 7 files from the competetition hosts: 
- Game Data: games.csv, the data for each game (season, date & time, location, home and visitor teams), key variable is `gameID`.
- PFF Scouting Data: PFFScoutingData.csv, "play-level scouting information for each game", key variables are `gameID` and `playID` (note `nflID` not included). Information about kick types, directions, and air time throughout the game.
- Player Data: players.csv, information for each player (height, weight, birth, college, position, name), key variable is `nflID` (Does not include `gameID` and `playID`).
- Play Data: plays.csv, "play-level information from each game", key variables are `gameID` and `playID`. Game-specific temporal information, type of play and play result. `kickerId` is `nflId` of kicker.
- Tracking Data: tracking2018.csv, tracking 2019.csv, and tracking2020.csv. Each contains "player tracking data" from the indicated season, key variables are `gameID`, `nflID`, and `playID`. Each player's position on the field as well as the football for all special teams plays per game.
We also have a weather dataset from ThomasJBliss.

In [2]:
games = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/games.csv")

scout = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/PFFScoutingData.csv")

players = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/players.csv")

play = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/plays.csv")

track18 = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/tracking2018.csv")
track19 = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/tracking2019.csv")
track20 = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/tracking2020.csv")

g_weather = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/games_weather.csv")
game_ident = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/games_w.csv")
stadium_ident = pd.read_csv("/Users/Edmonds.110/Desktop/Python/EI/nfl-big-data-bowl-2022/stadium_coordinates.csv")




Weather data needs to be combined into three datasets, one per year.

In [3]:
def get_weather_data():
    # Pull down datasets
    
    # Merge game and weather data on game_id
    g_weather_merge = pd.merge(g_weather, game_ident, on='game_id')

    # Merge stadium data on StadiumName
    final_df = pd.merge(g_weather_merge, stadium_ident, on='StadiumName')

    # Convert time columns to datetime objects
    time_cols = ['TimeMeasure', 'TimeStartGame', 'TimeEndGame']

    for col in time_cols:
        final_df[col] = pd.to_datetime(final_df[col], format='%m/%d/%Y %H:%M')

    # Create sliced DataFrames
    weather2018 = final_df[final_df['TimeMeasure'].dt.year == 2018]
    weather2019 = final_df[final_df['TimeMeasure'].dt.year == 2019]
    weather2020 = final_df[final_df['TimeMeasure'].dt.year == 2020]

    return weather2018, weather2019, weather2020

In [4]:
weather2018, weather2019, weather2020 = get_weather_data()

In [5]:
weather2018

Unnamed: 0,game_id,Source,DistanceToStation,TimeMeasure,Temperature,DewPoint,Humidity,Precipitation,WindSpeed,WindDirection,...,Season,StadiumName,TimeStartGame,TimeEndGame,TZOffset,HomeTeam,RoofType,Longitude,Latitude,StadiumAzimuthAngle
3099,2018091608,Meteostat,3.98,2018-09-16 12:00:00,75.56,68.90,80.0,0.0,10.31,90.0,...,2018,FedExField,2018-09-16 13:00:00,2018-09-16 15:53:00,-4,WAS,Outdoor,-76.864444,38.907778,295.0
3100,2018091608,Meteostat,3.98,2018-09-16 13:00:00,75.92,68.54,78.0,0.0,8.08,40.0,...,2018,FedExField,2018-09-16 13:00:00,2018-09-16 15:53:00,-4,WAS,Outdoor,-76.864444,38.907778,295.0
3101,2018091608,Meteostat,3.98,2018-09-16 14:00:00,77.72,69.62,76.0,0.0,9.20,30.0,...,2018,FedExField,2018-09-16 13:00:00,2018-09-16 15:53:00,-4,WAS,Outdoor,-76.864444,38.907778,295.0
3102,2018091608,Meteostat,3.98,2018-09-16 15:00:00,78.80,69.44,73.0,0.0,9.20,70.0,...,2018,FedExField,2018-09-16 13:00:00,2018-09-16 15:53:00,-4,WAS,Outdoor,-76.864444,38.907778,295.0
3103,2018091608,Meteostat,3.98,2018-09-16 16:00:00,80.24,69.62,70.0,0.0,12.74,100.0,...,2018,FedExField,2018-09-16 13:00:00,2018-09-16 15:53:00,-4,WAS,Outdoor,-76.864444,38.907778,295.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39067,2018121600,Meteostat,4.47,2018-12-16 14:00:00,51.98,46.94,83.0,0.0,6.96,310.0,...,2018,Mercedes-Benz Stadium,2018-12-16 13:00:00,2018-12-16 16:07:00,-5,ATL,Retractable,-84.400000,33.755556,70.9
39068,2018121600,Meteostat,4.47,2018-12-16 15:00:00,53.06,44.96,74.0,0.0,5.84,310.0,...,2018,Mercedes-Benz Stadium,2018-12-16 13:00:00,2018-12-16 16:07:00,-5,ATL,Retractable,-84.400000,33.755556,70.9
39069,2018121600,Meteostat,4.47,2018-12-16 16:00:00,53.06,46.04,77.0,0.0,9.20,310.0,...,2018,Mercedes-Benz Stadium,2018-12-16 13:00:00,2018-12-16 16:07:00,-5,ATL,Retractable,-84.400000,33.755556,70.9
39070,2018121600,Meteostat,4.47,2018-12-16 17:00:00,53.06,46.04,77.0,0.0,14.98,310.0,...,2018,Mercedes-Benz Stadium,2018-12-16 13:00:00,2018-12-16 16:07:00,-5,ATL,Retractable,-84.400000,33.755556,70.9


We need to standardize the height of all players. Inches will be easier to code with, so we first define a function `ft_in` to convert all heights from ft-in to inches. It also takes inches to an `int`. Then we will apply it to the `height` column of our dataframe.

In [6]:
def ft_in(x):
    if '-' in x:
        meas=x.split('-')
        #this will be a list ['ft','in']
        inches = int(meas[0])*12 + int(meas[1])
        return inches
    else:
        return int(x)

In [7]:
players['height'] = players['height'].apply(ft_in)

Now, we turn to the tracking data. We must reorient this to reflect movement in the offense direction instead of the on-field coordinates (reorient the orgin from the bottom left to top right for a change in direction).

In [8]:
#2018 tracking data
track18.loc[track18['playDirection'] == 'left', 'x'] = 120 -track18.loc[track18['playDirection']=='left','x']
track18.loc[track18['playDirection'] == 'left', 'y'] = 160/3 -track18.loc[track18['playDirection']=='left','y']
#note that we have 160/3 for the y direction since the football field is 160ft, but our units are yards

#2019 tracking data
track19.loc[track19['playDirection'] == 'left', 'x'] = 120 -track19.loc[track19['playDirection']=='left','x']
track19.loc[track19['playDirection'] == 'left', 'y'] = 160/3 -track19.loc[track19['playDirection']=='left','y']

#2020 tracking data
track20.loc[track20['playDirection'] == 'left', 'x'] = 120 -track20.loc[track20['playDirection']=='left','x']
track20.loc[track20['playDirection'] == 'left', 'y'] = 160/3 -track20.loc[track20['playDirection']=='left','y']


We are specifically looking at `Extra Point` in this Notebook. So we pull just that play data.

In [9]:
#extraPoint
play_extrapoint = play.loc[play['specialTeamsPlayType']=='Extra Point']

In [10]:
play_extrapoint.value_counts('specialTeamsResult')

specialTeamsResult
Kick Attempt Good           3252
Kick Attempt No Good         199
Blocked Kick Attempt          24
Non-Special Teams Result      13
dtype: int64

In [11]:
play_extrapoint.value_counts('kickReturnYardage')

Series([], dtype: int64)

In [12]:
play_extrapoint.value_counts('passResult')

passResult
I    4
dtype: int64

4 of our Non-Special Teams Resutls are incomplete passes.

In [13]:
play_extrapoint.value_counts('yardlineNumber')

yardlineNumber
15    3438
20      29
10       8
25       6
30       5
7        2
dtype: int64

In [14]:
play_extrapoint.value_counts('penaltyYards')

penaltyYards
 15.0    31
 5.0     28
-15.0     3
 0.0      3
dtype: int64

In [15]:
play_extrapoint[play_extrapoint['penaltyYards']==0]

Unnamed: 0,gameId,playId,playDescription,quarter,down,yardsToGo,possessionTeam,specialTeamsPlayType,specialTeamsResult,kickerId,...,penaltyCodes,penaltyJerseyNumbers,penaltyYards,preSnapHomeScore,preSnapVisitorScore,passResult,kickLength,kickReturnYardage,playResult,absoluteYardlineNumber
3751,2018111104,1392,"J.Lambo extra point is Blocked (D.Autry), Cent...",2,0,0,JAX,Extra Point,Blocked Kick Attempt,43068.0,...,LBL,IND 36,0.0,21,13,,,,0,95
15764,2020102500,4362,"M.Prater extra point is GOOD, Center-D.Muhlbac...",4,0,0,DET,Extra Point,Kick Attempt Good,31446.0,...,ILF,ATL,0.0,22,22,,,,0,40
16406,2020110112,4733,"B.McManus extra point is GOOD, Center-J.Bobenm...",4,0,0,DEN,Extra Point,Kick Attempt Good,40276.0,...,ILF,LAC,0.0,30,30,,,,0,25


'LBL' = low block and 'ILF' = illegal formation

In [16]:
play_extrapoint[(play_extrapoint['penaltyCodes']=='LBL') | (play_extrapoint['penaltyCodes']=='ILF')]

Unnamed: 0,gameId,playId,playDescription,quarter,down,yardsToGo,possessionTeam,specialTeamsPlayType,specialTeamsResult,kickerId,...,penaltyCodes,penaltyJerseyNumbers,penaltyYards,preSnapHomeScore,preSnapVisitorScore,passResult,kickLength,kickReturnYardage,playResult,absoluteYardlineNumber
1543,2018093009,3479,"M.McCrane extra point is GOOD, Center-T.Sieg, ...",4,0,0,OAK,Extra Point,Kick Attempt Good,46663.0,...,ILF,CLE 65,5.0,30,28,,,,0,25
2686,2018102106,1236,"J.Sanders extra point is GOOD, Center-J.Denney...",2,0,0,MIA,Extra Point,Kick Attempt Good,46298.0,...,ILF,DET 91,5.0,6,10,,,,0,25
3751,2018111104,1392,"J.Lambo extra point is Blocked (D.Autry), Cent...",2,0,0,JAX,Extra Point,Blocked Kick Attempt,43068.0,...,LBL,IND 36,0.0,21,13,,,,0,95
5252,2018120902,991,"C.Catanzaro extra point is GOOD, Center-J.Jans...",2,0,0,CAR,Extra Point,Kick Attempt Good,41736.0,...,ILF,CLE 90,5.0,7,13,,,,0,95
15764,2020102500,4362,"M.Prater extra point is GOOD, Center-D.Muhlbac...",4,0,0,DET,Extra Point,Kick Attempt Good,31446.0,...,ILF,ATL,0.0,22,22,,,,0,40
16406,2020110112,4733,"B.McManus extra point is GOOD, Center-J.Bobenm...",4,0,0,DEN,Extra Point,Kick Attempt Good,40276.0,...,ILF,LAC,0.0,30,30,,,,0,25
18922,2020122006,3769,"A.Rosas extra point is GOOD, Center-R.Matiscik...",4,0,0,JAX,Extra Point,Kick Attempt Good,43937.0,...,ILF,BAL 98,5.0,40,13,,,,0,95
19889,2021010312,3758,"S.Sloman extra point is GOOD, Center-M.Overton...",4,0,0,TEN,Extra Point,Kick Attempt Good,52656.0,...,ILF,HOU,5.0,35,37,,,,0,25


Remove columns that have no values or set values, i.e., `kickReturnYardage` is null and `yardsToGo` is always `0`.

Note that `playDescription` should never be included in analysis, just good for reference later.

In [17]:
ep = play_extrapoint.drop(columns =['kickReturnYardage', 'kickLength', 'playResult', 'returnerId', 'yardsToGo', 'down', 'specialTeamsPlayType'])


In [18]:
ep

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,gameClock,penaltyCodes,penaltyJerseyNumbers,penaltyYards,preSnapHomeScore,preSnapVisitorScore,passResult,absoluteYardlineNumber
15,2018090600,2883,"J.Elliott extra point is GOOD, Center-R.Lovato...",3,PHI,Kick Attempt Good,44966.0,,ATL,15,04:37:00,,,,9,6,,25
19,2018090600,3553,"M.Bryant extra point is No Good, Hit Right Upr...",4,ATL,Kick Attempt No Good,27091.0,,PHI,15,09:48:00,,,,10,12,,25
25,2018090900,380,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,08:42:00,,,,6,0,,95
30,2018090900,972,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,01:32:00,,,,13,0,,95
44,2018090900,2757,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",3,BAL,Kick Attempt Good,39470.0,,BUF,15,12:28:00,,,,32,0,,25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19968,2021010315,2813,"T.Vizcaino extra point is GOOD, Center-C.Holba...",4,SF,Kick Attempt Good,47590.0,,SEA,15,14:22:00,,,,15,6,,95
19970,2021010315,3074,"J.Myers extra point is No Good, Wide Left, Cen...",4,SEA,Kick Attempt No Good,41175.0,,SF,15,10:54:00,,,,16,12,,25
19973,2021010315,3667,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,02:20:00,,,,16,18,,25
19975,2021010315,3870,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,01:49:00,,,,16,25,,25


In [19]:
ep_play = pd.merge(ep, players[['nflId', 'height', 'weight','Position', 'displayName']], how = 'left',
             left_on = 'kickerId', right_on = 'nflId', suffixes = (False,'_kicker'))
ep_play

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,penaltyYards,preSnapHomeScore,preSnapVisitorScore,passResult,absoluteYardlineNumber,nflId,height,weight,Position,displayName
0,2018090600,2883,"J.Elliott extra point is GOOD, Center-R.Lovato...",3,PHI,Kick Attempt Good,44966.0,,ATL,15,...,,9,6,,25,44966.0,69.0,167.0,K,Jake Elliott
1,2018090600,3553,"M.Bryant extra point is No Good, Hit Right Upr...",4,ATL,Kick Attempt No Good,27091.0,,PHI,15,...,,10,12,,25,27091.0,69.0,203.0,K,Matt Bryant
2,2018090900,380,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,6,0,,95,39470.0,73.0,183.0,K,Justin Tucker
3,2018090900,972,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,13,0,,95,39470.0,73.0,183.0,K,Justin Tucker
4,2018090900,2757,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",3,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,32,0,,25,39470.0,73.0,183.0,K,Justin Tucker
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3483,2021010315,2813,"T.Vizcaino extra point is GOOD, Center-C.Holba...",4,SF,Kick Attempt Good,47590.0,,SEA,15,...,,15,6,,95,47590.0,74.0,205.0,K,Tristan Vizcaino
3484,2021010315,3074,"J.Myers extra point is No Good, Wide Left, Cen...",4,SEA,Kick Attempt No Good,41175.0,,SF,15,...,,16,12,,25,41175.0,70.0,190.0,K,Jason Myers
3485,2021010315,3667,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,16,18,,25,41175.0,70.0,190.0,K,Jason Myers
3486,2021010315,3870,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,16,25,,25,41175.0,70.0,190.0,K,Jason Myers


In [20]:
ep_play.value_counts('Position')

Position
K    3466
P       9
dtype: int64

Rename player information to indicate kicker, then drop duplicate `nflId`.

In [21]:
ep_plays=ep_play.rename(columns = {"height": 'kicker_height', "weight": 'kicker_weight', "Position": 'kicker_position', "displayName": 'kicker_name'})

ep_plays=ep_plays.drop(columns=['nflId'])


In [22]:
ep_plays

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,penaltyJerseyNumbers,penaltyYards,preSnapHomeScore,preSnapVisitorScore,passResult,absoluteYardlineNumber,kicker_height,kicker_weight,kicker_position,kicker_name
0,2018090600,2883,"J.Elliott extra point is GOOD, Center-R.Lovato...",3,PHI,Kick Attempt Good,44966.0,,ATL,15,...,,,9,6,,25,69.0,167.0,K,Jake Elliott
1,2018090600,3553,"M.Bryant extra point is No Good, Hit Right Upr...",4,ATL,Kick Attempt No Good,27091.0,,PHI,15,...,,,10,12,,25,69.0,203.0,K,Matt Bryant
2,2018090900,380,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,,6,0,,95,73.0,183.0,K,Justin Tucker
3,2018090900,972,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,,13,0,,95,73.0,183.0,K,Justin Tucker
4,2018090900,2757,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",3,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,,32,0,,25,73.0,183.0,K,Justin Tucker
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3483,2021010315,2813,"T.Vizcaino extra point is GOOD, Center-C.Holba...",4,SF,Kick Attempt Good,47590.0,,SEA,15,...,,,15,6,,95,74.0,205.0,K,Tristan Vizcaino
3484,2021010315,3074,"J.Myers extra point is No Good, Wide Left, Cen...",4,SEA,Kick Attempt No Good,41175.0,,SF,15,...,,,16,12,,25,70.0,190.0,K,Jason Myers
3485,2021010315,3667,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,,16,18,,25,70.0,190.0,K,Jason Myers
3486,2021010315,3870,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,,16,25,,25,70.0,190.0,K,Jason Myers


Now we add in the Blockers.

In [23]:
ep_full = pd.merge(ep_plays, players[['nflId', 'height', 'weight','Position', 'displayName']], how = 'left',
             left_on = 'kickBlockerId', right_on = 'nflId')

In [24]:
eps=ep_full.rename(columns = {"height": 'blocker_height', "weight": 'blocker_weight', "Position": 'blocker_position', "displayName": 'blocker_name'})

eps=eps.drop(columns=['nflId'])

eps


Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,passResult,absoluteYardlineNumber,kicker_height,kicker_weight,kicker_position,kicker_name,blocker_height,blocker_weight,blocker_position,blocker_name
0,2018090600,2883,"J.Elliott extra point is GOOD, Center-R.Lovato...",3,PHI,Kick Attempt Good,44966.0,,ATL,15,...,,25,69.0,167.0,K,Jake Elliott,,,,
1,2018090600,3553,"M.Bryant extra point is No Good, Hit Right Upr...",4,ATL,Kick Attempt No Good,27091.0,,PHI,15,...,,25,69.0,203.0,K,Matt Bryant,,,,
2,2018090900,380,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,95,73.0,183.0,K,Justin Tucker,,,,
3,2018090900,972,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",1,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,95,73.0,183.0,K,Justin Tucker,,,,
4,2018090900,2757,"J.Tucker extra point is GOOD, Center-M.Cox, Ho...",3,BAL,Kick Attempt Good,39470.0,,BUF,15,...,,25,73.0,183.0,K,Justin Tucker,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3483,2021010315,2813,"T.Vizcaino extra point is GOOD, Center-C.Holba...",4,SF,Kick Attempt Good,47590.0,,SEA,15,...,,95,74.0,205.0,K,Tristan Vizcaino,,,,
3484,2021010315,3074,"J.Myers extra point is No Good, Wide Left, Cen...",4,SEA,Kick Attempt No Good,41175.0,,SF,15,...,,25,70.0,190.0,K,Jason Myers,,,,
3485,2021010315,3667,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,25,70.0,190.0,K,Jason Myers,,,,
3486,2021010315,3870,"J.Myers extra point is GOOD, Center-T.Ott, Hol...",4,SEA,Kick Attempt Good,41175.0,,SF,15,...,,25,70.0,190.0,K,Jason Myers,,,,


Now we use the data frame above to get some high level statistics related to extra points.

Question 1: Where are the high level conversions? (by team, place kicker, yard line, time of play)

In [142]:
## a new data frame isolating successful conversions within Extra Points Play 
## where a successful conversion means a successful kick attempt 'Kick Attempt Good'

conv_succ = eps[(eps["specialTeamsResult"]=="Kick Attempt Good")]

In [26]:
## a new data frame isolating failed conversions within Extra Points Play 
## where a failed conversion means a blocked kick attempt, missed kicked attempt, 
## or nonspecial teams result

conv_fail= eps[(eps["specialTeamsResult"]=="Kick Attempt No Good")|(eps["specialTeamsResult"]=="Blocked Kick Attempt")|(eps["specialTeamsResult"]=="Non-Special Teams Result")]

The majority of successful extra points and failed extra point play occurs at the 15 yard line during the second quarter of the game.

In [29]:
conv_succ.yardlineNumber.value_counts(normalize=True)

15    0.987700
20    0.007995
10    0.002153
25    0.001230
7     0.000615
30    0.000308
Name: yardlineNumber, dtype: float64

In [60]:
conv_fail.yardlineNumber.value_counts(normalize=True)

15    0.957627
30    0.016949
20    0.012712
25    0.008475
10    0.004237
Name: yardlineNumber, dtype: float64

Some more details by yardage: 

In [59]:
conv_succ.quarter.value_counts(normalize=True)

2    0.308118
4    0.246925
1    0.226630
3    0.218327
Name: quarter, dtype: float64

In [61]:
conv_fail.quarter.value_counts(normalize=True)

2    0.330508
4    0.245763
1    0.241525
3    0.182203
Name: quarter, dtype: float64

Will Lutz, Harrison Butker, Justin Tucker, and Mason Crosby led the league in total successful extra point conversations over the 2018, 2019, and 2020 seasons.

In [38]:
conv_succ.kicker_name.value_counts()

Wil Lutz                144
Harrison Butker         143
Justin Tucker           133
Mason Crosby            124
Ka'imi Fairbairn        108
Jason Myers             106
Greg Zuerlein           104
Chris Boswell           101
Matt Prater              99
Dan Bailey               96
Stephen Gostkowski       95
Ryan Succop              93
Daniel Carlson           91
Robbie Gould             89
Jason Sanders            88
Cody Parkey              84
Jake Elliott             81
Randy Bullock            79
Michael Badgley          79
Brandon McManus          78
Zane Gonzalez            77
Aldrick Rosas            69
Adam Vinatieri           65
Brett Maher              65
Dustin Hopkins           63
Cairo Santos             61
Joey Slye                59
Matt Gay                 58
Stephen Hauschka         53
Tyler Bass               52
Matt Bryant              47
Graham Gano              46
Josh Lambo               45
Sebastian Janikowski     44
Rodrigo Blankenship      42
Sam Ficken          

Harrison Butker and Jason Myers led the league in total failed extra point conversions between 2018, 2019, and 2020 seasons. 

Note: if we look at the top 8 players with the most extra point conversion (over 100) then we can see that some of these players also had the highest number of failed extra  conversions whereas others seemed more efficient. 

For example, William Lutz led the league in extra point conversions and only had 2 failed extra point attempts, while Harrison Butker ranked 2nd in the league in in extra point conversions (just 1 less than Lutz) but led in failed extra point conversions over this time frame. 

Idea: a metric that explores efficiency of kickers

In [43]:
conv_fail.kicker_name.value_counts()

Harrison Butker         13
Jason Myers             11
Ryan Succop              9
Cody Parkey              8
Chris Boswell            8
Ka'imi Fairbairn         8
Dan Bailey               8
Adam Vinatieri           8
Austin Seibert           6
Sam Ficken               6
Joey Slye                6
Mason Crosby             6
Caleb Sturgis            6
Stephen Gostkowski       5
Graham Gano              5
Matt Gay                 5
Robbie Gould             5
Aldrick Rosas            5
Chandler Catanzaro       5
Jake Elliott             5
Zane Gonzalez            5
Michael Badgley          4
Brandon McManus          4
Dustin Hopkins           4
Daniel Carlson           4
Josh Lambo               4
Matt Prater              4
Justin Tucker            4
Younghoe Koo             4
Greg Joseph              4
Randy Bullock            4
Sam Sloman               3
Nick Folk                3
Matt Bryant              3
Stephen Hauschka         3
Greg Zuerlein            3
Sebastian Janikowski     3
K

In [57]:
df1 = eps[(eps["kicker_name"] == "Wil Lutz")&(eps["specialTeamsResult"]!= "Kick Attempt Good")]

df1

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,passResult,absoluteYardlineNumber,kicker_height,kicker_weight,kicker_position,kicker_name,blocker_height,blocker_weight,blocker_position,blocker_name
344,2018100800,347,"W.Lutz extra point is No Good, Wide Right, Cen...",1,NO,Kick Attempt No Good,43689.0,,WAS,15,...,,95,71.0,184.0,K,Wil Lutz,,,,
3415,2021010301,1356,"W.Lutz extra point is No Good, Wide Left, Cent...",2,NO,Kick Attempt No Good,43689.0,,CAR,15,...,,95,71.0,184.0,K,Wil Lutz,,,,


In [58]:
df2 = eps[(eps["kicker_name"] == "Harrison Butker")&(eps["specialTeamsResult"]!= "Kick Attempt Good")]

df2

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,passResult,absoluteYardlineNumber,kicker_height,kicker_weight,kicker_position,kicker_name,blocker_height,blocker_weight,blocker_position,blocker_name
524,2018102805,1823,"H.Butker extra point is No Good, Wide Left, Ce...",2,KC,Kick Attempt No Good,45046.0,,DEN,15,...,,95,76.0,205.0,K,Harrison Butker,,,,
582,2018110403,2734,"H.Butker extra point is No Good, Hit Right Upr...",3,KC,Kick Attempt No Good,45046.0,,CLE,15,...,,25,76.0,205.0,K,Harrison Butker,,,,
753,2018111900,2567,"H.Butker extra point is No Good, Wide Left, Ce...",2,KC,Kick Attempt No Good,45046.0,,LA,15,...,,25,76.0,205.0,K,Harrison Butker,,,,
881,2018120210,2278,"H.Butker extra point is Blocked (A.Key), Cente...",2,KC,Blocked Kick Attempt,45046.0,46156.0,OAK,15,...,,25,76.0,205.0,K,Harrison Butker,77.0,240.0,DE,Arden Key
1339,2019092204,1668,"H.Butker extra point is No Good, Wide Left, Ce...",2,KC,Kick Attempt No Good,45046.0,,BAL,15,...,,25,76.0,205.0,K,Harrison Butker,,,,
1798,2019111007,2516,"H.Butker extra point is No Good, Wide Left, Ce...",3,KC,Kick Attempt No Good,45046.0,,TEN,15,...,,95,76.0,205.0,K,Harrison Butker,,,,
2224,2019122214,3206,"H.Butker extra point is No Good, Hit Left Upri...",4,KC,Kick Attempt No Good,45046.0,,CHI,15,...,,25,76.0,205.0,K,Harrison Butker,,,,
2424,2020092012,1341,"H.Butker extra point is Blocked (J.Tillery), C...",2,KC,Blocked Kick Attempt,45046.0,47811.0,LAC,25,...,,85,76.0,205.0,K,Harrison Butker,78.0,295.0,DT,Jerry Tillery
2502,2020092800,538,"H.Butker extra point is No Good, Wide Left, Ce...",1,KC,Kick Attempt No Good,45046.0,,BAL,15,...,,25,76.0,205.0,K,Harrison Butker,,,,
2575,2020100501,3262,"H.Butker extra point is No Good, Wide Left, Ce...",4,KC,Kick Attempt No Good,45046.0,,NE,15,...,,95,76.0,205.0,K,Harrison Butker,,,,


Taking a closer look at Butker's failed attempts.

We can look at this manually to see the outcomes:

wide left (6), wide right (1), left upright (3), right upright (1), blocked (2)

This shows the Butker's kick tend to veer left when he misses. May be worth looking at the play data for his kick's versus Lutz's to understand the differencies in extra point efficiency.

Note: To avoid looking at this manually in future can write code using regular expressions doing frequency count on key words "wide left" "hit right upright" "wide right" "hit left upright" "blocked"

In [65]:
pd.set_option("max_colwidth", None)
df2[["playDescription"]]

Unnamed: 0,playDescription
524,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-D.Colquitt."
582,"H.Butker extra point is No Good, Hit Right Upright, Center-J.Winchester, Holder-D.Colquitt."
753,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-D.Colquitt."
881,"H.Butker extra point is Blocked (A.Key), Center-J.Winchester, Holder-D.Colquitt."
1339,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-D.Colquitt."
1798,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-D.Colquitt."
2224,"H.Butker extra point is No Good, Hit Left Upright, Center-J.Winchester, Holder-D.Colquitt."
2424,"H.Butker extra point is Blocked (J.Tillery), Center-J.Winchester, Holder-T.Townsend."
2502,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-T.Townsend."
2575,"H.Butker extra point is No Good, Wide Left, Center-J.Winchester, Holder-T.Townsend."


Butker missed the majority of his extra point attempts in the 2nd and 4th quarter. The two blocked attempts both occured in the 2nd quarter.

In [68]:
df2.quarter.value_counts()

2    6
4    4
3    2
1    1
Name: quarter, dtype: int64

A look at which teams have a lot of extra point conversion. Immediate question: what style of play for a team leads to a high number of extra point attempts?

In [41]:
conv_succ.possessionTeam.value_counts()

NO     144
KC     143
BAL    133
TB     126
GB     124
LA     123
SEA    120
IND    117
TEN    117
PIT    108
HOU    108
NE     107
BUF    103
LAC    103
DAL    102
DET     99
ARI     99
MIN     99
CHI     97
ATL     96
SF      95
CAR     94
CLE     93
MIA     88
CIN     85
PHI     81
DEN     79
NYG     79
NYJ     67
WAS     63
JAX     63
OAK     57
LV      40
Name: possessionTeam, dtype: int64

In [42]:
conv_fail.possessionTeam.value_counts()

CLE    17
TB     15
KC     14
NYJ    11
SEA    11
CAR    11
IND    10
NE     10
LAC    10
MIN     9
NYG     8
HOU     8
ATL     8
SF      8
PIT     8
BUF     6
TEN     6
LA      6
GB      6
CHI     6
PHI     6
DEN     5
JAX     5
DET     5
BAL     4
WAS     4
CIN     4
OAK     3
DAL     3
ARI     3
NO      2
MIA     2
LV      2
Name: possessionTeam, dtype: int64

Question 2: How many times were these extra point attempts blocked? What is happening in the blocks? Can we identify good offensive and defensive line players?

In [74]:
## create a new data frame isolating the blocks

conv_fail_blk = conv_fail[(conv_fail["specialTeamsResult"]=="Blocked Kick Attempt")]
conv_fail_blk.info()
    

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24 entries, 118 to 3348
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  24 non-null     int64  
 1   playId                  24 non-null     int64  
 2   playDescription         24 non-null     object 
 3   quarter                 24 non-null     int64  
 4   possessionTeam          24 non-null     object 
 5   specialTeamsResult      24 non-null     object 
 6   kickerId                24 non-null     float64
 7   kickBlockerId           24 non-null     float64
 8   yardlineSide            24 non-null     object 
 9   yardlineNumber          24 non-null     int64  
 10  gameClock               24 non-null     object 
 11  penaltyCodes            2 non-null      object 
 12  penaltyJerseyNumbers    2 non-null      object 
 13  penaltyYards            2 non-null      float64
 14  preSnapHomeScore        24 non-null     

In [81]:
conv_fail_blk[["playDescription"]]

Unnamed: 0,playDescription
118,"M.Nugent extra point is Blocked (S.Barrett), Center-T.Sieg, Holder-J.Townsend."
660,"J.Lambo extra point is Blocked (D.Autry), Center-M.Overton, Holder-L.Cooke. DEFENSIVE TWO-POINT ATTEMPT. A.Walker recovered the blocked kick. ATTEMPT FAILS. PENALTY on IND-C.Moore, Low Block, 0 yards, enforced between downs."
881,"H.Butker extra point is Blocked (A.Key), Center-J.Winchester, Holder-D.Colquitt."
888,"R.Succop extra point is Blocked (H.Anderson), Center-B.Brinkley, Holder-B.Kern. DEFENSIVE TWO-POINT ATTEMPT. T.Brooks recovered the blocked kick. ATTEMPT FAILS."
1274,"D.Bailey extra point is Blocked (T.Brown), Center-A.Cutting, Holder-B.Colquitt."
1367,"M.Gay extra point is Blocked (D.Lawrence II), Center-Z.Triner, Holder-B.Pinion."
1645,"S.Hauschka extra point is Blocked (D.Barnett), Center-R.Ferguson, Holder-C.Bojorquez. Ball smothered at BUF 36, no recovery after block,"
1701,"K.Fairbairn extra point is Blocked (C.Davis), Center-J.Weeks, Holder-B.Anger. DEFENSIVE TWO-POINT ATTEMPT. M.Smith recovered the blocked kick. ATTEMPT FAILS."
1711,"J.Slye extra point is Blocked (D.Cruikshank), Center-J.Jansen, Holder-M.Palardy. PENALTY on TEN-D.Bates, Lowering the Head to Initiate Contact, 15 yards, enforced between downs."
1726,"A.Vinatieri extra point is Blocked (C.Heyward), Center-L.Rhodes, Holder-R.Sanchez."


In [72]:
conv_fail_blk.kicker_name.value_counts()

Ryan Succop         3
Sam Sloman          2
Harrison Butker     2
Matt Gay            2
Joey Slye           2
Mike Nugent         1
Justin Tucker       1
Cairo Santos        1
Chris Boswell       1
Robbie Gould        1
Greg Zuerlein       1
Adam Vinatieri      1
Daniel Carlson      1
Josh Lambo          1
Ka'imi Fairbairn    1
Stephen Hauschka    1
Dan Bailey          1
Sam Ficken          1
Name: kicker_name, dtype: int64

In [79]:
conv_fail_blk_rs = conv_fail_blk[conv_fail_blk["kicker_name"]=="Ryan Succop"]
conv_fail_blk_rs

Unnamed: 0,gameId,playId,playDescription,quarter,possessionTeam,specialTeamsResult,kickerId,kickBlockerId,yardlineSide,yardlineNumber,...,passResult,absoluteYardlineNumber,kicker_height,kicker_weight,kicker_position,kicker_name,blocker_height,blocker_weight,blocker_position,blocker_name
888,2018120211,1910,"R.Succop extra point is Blocked (H.Anderson), Center-B.Brinkley, Holder-B.Kern. DEFENSIVE TWO-POINT ATTEMPT. T.Brooks recovered the blocked kick. ATTEMPT FAILS.",2,TEN,Blocked Kick Attempt,34707.0,42436.0,NYJ,15,...,,95,74.0,218.0,K,Ryan Succop,78.0,301.0,DE,Henry Anderson
2489,2020092711,1788,"R.Succop extra point is Blocked (J.Jones), Center-Z.Triner, Holder-B.Pinion.",2,TB,Blocked Kick Attempt,34707.0,45533.0,DEN,15,...,,25,74.0,218.0,K,Ryan Succop,72.0,231.0,ILB,Joseph Jones
2921,2020111500,2482,"R.Succop extra point is Blocked (B.Roy), Center-Z.Triner, Holder-B.Pinion.",3,TB,Blocked Kick Attempt,34707.0,52592.0,CAR,15,...,,95,74.0,218.0,K,Ryan Succop,73.0,333.0,DT,Bravvion Roy


In [84]:
conv_fail_blk.blocker_name.value_counts()

Shaquil Barrett       1
Denico Autry          1
Vincent Taylor        1
Romeo Okwara          1
Bravvion Roy          1
Tyrone Crawford       1
Justin Bethel         1
Dion Jordan           1
Tre Flowers           1
Joseph Jones          1
Isaac Rochell         1
Jerry Tillery         1
Hassan Ridgeway       1
Tanoh Kpassagnon      1
Kendall Sheffield     1
Cameron Heyward       1
Dane Cruikshank       1
Cody Davis            1
Derek Barnett         1
Dexter Lawrence       1
Tony Brown            1
Henry Anderson        1
Arden Key             1
Sheldon Richardson    1
Name: blocker_name, dtype: int64

In [85]:
conv_fail[["playDescription"]]

Unnamed: 0,playDescription
1,"M.Bryant extra point is No Good, Hit Right Upright, Center-J.Overbaugh, Holder-M.Bosher."
41,(Kick formation) TWO-POINT CONVERSION ATTEMPT. M.Palardy rushes left end. ATTEMPT FAILS.
82,"Z.Gonzalez extra point is No Good, Wide Left, Center-C.Hughlett, Holder-B.Colquitt."
84,"Z.Gonzalez extra point is No Good, Wide Left, Center-C.Hughlett, Holder-B.Colquitt."
87,"J.Sanders extra point is No Good, Wide Right, Center-J.Denney, Holder-M.Haack."
...,...
3415,"W.Lutz extra point is No Good, Wide Left, Center-Z.Wood, Holder-T.Morstead."
3432,"M.Prater extra point is No Good, Wide Left, Center-D.Muhlbach, Holder-J.Fox."
3450,"G.Gano extra point is No Good, Wide Left, Center-C.Kreiter, Holder-R.Dixon."
3472,"K.Fairbairn extra point is No Good, Wide Right, Center-J.Weeks, Holder-B.Anger."


In [86]:
#import re

In [104]:
#right_upright = "hit right upright"
#left_upright = "hit left upright"
#wide_left = "wide left" 
#wide_right = "wide right"
#blocked = "blocked"

#eps_fail_type1 = {'Right': [right_upright, wide_right], 
            'Left': [left_upright,wide_left],
            'Blocked': blocked}

#eps_fail_type2 = {'Options':[right_upright, left_upright, wide_right, wide_left, blocked]}

#string = "M.Bryant extra point is No Good, Hit Right Upright, Center-J.Overbaugh, Holder-M.Bosher."
#print(string.lower())
#result = re.findall(pattern,string.lower())
#print(result)

Revisiting the types of extra points results (which includes blocks) through the Play Description column. 

In [143]:
conv_fail_right_upright = conv_fail.loc[conv_fail["playDescription"].str.contains("Right Upright", flags=re.IGNORECASE)]
conv_fail_left_upright = conv_fail.loc[conv_fail["playDescription"].str.contains("Left Upright", flags=re.IGNORECASE)]
conv_fail_wide_right = conv_fail.loc[conv_fail["playDescription"].str.contains("Wide Right", flags=re.IGNORECASE)]
conv_fail_wide_left = conv_fail.loc[conv_fail["playDescription"].str.contains("Wide Left", flags=re.IGNORECASE)]
conv_fail_blocked = conv_fail.loc[conv_fail["playDescription"].str.contains("Blocked", flags=re.IGNORECASE)]
conv_fail_blocked = conv_fail.loc[conv_fail["playDescription"].str.contains("Blocked", flags=re.IGNORECASE)]
conv_succ_good = conv_succ.loc[conv_succ["playDescription"].str.contains("extra point is Good", flags=re.IGNORECASE)]

In [145]:
conv_succ_good.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3252 entries, 0 to 3487
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  3252 non-null   int64  
 1   playId                  3252 non-null   int64  
 2   playDescription         3252 non-null   object 
 3   quarter                 3252 non-null   int64  
 4   possessionTeam          3252 non-null   object 
 5   specialTeamsResult      3252 non-null   object 
 6   kickerId                3252 non-null   float64
 7   kickBlockerId           0 non-null      float64
 8   yardlineSide            3252 non-null   object 
 9   yardlineNumber          3252 non-null   int64  
 10  gameClock               3252 non-null   object 
 11  penaltyCodes            61 non-null     object 
 12  penaltyJerseyNumbers    61 non-null     object 
 13  penaltyYards            61 non-null     float64
 14  preSnapHomeScore        3252 non-null   

In [128]:
conv_fail_right_upright.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24 entries, 1 to 3333
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  24 non-null     int64  
 1   playId                  24 non-null     int64  
 2   playDescription         24 non-null     object 
 3   quarter                 24 non-null     int64  
 4   possessionTeam          24 non-null     object 
 5   specialTeamsResult      24 non-null     object 
 6   kickerId                24 non-null     float64
 7   kickBlockerId           0 non-null      float64
 8   yardlineSide            24 non-null     object 
 9   yardlineNumber          24 non-null     int64  
 10  gameClock               24 non-null     object 
 11  penaltyCodes            0 non-null      object 
 12  penaltyJerseyNumbers    0 non-null      object 
 13  penaltyYards            0 non-null      float64
 14  preSnapHomeScore        24 non-null     in

In [138]:
eps[["playDescription"]]

Unnamed: 0,playDescription
0,"J.Elliott extra point is GOOD, Center-R.Lovato, Holder-C.Johnston."
1,"M.Bryant extra point is No Good, Hit Right Upright, Center-J.Overbaugh, Holder-M.Bosher."
2,"J.Tucker extra point is GOOD, Center-M.Cox, Holder-S.Koch."
3,"J.Tucker extra point is GOOD, Center-M.Cox, Holder-S.Koch."
4,"J.Tucker extra point is GOOD, Center-M.Cox, Holder-S.Koch."
...,...
3483,"T.Vizcaino extra point is GOOD, Center-C.Holba, Holder-M.Wishnowsky."
3484,"J.Myers extra point is No Good, Wide Left, Center-T.Ott, Holder-M.Dickson."
3485,"J.Myers extra point is GOOD, Center-T.Ott, Holder-M.Dickson."
3486,"J.Myers extra point is GOOD, Center-T.Ott, Holder-M.Dickson."


In [110]:
conv_fail_left_upright.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20 entries, 166 to 3351
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  20 non-null     int64  
 1   playId                  20 non-null     int64  
 2   playDescription         20 non-null     object 
 3   quarter                 20 non-null     int64  
 4   possessionTeam          20 non-null     object 
 5   specialTeamsResult      20 non-null     object 
 6   kickerId                20 non-null     float64
 7   kickBlockerId           0 non-null      float64
 8   yardlineSide            20 non-null     object 
 9   yardlineNumber          20 non-null     int64  
 10  gameClock               20 non-null     object 
 11  penaltyCodes            1 non-null      object 
 12  penaltyJerseyNumbers    1 non-null      object 
 13  penaltyYards            1 non-null      float64
 14  preSnapHomeScore        20 non-null     

In [111]:
conv_fail_wide_left.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 78 entries, 82 to 3484
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  78 non-null     int64  
 1   playId                  78 non-null     int64  
 2   playDescription         78 non-null     object 
 3   quarter                 78 non-null     int64  
 4   possessionTeam          78 non-null     object 
 5   specialTeamsResult      78 non-null     object 
 6   kickerId                78 non-null     float64
 7   kickBlockerId           0 non-null      float64
 8   yardlineSide            78 non-null     object 
 9   yardlineNumber          78 non-null     int64  
 10  gameClock               78 non-null     object 
 11  penaltyCodes            0 non-null      object 
 12  penaltyJerseyNumbers    0 non-null      object 
 13  penaltyYards            0 non-null      float64
 14  preSnapHomeScore        78 non-null     i

In [112]:
conv_fail_wide_right.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76 entries, 87 to 3472
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   gameId                  76 non-null     int64  
 1   playId                  76 non-null     int64  
 2   playDescription         76 non-null     object 
 3   quarter                 76 non-null     int64  
 4   possessionTeam          76 non-null     object 
 5   specialTeamsResult      76 non-null     object 
 6   kickerId                76 non-null     float64
 7   kickBlockerId           0 non-null      float64
 8   yardlineSide            76 non-null     object 
 9   yardlineNumber          76 non-null     int64  
 10  gameClock               76 non-null     object 
 11  penaltyCodes            0 non-null      object 
 12  penaltyJerseyNumbers    0 non-null      object 
 13  penaltyYards            0 non-null      float64
 14  preSnapHomeScore        76 non-null     i

In [114]:
#import re

Note to self: How do we associate additional information about the type of failed extra point attempt to our 'eps' data frame? If we do this and then one hot encode it could give us something else to try and cluster around.

In [148]:
#right_upright = "hit right upright"
#left_upright = "hit left upright"
#wide_left = "wide left" 
#wide_right = "wide right"
#blocked = "blocked"

#eps_fail_type1 = {'Right': [right_upright, wide_right], 
            #'Left': [left_upright,wide_left],
            #'Blocked': blocked}

#eps_fail_type2 = {'Options':[right_upright, left_upright, wide_right, wide_left, blocked]}

#string = "M.Bryant extra point is No Good, Hit Right Upright, Center-J.Overbaugh, Holder-M.Bosher."
#pattern = "Hit Right Upright"
#result = re.(pattern,string)
#print(result)

#result = re.sub(r"\wHit\sRight\sUpright)(.*))
                


['Hit Right Upright']


In [None]:
## regular expressions for each fail option
                
#(?i)hit\s{0,3}right\s{0,3}upright(\W|$)
#(?i)hit\s{0,3}left\s{0,3}upright(\W|$)
#(?i)wide\s{0,3}left(\W|$)
#(?i)wide\s{0,3}right(\W|$)
#(?i)blocked(\W|$)
                
## regular expression for list of possible fail options
                
#(?i)(\W|^)(hit\sright\supright|hit\sleft\supright|wide\sleft|wide\sright|blocked)(\W|$)

In [154]:
#conv_fail_wide_right["playDescription"] = conv_fail_wide_right["playDescription"].apply(lambda x: x.str.extract(r'(?i)wide\s{0,3}right(\W|$)', expand=False))

In [153]:
#conv_fail_wide_right["playDescription"]=conv_fail_wide_right["playDescription"].str.extract(r'(?i)wide\s{0,3}right(\W|$)', expand=False)

In [164]:
conv_fail.loc[conv_fail['playDescription'].str.contains('hit right upright', flags=re.IGNORECASE),'playDescription'] = 'hit right upright'

In [165]:
conv_fail.loc[conv_fail['playDescription'].str.contains('hit left upright', flags=re.IGNORECASE),'playDescription'] = 'hit left upright'

In [166]:
conv_fail.loc[conv_fail['playDescription'].str.contains('wide left', flags=re.IGNORECASE),'playDescription'] = 'wide left'

In [167]:
conv_fail.loc[conv_fail['playDescription'].str.contains('wide right', flags=re.IGNORECASE),'playDescription'] = 'wide right'

In [174]:
conv_fail.loc[conv_fail['playDescription'].str.contains('blocked', flags=re.IGNORECASE),'playDescription'] = 'blocked'

In [175]:
conv_fail.playDescription.value_counts()

wide left                                                                                                                                                                                  78
wide right                                                                                                                                                                                 76
blocked                                                                                                                                                                                    25
Right Upright                                                                                                                                                                              24
hit left upright                                                                                                                                                                           20
(Kick formation) TWO-POINT CONVERSION ATTEMPT. M.W

General Note: think its worth looking more into what is influencing the failed conversion attempts