# Predict Final Outcome using Aggression Scores

We are also interested in predicting the final score of a game based on how aggressive a team is. We expect more aggressive teams to attempt to score in situations where less aggressive teams would choose to end their possession in fear of allowing the other team to score easily. In particular, we will find the number of fourth-down attempts and two-point conversion attempts. In addition, more aggressive teams will likely pass the ball for high yardage. These three parameters will assign an aggression score to each of the 32 NFL teams. Using, Feature Engineering, we can create a parameter to help predict the odds of winning.

In [1]:
import pandas as pd
import altair as alt
from sklearn.tree import DecisionTreeRegressor

ModuleNotFoundError: No module named 'sklearn'

We load a dataset of all plays in the 2023 NFL season and we make a column for if a fourth down attempt was attempted and the result of a two point conversion.

In [2]:
all_plays_2023 = pd.read_csv('/Users/MC/Downloads/play_by_play_2023.csv')
all_plays_2023['fourth_down_attempts'] = all_plays_2023['fourth_down_converted'] + all_plays_2023['fourth_down_failed']
all_plays_2023['two_point_conv_result2']= all_plays_2023['two_point_conv_result'].map(lambda x: True if x == 'success' else False)
all_plays_2023

  all_plays_2023 = pd.read_csv('/Users/MC/Downloads/play_by_play_2023.csv')


Unnamed: 0,play_id,game_id,old_game_id,home_team,away_team,season_type,week,posteam,posteam_type,defteam,...,qb_epa,xyac_epa,xyac_mean_yardage,xyac_median_yardage,xyac_success,xyac_fd,xpass,pass_oe,fourth_down_attempts,two_point_conv_result2
0,1,2023_01_ARI_WAS,2023091007,WAS,ARI,REG,1,,,,...,0.000000,,,,,,,,,False
1,39,2023_01_ARI_WAS,2023091007,WAS,ARI,REG,1,WAS,home,ARI,...,0.000000,,,,,,,,0.0,False
2,55,2023_01_ARI_WAS,2023091007,WAS,ARI,REG,1,WAS,home,ARI,...,-0.336103,,,,,,0.515058,-51.505846,0.0,False
3,77,2023_01_ARI_WAS,2023091007,WAS,ARI,REG,1,WAS,home,ARI,...,0.703308,0.340652,3.328642,1.0,0.996628,0.583928,0.661106,33.889407,0.0,False
4,102,2023_01_ARI_WAS,2023091007,WAS,ARI,REG,1,WAS,home,ARI,...,0.469799,,,,,,0.196065,-19.606467,0.0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48420,4253,2023_19_PIT_BUF,2024011501,BUF,PIT,POST,19,PIT,away,BUF,...,0.097917,0.642515,5.621778,4.0,0.988080,0.249705,0.962465,3.753471,0.0,False
48421,4278,2023_19_PIT_BUF,2024011501,BUF,PIT,POST,19,PIT,away,BUF,...,-0.858869,,,,,,0.968867,3.113294,0.0,False
48422,4322,2023_19_PIT_BUF,2024011501,BUF,PIT,POST,19,PIT,away,BUF,...,-0.316456,,,,,,0.940734,5.926609,0.0,False
48423,4349,2023_19_PIT_BUF,2024011501,BUF,PIT,POST,19,PIT,away,BUF,...,-1.543516,,,,,,0.962551,3.744876,0.0,False


We will use the first 15 weeks of an NFL season as a training set. We are most interested in how often a team goes for a fourth down conversion with more than 5 minutes to go between the NOT between the 30 and 40-yard line. If a team is within 30 yards, it would be more aggressive to try to go for a conversion rather than the safe option of kicking a field goal. Similarly, if a team is beyond the 40-yard line, we want to see how often they go for a conversion on fourth down rather than the safe option of punting the ball.  We note that the 30 and 40-yard line "dead man's zone" is arbitrarily created. A future extension of this project is to determine what bounds make the most sense for this area.

In [3]:
plays_2023 = all_plays_2023[['season_type','game_id','week','posteam','side_of_field','game_seconds_remaining','home_team','away_team','yards_gained','yardline_100','down','play_type','pass_length','air_yards','fourth_down_attempts','fourth_down_converted','two_point_attempt','two_point_conv_result2','spread_line','total_line']].copy()
train = plays_2023[(plays_2023["week"] <= 15) & (plays_2023["game_seconds_remaining"] >= 300) & ((plays_2023["yardline_100"] < 30) | (plays_2023["yardline_100"] > 40))].copy()
train['two_point_conv_result2']= train['two_point_conv_result2'].map(lambda x: True if x == 'success' else False)
train = train.fillna(0)

We will create an index called go to determine how often a team goes for a fourth down conversion NOT between the 30 and 40 yard line with more than 5 mins to go. In all a close game with less than 5 minutes left, a team is likely to attempt a conversion regardless of how truly aggresive they are. Between 30 and 40 yards, it is simillar to a no mans land where it makes sense to go for an attempt since the opposing team will still have a lot of yardage to cover in a turnover and your team is likely too far for a field goal attempt. The are between 30 and 40 yards is not defenitive and a possible follow up of this research can model what the true no mans land yardage are should be.

In [4]:
four_down = train[train['down'] == 4].copy()
four_down['play'] = four_down['play_type'].map(lambda x: 1 if x == 'pass' or x == 'run' else 0)
plays = four_down.groupby('posteam', as_index = False).sum(numeric_only = True)
fourths = four_down.groupby('posteam',as_index = False).count()
fourths['go'] = plays['play']
fourths['prop'] = plays['play']/fourths['play']
go = fourths.set_index('posteam')['prop']

In [5]:
teams = pd.DataFrame(train[train['air_yards'] > 20].groupby('posteam').count()['pass_length'])
teams['two_point_attempt'] = train[plays_2023['two_point_attempt'] == 1].groupby('posteam').count()['two_point_attempt']
teams['fourth_down_attempt'] = train[plays_2023['fourth_down_attempts'] == 1].groupby('posteam').count()['fourth_down_attempts']

  teams['two_point_attempt'] = train[plays_2023['two_point_attempt'] == 1].groupby('posteam').count()['two_point_attempt']
  teams['fourth_down_attempt'] = train[plays_2023['fourth_down_attempts'] == 1].groupby('posteam').count()['fourth_down_attempts']


Below is a data frame of each team and their aggression score. The most aggressive team is Arizona as they had 10 two-point attempts, 18 fourth down attempts, and a 20 percent go rate for aggressive fourth down attempts. Jacksonville would be seen as 78% as aggressive as Arizona in this table.

In [6]:
teams = teams.rename_axis('team').reset_index()
teams = teams.fillna(0)
teams = teams.rename(columns={"pass_length": "deep_passes"})
teams['go'] = teams['team'].map(go)
teams['score'] = teams['deep_passes'] + 5*teams['two_point_attempt'] + 2*teams['fourth_down_attempt'] + 200*teams['go']
teams['score'] = teams['score']/max(teams['score'])
teams.sort_values(['score'],ascending=False)

Unnamed: 0,team,deep_passes,two_point_attempt,fourth_down_attempt,go,score
0,ARI,29,10.0,18,0.204545,1.0
14,JAX,43,3.0,14,0.181818,0.78484
25,PHI,43,2.0,14,0.202899,0.779812
7,CLE,46,5.0,13,0.117117,0.772395
13,IND,31,3.0,18,0.189474,0.769004
11,GB,38,4.0,14,0.150538,0.744713
10,DET,23,3.0,16,0.222222,0.734046
8,DAL,40,4.0,11,0.15942,0.730452
31,WAS,43,4.0,11,0.135802,0.719397
12,HOU,33,5.0,13,0.132653,0.708943


We will make a dictionary of each team and their aggression score.

In [7]:
agg = teams.set_index('team')['score']

We will make a games data frame with each game in the NFL season and the spread, total, and aggression scores of the home and away team.

In [8]:
games = plays_2023[plays_2023['season_type'] == 'REG'].groupby('game_id').first()
games_tries = plays_2023[plays_2023['season_type'] == 'REG'].groupby('game_id').sum(numeric_only = True)
games = games[['spread_line','total_line','away_team','home_team']]
games['two_point_attempts'] =games_tries['two_point_attempt']
games['two_point_try'] = games['two_point_attempts'].map(lambda x: 1 if x > .00001 else 0)
games = games.rename_axis('game').reset_index()
games['away_team_score'] = games['away_team'].map(agg)
games['home_team_score'] = games['home_team'].map(agg)
games = games[['game','spread_line','total_line','away_team_score','home_team_score','two_point_attempts','two_point_try']]
games

Unnamed: 0,game,spread_line,total_line,away_team_score,home_team_score,two_point_attempts,two_point_try
0,2023_01_ARI_WAS,7.0,38.0,1.000000,0.719397,0.0,0
1,2023_01_BUF_NYJ,-2.5,44.5,0.620677,0.490472,0.0,0
2,2023_01_CAR_ATL,3.5,40.5,0.695155,0.569172,0.0,0
3,2023_01_CIN_CLE,-1.0,46.5,0.398137,0.772395,1.0,1
4,2023_01_DAL_NYG,-3.5,44.5,0.730452,0.630321,0.0,0
...,...,...,...,...,...,...,...
267,2023_18_NYJ_NE,2.5,28.5,0.490472,0.392452,1.0,1
268,2023_18_PHI_NYG,-4.5,43.0,0.779812,0.630321,0.0,0
269,2023_18_PIT_BAL,-3.0,34.0,0.487600,0.413703,0.0,0
270,2023_18_SEA_ARI,-2.5,48.0,0.447393,1.000000,1.0,1


We will then fit the data with a Decision Tree Regression model with spread, total, and aggression scores being used to predict the probability of a two-point attempt.

In [9]:
X = games[['spread_line','total_line','away_team_score','home_team_score']]
y = games['two_point_try']
clf = DecisionTreeRegressor(max_depth=5)
clf.fit(X,y)

This is the spread, total, and aggression score for the Super Bowl 2024 matchup.

In [10]:
data = pd.DataFrame({"spread_line":[2],"total_line":[47.5],'away_team_score':agg['SF'],'home_team_score':agg['KC']})
data

Unnamed: 0,spread_line,total_line,away_team_score,home_team_score
0,2,47.5,0.313234,0.421437


The predicted proability of a two point attempt in the 2023-2024 SuperBowl is 30.344%.

In [11]:
p_super_try = clf.predict(data)[0]
p_super_try

0.30344827586206896

Therefore, the odds of a two point attempt in the 2023-2024 SuperBowl is 229:100. Therefore, for every 100 dollars you bet that there will be a two point attempt, you would win back 229 dollars if there is such an attempt in the Super Bowl.

In [12]:
(1-p_super_try)/p_super_try*100

229.5454545454546

To find the proability of a sucess, we take the number of sucessful attempts divided by the number of two point attempts in the 2023 NFL Season

In [13]:
p_sucess = len(all_plays_2023[all_plays_2023['two_point_conv_result2'] ==  True])/len(all_plays_2023[all_plays_2023['two_point_attempt'] == 1])
p_sucess

0.5538461538461539

To find the predicted proability of a sucessful two point conversion in the Super Bowl, we multiply the probability of an attempt by the probaility of a sucess. There is a predicted 16.8% chance of a sucessful 2 point attempt.

In [14]:
p_super_sucess = p_super_try * p_sucess
p_super_sucess

0.16806366047745358

Therefore, the odds of a two point sucess in the 2023-2024 SuperBowl is 495:100. Therefore, for every 100 dollars you bet that there will be a two point sucess, you would win back 495 dollars if there is such an sucessful 2 point conversion in the Super Bowl. 

In [15]:
((1-p_super_sucess)/p_super_sucess)*100

495.0126262626263