# NFL Superbowl Prediction Model 

First, I imported packages such as pandas, numpy, seaborn, and poisson to build regression the model and read in the csv file of the details of the NFL season 2018. A small insight into the dataset can be shown in the output below. 

In [4]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn
from scipy.stats import poisson,skellam

NFL_2018 = pd.read_csv("NFL 2018 Season Data.csv") 
NFL_2018.head()

Unnamed: 0,Team,Opponent,Game Number,Game Location,Team_Score,Opponent_Score,Team_Passing,Team_Rushing,Team_Turnovers,Opponent_Passing,Opponent_Rushing,Opponent_Turnovers
0,Arizona Cardinals,Washington Redskins,1,HOME,6.0,24.0,145.0,68.0,2,247.0,182.0,1
1,Arizona Cardinals,Los Angeles Rams,2,AWAY,0.0,34.0,83.0,54.0,1,342.0,90.0,1
2,Arizona Cardinals,Chicago Bears,3,HOME,14.0,16.0,168.0,53.0,4,194.0,122.0,2
3,Arizona Cardinals,Seattle Seahawks,4,HOME,17.0,20.0,171.0,92.0,1,160.0,171.0,0
4,Arizona Cardinals,San Francisco 49ers,5,AWAY,28.0,18.0,164.0,56.0,0,300.0,147.0,5


## Data Manipulation 

In the Data Manipulation portion, I added a few new columns after getting rid of the NA values: Team_Winner, Opponent_Winner, and Passing_Winner. 

Team_Winner assigns 1 to team scores that are larger than opponent scores and 0 to the others. Opponent_Winner assigns 1 to opponent scores that are larger than team scores and 0 to the others. Lastly, Passing_Winner, which is the basis of the model, assigns 1 if the Team's passing yards are larger than the Opponent's passing yards and 0 if the Opponent's passing yards are larger. 

The updated data frame can be viewed below in the output cell. 

In [13]:
#Data Manipulation
NFL_2018 = NFL_2018.dropna(axis='rows')

NFL_2018['Team_Winner'] = np.where(NFL_2018['Team_Score'] > NFL_2018['Opponent_Score'], 1, 0)
NFL_2018['Opponent_Winner'] = np.where(NFL_2018['Team_Score'] < NFL_2018['Opponent_Score'], 1, 0)
NFL_2018['Passing_Winner'] = np.where(NFL_2018['Team_Passing'] > NFL_2018['Opponent_Passing'], 1, 0)

NFL_2018.head()

Unnamed: 0,Team,Opponent,Game Number,Game Location,Team_Score,Opponent_Score,Team_Passing,Team_Rushing,Team_Turnovers,Opponent_Passing,Opponent_Rushing,Opponent_Turnovers,Team_Winner,Opponent_Winner,Passing_Winner
0,Arizona Cardinals,Washington Redskins,1,HOME,6.0,24.0,145.0,68.0,2,247.0,182.0,1,0,1,0
1,Arizona Cardinals,Los Angeles Rams,2,AWAY,0.0,34.0,83.0,54.0,1,342.0,90.0,1,0,1,0
2,Arizona Cardinals,Chicago Bears,3,HOME,14.0,16.0,168.0,53.0,4,194.0,122.0,2,0,1,0
3,Arizona Cardinals,Seattle Seahawks,4,HOME,17.0,20.0,171.0,92.0,1,160.0,171.0,0,0,1,1
4,Arizona Cardinals,San Francisco 49ers,5,AWAY,28.0,18.0,164.0,56.0,0,300.0,147.0,5,1,0,0


## Poisson Model

The Poisson Distribution is a mathematical concept for translating mean averages into a probability for variable outcomes across a distribution. By knowing the passing history of both the team and opponent, I was able to build a model to help predict whether the LA Rams or Patriots would win looking at their Passing yards history. 

Below is the result of the poisson model. The coef column is the same as the slope in the regression. A positive value implies more points for the team. Values closer  to zero represent more neutral effects. 

The last row Passing Winner has a coef of .0345. The represents that teams that have a higher number of passing yards, score more points by e^.0345 times more. It is important to note that the LA Rams has a coef of .8567 while the New England Patriots have a coef of .6662 indicating that LA Rams are better at scoring on average. 


In [14]:
# Poisson Model
goal_model_data = pd.concat([NFL_2018[['Team','Opponent','Team_Score']].assign(Passing_Winner=1),
           NFL_2018[['Opponent','Team','Opponent_Score']].assign(Passing_Winner=0)])

poisson_model = smf.glm(formula="Team_Score ~ Passing_Winner + Team + Opponent", data=NFL_2018, 
                        family=sm.families.Poisson()).fit()
poisson_model.summary()

0,1,2,3
Dep. Variable:,Team_Score,No. Observations:,528.0
Model:,GLM,Df Residuals:,464.0
Model Family:,Poisson,Df Model:,63.0
Link Function:,log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-2251.8
Date:,"Sun, 03 Feb 2019",Deviance:,1954.8
Time:,13:17:04,Pearson chi2:,1780.0
No. Iterations:,7,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,2.6873,0.085,31.640,0.000,2.521 2.854
Team[T.Atlanta Falcons],0.6120,0.086,7.110,0.000,0.443 0.781
Team[T.Baltimore Ravens],0.5288,0.086,6.167,0.000,0.361 0.697
Team[T.Buffalo Bills],0.1889,0.093,2.030,0.042,0.007 0.371
Team[T.Carolina Panthers],0.5144,0.087,5.905,0.000,0.344 0.685
Team[T.Chicago Bears],0.5489,0.084,6.557,0.000,0.385 0.713
Team[T.Cincinnati Bengals],0.5202,0.087,5.968,0.000,0.349 0.691
Team[T.Cleveland Browns],0.4694,0.088,5.355,0.000,0.298 0.641
Team[T.Dallas Cowboys],0.4467,0.086,5.192,0.000,0.278 0.615


## Prediction 

I simply passed the LA Rams and New England Patriots into the Poisson model and it returned the expected average number of points for the indicated team. As shown below, the model predicted that the **LA Rams** will score **31** and **New England Patriots** will score **28**. 

In [7]:
poisson_model.predict(pd.DataFrame(data={'Team': 'Los Angeles Rams', 'Opponent': 'New England Patriots',
                                       'Passing_Winner':1},index=[1]))

array([ 31.11805604])

In [8]:
poisson_model.predict(pd.DataFrame(data={'Team': 'New England Patriots', 'Opponent': 'Los Angeles Rams',
                                       'Passing_Winner':0},index=[1]))

array([ 28.16202667])

## Game Simulation

The matrix that is outputted from the game_simulation function is the probability of the LA Rams and New England Patriots scoring a specific number of points. For example, along the diagonal, both teams score the same number of goals. Therefore, by summing the diagonals, we can calculate the probabilities.  


In [9]:
def game_simulation(game_model, Home_Team, Away_Team, max_points=40):
    home_points_avg = game_model.predict(pd.DataFrame(data={'Team': Home_Team, 
                                                            'Opponent': Away_Team,'Passing_Winner':1},
                                                      index=[1]))
    away_points_avg = game_model.predict(pd.DataFrame(data={'Team': Away_Team, 
                                                            'Opponent': Home_Team,'Passing_Winner':0},
                                                      index=[1]))
    team_pred = [[poisson.pmf(i, team_avg) for i in range(0, max_points+1)] for team_avg in [home_points_avg, away_points_avg]]
    return(np.outer(np.array(team_pred[0]), np.array(team_pred[1])))
game_simulation(poisson_model, 'Los Angeles Rams', 'New England Patriots', max_points=40)

array([[  1.79881787e-26,   5.06583568e-25,   7.13320998e-24, ...,
          4.20431384e-16,   3.03594868e-16,   2.13746169e-16],
       [  5.59757153e-25,   1.57638959e-23,   2.21971628e-22, ...,
          1.30830074e-14,   9.44728212e-15,   6.65136527e-15],
       [  8.70927723e-24,   2.45270898e-22,   3.45366278e-21, ...,
          2.03558878e-13,   1.46990527e-13,   1.03488779e-13],
       ..., 
       [  1.86612251e-14,   5.25537918e-13,   7.40010643e-12, ...,
          4.36162260e-04,   3.14954185e-04,   2.21743704e-04],
       [  1.48897704e-14,   4.19326112e-13,   5.90453658e-12, ...,
          3.48013376e-04,   2.51301589e-04,   1.76929051e-04],
       [  1.15835178e-14,   3.26215336e-13,   4.59344250e-12, ...,
          2.70737493e-04,   1.95500423e-04,   1.37642203e-04]])

## Probability of Teams Winning

As shown below, after summing the diagonals, the LA Rams have around a **62%** chance of winning and the New England Patriots have around a **36%** chance of winning. This supports the score of 31 vs 28 LA Rams that was predicted above. 

In [10]:
#Probability LA Rams Win

LA_Rams = game_simulation(poisson_model, "Los Angeles Rams", "New England Patriots", max_points=40)
np.sum(np.tril(LA_Rams))

0.62258498435038223

In [11]:
#Probability of Draw
np.sum(np.diag(LA_Rams))

0.048126417085664924

In [12]:
#Probability New England Patriots Win

np.sum(np.triu(LA_Rams))

0.36165925266021326