# 49ers SuperBowl Score Prediction Model

#### Methodology:  Utilize multi step modeling to predict the 49ers' Superbowl score.  To begin I used linear regression to identify the most important offensive stat (ie: Pass Yards, Rush Yards etc.) features that predict 49ers' total score given two data sets, one containing data from the past ten seasons, one containing data from just 2019. I then predicted each offensive "important feature" seperately given Kansas City's defensive stats, projecting what the 49ers' offense stats will be during the Superbowl given the two data sets. I finally plugged in the predicted "important features" into their respective score regressions to obtain two seperate projected scores given the last ten seasons stats and 2019 season respectively.  Finally I took a weighted average giving more importance to the 2019 season.

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression


In [9]:
data=pd.read_csv('/Users/jt/Downloads/49ersfinal.csv')

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170 entries, 0 to 169
Data columns (total 47 columns):
Tm             170 non-null int64
OCmp           170 non-null int64
OAtt           170 non-null int64
PassYds        170 non-null int64
OInt           170 non-null int64
Sk taken       170 non-null int64
OYds           170 non-null int64
Pass Y/A       170 non-null float64
Pass NY/A      170 non-null float64
Cmp%           170 non-null float64
qBRate         170 non-null float64
Att            170 non-null int64
rushYds        170 non-null int64
RUSH Y/A       170 non-null float64
Pnt            170 non-null int64
pntYds         170 non-null int64
3DConv         170 non-null int64
3DAtt          170 non-null int64
4DConv         170 non-null int64
4DAtt          170 non-null int64
Team Name      170 non-null object
GDef           170 non-null float64
PFDef          170 non-null float64
TotalYdsDef    170 non-null float64
PlyDef         170 non-null float64
Y/PDef         170 non-nul

In [10]:
correlated_features = set()
correlation_matrix = data.drop('Tm', axis=1).corr()

for i in range(len(correlation_matrix.columns)):
    for j in range(i):
        if abs(correlation_matrix.iloc[i, j]) > 0.8:
            colname = correlation_matrix.columns[i]
            correlated_features.add(colname)

print(correlated_features)

{'Pass NY/A', 'pntYds', 'Sc%Def', 'PassYdsDef', 'AttDef', 'OAtt', '1stDDefP', '1stDDef', '1stDDefr', 'OYds', 'TO%Def', 'qBRate', 'NY/ADef', 'Y/PDef', 'PenYdsDef', 'IntDef'}


In [14]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Tm"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.4]
relevant_features

Tm           1.000000
Sk taken     0.435890
Pass Y/A     0.584282
Pass NY/A    0.633727
qBRate       0.680982
Att          0.503850
rushYds      0.514343
Pnt          0.403251
Name: Tm, dtype: float64

## Most Important Features Predicting San Francisco Score Given 10 years of Records are QB Rating, Passing Yards per Attempt, Rushing Yards, Passing Attempts and Punts

## Original Score Regression Equation (Figure 1)

In [29]:
X=data[['qBRate','Pass NY/A','rushYds','Att','Pnt']]
y=data[['Tm']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('49ers:', model.coef_)

Score: 0.629499695315029
intercept: [-8.14191842]
49ers: [[ 0.15039175  1.10708581  0.05185833  0.1531507  -0.18188369]]


## Predicting QB Rating Using Defensive Stat Ratings 

In [30]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["qBRate"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.25]
relevant_features

Tm             0.680982
PassYds        0.517997
OInt           0.590461
Sk taken       0.326820
OYds           0.261479
Pass Y/A       0.801849
Pass NY/A      0.787481
Cmp%           0.666620
qBRate         1.000000
Att            0.290773
Pnt            0.321828
pntYds         0.293324
PFDef          0.310070
TotalYdsDef    0.358403
Y/PDef         0.374231
1stDDef        0.266760
PassYdsDef     0.258001
NY/ADef        0.358225
RushYdsDef     0.266330
EXPDef         0.274644
Name: qBRate, dtype: float64

In [35]:
X=data[['PFDef','Y/PDef','EXPDef','1stDDef']]
y=data[['qBRate']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('QBR:', model.coef_)

Score: 0.14546752056821033
intercept: [-32.16639052]
QBR: [[ 7.52311851e-01  1.97426815e+01 -8.99829051e-04 -2.18006557e-01]]


## 49ers Projected QBR Given Chief's Stats = 84.2799679

## Predicting Pass NY/A

In [36]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Pass NY/A"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.2]
relevant_features

Tm             0.633727
PassYds        0.706043
Sk taken       0.548172
OYds           0.488160
Pass Y/A       0.934568
Pass NY/A      1.000000
Cmp%           0.511626
qBRate         0.787481
Att            0.281973
Pnt            0.510320
pntYds         0.479986
3DAtt          0.259162
PFDef          0.278579
TotalYdsDef    0.342528
Y/PDef         0.342916
1stDDef        0.233103
PassYdsDef     0.233325
NY/ADef        0.336762
RushAttDef     0.209030
RushYdsDef     0.272535
RTDDef         0.210288
Y/ADef         0.224710
1stDDefr       0.223664
Sc%Def         0.235842
EXPDef         0.204029
Name: Pass NY/A, dtype: float64

In [37]:
X=data[['Y/PDef','1stDDef','Sc%Def','RushAttDef']]
y=data[['Pass NY/A']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Pass NY/A:', model.coef_)

Score: 0.14301172943147888
intercept: [-6.31145198]
Pass NY/A: [[ 2.07264826 -0.01405938 -0.0518262   0.13323889]]


## Projected Pass NY/A Given Cheif's Defensive Stats = 6.24959

## Predicting Rush Yds

In [39]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["rushYds"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.2]
relevant_features

Tm            0.514343
OCmp          0.372039
OAtt          0.445819
PassYds       0.205732
OInt          0.204210
Sk taken      0.326696
OYds          0.293045
qBRate        0.240237
Att           0.698615
rushYds       1.000000
RUSH Y/A      0.796252
Pnt           0.326171
pntYds        0.324189
1stDDef       0.232767
RushAttDef    0.306531
RushYdsDef    0.398460
RTDDef        0.271403
Y/ADef        0.315436
1stDDefr      0.351612
1stPyDef      0.222947
Name: rushYds, dtype: float64

In [40]:
X=data[['RushYdsDef','1stDDefr','RTDDef','1stPyDef']]
y=data[['rushYds']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Rush Yds:', model.coef_)

Score: 0.21470569258474093
intercept: [-75.79458795]
Pass NY/A: [[ 1.26621475 -0.76177122  8.77300035 30.41496579]]


## Projected Rushing Yards Given Chief's Defensive Stats = 162.99

## Predicting Passing Attempts

In [42]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Att"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.15]
relevant_features

Tm             0.503850
OCmp           0.321234
OAtt           0.419999
OInt           0.249406
Sk taken       0.431372
OYds           0.450617
Pass Y/A       0.194786
Pass NY/A      0.281973
qBRate         0.290773
Att            1.000000
rushYds        0.698615
RUSH Y/A       0.160598
Pnt            0.281651
pntYds         0.261763
3DConv         0.399651
3DAtt          0.182210
PFDef          0.169546
TotalYdsDef    0.175431
PlyDef         0.267664
1stDDef        0.225631
AttDef         0.197544
RushAttDef     0.419460
RushYdsDef     0.355497
RTDDef         0.276532
1stDDefr       0.366773
1stPyDef       0.174090
Name: Att, dtype: float64

In [45]:
X=data[['PlyDef','1stDDefr','RushAttDef','RTDDef']]
y=data[['Att']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Attempts:', model.coef_)

Score: 0.1846298255064701
intercept: [-10.42811164]
Attempts: [[0.19106544 0.62779357 0.84245307 0.93059086]]


## Projected Passing Attempts Given Chief's Stats = 29.265

In [46]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Pnt"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.15]
relevant_features

Tm             0.403251
OAtt           0.156835
PassYds        0.310015
Sk taken       0.398670
OYds           0.338105
Pass Y/A       0.442368
Pass NY/A      0.510320
Cmp%           0.401857
qBRate         0.321828
Att            0.281651
rushYds        0.326171
RUSH Y/A       0.210734
Pnt            1.000000
pntYds         0.966533
3DConv         0.243464
3DAtt          0.350686
TotalYdsDef    0.172677
Y/PDef         0.202447
1stDDef        0.170039
RushYdsDef     0.182913
Y/ADef         0.206254
Sc%Def         0.241600
EXPDef         0.196469
Name: Pnt, dtype: float64

In [48]:
X=data[['Sc%Def','Y/PDef','RushYdsDef','EXPDef']]
y=data[['Pnt']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Attempts:', model.coef_)

Score: 0.06515841901976904
intercept: [7.94534146]
Attempts: [[-0.06531023  0.03513437 -0.00993858  0.0204101 ]]


## Projected Amount of 49er Punts Given Chiefs Defense = 4.524

# 49ers Projected Score Using Projected Features with Figure 1 Regression = 23.5634

# Modeling Given Only 2019 Season

In [49]:
#Subset Data to 2019
data=data.iloc[0:18]
print(data)

    Tm  OCmp  OAtt  PassYds  OInt  Sk taken  OYds  Pass Y/A  Pass NY/A  Cmp%  \
0   31    18    27      158     1         1     8       6.1        5.6  66.7   
1   41    18    26      312     1         0     0      12.0       12.0  69.2   
2   24    23    32      268     2         1     9       8.7        8.1  71.9   
3   31    20    29      171     0         2    10       6.2        5.5  69.0   
4   20    24    33      232     1         2    11       7.4        6.6  72.7   
5    9    12    21      146     1         2     5       7.2        6.3  57.1   
6   51    18    22      156     1         3    19       8.0        6.2  81.8   
7   28    28    37      310     0         1     7       8.6        8.2  75.7   
8   24    24    46      215     1         5    33       5.4        4.2  52.2   
9   36    34    45      408     2         2    16       9.4        8.7  75.6   
10  37    14    20      227     0         3    26      12.7        9.9  70.0   
11  17    15    21      157     0       

In [52]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Tm"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.4]
relevant_features

Tm          1.000000
Pass Y/A    0.428956
Cmp%        0.461774
qBRate      0.552706
RUSH Y/A    0.469184
3DAtt       0.482738
4DAtt       0.502788
Name: Tm, dtype: float64

## Most Important Features for Predicting 49ers Score in 2019: QB Rating, 3rd Down Attempts, Rush Yards Per Attempt, Comp %

## Score Regression Equation for 2019 Data (Figure 2)

In [59]:
X=data[['qBRate','3DAtt','RUSH Y/A','Cmp%']]
y=data[['Tm']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)

print('49ers:', model.coef_)

Score: 0.4388430068976401
intercept: [14.64774963]
49ers: [[ 0.13049058 -0.98543987  1.14450736  0.13495023]]


## Predicting QB Rating

In [63]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["qBRate"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.25]
relevant_features

Tm           0.552706
PassYds      0.497805
OInt         0.427488
Sk taken     0.319470
Pass Y/A     0.682556
Pass NY/A    0.724381
Cmp%         0.697563
qBRate       1.000000
Att          0.287602
RUSH Y/A     0.406866
3DAtt        0.303927
PlyDef       0.263761
Y/PDef       0.424514
TODef        0.449635
FLDef        0.439837
IntDef       0.290445
NY/ADef      0.353810
Y/ADef       0.285080
TO%Def       0.370213
EXPDef       0.368205
Name: qBRate, dtype: float64

In [68]:
X=data[['TODef','Y/PDef','EXPDef']]
y=data[['qBRate']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)

print('QBR:', model.coef_)

Score: 0.36519616057844373
intercept: [-212.11911274]
QBR: [[-50.21299283  75.24149289   7.35700727]]


## Projected QBR Given Chief's Defensive Stats = 94.068752

## Predicting 3rd Down Attempts

In [69]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["3DAtt"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.25]
relevant_features

Tm           0.482738
OCmp         0.380998
OAtt         0.473901
Pass Y/A     0.667516
Pass NY/A    0.550019
Cmp%         0.295484
qBRate       0.303927
rushYds      0.279558
RUSH Y/A     0.505898
Pnt          0.348242
pntYds       0.352700
3DConv       0.790207
3DAtt        1.000000
4DConv       0.644804
4DAtt        0.622388
PFDef        0.266032
PlyDef       0.349710
1stDDef      0.383455
CmpDef       0.356726
TDDef        0.506293
IntDef       0.389192
1stDDefP     0.277249
1stPyDef     0.520025
Name: 3DAtt, dtype: float64

In [73]:
X=data[['TDDef','1stPyDef','IntDef','PlyDef']]
y=data[['3DAtt']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)

print('3rd Down Attempts:', model.coef_)

Score: 0.3594512768538465
intercept: [-6.47440384]
3rd Down Attempts: [[1.34184595 2.61741732 0.20131184 0.17785535]]


## Projected 3rd Down Attempts Given Chief's Defense = 13.467393

## Predicting Rush Yards Per Attempt

In [72]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["RUSH Y/A"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.5]
relevant_features

OAtt        0.535581
rushYds     0.812314
RUSH Y/A    1.000000
3DAtt       0.505898
PlyDef      0.609466
CmpDef      0.652653
TDDef       0.529295
1stDDefP    0.585332
Name: RUSH Y/A, dtype: float64

In [74]:
X=data[['1stDDefP','TDDef','CmpDef','PlyDef']]
y=data[['RUSH Y/A']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)

print('Rush Yards/Attempt:', model.coef_)

Score: 0.5175226589440517
intercept: [26.83551545]
Rush Yards/Attempt: [[ 0.3819564   0.07732942 -0.52797497 -0.23680891]]


## Projected Rush Yards Per Attempt Given Chief's Defensive Stats = 4.4201879

## Predicting Completion %

In [77]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Cmp%"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>.15]
relevant_features

Tm           0.461774
OCmp         0.233359
PassYds      0.323498
OInt         0.228409
Sk taken     0.383621
OYds         0.158548
Pass Y/A     0.526085
Pass NY/A    0.557067
Cmp%         1.000000
qBRate       0.697563
rushYds      0.181496
RUSH Y/A     0.282035
Pnt          0.394789
pntYds       0.426276
3DAtt        0.295484
4DAtt        0.236679
TODef        0.203851
FLDef        0.177973
RTDDef       0.185599
Y/ADef       0.151547
PenDef       0.218650
PenYdsDef    0.294998
TO%Def       0.225437
Name: Cmp%, dtype: float64

In [78]:
X=data[['PenYdsDef','TO%Def','RTDDef','Y/ADef']]
y=data[['Cmp%']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)

print('Completion %:', model.coef_)

Score: 0.15918540581539053
intercept: [42.46012234]
Completion %: [[ 0.34876852 -0.19668166  4.26464464  1.29411659]]


## Completion % Given Chief's Defensive Stats = 68.294288

## 49ers Projected Score Using 2019 Projected Features With Figure 2 Regression = 27.9268

## 49ers Projected Score Using Projected Features with Figure 1 Regression = 23.5634

# Weighted Average Of Both 25%*23.5634 + 75%*27.9268=26.835 Or 27 Points