# Kansas City Super Bowl Score Prediction

#### Methodology:  Use feature selection to identify most important variables when predicting Kansas City's average score for two seperate datasets, 2019 season and last 10 seasons.  Then predict each of those "important features" using linear regression models subject to 49ers defensive averages from 2019.  Then, plug in the projected feature scores into two linear regression models projecting Kansas City's score given only 2019 season and then projecting Kansas City's score given the last 10 seasons. Finally take a weighted average of the two putting more importance on the 2019 season to generate a projected score total for the superbowl.

In [305]:
#importing Packages to use
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression


In [415]:
#importing data to use
data = pd.read_csv('/Users/jt/Downloads/KCRS.csv')
data3=pd.read_csv('/Users/jt/Downloads/KCRSStandardized.csv')
data4=pd.read_csv('/Users/jt/Downloads/KCRSfinal.csv')
data3=data3.dropna()
data4=data4.iloc[:,0:47]

In [429]:
#Collinearity in Data
correlated_features = set()
correlation_matrix = data4.drop('Tm', axis=1).corr()

for i in range(len(correlation_matrix.columns)):
    for j in range(i):
        if abs(correlation_matrix.iloc[i, j]) > 0.8:
            colname = correlation_matrix.columns[i]
            correlated_features.add(colname)

correlated_features

{'1stDDef',
 '1stDDefP',
 '1stDDefr',
 'AttDef',
 'IntDef',
 'NY/ADef',
 'OAtt',
 'OYds',
 'Pass NY/A',
 'PassYdsDef',
 'PenYdsDef',
 'Sc%Def',
 'TO%Def',
 'Y/PDef',
 'pntYds'}

# Most Important Features Given 10 years of Records

In [336]:
#All Records Model
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Tm"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.35]
relevant_features

Tm           1.000000
OCmp         0.353253
PassYds      0.558114
Pass Y/A     0.613573
Pass NY/A    0.639146
Cmp%         0.482553
qBRate       0.734852
Pnt          0.435199
pntYds       0.452020
Name: Tm, dtype: float64

## Most Important Features Predicting Kansas City's Score Given 10 years of Records are Pass Yards/Attempt, QBR, Comp% and Punts

## 10 Year Score Regression Equation 

In [337]:
#Most Important Features Predicting score Given All records
X=data[['Pass NY/A','qBRate','Cmp%','Pnt']]
y=data[['Tm']]


lm = linear_model.LinearRegression()
model = lm.fit(X,y)



In [338]:
print('Score:', model.score(X, y))

Score: 0.5788556786336514


In [328]:
#Regression for all records
print('intercept:', model.intercept_)

print('KC Score Coef:', model.coef_)

intercept: [7.49767352]
KC Score Coef: [[ 0.76444273  0.2649305  -0.13694137 -0.80508179]]


## Predicting Pass NY/A

In [421]:

corr = data4.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Pass NY/A"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.25]
relevant_features

Tm             0.645039
OCmp           0.328995
PassYds        0.760972
Sk taken       0.462355
OYds           0.422129
Pass Y/A       0.949785
Pass NY/A      1.000000
Cmp%           0.565369
qBRate         0.733114
Pnt            0.594274
pntYds         0.602594
3DConv         0.251544
3DAtt          0.274648
TotalYdsDef    0.297717
Y/PDef         0.368874
1stDDef        0.263512
PassYdsDef     0.303551
NY/ADef        0.351942
Sc%Def         0.308101
EXPDef         0.345237
Name: Pass NY/A, dtype: float64

In [424]:
X=data4[['Y/PDef','Sc%Def','EXPDef','1stDDef']]
y=data4[['Pass NY/A']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Pass NY/A:', model.coef_)

Score: 0.1587852015225215
intercept: [-0.21265145]
Pass NY/A: [[ 1.02329635  0.01350145 -0.06274578  0.03249883]]


## Chiefs Projected Pass NY/A Given 49ers Stats = 5.189

## Predicting QBR

In [426]:
#qbr
corr = data4.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["qBRate"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.2]
relevant_features

Tm             0.735357
OCmp           0.394869
PassYds        0.594238
OInt           0.573691
Pass Y/A       0.750265
Pass NY/A      0.733114
Cmp%           0.693766
qBRate         1.000000
Pnt            0.365489
pntYds         0.373933
3DConv         0.302736
TotalYdsDef    0.216639
Y/PDef         0.263001
1stDDef        0.209333
CmpDef         0.209637
PassYdsDef     0.218310
IntDef         0.204248
NY/ADef        0.247409
Sc%Def         0.294955
EXPDef         0.369308
Name: qBRate, dtype: float64

In [431]:
X=data4[['EXPDef','Sc%Def','NY/ADef','CmpDef']]
y=data4[['qBRate']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('QBR:', model.coef_)

Score: 0.15981815821154877
intercept: [36.80485427]
Pass NY/A: [[-1.50241314  0.72614923 -1.60336251  1.80685669]]


## Chiefs Projected QBR Given 49ers Stats = 77.109

## Prediciting Comp% 

In [434]:
#Cmp%
corr = data4.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Cmp%"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.15]
relevant_features

Tm             0.487861
OCmp           0.477258
PassYds        0.400461
OInt           0.206050
OYds           0.175776
Pass Y/A       0.592624
Pass NY/A      0.565369
Cmp%           1.000000
qBRate         0.693766
Pnt            0.440259
pntYds         0.437154
3DConv         0.291407
4DAtt          0.171675
TotalYdsDef    0.157335
Y/PDef         0.199159
CmpDef         0.284468
PassYdsDef     0.197295
NY/ADef        0.156924
Sc%Def         0.167139
EXPDef         0.192271
Name: Cmp%, dtype: float64

In [436]:
X=data4[['CmpDef','PassYdsDef','EXPDef','Sc%Def']]
y=data4[['Cmp%']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Cmp%:', model.coef_)

Score: 0.11014947450373946
intercept: [34.91352477]
Cmp%: [[ 1.64633665 -0.0617111  -0.2243508   0.21479017]]


## Chiefs Projected Comp% Given 49ers Stats = 62.116%

## Projected Punts

In [440]:
#pnt
corr = data4.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Pnt"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.15]
relevant_features

Tm            0.422487
OCmp          0.353749
PassYds       0.536156
Sk taken      0.292438
OYds          0.241011
Pass Y/A      0.553819
Pass NY/A     0.594274
Cmp%          0.440259
qBRate        0.365489
Pnt           1.000000
pntYds        0.963743
3DConv        0.349650
3DAtt         0.261149
PlyDef        0.179803
Y/PDef        0.243618
CmpDef        0.180131
PassYdsDef    0.204513
NY/ADef       0.208005
1stDDefP      0.179565
RushAttDef    0.249465
Sc%Def        0.173459
EXPDef        0.294719
Name: Pnt, dtype: float64

In [441]:
X=data4[['EXPDef','Y/PDef','RushAttDef','CmpDef']]
y=data4[['Pnt']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Pnt:', model.coef_)

Score: 0.165742334694149
intercept: [2.50353237]
Pnt: [[ 0.07890892 -0.68793236  0.21425513  0.00167077]]


## Chiefs Expected Punts Given 49ers Stats = 5.15

## Chiefs Expected Score using oringial score regression from above plugging in projected featured values from above = 19.237

# Modeling only given 2019 data

In [340]:
#subset data looking at 2019 only
data=data.iloc[0:18]
print(data)

    Tm  OCmp  OAtt  PassYds  OInt  Sk taken  OYds  Pass Y/A  Pass NY/A  Cmp%  \
0   40    25    34      378     0         0     0      11.1       11.1  73.5   
1   28    30    44      436     0         2     7      10.1        9.5  68.2   
2   33    27    37      363     0         1    11      10.1        9.6  73.0   
3   34    24    42      315     0         0     0       7.5        7.5  57.1   
4   13    22    39      288     0         4    33       8.2        6.7  56.4   
5   24    19    35      256     1         1    17       7.8        7.1  54.3   
6   30    20    30      191     0         1     2       6.4        6.2  66.7   
7   24    24    36      249     0         2    18       7.4        6.6  66.7   
8   26    25    35      230     0         5    45       7.9        5.8  71.4   
9   32    36    51      433     0         2    13       8.7        8.2  70.6   
10  24    19    32      180     1         1     2       5.7        5.5  59.4   
11  40    15    29      163     0       

## Most important features given only 2019 data

In [341]:
corr = data.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr["Tm"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.35]
relevant_features

Tm           1.000000
OInt         0.363130
Sk taken     0.493352
OYds         0.496155
Pass NY/A    0.383688
qBRate       0.440394
rushYds      0.425616
3DAtt        0.428863
Name: Tm, dtype: float64

## Most Important Features Predicting Score Given 2019 Records are Sacks Taken, Pass Yards per Attempt, QBR,  Rush Yards and 3rd Down Attempts

In [343]:
X=data[['Sk taken','Pass NY/A','qBRate','rushYds','3DAtt']]
y=data[['Tm']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)


In [344]:
print('Score:', model.score(X, y))

Score: 0.6222313986057935


## 2019 Kansas City Score Regression Equation

In [345]:
print('intercept:', model.intercept_)

print('KC Score Coef:', model.coef_)

intercept: [18.70414193]
KC Score Coef: [[-3.10153976 -2.39199662  0.32229273  0.0698103  -0.59521856]]


In [442]:

data5=data4.iloc[0:18,0:47]
data5.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 47 columns):
Tm             18 non-null int64
OCmp           18 non-null int64
OAtt           18 non-null int64
PassYds        18 non-null int64
OInt           18 non-null int64
Sk taken       18 non-null int64
OYds           18 non-null int64
Pass Y/A       18 non-null float64
Pass NY/A      18 non-null float64
Cmp%           18 non-null float64
qBRate         18 non-null float64
Att            18 non-null int64
rushYds        18 non-null int64
RUSH Y/A       18 non-null float64
Pnt            18 non-null int64
pntYds         18 non-null int64
3DConv         18 non-null int64
3DAtt          18 non-null int64
4DConv         18 non-null int64
4DAtt          18 non-null int64
Team Name      18 non-null object
GDef           18 non-null float64
PFDef          18 non-null float64
TotalYdsDef    18 non-null float64
PlyDef         18 non-null float64
Y/PDef         18 non-null float64
TODef          18 

## Predicting Sacks Taken

In [411]:

corr = data5.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr['Sk taken'])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.3]
relevant_features

Tm            0.493352
Sk taken      1.000000
OYds          0.865710
Att           0.440396
Pnt           0.325657
pntYds        0.364677
3DAtt         0.300723
4DConv        0.360243
4DAtt         0.370956
TODef         0.387045
CmpDef        0.494084
AttDef        0.342184
IntDef        0.354212
RushYdsDef    0.323604
RTDDef        0.545291
1stDDefr      0.323682
TO%Def        0.430620
Name: Sk taken, dtype: float64

In [402]:
X=data5[['RTDDef','CmpDef','TO%Def','RushYdsDef']]
y=data5[['Sk taken']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Sk taken:', model.coef_)

## Chiefs Sacks Taken Given 49ers Stats = 1.29



## Predicting Pass NY/A

In [445]:
#Predicting 'Pass NY/A'
corr = data5.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr['Pass NY/A'])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.2]
relevant_features

Tm             0.383688
OCmp           0.542856
OAtt           0.320919
PassYds        0.822746
OInt           0.216516
Sk taken       0.219781
OYds           0.335864
Pass Y/A       0.937970
Pass NY/A      1.000000
Cmp%           0.579316
qBRate         0.835480
Pnt            0.434899
pntYds         0.507379
3DAtt          0.327488
4DConv         0.286040
4DAtt          0.217067
TotalYdsDef    0.252080
Y/PDef         0.283524
1stDDef        0.218120
IntDef         0.243771
RushYdsDef     0.287983
RTDDef         0.432475
Y/ADef         0.379647
1stDDefr       0.238502
Name: Pass NY/A, dtype: float64

In [447]:
X=data5[['RTDDef','Y/ADef','1stDDef','IntDef']]
y=data5[['Pass NY/A']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('Pass NY/A:', model.coef_)

Score: 0.2775776682888451
intercept: [-0.02140123]
Pass NY/A: [[ 1.79348646  1.38113794  0.03835943 -0.72478295]]


## Projected Pass NY/A Given 49ers Stats = 7.570436 


## Predicting QBR

In [449]:

corr = data5.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr['qBRate'])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.2]
relevant_features

Tm             0.440394
OCmp           0.549258
OAtt           0.252792
PassYds        0.704337
OInt           0.479796
Pass Y/A       0.843012
Pass NY/A      0.835480
Cmp%           0.676375
qBRate         1.000000
pntYds         0.246358
3DAtt          0.208215
4DConv         0.225140
TotalYdsDef    0.249597
Y/PDef         0.211662
FLDef          0.288898
1stDDef        0.232260
AttDef         0.245416
PassYdsDef     0.209025
1stDDefP       0.203402
RTDDef         0.316923
Y/ADef         0.345500
Name: qBRate, dtype: float64

In [450]:
X=data5[['RTDDef','Y/ADef','FLDef','AttDef']]
y=data5[['qBRate']]

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('QBR:', model.coef_)

Score: 0.30101021913043335
intercept: [-35.00357251]
QBR: [[34.57742173 11.09659719 27.01846446  1.45549022]]



## Projected QBR Given 49ers Stats = 111.34

## Projecting Rush Yrds

In [454]:
corr = data5.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr['rushYds'])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.2]
relevant_features

Tm            0.425616
OAtt          0.393041
PassYds       0.276823
Sk taken      0.231067
Cmp%          0.238247
Att           0.342299
rushYds       1.000000
RUSH Y/A      0.818487
3DAtt         0.316485
PFDef         0.202373
Y/PDef        0.235831
1stDDef       0.234133
PassYdsDef    0.288054
TDDef         0.312191
NY/ADef       0.246930
1stDDefP      0.206456
RushYdsDef    0.202386
RTDDef        0.214378
Name: rushYds, dtype: float64

In [456]:
X=data5[['TDDef','PassYdsDef','1stDDefP','PFDef']]
y=data5[['rushYds']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('rushYds:', model.coef_)

Score: 0.2125615033365037
intercept: [104.52110471]
rushYds: [[-62.6120936   -1.65092853  25.73340366   7.86526488]]


## Projected Rush Yds Given 49ers Stats = 128.743

## Projecting 3rd Down Attempts

In [462]:
corr = data5.corr()
corr.head()
#Correlation with output variable
cor_target = abs(corr['3DAtt'])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.4]
relevant_features

Tm             0.428863
RUSH Y/A       0.438200
Pnt            0.532693
pntYds         0.456587
3DAtt          1.000000
4DConv         0.461987
PFDef          0.416123
TotalYdsDef    0.520162
Y/PDef         0.560401
PassYdsDef     0.431809
IntDef         0.423819
RushYdsDef     0.422663
Y/ADef         0.458978
Sc%Def         0.435109
EXPDef         0.529787
Name: 3DAtt, dtype: float64

In [463]:
X=data5[['Y/PDef','Y/ADef','EXPDef','Sc%Def']]
y=data5[['3DAtt']]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
print('Score:', model.score(X, y))
print('intercept:', model.intercept_)
print('rushYds:', model.coef_)

Score: 0.42188902885967833
intercept: [11.49642046]
rushYds: [[ 2.37413446 -3.08347133  0.40706778  0.04546511]]


## Projected 3rd Down Attempts Given 49ers Stats = 12.54

# Chiefs Expected Score Using 2019 Regression Equation From Above, Plugging In Projected Featured Values From 2019 Season=33.999546


# Projected Score Given 10 Years Records=19.237

# Weighted Projected Score=75% 2019 Season+.25%*10 Years Score = 30