# Merging and Modeling: Total Points

This notebook contains all code related to manipulating historical data to get it ready for modeling. It also contains the modeling steps and predictions for week 12 matchups of the 2023 season.

## Table of Contents

**1. Manipulating historical data**
- Setting historical teams up against each other.
- Merging game results dataframe to determine accuracy

**2. Modeling for total points scored**
- Model setup
- Model evaluation

**3. Predictions and accuracy for Week 12 games**
- Importing current season data (up to week 11) and running for week 12 predictions
- Comparing my model to Vegas
- Calculating expected profit



### Modeling Notebooks
**1st Model: O/U (this notebook)**
- y = total points, use RMSE to test accuracy

**2nd Model: Away Team ('TeamScorePredictor-SundayProphet.ipynb')**
- y = away score, use RMSE to test accuracy

**3rd Model: Home Team ('TeamScorePredictor-SundayProphet.ipynb')**
- y = home score, use RMSE to test accuracy

**4th Model: Moneyline ('WinPredictor-SundayProphet.ipynb')**
- I bet I can do this... ideas:
    - Create new column in modeling notebook: feature engineering
    - If away score > home score, win (columns = 0 and 1)
    - With this additional information, you can probably create a classification model for predicting wins.
- If you don't want to go down this road, just create the column after where if one is higher, they are displayed as the winner

### Scraping Notebooks
**SundayProphet-Scraping Historical Data**
- The process of scraping from ESPN from scratch on all years between 2004 and 2022
- Important for model training and evaluation

**SundayProphet-Scraping In-Season Data**
- Scraping only data from this season from ESPN
- Important for deploying our model to get predictions

In [34]:
import pandas as pd
import numpy as np
from itertools import combinations
from itertools import permutations
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import r2_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [2]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [None]:
# copying in from colab

In [145]:
hist = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/FINAL_Hist_Data.csv')

In [146]:
hist.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,...,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.4375,20.0,1.8125,1.375,33.3125,31.5625,2.375,14.3125,1.625,1.5625
1,Offense--ari-2005,19.4,1.625,19.0,3.625,14.0,1.375,38.08,31.25,0.625373,...,2.8125,17.875,2.8125,2.6875,41.875,30.5,2.3125,13.5625,1.5,1.3125
2,Offense--ari-2006,19.6,2.0625,18.625,5.25,11.6875,1.6875,39.15,66.67,0.590826,...,2.1875,16.375,2.3125,1.75,34.0625,32.625,2.375,14.875,2.125,1.9375
3,Offense--ari-2007,25.3,3.0625,19.25,4.375,13.125,1.75,36.95,85.71,0.60339,...,1.5,10.1875,1.875,1.3125,36.875,35.625,2.25,15.125,2.0625,1.75
4,Offense--ari-2008,26.7,3.1875,20.5,4.5,14.4375,1.5625,41.92,50.0,0.663492,...,1.75,12.5625,1.75,1.5625,39.375,32.3125,1.9375,11.9375,1.6875,1.3125


In [147]:
hist.shape

(608, 78)

In [148]:
y04 = hist.loc[hist['SeasonID'].str.contains('2004'),:]
# testing if this works on a single year
y04.shape
# 32 teams by 78 features is correct. Now we must get every possible combination of teams against each other,
# then input the game results

(32, 78)

In [151]:
df = hist
df.shape
# should be 608 x 78

(608, 78)

In [152]:
yearList = ['2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022']
# this is our list of years that will be looped through

In [153]:
final_list = []

for year in yearList:
    temp_df = df.loc[df['SeasonID'].str.contains(year),:]

    combinations_list = list(permutations(temp_df.iterrows(), 2))

    pairs_data = []
    for pair in combinations_list:
        index1, row1 = pair[0]
        index2, row2 = pair[1]

        # Combine rows and add a distinction between sides
        pair_data = list(row1) + list(row2)
        #pair_data.extend([f'{col}_Team2' for col in y04.columns])

        pairs_data.append(pair_data)

    # Create column names for the new dataframe
    columns = [f'{col}_Team1' for col in temp_df.columns] + [f'{col}_Team2' for col in temp_df.columns]

    # Create the new dataframe
    pairs_df2 = pd.DataFrame(pairs_data, columns=columns)

    final_list.append(pairs_df2)

In [154]:
final_list

[        SeasonID_Team1  PPG_Team1  Tot_TDs_PG_Team1  1st_Downs_PG_Team1  \
 0    Offense--ari-2004       17.6            1.9375             17.5000   
 1    Offense--ari-2004       17.6            1.9375             17.5000   
 2    Offense--ari-2004       17.6            1.9375             17.5000   
 3    Offense--ari-2004       17.6            1.9375             17.5000   
 4    Offense--ari-2004       17.6            1.9375             17.5000   
 ..                 ...        ...               ...                 ...   
 987  Offense--wsh-2004       15.0            1.6250             16.8125   
 988  Offense--wsh-2004       15.0            1.6250             16.8125   
 989  Offense--wsh-2004       15.0            1.6250             16.8125   
 990  Offense--wsh-2004       15.0            1.6250             16.8125   
 991  Offense--wsh-2004       15.0            1.6250             16.8125   
 
      Rush_1st_Downs_PG_Team1  Pass_1st_Downs_PG_Team1  \
 0                     5.375

In [155]:
merged = pd.concat(final_list)

In [156]:
merged.shape
# this should be 18,848 x 156

(18848, 156)

In [157]:
merged.tail(33)

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Sacks_Taken_PG_Team2,Sack_Yds_Lost_PG_Team2,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2
959,Offense--ten-2022,17.5,1.941176,16.294118,6.117647,9.0,1.176471,36.53,52.94,0.625,...,1.294118,9.411765,2.235294,1.823529,44.176471,33.529412,2.647059,19.470588,1.588235,1.411765
960,Offense--ten-2022,17.5,1.941176,16.294118,6.117647,9.0,1.176471,36.53,52.94,0.625,...,2.823529,18.294118,1.764706,1.470588,32.588235,30.235294,2.529412,16.411765,1.705882,1.529412
961,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.705882,20.0,1.882353,1.647059,39.058824,34.882353,2.117647,15.647059,1.823529,1.705882
962,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.176471,13.411765,2.176471,1.882353,24.411765,33.0,1.235294,8.705882,2.235294,1.941176
963,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.235294,9.529412,2.529412,2.176471,28.705882,34.882353,2.823529,17.941176,2.235294,1.823529
964,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,1.941176,9.529412,1.823529,1.588235,33.764706,33.529412,2.352941,16.941176,1.647059,1.235294
965,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.117647,14.705882,2.058824,1.941176,26.882353,34.117647,2.058824,13.529412,1.941176,1.411765
966,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,3.411765,22.294118,1.588235,1.470588,22.176471,28.294118,1.176471,7.294118,1.705882,1.235294
967,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.588235,16.470588,1.705882,1.411765,35.882353,32.882353,1.764706,9.588235,2.352941,2.0
968,Offense--wsh-2022,18.9,2.117647,19.941176,7.294118,10.823529,1.823529,35.22,48.0,0.620939,...,2.588235,15.647059,1.882353,1.411765,31.764706,30.529412,2.0,12.411765,2.058824,1.764706


In [158]:
merged['SeasonID_Team2'].value_counts()
# every team with every concievable matchup would be 31, so this is correct

Offense--atl-2004    31
Offense--phi-2016    31
Offense--lar-2016    31
Offense--mia-2016    31
Offense--min-2016    31
                     ..
Offense--hou-2010    31
Offense--ind-2010    31
Offense--jax-2010    31
Offense--kc-2010     31
Offense--ari-2022    31
Name: SeasonID_Team2, Length: 608, dtype: int64

In [160]:
# pd.set_option('display.max_rows', None)
# DO NOT RUN THIS, IT WILL CRASH THE NOTEBOOK

In [161]:
# pd.set_option('display.max_columns',None)
# ONCE AGAIN, DO NOT RUN THIS
pairs_df2.head(32) # there should be 31 results with arizona first because thats how many other teams there are

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Sacks_Taken_PG_Team2,Sack_Yds_Lost_PG_Team2,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2
0,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,2.176471,13.411765,2.176471,1.882353,24.411765,33.0,1.235294,8.705882,2.235294,1.941176
1,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,2.235294,9.529412,2.529412,2.176471,28.705882,34.882353,2.823529,17.941176,2.235294,1.823529
2,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,1.941176,9.529412,1.823529,1.588235,33.764706,33.529412,2.352941,16.941176,1.647059,1.235294
3,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,2.117647,14.705882,2.058824,1.941176,26.882353,34.117647,2.058824,13.529412,1.941176,1.411765
4,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,3.411765,22.294118,1.588235,1.470588,22.176471,28.294118,1.176471,7.294118,1.705882,1.235294
5,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,2.588235,16.470588,1.705882,1.411765,35.882353,32.882353,1.764706,9.588235,2.352941,2.0
6,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,2.588235,15.647059,1.882353,1.411765,31.764706,30.529412,2.0,12.411765,2.058824,1.764706
7,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,1.588235,10.294118,1.882353,1.705882,32.705882,32.352941,3.176471,20.705882,2.352941,2.0
8,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,3.705882,24.411765,2.117647,1.647059,33.588235,35.588235,2.117647,14.411765,2.411765,2.294118
9,Offense--ari-2022,20.0,2.176471,19.176471,6.764706,11.117647,1.294118,35.19,43.9,0.652108,...,1.411765,9.588235,1.764706,1.411765,34.588235,32.941176,2.294118,15.705882,1.764706,1.470588


In [162]:
dfg = merged.groupby(by=['SeasonID_Team1', 'SeasonID_Team2']).size()
dfg

SeasonID_Team1     SeasonID_Team2   
Offense--ari-2004  Offense--atl-2004    1
                   Offense--bal-2004    1
                   Offense--buf-2004    1
                   Offense--car-2004    1
                   Offense--chi-2004    1
                                       ..
Offense--wsh-2022  Offense--pit-2022    1
                   Offense--sea-2022    1
                   Offense--sf-2022     1
                   Offense--tb-2022     1
                   Offense--ten-2022    1
Length: 18848, dtype: int64

In [163]:
merged.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Sacks_Taken_PG_Team2,Sack_Yds_Lost_PG_Team2,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2
0,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,3.125,17.5,1.4375,1.125,24.6875,32.3125,3.0,19.5,1.125,1.0
1,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.1875,15.4375,2.0,1.8125,29.0625,31.3125,2.4375,16.5,1.9375,1.625
2,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.375,13.4375,1.75,1.5,28.8125,30.375,2.8125,19.9375,2.0,1.6875
3,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.0625,15.375,1.5625,1.25,33.5,32.0625,2.125,14.0625,1.75,1.25
4,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,4.125,28.0625,1.5,0.9375,29.4375,32.1875,2.1875,10.8125,2.25,1.625


In [None]:
# team one abbrev
# team two abbrev
# year --> drop team 2 year and rename first column

In [164]:
abbrev1 = merged['SeasonID_Team1'].str.split('-', expand=True)
abbrev1.head()

Unnamed: 0,0,1,2,3
0,Offense,,ari,2004
1,Offense,,ari,2004
2,Offense,,ari,2004
3,Offense,,ari,2004
4,Offense,,ari,2004


In [165]:
merged['Team1_Abbrev'] = abbrev1[2]

In [166]:
abbrev2 = merged['SeasonID_Team2'].str.split('-', expand=True)
abbrev2.head()

Unnamed: 0,0,1,2,3
0,Offense,,atl,2004
1,Offense,,bal,2004
2,Offense,,buf,2004
3,Offense,,car,2004
4,Offense,,chi,2004


In [167]:
merged['Team2_Abbrev'] = abbrev2[2]

In [168]:
merged.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,Team1_Abbrev,Team2_Abbrev
0,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.4375,1.125,24.6875,32.3125,3.0,19.5,1.125,1.0,ari,atl
1,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.0,1.8125,29.0625,31.3125,2.4375,16.5,1.9375,1.625,ari,bal
2,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.75,1.5,28.8125,30.375,2.8125,19.9375,2.0,1.6875,ari,buf
3,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.5625,1.25,33.5,32.0625,2.125,14.0625,1.75,1.25,ari,car
4,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.5,0.9375,29.4375,32.1875,2.1875,10.8125,2.25,1.625,ari,chi


In [None]:
merged.to_csv('team_combinations.csv', index=False)
# every concievable matchup in a CSV for the purpose saving out spot in the process

## Creating a final DataFrame for model creation

**Individual team statistics:** team_combinations.csv
- created above in this notebook

**Game results for accuracy testing:** all_games.csv
- created in Game Data Notebook

In [169]:
teams = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/team_combinations.csv')
teams.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,Team1_Abbrev,Team2_Abbrev
0,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.4375,1.125,24.6875,32.3125,3.0,19.5,1.125,1.0,ari,atl
1,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,2.0,1.8125,29.0625,31.3125,2.4375,16.5,1.9375,1.625,ari,bal
2,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.75,1.5,28.8125,30.375,2.8125,19.9375,2.0,1.6875,ari,buf
3,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.5625,1.25,33.5,32.0625,2.125,14.0625,1.75,1.25,ari,car
4,Offense--ari-2004,17.6,1.9375,17.5,5.375,9.5,2.625,34.86,41.67,0.560976,...,1.5,0.9375,29.4375,32.1875,2.1875,10.8125,2.25,1.625,ari,chi


In [170]:
games = pd.read_csv('all_games.csv')
games.head()
# if you include new game statistics, this is where you import a different csv

Unnamed: 0,season,away_team,away_score,home_team,home_score,div_game,away_summary,home_summary
0,1999,min,17.0,atl,14.0,0,min_17.0,atl_14.0
1,1999,kc,17.0,chi,20.0,0,kc_17.0,chi_20.0
2,1999,pit,43.0,cle,0.0,1,pit_43.0,cle_0.0
3,1999,oak,24.0,gb,28.0,0,oak_24.0,gb_28.0
4,1999,buf,14.0,ind,31.0,1,buf_14.0,ind_31.0


In [171]:
teams.shape
# confirm it is 18,848 x 158

(18848, 158)

In [172]:
games.shape
# confirm it is 6314 x 8

(6314, 8)

In [None]:
y04 = games.loc[games['season'] == 2004]
y04.head()
# I am creating dataframes for only the years I want. There has got to be an easier way to do this
# but it would have taken me longer to figure it out than what I did

In [None]:
y05 = games.loc[games['season'] == 2005]
y05.head()

In [None]:
y06 = games.loc[games['season'] == 2006]
y06.head()

In [None]:
y07 = games.loc[games['season'] == 2007]
y07.head()

In [None]:
y08 = games.loc[games['season'] == 2008]
y08.head()

In [None]:
y09 = games.loc[games['season'] == 2009]
y09.head()

In [None]:
y10 = games.loc[games['season'] == 2010]
y10.head()

In [None]:
y11 = games.loc[games['season'] == 2011]
y11.head()

In [None]:
y12 = games.loc[games['season'] == 2012]
y12.head()

In [None]:
y13 = games.loc[games['season'] == 2013]
y13.head()

In [None]:
y14 = games.loc[games['season'] == 2014]
y14.head()

In [None]:
y15 = games.loc[games['season'] == 2015]
y15.head()

In [None]:
y16 = games.loc[games['season'] == 2016]
y16.head()

In [None]:
y17 = games.loc[games['season'] == 2017]
y17.head()

In [None]:
y18 = games.loc[games['season'] == 2018]
y18.head()

In [None]:
y19 = games.loc[games['season'] == 2019]
y19.head()

In [None]:
y20 = games.loc[games['season'] == 2020]
y20.head()

In [None]:
y21 = games.loc[games['season'] == 2021]
y21.head()

In [None]:
y21.shape
# games increased to 272 in a season this year. This is correct!

In [None]:
y22 = games.loc[games['season'] == 2022]
y22.head()

In [None]:
games = pd.concat([y04,y05,y06,y07,y08,y09,y10,y11,y12,y13,y14,y15,y16,y17,y18,y19,y20,y21,y22])

In [None]:
games.head()

In [None]:
games.to_csv('range_of_dates.csv', index=False)
# save point

In [None]:
games = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/range_of_dates.csv')

In [None]:
games.shape
# should be 4895 x 8

In [None]:
teams.isna().sum()

In [None]:
merged_df = pd.merge(teams, games, left_on=['Team1_Abbrev', 'Team2_Abbrev', 'Year_Team1'], right_on=['away_team', 'home_team', 'season'], how='right')

In [None]:
merged_df.shape
# should be 4895 x 166

In [None]:
merged_df.head(15)
# Confirm all NaN rows are for games involving a team that no longer exists (San Diego, Oakland, Redskins and St Louis)
# We will drop all rows where games were not played. We can choose to still include outcomes from these 4 teams, but
# I'm going to ignore these for now because of time constraints.

In [None]:
merged_df.isna().sum()

In [None]:
rows_with_nulls = merged_df[merged_df.isnull().any(axis=1)]
rows_with_nulls
# as long as the only rows with null values are associated with sd, oak, stl and was, we are good :)
# In the future, I could associate these with their old rows

In [None]:
clean_merge = merged_df.dropna()

In [None]:
clean_merge.head()

In [None]:
clean_merge.shape

In [None]:
# THIS IS THE FINAL CSV BEING USED FOR TRAINING ON ALL MODELS

clean_merge.to_csv('Network_Training.csv', index=False)



# Modeling for Total Points

In [240]:
modeling = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/Network_Training.csv')

In [241]:
modeling.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Team1_Abbrev,Team2_Abbrev,season,away_team,away_score,home_team,home_score,div_game,away_summary,home_summary
0,Offense--ind-2004,32.6,4.125,23.6875,5.875,14.875,2.9375,42.68,57.14,0.669829,...,ind,ne,2004,ind,24.0,ne,27.0,0,ind_24.0,ne_27.0
1,Offense--ten-2004,21.5,2.5625,19.25,5.3125,12.5,1.4375,34.1,44.44,0.604414,...,ten,mia,2004,ten,17.0,mia,7.0,0,ten_17.0,mia_7.0
2,Offense--jax-2004,17.4,1.625,17.4375,5.5,10.0625,1.875,36.87,55.56,0.594542,...,jax,buf,2004,jax,13.0,buf,10.0,0,jax_13.0,buf_10.0
3,Offense--det-2004,18.5,2.0,16.4375,5.75,8.875,1.8125,31.43,37.5,0.564356,...,det,chi,2004,det,20.0,chi,16.0,1,det_20.0,chi_16.0
4,Offense--bal-2004,19.8,2.0625,16.25,6.4375,8.4375,1.375,35.07,26.67,0.554839,...,bal,cle,2004,bal,3.0,cle,20.0,1,bal_3.0,cle_20.0


In [242]:
newDF = modeling.select_dtypes(include='number')
newDF.head()

Unnamed: 0,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,Pass_Yds_PG_Team1,...,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,away_score,home_score,div_game
0,32.6,4.125,23.6875,5.875,14.875,2.9375,42.68,57.14,0.669829,288.9375,...,30.3125,33.625,2.8125,19.4375,1.125,0.9375,2004,24.0,27.0,0
1,21.5,2.5625,19.25,5.3125,12.5,1.4375,34.1,44.44,0.604414,226.0,...,36.625,27.125,2.25,13.9375,1.75,1.25,2004,17.0,7.0,0
2,17.4,1.625,17.4375,5.5,10.0625,1.875,36.87,55.56,0.594542,197.4375,...,28.8125,30.375,2.8125,19.9375,2.0,1.6875,2004,13.0,10.0,0
3,18.5,2.0,16.4375,5.75,8.875,1.8125,31.43,37.5,0.564356,182.25,...,29.4375,32.1875,2.1875,10.8125,2.25,1.625,2004,20.0,16.0,1
4,19.8,2.0625,16.25,6.4375,8.4375,1.375,35.07,26.67,0.554839,144.5,...,27.4375,28.75,2.0,11.875,1.75,1.5,2004,3.0,20.0,1


In [243]:
# randomized_df = newDF.sample(frac=1)
# this is my process for manually selecting a train/test split. This may not be necessary if only predicting one
# variable at a time :)

In [244]:
# randomized_df.head()

In [245]:
modeling = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/Network_Training.csv')

In [246]:
modeling['total_points'] = modeling['home_score'] + modeling['away_score']

In [247]:
modeling['summary'] = modeling['away_team'] + '_' + modeling['home_team'] + modeling['season'].astype(str)

In [248]:
modeling['season']

0       2004
1       2004
2       2004
3       2004
4       2004
        ... 
3865    2022
3866    2022
3867    2022
3868    2022
3869    2022
Name: season, Length: 3870, dtype: int64

In [249]:
modeling.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,season,away_team,away_score,home_team,home_score,div_game,away_summary,home_summary,total_points,summary
0,Offense--ind-2004,32.6,4.125,23.6875,5.875,14.875,2.9375,42.68,57.14,0.669829,...,2004,ind,24.0,ne,27.0,0,ind_24.0,ne_27.0,51.0,ind_ne2004
1,Offense--ten-2004,21.5,2.5625,19.25,5.3125,12.5,1.4375,34.1,44.44,0.604414,...,2004,ten,17.0,mia,7.0,0,ten_17.0,mia_7.0,24.0,ten_mia2004
2,Offense--jax-2004,17.4,1.625,17.4375,5.5,10.0625,1.875,36.87,55.56,0.594542,...,2004,jax,13.0,buf,10.0,0,jax_13.0,buf_10.0,23.0,jax_buf2004
3,Offense--det-2004,18.5,2.0,16.4375,5.75,8.875,1.8125,31.43,37.5,0.564356,...,2004,det,20.0,chi,16.0,1,det_20.0,chi_16.0,36.0,det_chi2004
4,Offense--bal-2004,19.8,2.0625,16.25,6.4375,8.4375,1.375,35.07,26.67,0.554839,...,2004,bal,3.0,cle,20.0,1,bal_3.0,cle_20.0,23.0,bal_cle2004


In [250]:
modeling.set_index('summary', inplace=True)

In [251]:
modeling = modeling.drop(columns='div_game')

In [252]:
modeling.head()

Unnamed: 0_level_0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Team1_Abbrev,Team2_Abbrev,season,away_team,away_score,home_team,home_score,away_summary,home_summary,total_points
summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ind_ne2004,Offense--ind-2004,32.6,4.125,23.6875,5.875,14.875,2.9375,42.68,57.14,0.669829,...,ind,ne,2004,ind,24.0,ne,27.0,ind_24.0,ne_27.0,51.0
ten_mia2004,Offense--ten-2004,21.5,2.5625,19.25,5.3125,12.5,1.4375,34.1,44.44,0.604414,...,ten,mia,2004,ten,17.0,mia,7.0,ten_17.0,mia_7.0,24.0
jax_buf2004,Offense--jax-2004,17.4,1.625,17.4375,5.5,10.0625,1.875,36.87,55.56,0.594542,...,jax,buf,2004,jax,13.0,buf,10.0,jax_13.0,buf_10.0,23.0
det_chi2004,Offense--det-2004,18.5,2.0,16.4375,5.75,8.875,1.8125,31.43,37.5,0.564356,...,det,chi,2004,det,20.0,chi,16.0,det_20.0,chi_16.0,36.0
bal_cle2004,Offense--bal-2004,19.8,2.0625,16.25,6.4375,8.4375,1.375,35.07,26.67,0.554839,...,bal,cle,2004,bal,3.0,cle,20.0,bal_3.0,cle_20.0,23.0


In [253]:
newDF = modeling.select_dtypes(include='number')
newDF.head()

Unnamed: 0_level_0,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,Pass_Yds_PG_Team1,...,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,away_score,home_score,total_points
summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ind_ne2004,32.6,4.125,23.6875,5.875,14.875,2.9375,42.68,57.14,0.669829,288.9375,...,30.3125,33.625,2.8125,19.4375,1.125,0.9375,2004,24.0,27.0,51.0
ten_mia2004,21.5,2.5625,19.25,5.3125,12.5,1.4375,34.1,44.44,0.604414,226.0,...,36.625,27.125,2.25,13.9375,1.75,1.25,2004,17.0,7.0,24.0
jax_buf2004,17.4,1.625,17.4375,5.5,10.0625,1.875,36.87,55.56,0.594542,197.4375,...,28.8125,30.375,2.8125,19.9375,2.0,1.6875,2004,13.0,10.0,23.0
det_chi2004,18.5,2.0,16.4375,5.75,8.875,1.8125,31.43,37.5,0.564356,182.25,...,29.4375,32.1875,2.1875,10.8125,2.25,1.625,2004,20.0,16.0,36.0
bal_cle2004,19.8,2.0625,16.25,6.4375,8.4375,1.375,35.07,26.67,0.554839,144.5,...,27.4375,28.75,2.0,11.875,1.75,1.5,2004,3.0,20.0,23.0


In [295]:
newDF.to_csv('FINAL_NET_TRAINING.csv')

In [254]:
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler

# X = newDF.drop(columns = ['total_points'])
# y = newDF['total_points']

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33)

# scaler = StandardScaler().fit(X_train)
# X_train = scaler.transform(X_train)
# X_test = scaler.transform(X_test)


# This is my old process and should not be used. Accidentally leaks home and away
# points in prediction of the total points, which obviously is not allowed.

In [255]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = newDF.drop(columns = ['total_points', 'away_score', 'home_score', 'Year_Team1', 'Year_Team2'])
y = newDF['total_points']
indices = newDF.index

X_train, X_test, y_train, y_test, index_train, index_test = train_test_split(X, y, indices, test_size = 0.33)

scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [296]:
X.shape

(3870, 153)

In [257]:
indices

Index(['ind_ne2004', 'ten_mia2004', 'jax_buf2004', 'det_chi2004',
       'bal_cle2004', 'sea_no2004', 'cin_nyj2004', 'dal_min2004',
       'nyg_phi2004', 'atl_sf2004',
       ...
       'min_chi2022', 'bal_cin2022', 'hou_ind2022', 'nyj_mia2022',
       'car_no2022', 'cle_pit2022', 'lac_den2022', 'nyg_phi2022', 'ari_sf2022',
       'det_gb2022'],
      dtype='object', name='summary', length=3870)

In [258]:
type(X_test)

numpy.ndarray

In [259]:
X_test.shape

(1278, 153)

In [260]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [261]:
model = keras.Sequential()
# is keras the right one?

model.add(layers.Dense(120, activation="relu"))
model.add(layers.Dense(80, activation="relu"))
model.add(layers.Dense(60, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(20, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(60, activation="relu"))
model.add(layers.Dense(80, activation="relu"))
model.add(layers.Dense(120, activation="relu"))

# when should I use which activate? How dense?
# try a gridsearch?

model.add(layers.Dense(1, activation=None))

model.compile(
    # Optimizer
    optimizer=keras.optimizers.Adam(),
    # Loss function to minimize
    loss=keras.losses.MeanSquaredError()
)

In [262]:
please = model.fit(X_train, y_train, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [263]:
train_loss = please.history["loss"][-1]
result = model.evaluate(X_test, y_test, verbose=0)

print(f"Train Loss: {train_loss:.4f}")
print(f"Test Loss: {result:.4f}")

predictions = model.predict(X_test)

Train Loss: 4.7960
Test Loss: 267.4741


## Note to self, this seems overfit
- Baseline:
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(40, activation="relu"))
model.add(layers.Dense(2, activation=None))
- Results: train loss = 0.6254, test loss = 7.5824


In [264]:
predictions.shape

(1278, 1)

In [265]:
index_test.shape

(1278,)

In [266]:
predictions_df = pd.DataFrame(predictions, index=index_test)

In [267]:
predictions_df = predictions_df.rename(columns={0: 'total_points'})

In [268]:
predictions_df.head()

Unnamed: 0_level_0,total_points
summary,Unnamed: 1_level_1
pit_bal2014,40.173683
car_no2005,34.659206
tb_sf2005,48.464985
sea_ten2005,44.808758
dal_cle2016,55.810951


In [269]:
def get_scores(away_team, home_team, season):
    temp_index = away_team + '_' + home_team + season
    result = predictions_df.loc[temp_index,'total_points']
    return result

In [297]:
score = get_scores('wsh', 'sea', '2015')
score

44.39333

In [271]:
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_test,predictions)
mse

267.47414386230685

In [272]:
rmse = np.sqrt(mse)
rmse

16.35463676950078

In [273]:
predictions_df.describe()

Unnamed: 0,total_points
count,1278.0
mean,44.329533
std,11.895653
min,14.020357
25%,36.062719
50%,43.579376
75%,51.655582
max,92.667191


# Model Predictions
- Running data scraped from Week 11 to make predictions on week 12

In [274]:
wk11 = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/Week12_modeling.csv')

In [275]:
wk11.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Sack_Yds_Lost_PG_Team2,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023
1,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023
2,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023
3,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023
4,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,11.823529,1.176471,1.117647,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023


In [276]:
wk11.shape

(992, 155)

In [277]:
hack_t1 = wk11['SeasonID_Team1'].str.split('-', expand=True)
hack_t1.head()

Unnamed: 0,0,1,2,3
0,Offense,,ari,2023
1,Offense,,ari,2023
2,Offense,,ari,2023
3,Offense,,ari,2023
4,Offense,,ari,2023


In [278]:
wk11['Team1_abbrev'] = hack_t1[2]

In [279]:
hack_t2 = wk11['SeasonID_Team2'].str.split('-', expand=True)
hack_t2.head()

Unnamed: 0,0,1,2,3
0,Offense,,atl,2023
1,Offense,,bal,2023
2,Offense,,buf,2023
3,Offense,,car,2023
4,Offense,,chi,2023


In [280]:
wk11['Team2_abbrev'] = hack_t2[2]

In [281]:
wk11.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,Team1_abbrev,Team2_abbrev
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023,ari,atl
1,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023,ari,bal
2,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023,ari,buf
3,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023,ari,car
4,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023,ari,chi


In [282]:
wk11['summary'] = wk11['Team1_abbrev'] + '_' + wk11['Team2_abbrev'] + wk11['season'].astype(str)

In [283]:
wk11.head()

Unnamed: 0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,Team1_abbrev,Team2_abbrev,summary
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023,ari,atl,ari_atl2023
1,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023,ari,bal,ari_bal2023
2,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023,ari,buf,ari_buf2023
3,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023,ari,car,ari_car2023
4,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023,ari,chi,ari_chi2023


In [284]:
wk11.set_index('summary', inplace=True)

In [285]:
wk11.head()

Unnamed: 0_level_0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,Team1_abbrev,Team2_abbrev
summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ari_atl2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023,ari,atl
ari_bal2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023,ari,bal
ari_buf2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023,ari,buf
ari_car2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023,ari,car
ari_chi2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023,ari,chi


In [286]:
wk11.to_csv('Model-ready-Wk11')
wk11.head()

Unnamed: 0_level_0,SeasonID_Team1,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,...,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season,Team1_abbrev,Team2_abbrev
summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ari_atl2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023,ari,atl
ari_bal2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023,ari,bal
ari_buf2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023,ari,buf
ari_car2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023,ari,car
ari_chi2023,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,...,1.117647,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023,ari,chi


In [287]:
bazinga = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/Model-ready-Wk11', index_col='summary')

In [288]:
Wk11DF = bazinga.select_dtypes(include='number')
Wk11DF.head()

Unnamed: 0_level_0,PPG_Team1,Tot_TDs_PG_Team1,1st_Downs_PG_Team1,Rush_1st_Downs_PG_Team1,Pass_1st_Downs_PG_Team1,OFF_1st_by_pen_PG_Team1,3rd_Conv_Rate_Team1,4th_Conv_Rate_Team1,Pass_Comp_Rate_Team1,Pass_Yds_PG_Team1,...,Sack_Yds_Lost_PG_Team2,FG_Att_PG_Team2,FG_Good_PG_Team2,Pass_Att_PG_Team2,DEF_Pass_Att_PG_Team2,DEF_Sacks_PG_Team2,DEF_Sack_Yds_PG_Team2,DEF_FG_Att_PG_Team2,DEF_FG_Good_PG_Team2,season
summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ari_atl2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,...,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294,2023
ari_bal2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,...,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765,2023
ari_buf2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,...,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824,2023
ari_car2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,...,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353,2023
ari_chi2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,...,11.823529,1.176471,1.117647,19.294118,23.647059,0.882353,5.235294,1.411765,1.058824,2023


In [289]:
columns_not_shared = set(X.columns) ^ set(Wk11DF.columns)
columns_not_shared

set()

In [290]:
Wk11DF = scaler.transform(Wk11DF)

In [291]:
Wk12_predictions = model.predict(Wk11DF)
Wk12_predictions



array([[47.07318 ],
       [65.700424],
       [66.426926],
       [61.327442],
       [47.903008],
       [43.85438 ],
       [64.388824],
       [79.52333 ],
       [51.000683],
       [67.21874 ],
       [55.01541 ],
       [47.870647],
       [58.18981 ],
       [51.937176],
       [71.13933 ],
       [50.85149 ],
       [54.42864 ],
       [49.01807 ],
       [65.15986 ],
       [66.10139 ],
       [49.385567],
       [42.829906],
       [38.991196],
       [57.14031 ],
       [52.51616 ],
       [49.233765],
       [56.95738 ],
       [45.75623 ],
       [51.395237],
       [46.666775],
       [67.60845 ],
       [52.52376 ],
       [64.71544 ],
       [63.538433],
       [56.45595 ],
       [45.2404  ],
       [45.857147],
       [66.46914 ],
       [79.63013 ],
       [42.3392  ],
       [64.13027 ],
       [53.807186],
       [50.01488 ],
       [57.2795  ],
       [49.8758  ],
       [72.71152 ],
       [46.358227],
       [49.0639  ],
       [48.40206 ],
       [63.668495],


In [298]:
total = pd.DataFrame(Wk12_predictions, index=wk11.index)

In [301]:
total = total.rename(columns={0:'expected_points'})
total.head()

Unnamed: 0_level_0,expected_points
summary,Unnamed: 1_level_1
ari_atl2023,47.073181
ari_bal2023,65.700424
ari_buf2023,66.426926
ari_car2023,61.327442
ari_chi2023,47.903008


In [303]:
def get_total_prediction(away_team, home_team, season):
    temp_index = away_team + '_' + home_team + season
    result = total.loc[temp_index,'expected_points']
    return result

In [304]:
points_prediction = get_total_prediction('chi', 'min', '2023')
# put away team first, will output away team prediction
points_prediction

67.2491

In [302]:
total.describe()

Unnamed: 0,expected_points
count,992.0
mean,55.463173
std,11.349466
min,26.418385
25%,47.361275
50%,54.467014
75%,63.053327
max,92.886185
