# NFL Gambling Model
by: Leo Dueker

### Project Summary and Goals:

This project will be using multiple machine learning methods to predict money line bet winners in NFL games.  The dataset was taken from https://www.kaggle.com/datasets/tobycrabtree/nfl-scores-and-betting-data which is a collection of game, weather, and gambling data from pro-football-reference, espn, nfl weather, and a few other databases.

After the data has been cleaned, custom features which help predict the home team's win chances will be created from the cleaned data.  The best four features will be selected using the RFE base model.  Once the features have been selected for training the model, the data will be ran through multiple popular machine learning methods to determine which ones perform the best by average accuracy.
    
Money line predictions will be using classification models to predict straight-up winner.  Once the model has been trained, games will be tested and won bets will produce a return based on the spread of the game and the average money line for that spread.  The average moneyline for each spread is given in favMoneyLineDict and undMoneyLineDict.  All of the bets in this model are theoretical.

## Data Setup

Packages:

In [743]:
# packages
import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np
import datetime
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=DeprecationWarning)
    
import sklearn

# required machine learning packages
from sklearn import model_selection
from sklearn.feature_selection import RFE
from sklearn.metrics import brier_score_loss, roc_auc_score
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV as CCV

from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, VotingClassifier
import xgboost as xgb

Initial Datasets:

In [744]:
scoresDF = pd.read_csv("spreadspoke_scores.csv")
scoresDF.head()
scoresDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13504 entries, 0 to 13503
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   schedule_date        13504 non-null  object 
 1   schedule_season      13504 non-null  int64  
 2   schedule_week        13504 non-null  object 
 3   schedule_playoff     13504 non-null  bool   
 4   team_home            13504 non-null  object 
 5   score_home           13296 non-null  float64
 6   score_away           13296 non-null  float64
 7   team_away            13504 non-null  object 
 8   team_favorite_id     10817 non-null  object 
 9   spread_favorite      10817 non-null  float64
 10  over_under_line      10807 non-null  object 
 11  stadium              13504 non-null  object 
 12  stadium_neutral      13504 non-null  bool   
 13  weather_temperature  12280 non-null  float64
 14  weather_wind_mph     12264 non-null  float64
 15  weather_humidity     8441 non-null  

In [745]:
teamsDF = pd.read_csv("nfl_teams.csv")
teamsDF.head()

Unnamed: 0,team_name,team_name_short,team_id,team_id_pfr,team_conference,team_division,team_conference_pre2002,team_division_pre2002
0,Arizona Cardinals,Cardinals,ARI,CRD,NFC,NFC West,NFC,NFC West
1,Atlanta Falcons,Falcons,ATL,ATL,NFC,NFC South,NFC,NFC West
2,Baltimore Colts,Colts,IND,CLT,AFC,,AFC,AFC East
3,Baltimore Ravens,Ravens,BAL,RAV,AFC,AFC North,AFC,AFC Central
4,Boston Patriots,Patriots,NE,NWE,AFC,,AFC,


In [746]:
favMoneyLineDict = {0 : 110,
                 -0.5 : 116,
                 -1.0 : 122, 
                 -1.5 : 128,
                 -2.0 : 131,
                 -2.5 : 142,
                 -3.0 : 164,
                 -3.5 : 191,
                 -4.0 : 211,
                 -4.5 : 224,
                 -5.0 : 234,
                 -5.5 : 244,
                 -6.0 : 261,
                 -6.5 : 282,
                 -7.0 : 319,
                 -7.5 : 346,
                 -8.0 : 366,
                 -8.5 : 397,
                 -9.0 : 416,
                 -9.5 : 436,
                 -10.0 : 483,
                 -10.5 : 538,
                 -11.0 : 567,
                 -11.5 : 646,
                 -12.0 : 660,
                 -12.5 : 675,
                 -13.0 : 729,
                 -13.5 : 819,
                 -14.0 : 890,
                 -14.5 : 984,
                 -15.0 : 1134,
                 -15.5 : 1197
                }

undMoneyLineDict = {0 : 100,
                 -0.5 : 100,
                 -1.0 : 101, 
                 -1.5 : 105,
                 -2.0 : 108,
                 -2.5 : 117,
                 -3.0 : 135,
                 -3.5 : 156,
                 -4.0 : 171,
                 -4.5 : 181,
                 -5.0 : 188,
                 -5.5 : 195,
                 -6.0 : 208,
                 -6.5 : 224,
                 -7.0 : 249,
                 -7.5 : 268,
                 -8.0 : 282,
                 -8.5 : 302,
                 -9.0 : 314,
                 -9.5 : 327,
                 -10.0 : 356,
                 -10.5 : 389,
                 -11.0 : 406,
                 -11.5 : 450,
                 -12.0 : 458,
                 -12.5 : 466,
                 -13.0 : 494,
                 -13.5 : 539,
                 -14.0 : 573,
                 -14.5 : 615,
                 -15.0 : 677,
                 -15.5 : 702
                }

## Data Cleaning

In [747]:
#replacing blank with NaN
scoresDF = scoresDF.replace(r'^\s*$', np.nan, regex=True)

# removing rows from specific columns that have null values
scoresDF = scoresDF[(scoresDF.score_home.isnull() == False)]
scoresDF = scoresDF[(scoresDF.team_favorite_id.isnull() == False)]
scoresDF = scoresDF[(scoresDF.over_under_line.isnull() == False)]
scoresDF = scoresDF[(scoresDF.schedule_season >= 1990)]

#Reset Index
scoresDF.reset_index(drop=True, inplace=True)

#Change datatype of line to float
scoresDF['over_under_line'] = scoresDF.over_under_line.astype(float)

#Mapping Team names using teamsDF
scoresDF['team_home'] = scoresDF.team_home.map(teamsDF.set_index('team_name')['team_id'].to_dict())
scoresDF['team_away'] = scoresDF.team_away.map(teamsDF.set_index('team_name')['team_id'].to_dict())


scoresDF.loc[(scoresDF.schedule_season == 1968) & (scoresDF.schedule_week == 'Superbowl'), 'team_favorite_id'] = 'IND'
scoresDF.loc[(scoresDF.schedule_season == 1970) & (scoresDF.schedule_week == 'Superbowl'), 'team_favorite_id'] = 'IND'

#Creating home and away favorites and filling NaN  with 0's
scoresDF.loc[scoresDF.team_favorite_id == scoresDF.team_home, 'home_favorite'] = 1
scoresDF.loc[scoresDF.team_favorite_id == scoresDF.team_away, 'away_favorite'] = 1
scoresDF.home_favorite.fillna(0, inplace=True)
scoresDF.away_favorite.fillna(0, inplace=True)

#creating over-under column for stats summary
scoresDF.loc[((scoresDF.score_home + scoresDF.score_away) > scoresDF.over_under_line), 'over'] = 1
scoresDF.over.fillna(0, inplace=True)

# stadium neutral and schedule playoff as boolean
scoresDF['stadium_neutral'] = scoresDF.stadium_neutral.astype(int)
scoresDF['schedule_playoff'] = scoresDF.schedule_playoff.astype(int)

# change data type of date columns
scoresDF['schedule_date'] = pd.to_datetime(scoresDF['schedule_date'])

In [748]:
#Fixing specific errors in dataset
scoresDF.loc[(scoresDF.schedule_week == '18'), 'schedule_week'] = '17'
scoresDF.loc[(scoresDF.schedule_week == 'Wildcard') | (scoresDF.schedule_week == 'WildCard'), 'schedule_week'] = '18'
scoresDF.loc[(scoresDF.schedule_week == 'Division'), 'schedule_week'] = '19'
scoresDF.loc[(scoresDF.schedule_week == 'Conference'), 'schedule_week'] = '20'
scoresDF.loc[(scoresDF.schedule_week == 'Superbowl') | (scoresDF.schedule_week == 'SuperBowl'), 'schedule_week'] = '21'
scoresDF['schedule_week'] = scoresDF.schedule_week.astype(int)

In [749]:
#selecting on the columns for analysis
scoresDF = scoresDF[['schedule_date', 'schedule_season', 'schedule_week', 'team_home',
       'team_away', 'team_favorite_id', 'spread_favorite',
       'over_under_line', 'weather_temperature',
       'weather_wind_mph', 'score_home', 'score_away',
       'stadium_neutral', 'home_favorite', 'away_favorite',
       'over']]

In [750]:
#Fixing specific errors in dataset
scoresDF.loc[(scoresDF.schedule_date == '2016-09-19') & (scoresDF.team_home == 'MIN'), 'schedule_date'] = datetime.datetime(2016, 9, 18)
scoresDF.loc[(scoresDF.schedule_date == '2017-01-22') & (scoresDF.schedule_week == 21), 'schedule_date'] = datetime.datetime(2017, 2, 5)
scoresDF.loc[(scoresDF.schedule_date == '1990-01-27') & (scoresDF.schedule_week == 21), 'schedule_date'] = datetime.datetime(1990, 1, 28)
scoresDF.loc[(scoresDF.schedule_date == '1990-01-13'), 'schedule_date'] = datetime.datetime(1990, 1, 14)

In [751]:
scoresDF.dtypes

schedule_date          datetime64[ns]
schedule_season                 int64
schedule_week                   int32
team_home                      object
team_away                      object
team_favorite_id               object
spread_favorite               float64
over_under_line               float64
weather_temperature           float64
weather_wind_mph              float64
score_home                    float64
score_away                    float64
stadium_neutral                 int32
home_favorite                 float64
away_favorite                 float64
over                          float64
dtype: object

In [752]:
scoresDF

Unnamed: 0,schedule_date,schedule_season,schedule_week,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,weather_temperature,weather_wind_mph,score_home,score_away,stadium_neutral,home_favorite,away_favorite,over
0,1990-09-09,1990,1,ATL,TEN,ATL,-1.0,46.0,79.0,10.0,47.0,27.0,0,1.0,0.0,1.0
1,1990-09-09,1990,1,BUF,IND,BUF,-7.5,37.0,59.0,7.0,26.0,10.0,0,1.0,0.0,0.0
2,1990-09-09,1990,1,CHI,SEA,CHI,-6.0,37.0,73.0,7.0,17.0,0.0,0,1.0,0.0,0.0
3,1990-09-09,1990,1,CIN,NYJ,CIN,-9.0,42.0,72.0,9.0,25.0,20.0,0,1.0,0.0,1.0
4,1990-09-09,1990,1,CLE,PIT,CLE,-3.0,37.0,68.0,8.0,13.0,3.0,0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8375,2022-10-02,2022,4,NYG,CHI,NYG,-3.0,39.5,,,20.0,12.0,0,1.0,0.0,0.0
8376,2022-10-02,2022,4,PHI,JAX,PHI,-6.5,44.0,,,29.0,21.0,0,1.0,0.0,1.0
8377,2022-10-02,2022,4,PIT,NYJ,PIT,-3.0,41.0,,,20.0,24.0,0,1.0,0.0,1.0
8378,2022-10-02,2022,4,TB,KC,TB,-1.5,47.5,,,31.0,41.0,0,1.0,0.0,1.0


In [753]:
# creating result which shows if the home team won (1 if home team won, 0 if away team won)
scoresDF['result'] = (scoresDF.score_home > scoresDF.score_away).astype(int)

## Data Summary and Info

In [754]:
# summary statistics
scoresDF.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
schedule_season,8380.0,2006.005012,9.224118,1990.0,1998.0,2006.0,2014.0,2022.0
schedule_week,8380.0,9.509547,5.304503,1.0,5.0,10.0,14.0,21.0
spread_favorite,8380.0,-5.486038,3.491928,-26.5,-7.0,-4.5,-3.0,0.0
over_under_line,8380.0,42.496265,5.033493,28.0,38.5,42.5,46.0,63.5
weather_temperature,7623.0,60.540076,15.392066,-6.0,50.0,65.0,72.0,97.0
weather_wind_mph,7610.0,6.667148,5.68686,0.0,0.0,7.0,10.0,40.0
score_home,8380.0,22.861814,10.293636,0.0,16.0,23.0,30.0,62.0
score_away,8380.0,20.291289,10.070997,0.0,13.0,20.0,27.0,59.0
stadium_neutral,8380.0,0.009427,0.096641,0.0,0.0,0.0,0.0,1.0
home_favorite,8380.0,0.663842,0.472422,0.0,0.0,1.0,1.0,1.0


In [755]:
# some percentages to take into consideration when betting
home_win = "{:.2f}".format((sum((scoresDF.result == 1) & (scoresDF.stadium_neutral == 0)) / len(scoresDF)) * 100)
away_win = "{:.2f}".format((sum((scoresDF.result == 0) & (scoresDF.stadium_neutral == 0)) / len(scoresDF)) * 100)
under_line = "{:.2f}".format((sum((scoresDF.score_home + scoresDF.score_away) < scoresDF.over_under_line) / len(scoresDF)) * 100)
over_line = "{:.2f}".format((sum((scoresDF.score_home + scoresDF.score_away) > scoresDF.over_under_line) / len(scoresDF)) * 100)

favored = "{:.2f}".format((sum(((scoresDF.home_favorite == 1) & (scoresDF.result == 1)) | ((scoresDF.away_favorite == 1) & (scoresDF.result == 0)))
                           / len(scoresDF)) * 100)

cover = "{:.2f}".format((sum(((scoresDF.home_favorite == 1) & ((scoresDF.score_away - scoresDF.score_home) < scoresDF.spread_favorite)) | 
                             ((scoresDF.away_favorite == 1) & ((scoresDF.score_home - scoresDF.score_away) < scoresDF.spread_favorite))) 
                         / len(scoresDF)) * 100)

ats = "{:.2f}".format((sum(((scoresDF.home_favorite == 1) & ((scoresDF.score_away - scoresDF.score_home) > scoresDF.spread_favorite)) | 
                           ((scoresDF.away_favorite == 1) & ((scoresDF.score_home - scoresDF.score_away) > scoresDF.spread_favorite))) 
                       / len(scoresDF)) * 100)

In [756]:
# print all percentages
print("Number of Games: " + str(len(scoresDF)))
print("Home Straight Up Win Percentage: " + home_win + "%")
print("Away Straight Up Win Percentage: " + away_win + "%")
print("Under Percentage: " + under_line + "%")
print("Over Percentage: " + over_line + "%")
print("Favored Win Percentage: " + favored + "%")
print("Cover The Spread Percentage: " + cover + "%")
print("Against The Spread Percentage: " + ats + "%")

Number of Games: 8380
Home Straight Up Win Percentage: 57.06%
Away Straight Up Win Percentage: 41.99%
Under Percentage: 49.94%
Over Percentage: 48.28%
Favored Win Percentage: 65.87%
Cover The Spread Percentage: 46.61%
Against The Spread Percentage: 49.57%


## Feature Manipulation

In [757]:
# creating 2 separate dataframes with the home teams / scores and the away teams / scores
score = scoresDF.groupby(['schedule_season', 'schedule_week', 'team_home']).mean()[['score_home', 'score_away']].reset_index()
aw_score = scoresDF.groupby(['schedule_season', 'schedule_week', 'team_away']).mean()[['score_home', 'score_away']].reset_index()

# create total pts column
score['point_diff'] = score.score_home - score.score_away
aw_score['point_diff'] = aw_score.score_away - aw_score.score_home

# append the two dataframes
score = score.append(aw_score, ignore_index=True, sort=True)

# fill null values
score.team_home.fillna(score.team_away, inplace=True)

# sort by season and week 
score.sort_values(['schedule_season', 'schedule_week'], ascending = [True, True], inplace=True)

# removing unneeded columns & changing column name 
score = score[['schedule_season', 'schedule_week', 'team_home', 'point_diff']]
score.rename(columns={'team_home' : 'team'}, inplace=True)

In [758]:
tm_dict = {}
for key in score.team.unique():
    tm_dict[key] = score[score.team == key].reset_index(drop=True)

In [759]:
pts_diff = pd.DataFrame()

# for loop to create a rolling average of the previous games for each season
for yr in score.schedule_season.unique():
    for tm in score.team.unique():
        data = tm_dict[tm].copy()
        data = data[data.schedule_season == yr]
        
        data.loc[:, 'avg_pts_diff'] = data.point_diff.shift().expanding().mean()
        
        pts_diff = pts_diff.append(data)

In [760]:
# merging to scoresDF and changing column names
scoresDF = scoresDF.merge(pts_diff[['schedule_season', 'schedule_week', 'team', 'avg_pts_diff']], 
              left_on=['schedule_season', 'schedule_week', 'team_home'], right_on=['schedule_season', 'schedule_week', 'team'],
              how='left')

scoresDF.rename(columns={'avg_pts_diff' : 'hm_avg_pts_diff'}, inplace=True)

scoresDF = scoresDF.merge(pts_diff[['schedule_season', 'schedule_week', 'team', 'avg_pts_diff']], 
              left_on=['schedule_season', 'schedule_week', 'team_away'], right_on=['schedule_season', 'schedule_week', 'team'],
              how='left')

scoresDF.rename(columns={'avg_pts_diff' : 'aw_avg_pts_diff'}, inplace=True)

In [761]:
total_season = pts_diff.groupby(['schedule_season', 'team']).mean()['point_diff'].reset_index()

In [762]:
# adding schedule week for merge and adding one to the season for predictions
total_season['schedule_week'] = 1
total_season['schedule_season'] += 1

In [763]:
# cleaning of columns
scoresDF = scoresDF[['schedule_date', 'schedule_season', 'schedule_week', 'team_home',
       'team_away', 'team_favorite_id', 'spread_favorite', 'over_under_line',
       'weather_temperature', 'weather_wind_mph', 'score_home', 'score_away', 'stadium_neutral', 'home_favorite',
       'away_favorite', 'hm_avg_pts_diff','aw_avg_pts_diff', 'over', 'result']]

In [764]:
scoresDF = scoresDF.merge(total_season[['schedule_season', 'schedule_week', 'team', 'point_diff']], 
              left_on=['schedule_season', 'schedule_week', 'team_home'], right_on=['schedule_season', 'schedule_week', 'team'],
              how='left')

scoresDF.rename(columns={'point_diff' : 'hm_avg_diff'}, inplace=True)

scoresDF = scoresDF.merge(total_season[['schedule_season', 'schedule_week', 'team', 'point_diff']], 
              left_on=['schedule_season', 'schedule_week', 'team_away'], right_on=['schedule_season', 'schedule_week', 'team'],
              how='left')

scoresDF.rename(columns={'point_diff' : 'aw_avg_diff'}, inplace=True)

# fill null values
scoresDF.hm_avg_pts_diff.fillna(scoresDF.hm_avg_diff, inplace=True)
scoresDF.aw_avg_pts_diff.fillna(scoresDF.aw_avg_diff, inplace=True)

In [765]:
# cleaning of columns
scoresDF = scoresDF[['schedule_date', 'schedule_season', 'schedule_week', 'team_home',
       'team_away', 'team_favorite_id', 'spread_favorite', 'over_under_line',
       'weather_temperature', 'weather_wind_mph', 'score_home', 'score_away', 'stadium_neutral', 'home_favorite',
       'away_favorite', 'hm_avg_pts_diff','aw_avg_pts_diff','over', 'result']]

moneyLineListFav = []
moneyLineListUnd = []

#Calculating theoretical winnings on a favorite (if bet is eventually won)
for i in scoresDF['spread_favorite']:
    if i < -15.5:
        moneyLineListFav.append(10 * (100/1200))
    else: 
        value = 10 * (100 / favMoneyLineDict[i])
        moneyLineListFav.append(value)
    
scoresDF['moneyLine_favorite_winnings'] = moneyLineListFav

#Calculating theoretical winnings on an underdog (if bet is eventually won)
for i in scoresDF['spread_favorite']:
    if i < -15.5:
        moneyLineListUnd.append(10 * (1200/100))
    else:
        value = 10 * (undMoneyLineDict[i] / 100)
        moneyLineListUnd.append(value)
        
scoresDF['moneyLine_underdog_winnings'] = moneyLineListUnd
        

In [766]:
# removing all rows with null values
scoresDF = scoresDF.dropna(how='any',axis=0) 

In [767]:
scoresDF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7727 entries, 14 to 8517
Data columns (total 21 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   schedule_date                7727 non-null   datetime64[ns]
 1   schedule_season              7727 non-null   int64         
 2   schedule_week                7727 non-null   int32         
 3   team_home                    7727 non-null   object        
 4   team_away                    7727 non-null   object        
 5   team_favorite_id             7727 non-null   object        
 6   spread_favorite              7727 non-null   float64       
 7   over_under_line              7727 non-null   float64       
 8   weather_temperature          7727 non-null   float64       
 9   weather_wind_mph             7727 non-null   float64       
 10  score_home                   7727 non-null   float64       
 11  score_away                   7727 non-null

## Feature and Model Testing

In [768]:
# initial features possible for model
X = scoresDF[['schedule_season', 'schedule_week', 'over_under_line', 'spread_favorite', 'weather_temperature', 'weather_wind_mph',
        'home_favorite', 'hm_avg_pts_diff','aw_avg_pts_diff']]

y = scoresDF['result']

In [769]:
# base model
base = LDA()

# choose 4 best features
rfe = RFE(estimator=LDA(), n_features_to_select=4)
rfe = rfe.fit(X, y)

# features
print(rfe.support_)
print(rfe.ranking_)

[False False False  True False False  True  True  True]
[3 4 6 1 5 2 1 1 1]


In [770]:
# best 4 features chosen by the RFE base model
final_x = scoresDF[['spread_favorite', 'home_favorite', 'hm_avg_pts_diff','aw_avg_pts_diff']]

In [771]:
# prepare models
models = []

models.append(('LRG', LogisticRegression(solver='liblinear')))
models.append(('KNB', KNeighborsClassifier()))
models.append(('GNB', GaussianNB()))
models.append(('XGB', xgb.XGBClassifier(random_state=0)))
models.append(('RFC', RandomForestClassifier(random_state=0, n_estimators=100)))
models.append(('DTC', DecisionTreeClassifier(random_state=0, criterion='entropy', max_depth=5)))

# evaluate each model by average and standard deviations of roc auc 
results = []
names = []

for name, m in models:
    kfold = model_selection.KFold(n_splits=5)
    cv_results = model_selection.cross_val_score(m, final_x, y, cv=kfold, scoring = 'roc_auc')
    results.append(cv_results)
    names.append(name)
    msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
    print(msg)

LRG: 0.684716 (0.013185)
KNB: 0.626486 (0.009369)
GNB: 0.682995 (0.013449)
XGB: 0.662390 (0.009068)
RFC: 0.647676 (0.005860)
DTC: 0.698483 (0.012440)


In [772]:
# training and testing data (2017 and 2018)
train = scoresDF.copy()
test = scoresDF.copy()
train = train.loc[train['schedule_season'] < 2017]
test = test.loc[test['schedule_season'] > 2016]
X_train = train[['over_under_line', 'spread_favorite', 'home_favorite', 'hm_avg_pts_diff', 'aw_avg_pts_diff']]
y_train = train['result']
X_test = test[['over_under_line', 'spread_favorite', 'home_favorite', 'hm_avg_pts_diff','aw_avg_pts_diff']]
y_test = test['result']

In [773]:
test

Unnamed: 0,schedule_date,schedule_season,schedule_week,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,weather_temperature,weather_wind_mph,...,score_away,stadium_neutral,home_favorite,away_favorite,hm_avg_pts_diff,aw_avg_pts_diff,over,result,moneyLine_favorite_winnings,moneyLine_underdog_winnings
7025,2017-09-07,2017,1,NE,KC,NE,-9.0,48.5,63.0,7.0,...,42.0,0,1.0,0.0,12.315789,4.470588,1.0,0,2.403846,31.4
7026,2017-09-10,2017,1,BUF,NYJ,BUF,-9.5,40.0,61.0,5.0,...,12.0,0,1.0,0.0,1.312500,-8.375000,0.0,1,2.293578,32.7
7027,2017-09-10,2017,1,CHI,ATL,ATL,-7.0,49.5,66.0,9.0,...,23.0,0,0.0,1.0,-7.500000,8.789474,0.0,0,3.134796,24.9
7028,2017-09-10,2017,1,CIN,BAL,CIN,-3.0,42.5,71.0,8.0,...,20.0,0,1.0,0.0,0.625000,1.375000,0.0,0,6.097561,13.5
7029,2017-09-10,2017,1,CLE,PIT,PIT,-9.0,47.0,67.0,9.0,...,21.0,0,0.0,1.0,-11.750000,3.842105,0.0,0,2.403846,31.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8512,2022-10-02,2022,4,DAL,WAS,DAL,-3.0,41.0,72.0,0.0,...,10.0,0,1.0,0.0,-2.000000,-6.333333,0.0,1,6.097561,13.5
8513,2022-10-02,2022,4,DET,SEA,DET,-3.5,48.5,72.0,0.0,...,48.0,0,1.0,0.0,0.666667,-7.666667,1.0,0,5.235602,15.6
8515,2022-10-02,2022,4,HOU,LAC,LAC,-5.5,45.0,72.0,0.0,...,34.0,0,0.0,1.0,-3.333333,-8.666667,1.0,0,4.098361,19.5
8516,2022-10-02,2022,4,IND,TEN,IND,-3.5,43.0,72.0,0.0,...,24.0,0,1.0,0.0,-7.000000,-11.000000,0.0,0,5.235602,15.6


In [774]:
# calibrate probabilities and fit model to training data
boost = xgb.XGBClassifier()
dtc = DecisionTreeClassifier(max_depth=5, criterion='entropy')
lrg = LogisticRegression(solver='liblinear')
vote = VotingClassifier(estimators=[('boost', boost), ('dtc', dtc), ('lrg', lrg)], voting='soft')

model = CCV(vote, method='isotonic', cv=3)
model.fit(X_train, y_train)

In [775]:
# predict probabilities
predicted = model.predict_proba(X_test)[:,1]

In [776]:
test

Unnamed: 0,schedule_date,schedule_season,schedule_week,team_home,team_away,team_favorite_id,spread_favorite,over_under_line,weather_temperature,weather_wind_mph,...,score_away,stadium_neutral,home_favorite,away_favorite,hm_avg_pts_diff,aw_avg_pts_diff,over,result,moneyLine_favorite_winnings,moneyLine_underdog_winnings
7025,2017-09-07,2017,1,NE,KC,NE,-9.0,48.5,63.0,7.0,...,42.0,0,1.0,0.0,12.315789,4.470588,1.0,0,2.403846,31.4
7026,2017-09-10,2017,1,BUF,NYJ,BUF,-9.5,40.0,61.0,5.0,...,12.0,0,1.0,0.0,1.312500,-8.375000,0.0,1,2.293578,32.7
7027,2017-09-10,2017,1,CHI,ATL,ATL,-7.0,49.5,66.0,9.0,...,23.0,0,0.0,1.0,-7.500000,8.789474,0.0,0,3.134796,24.9
7028,2017-09-10,2017,1,CIN,BAL,CIN,-3.0,42.5,71.0,8.0,...,20.0,0,1.0,0.0,0.625000,1.375000,0.0,0,6.097561,13.5
7029,2017-09-10,2017,1,CLE,PIT,PIT,-9.0,47.0,67.0,9.0,...,21.0,0,0.0,1.0,-11.750000,3.842105,0.0,0,2.403846,31.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8512,2022-10-02,2022,4,DAL,WAS,DAL,-3.0,41.0,72.0,0.0,...,10.0,0,1.0,0.0,-2.000000,-6.333333,0.0,1,6.097561,13.5
8513,2022-10-02,2022,4,DET,SEA,DET,-3.5,48.5,72.0,0.0,...,48.0,0,1.0,0.0,0.666667,-7.666667,1.0,0,5.235602,15.6
8515,2022-10-02,2022,4,HOU,LAC,LAC,-5.5,45.0,72.0,0.0,...,34.0,0,0.0,1.0,-3.333333,-8.666667,1.0,0,4.098361,19.5
8516,2022-10-02,2022,4,IND,TEN,IND,-3.5,43.0,72.0,0.0,...,24.0,0,1.0,0.0,-7.000000,-11.000000,0.0,0,5.235602,15.6


## Results of MoneyLine Betting

In [777]:
#create column for probabilty that the home team wins
test.loc[:,'hm_prob'] = predicted
test = test[['schedule_season', 'schedule_week', 'team_home', 'team_away', 'spread_favorite','hm_prob', 'result','moneyLine_favorite_winnings','moneyLine_underdog_winnings']]

In [778]:
# calulate bets won (only make a bet when probability is greater than / equal to 60% or less than / equal to 40%)
test['my_bet_won'] = ((((test.hm_prob >= 0.60) & (test.moneyLine_favorite_winnings >= 5.00))|(((test.hm_prob >= 0.40) & (test.hm_prob<=0.50)) & (test.moneyLine_underdog_winnings >= 13.00)))&(test.result == 1)).astype(int)

# calulate bets lost (only make a bet when probability is greater than / equal to 60% or less than / equal to 40%)
test['my_bet_lost'] =((((test.hm_prob >= 0.60) & (test.moneyLine_favorite_winnings >= 5.00))|(((test.hm_prob >= 0.40) & (test.hm_prob<=0.50)) & (test.moneyLine_underdog_winnings >= 13.00)))&(test.result == 0)).astype(int)

actualWinningList = []

#Changing spread column to give better insight on who is favored, and by how much
for i in range(len(test['spread_favorite'])):
    if test['hm_prob'].iloc[i] > 0.5:
        test['spread_favorite'].iloc[i] = str(test['team_home'].iloc[i]) + str(test['spread_favorite'].iloc[i])
    else:
         test['spread_favorite'].iloc[i] = str(test['team_away'].iloc[i]) + str(test['spread_favorite'].iloc[i])
        
    


#determining actual winnings
for i in range(len(test['my_bet_won'])):
    #if the bet is won
    if test['my_bet_won'].iloc[i] == 1:
        #if the home teams prob to win is greater than the away teams prob to win
        if test['hm_prob'].iloc[i] > 0.5:
            #select winnings for betting the favorite (+10 is for the initial 10 dollars in the bet)
            actualWinningList.append(test['moneyLine_favorite_winnings'].iloc[i] + 10)
        else: 
            #select winnings for betting the underdog 
            actualWinningList.append(test['moneyLine_underdog_winnings'].iloc[i]+ 10)
    else:
        actualWinningList.append(0)

test['actual_winnings'] = actualWinningList

In [779]:
test

Unnamed: 0,schedule_season,schedule_week,team_home,team_away,spread_favorite,hm_prob,result,moneyLine_favorite_winnings,moneyLine_underdog_winnings,my_bet_won,my_bet_lost,actual_winnings
7025,2017,1,NE,KC,NE-9.0,0.745148,0,2.403846,31.4,0,0,0.0
7026,2017,1,BUF,NYJ,BUF-9.5,0.788846,1,2.293578,32.7,0,0,0.0
7027,2017,1,CHI,ATL,ATL-7.0,0.443243,0,3.134796,24.9,0,1,0.0
7028,2017,1,CIN,BAL,CIN-3.0,0.584922,0,6.097561,13.5,0,0,0.0
7029,2017,1,CLE,PIT,PIT-9.0,0.294657,0,2.403846,31.4,0,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
8512,2022,4,DAL,WAS,DAL-3.0,0.567204,1,6.097561,13.5,0,0,0.0
8513,2022,4,DET,SEA,DET-3.5,0.604812,0,5.235602,15.6,0,1,0.0
8515,2022,4,HOU,LAC,LAC-5.5,0.340222,0,4.098361,19.5,0,0,0.0
8516,2022,4,IND,TEN,IND-3.5,0.609503,0,5.235602,15.6,0,1,0.0


In [780]:
moneyPlaced = (test.my_bet_lost.sum() + test.my_bet_won.sum()) * 10
betsWon = (round(test.my_bet_won.sum() / (test.my_bet_lost.sum() + test.my_bet_won.sum()),4))

#overall Results
print("My Model Win Percentage: " + str(betsWon))
print("Total Number of Bets Won: " + str(test.my_bet_won.sum()))
print("Total Number of Bets Made: " + str((test.my_bet_lost.sum() + test.my_bet_won.sum())))
print("Possible Games: " + str(len(test['schedule_week'])))
print("Gross Winnings:" + str(round(test.actual_winnings.sum(),2)))
print("Amount of Money Bet:" + str(moneyPlaced) + ".00")
print("Net Winnings:" + str(round(test.actual_winnings.sum() - moneyPlaced,2)))

My Model Win Percentage: 0.5215
Total Number of Bets Won: 85
Total Number of Bets Made: 163
Possible Games: 935
Gross Winnings:1588.12
Amount of Money Bet:1630.00
Net Winnings:-41.88


In [781]:
results_df = test.groupby(['schedule_season', 'schedule_week']).agg({'team_home' : 'count', 'my_bet_won' : 'sum', 'my_bet_lost' : 'sum','actual_winnings': 'sum'}).reset_index().rename(columns={'team_home' : 'total_games'})
results_df['total_bets'] = results_df.my_bet_won + results_df.my_bet_lost
results_df['bet_accuracy'] = round((results_df.my_bet_won / results_df.total_bets) * 100, 2)
results_df['winnings_for_the_week'] = results_df.actual_winnings - (results_df.total_bets * 10)

results_df = results_df[['schedule_season', 'schedule_week','my_bet_won','my_bet_lost',
                         'total_bets', 'total_games',  'bet_accuracy','winnings_for_the_week']]
#week by week results
results_df

Unnamed: 0,schedule_season,schedule_week,my_bet_won,my_bet_lost,total_bets,total_games,bet_accuracy,winnings_for_the_week
0,2017,1,1,2,3,15,33.33,-14.764398
1,2017,2,0,2,2,14,0.00,-20.000000
2,2017,3,4,3,7,16,57.14,27.997561
3,2017,4,2,3,5,16,40.00,-17.804878
4,2017,5,1,1,2,14,50.00,3.500000
...,...,...,...,...,...,...,...,...
100,2021,21,0,0,0,1,,0.000000
101,2022,1,1,0,1,6,100.00,5.235602
102,2022,2,0,0,0,5,,0.000000
103,2022,3,0,0,0,3,,0.000000
