# Project Overview

The goal of this project is to do two things: 
1. Analyze the reliability and accuracy of [fivethirtyeight's][https://projects.fivethirtyeight.com/2022-nba-predictions/?ex_cid=rrpromo] NBA predictions against historical Vegas odds

2. Answer the controversial question: Is using the [Kelly Criterion][https://en.wikipedia.org/wiki/Kelly_criterion] an effective strategy in sports betting?

This project was inspired by an idea I had (but never tried) to use fivethirtyeight's predictions combined with the kelly criterion formula to identify a betting strategy that would be probable over the long term. Based on the current data, I do not think this would be a profittable strategy when using fivethirtyeight's numbers, but the bigger question I would like to answer is this: 

#### Given a random sample of 1000 historical games, does the kelly criterion out-perform simply betting \\$100 on every game and if so (as I suspect the answer is yes), what is the minimum percentage of correct bets needed to generate profit?

[https://projects.fivethirtyeight.com/2022-nba-predictions/?ex_cid=rrpromo]:https://projects.fivethirtyeight.com/2022-nba-predictions/?ex_cid=rrpromo
[https://en.wikipedia.org/wiki/Kelly_criterion]:https://en.wikipedia.org/wiki/Kelly_criterion

The data used here is from two sources: 538's predictions going back to the 1950's and [historical moneylines from Westgate in Vegas][https://sportsbookreviewsonline.com/scoresoddsarchives/nba/nbaoddsarchives.htm] going back to 2007. 

[https://sportsbookreviewsonline.com/scoresoddsarchives/nba/nbaoddsarchives.htm]:https://sportsbookreviewsonline.com/scoresoddsarchives/nba/nbaoddsarchives.htm

In [4]:
import pandas as pd
import numpy as np
from pandas import ExcelWriter
from pandas import ExcelFile
import csv
import os
from csv import reader
import datetime as dt

# Loading the datasets

In [154]:
url = 'https://projects.fivethirtyeight.com/nba-model/nba_elo.csv'
df_538 = pd.read_csv(url)

df_538.tail()

Unnamed: 0,date,season,neutral,playoff,team1,team2,elo1_pre,elo2_pre,elo_prob1,elo_prob2,...,carm-elo2_post,raptor1_pre,raptor2_pre,raptor_prob1,raptor_prob2,score1,score2,quality,importance,total_rating
71945,2022-04-09,2022,0,,MEM,BOS,1600.706927,1553.099891,0.700503,0.299497,...,,1613.725246,1601.258409,0.591155,0.408845,,,87,31.0,59.0
71946,2022-04-09,2022,0,,MIN,CHI,1542.135198,1516.418892,0.673417,0.326583,...,,1559.730061,1553.193522,0.70899,0.29101,,,72,35.0,54.0
71947,2022-04-09,2022,0,,ORL,MIA,1296.276081,1602.627919,0.233642,0.766358,...,,1340.285813,1594.677065,0.33826,0.66174,,,33,2.0,18.0
71948,2022-04-09,2022,0,,CHO,WAS,1529.142381,1478.178616,0.704541,0.295459,...,,1508.14813,1483.416546,0.668481,0.331519,,,49,85.0,67.0
71949,2022-04-09,2022,0,,NOP,GSW,1457.055962,1646.892445,0.373521,0.626479,...,,1518.67065,1607.068099,0.506754,0.493246,,,74,12.0,43.0


In [155]:
df_538 = df_538.drop(['elo1_pre','elo2_pre','elo1_post','elo2_post',
                      'carm-elo1_pre','carm-elo2_pre','carm-elo1_post',
                      'carm-elo2_post','carm-elo2_post','raptor1_pre',
                      'raptor2_pre','quality','importance','total_rating'], axis=1)

In [156]:
df_538.head()

Unnamed: 0,date,season,neutral,playoff,team1,team2,elo_prob1,elo_prob2,carm-elo_prob1,carm-elo_prob2,raptor_prob1,raptor_prob2,score1,score2
0,1946-11-01,1947,0,,TRH,NYK,0.640065,0.359935,,,,,66.0,68.0
1,1946-11-02,1947,0,,DTF,WSC,0.640065,0.359935,,,,,33.0,50.0
2,1946-11-02,1947,0,,PRO,BOS,0.640065,0.359935,,,,,59.0,53.0
3,1946-11-02,1947,0,,STB,PIT,0.640065,0.359935,,,,,56.0,51.0
4,1946-11-02,1947,0,,CHS,NYK,0.631101,0.368899,,,,,63.0,47.0


In [15]:
#example of raw dataset for one season
df_spread = pd.read_excel('nba odds 2019-20.xlsx')
df_spread.head(3)

Unnamed: 0,Date,Rot,VH,Team,1st,2nd,3rd,4th,Final,Open,Close,ML,2H
0,1022,501,V,NewOrleans,30,31,25,31,122,231.5,229.5,230,113
1,1022,502,H,Toronto,27,29,32,29,130,6.5,6.5,-280,6
2,1022,503,V,LALakers,25,29,31,17,102,227.0,3.5,-180,5


In [14]:
#get current directory where files are stored
# cwd = os.path.abspath('') 
cwd = os.chdir(r'C:\Users\Sarah Pierce\Documents\betting data')
files = os.listdir(cwd) 
#splice file list to only show the ones we're interested in
nba_files = []

for file in files:
     if file.startswith('nba'):
        nba_files.append(file)

nba_files

['nba odds 2007-08.xlsx',
 'nba odds 2008-09.xlsx',
 'nba odds 2009-10.xlsx',
 'nba odds 2010-11.xlsx',
 'nba odds 2011-12.xlsx',
 'nba odds 2012-13.xlsx',
 'nba odds 2013-14.xlsx',
 'nba odds 2014-15.xlsx',
 'nba odds 2015-16.xlsx',
 'nba odds 2016-17.xlsx',
 'nba odds 2017-18.xlsx',
 'nba odds 2018-19.xlsx',
 'nba odds 2019-20.xlsx',
 'nba odds 2020-21.xlsx',
 'nba odds 2021-22.xlsx']

In [159]:
df_spread = pd.DataFrame()
date_year = []

#create a loop to add the appropriate year to the date column in each file
for file in nba_files:
    df_file = pd.read_excel(file)
    date = df_file['Date'].astype(str)
    first_year = file[9:13]
    second_year = file[9:11] + file[14:16]
    
    for day in date:
        if len(day) == 4:
            new_date = day + first_year 
            date_year.append(new_date)
        else:
            new_date = '0' + day + second_year
            date_year.append(new_date) 
print(len(date_year))
        
#combine all files into a master file
for file in nba_files:
    df_spread = df_spread.append(pd.read_excel(file), ignore_index=True) 
print(len(df_spread['Date']))

df_spread['Date'] = date_year

#check to see if it worked
print(df_spread.head())
print(df_spread.tail())

36886
36886
       Date  Rot VH         Team  1st  2nd  3rd  4th  Final   Open  Close  \
0  10302007  501  V     Portland   26   23   28   20     97    184  189.5   
1  10302007  502  H   SanAntonio   29   30   22   25    106   12.5     13   
2  10302007  503  V         Utah   28   34   24   31    117  214.5    212   
3  10302007  504  H  GoldenState   30   21   21   24     96      3      1   
4  10302007  505  V      Houston   16   27   27   25     95    2.5      5   

      ML     2H  
0    900     95  
1  -1400      5  
2    100  105.5  
3   -120      3  
4   -230      3  
           Date  Rot VH        Team  1st  2nd  3rd  4th  Final  Open Close  \
36881  01112022  550  H  NewOrleans   35   30   32   31    128   228   226   
36882  01112022  551  V      Denver   18   23   25   19     85   1.5   2.5   
36883  01112022  552  H  LAClippers   16   12   27   32     87   218   213   
36884  01112022  587  V     Detroit   27   25   14   21     87   221   220   
36885  01112022  588  H    

In [160]:
df_spread = df_spread.drop(['Rot','1st', '2nd', '3rd', '4th','Open','Close','2H'], axis=1)
df_spread.head(3)

Unnamed: 0,Date,VH,Team,Final,ML
0,10302007,V,Portland,97,900
1,10302007,H,SanAntonio,106,-1400
2,10302007,V,Utah,117,100


# Data Cleaning and Wrangling
This dataframe needs a lot of work in order to sync well with the 538 data. Here are some of issues that need to be addressed:
1. Date needs to be formatted in datetime YYYY-MM-DD
2. Team names need to be renamed to match 538
3. Individual games are currently on 2 rows but need to be on individual rows

Once all of these problems are addressed, we can combine the datasets. 

In [161]:
#dataset duration = Oct 30, 2007 to Jan 11, 2022
print(df_spread['Date'].head(1))
print(df_spread['Date'].tail(1))

0    10302007
Name: Date, dtype: object
36885    01112022
Name: Date, dtype: object


In [162]:
#convert to datetime
df_spread['Date'] = df_spread['Date'].astype(str)
df_spread['Date'] = pd.to_datetime(df_spread['Date'],format='%m%d%Y').dt.date.astype(str)

print(df_spread.head(3))
print(df_spread.tail(3))

         Date VH        Team  Final     ML
0  2007-10-30  V    Portland     97    900
1  2007-10-30  H  SanAntonio    106  -1400
2  2007-10-30  V        Utah    117    100
             Date VH        Team  Final     ML
36883  2022-01-11  H  LAClippers     87    125
36884  2022-01-11  V     Detroit     87    750
36885  2022-01-11  H     Chicago    133  -1200


In [163]:
#filter the 538 dataset for the 2020 season and reset index
df_season = df_538.loc[(df_538['season'] > 2007)]
df_season = df_season.reset_index(drop=True)

#get the unique values for team names in each dataset
print(len(df_season['team1'].unique()))
print(len(df_spread['Team'].unique()))
print(df_season['team1'].unique())
print(df_spread['Team'].unique())
#'OklahomaCity' and 'Oklahoma City', 'LAClippers' and 'LA Clippers' account for the unexpected 34

32
34
['SAS' 'LAL' 'GSW' 'CLE' 'NOP' 'IND' 'TOR' 'MEM' 'NJN' 'DEN' 'ORL' 'UTA'
 'SEA' 'MIA' 'BOS' 'PHO' 'CHI' 'ATL' 'MIN' 'LAC' 'CHO' 'MIL' 'WAS' 'DAL'
 'PHI' 'HOU' 'NYK' 'DET' 'SAC' 'POR' 'OKC' 'BRK']
['Portland' 'SanAntonio' 'Utah' 'GoldenState' 'Houston' 'LALakers'
 'Philadelphia' 'Toronto' 'Washington' 'Indiana' 'Milwaukee' 'Orlando'
 'Chicago' 'NewJersey' 'Dallas' 'Cleveland' 'Memphis' 'Sacramento'
 'NewOrleans' 'Seattle' 'Denver' 'Detroit' 'Miami' 'Phoenix' 'Charlotte'
 'Atlanta' 'NewYork' 'Boston' 'Minnesota' 'LAClippers' 'OklahomaCity'
 'Brooklyn' 'Oklahoma City' 'LA Clippers']


In [164]:
print(df_spread.shape)
print(df_season.shape)

(36886, 5)
(19065, 14)


In [165]:
#create dictionary to map names
team_names = {'NewOrleans': 'NOP', 
              'Seattle': 'SEA',
              'Toronto': 'TOR',
              'LALakers': 'LAL',
              'LAClippers': 'LAC',
              'LA Clippers': 'LAC',
              'NewJersey': 'NJN',
              'Detroit': 'DET',
              'Indiana': 'IND',
              'Cleveland': 'CLE', 
              'Orlando': 'ORL',
              'Chicago': 'CHI', 
              'Charlotte': 'CHO',
              'Boston': 'BOS',
              'Philadelphia': 'PHI', 
              'Memphis': 'MEM',
              'Miami': 'MIA',
              'Minnesota': 'MIN',
              'Brooklyn': 'BRK',
              'NewYork': 'NYK',
              'SanAntonio': 'SAS',
              'Washington': 'WAS',
              'Dallas': 'DAL',
              'OklahomaCity': 'OKC',
              'Oklahoma City': 'OKC',
              'Utah': 'UTA',
              'Sacramento': 'SAC',
              'Phoenix': 'PHO',
              'Denver': 'DEN',
              'Portland': 'POR',
              'Atlanta': 'ATL',
              'Milwaukee': 'MIL',
              'Houston': 'HOU',
              'GoldenState': 'GSW'}

In [166]:
#replace names with dictionary keys
df_spread = df_spread.replace({"Team": team_names})

#check to see if it worked
df_spread.head()

Unnamed: 0,Date,VH,Team,Final,ML
0,2007-10-30,V,POR,97,900
1,2007-10-30,H,SAS,106,-1400
2,2007-10-30,V,UTA,117,100
3,2007-10-30,H,GSW,96,-120
4,2007-10-30,V,HOU,95,-230


In [167]:
# get games on the same row
df_spread_A = df_spread.iloc[::2, :]
df_spread_A = df_spread_A.reset_index(drop=True)
df_spread_A.rename(columns = {'Date':'Date2',
                              'VH':'V',
                              'Team':'Away_Team',
                              'Final':'Away_Final',
                              'ML':'Away_ML'}, inplace=True)
df_spread_H = df_spread.iloc[1::2, :]
df_spread_H = df_spread_H.reset_index(drop=True)
df_spread_H.rename(columns={'Date':'Date1',
                            'VH':'H',
                            'Team':'Home_Team',
                            'Final':'Home_Final',
                            'ML':'Home_ML'}, inplace=True)
print(df_spread_A.shape)
print(df_spread_H.shape)

df_ml = pd.concat([df_spread_H, df_spread_A], axis=1)
df_ml.head()

(18443, 5)
(18443, 5)


Unnamed: 0,Date1,H,Home_Team,Home_Final,Home_ML,Date2,V,Away_Team,Away_Final,Away_ML
0,2007-10-30,H,SAS,106,-1400,2007-10-30,V,POR,97,900
1,2007-10-30,H,GSW,96,-120,2007-10-30,V,UTA,117,100
2,2007-10-30,H,LAL,93,190,2007-10-30,V,HOU,95,-230
3,2007-10-31,H,TOR,106,-305,2007-10-31,V,PHI,97,255
4,2007-10-31,H,IND,119,105,2007-10-31,V,WAS,110,-125


In [168]:
# one game is out of sync
problem = df_ml['Date1'] != df_ml['Date2'] 
df_ml[problem]
# a google search shows that this game occurred on the 29th

Unnamed: 0,Date1,H,Home_Team,Home_Final,Home_ML,Date2,V,Away_Team,Away_Final,Away_ML
4640,2011-01-30,H,LAC,103,-275,2011-01-29,V,CHO,88,235


In [169]:
df_ml.iloc[4640,0] = '2011-01-29'
df_ml.loc[4638:4642]

Unnamed: 0,Date1,H,Home_Team,Home_Final,Home_ML,Date2,V,Away_Team,Away_Final,Away_ML
4638,2011-01-29,H,DAL,102,-220,2011-01-29,V,ATL,91,180
4639,2011-01-29,H,SAC,102,175,2011-01-29,V,NOP,96,-210
4640,2011-01-29,H,LAC,103,-275,2011-01-29,V,CHO,88,235
4641,2011-01-30,H,OKC,103,-115,2011-01-30,V,MIA,108,-105
4642,2011-01-30,H,LAL,96,-150,2011-01-30,V,BOS,109,130


In [170]:
problem = df_ml['Date1'] != df_ml['Date2'] 
df_ml[problem]
# problem resolved

Unnamed: 0,Date1,H,Home_Team,Home_Final,Home_ML,Date2,V,Away_Team,Away_Final,Away_ML


In [171]:
df_ml = df_ml[['Date1','Home_Team','Away_Team','Home_Final','Away_Final','Home_ML','Away_ML']]
df_ml.head()

Unnamed: 0,Date1,Home_Team,Away_Team,Home_Final,Away_Final,Home_ML,Away_ML
0,2007-10-30,SAS,POR,106,97,-1400,900
1,2007-10-30,GSW,UTA,96,117,-120,100
2,2007-10-30,LAL,HOU,93,95,190,-230
3,2007-10-31,TOR,PHI,106,97,-305,255
4,2007-10-31,IND,WAS,119,110,105,-125


### Beautiful!!!
All thats left now is to append these rows to the 538 dataframe. Since the rows are in different orders in both dataframes depending on the date, I'll need to create some logic that sorts it for me. 

This is going to require some creativity since I have multiples of each value... My plan is to create a new column in both datasets called "date_team" by combining two columns (giving a unique value for each row in the new column), sort both dataframes by "date_team", and double-check to ensure the order is the same. Then if both dataframes are in the same order: combine and delete the "date_team" column.

But first, let's check one more thing...

In [172]:
# Our money line data is coming up short by 622 rows. 
# The df_season dataset includes games all the way until the end of the 2022 season. 
# Lets see if this is where our missing data is. 

print(df_ml.shape)
print(df_season.shape)

(18443, 7)
(19065, 14)


In [173]:
19065-18443

622

In [174]:
df_ml.tail()

Unnamed: 0,Date1,Home_Team,Away_Team,Home_Final,Away_Final,Home_ML,Away_ML
18438,2022-01-11,TOR,PHO,95,99,165,-185
18439,2022-01-11,MEM,GSW,116,108,110,-130
18440,2022-01-11,NOP,MIN,128,125,140,-160
18441,2022-01-11,LAC,DEN,87,85,125,-145
18442,2022-01-11,CHI,DET,133,87,-1200,750


In [175]:
df_season.tail(623)
# as expected, we just need to delete the rows after Jan 11, 2022 in the 538 dataset

Unnamed: 0,date,season,neutral,playoff,team1,team2,elo_prob1,elo_prob2,carm-elo_prob1,carm-elo_prob2,raptor_prob1,raptor_prob2,score1,score2
18442,2022-01-11,2022,0,,LAC,DEN,0.560095,0.439905,,,0.491468,0.508532,87.0,85.0
18443,2022-01-12,2022,0,,WAS,ORL,0.832746,0.167254,,,0.760787,0.239213,112.0,106.0
18444,2022-01-12,2022,0,,IND,BOS,0.560011,0.439989,,,0.561706,0.438294,100.0,119.0
18445,2022-01-12,2022,0,,PHI,CHO,0.755352,0.244648,,,0.736239,0.263761,98.0,109.0
18446,2022-01-12,2022,0,,ATL,MIA,0.479371,0.520629,,,0.712704,0.287296,91.0,115.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19060,2022-04-09,2022,0,,MEM,BOS,0.700503,0.299497,,,0.591155,0.408845,,
19061,2022-04-09,2022,0,,MIN,CHI,0.673417,0.326583,,,0.708990,0.291010,,
19062,2022-04-09,2022,0,,ORL,MIA,0.233642,0.766358,,,0.338260,0.661740,,
19063,2022-04-09,2022,0,,CHO,WAS,0.704541,0.295459,,,0.668481,0.331519,,


In [180]:
# delete all rows after Jan 11, 2022
df_season = df_season.loc[:18442]
df_season.tail()

Unnamed: 0,date,season,neutral,playoff,team1,team2,elo_prob1,elo_prob2,carm-elo_prob1,carm-elo_prob2,raptor_prob1,raptor_prob2,score1,score2
18438,2022-01-11,2022,0,,TOR,PHO,0.477302,0.522698,,,0.536477,0.463523,95.0,99.0
18439,2022-01-11,2022,0,,MEM,GSW,0.589181,0.410819,,,0.62808,0.37192,116.0,108.0
18440,2022-01-11,2022,0,,NOP,MIN,0.534105,0.465895,,,0.483548,0.516452,128.0,125.0
18441,2022-01-11,2022,0,,CHI,DET,0.879148,0.120852,,,0.902778,0.097222,133.0,87.0
18442,2022-01-11,2022,0,,LAC,DEN,0.560095,0.439905,,,0.491468,0.508532,87.0,85.0


The next step is to get games in both datasets to be in the same order. To do this, I am going to create a loop that iterates through both dataframes and combines matching rows. 

In [181]:
# covert both dfs to list of lists
list_538 = df_season.values.tolist() 
list_ml = df_ml.values.tolist()

list_match_538 = []
list_match_ml = []

for row in list_538: #iterate through each row in 538
    date=row[0]
    team1=row[4]
    team2=row[5]
    score1=int(row[-2])
    score2=int(row[-1])
    
    for i in list_ml: # for each row in 538, iterate through each row in ml
        Date1=i[0] 
        Home_Team=i[1] 
        Away_Team=i[2] 
        Home_Final=i[3] 
        Away_Final=i[4]
        
        if date == Date1 and team1 == Home_Team and team2 == Away_Team and score1 == Home_Final and score2 == Away_Final:
            #that logic condition should be more than enough!!!
            list_match_538.append(row)
            list_match_ml.append(i)
        else:
            pass


In [182]:
print(df_season.columns)
print(df_ml.columns)

Index(['date', 'season', 'neutral', 'playoff', 'team1', 'team2', 'elo_prob1',
       'elo_prob2', 'carm-elo_prob1', 'carm-elo_prob2', 'raptor_prob1',
       'raptor_prob2', 'score1', 'score2'],
      dtype='object')
Index(['Date1', 'Home_Team', 'Away_Team', 'Home_Final', 'Away_Final',
       'Home_ML', 'Away_ML'],
      dtype='object')


In [279]:
# create a new dataframe and rename columns
index_ml = list(range(0,len(list_match_ml)))
columns_ml = ['Date1', 'Home_Team', 'Away_Team', 'Home_Final', 'Away_Final','home_ml', 'away_ml']
index_538 = list(range(0,len(list_match_538)))
columns_538 = ['date', 'season', 'neutral', 'playoff', 'home_team', 'away_team', 'home_elo_prob',
       'away_elo_prob', 'home_carm-elo_prob', 'away_carm-elo_prob', 'home_raptor_prob',
       'away_raptor_prob', 'home_score', 'away_score']

df_538 = pd.DataFrame(list_match_538, index_538, columns_538)
df_ml = pd.DataFrame(list_match_ml, index_ml, columns_ml)

print(df_538.shape)
print(df_ml.shape)

df = pd.concat([df_538, df_ml], axis=1)
df.head()

(18413, 14)
(18413, 7)


Unnamed: 0,date,season,neutral,playoff,home_team,away_team,home_elo_prob,away_elo_prob,home_carm-elo_prob,away_carm-elo_prob,...,away_raptor_prob,home_score,away_score,Date1,Home_Team,Away_Team,Home_Final,Away_Final,home_ml,away_ml
0,2007-10-30,2008,0,,SAS,POR,0.882229,0.117771,,,...,,106.0,97.0,2007-10-30,SAS,POR,106,97,-1400,900
1,2007-10-30,2008,0,,LAL,HOU,0.536006,0.463994,,,...,,93.0,95.0,2007-10-30,LAL,HOU,93,95,190,-230
2,2007-10-30,2008,0,,GSW,UTA,0.637471,0.362529,,,...,,96.0,117.0,2007-10-30,GSW,UTA,96,117,-120,100
3,2007-10-31,2008,0,,CLE,DAL,0.600327,0.399673,,,...,,74.0,92.0,2007-10-31,CLE,DAL,74,92,120,-140
4,2007-10-31,2008,0,,NOP,SAC,0.713491,0.286509,,,...,,104.0,90.0,2007-10-31,NOP,SAC,104,90,-525,425


Now that we have our main dataframe, let's check to see if there are any discrepencies.

In [280]:
problem_index = df['home_team'] != df['Home_Team'] 
df[problem_index].head()

Unnamed: 0,date,season,neutral,playoff,home_team,away_team,home_elo_prob,away_elo_prob,home_carm-elo_prob,away_carm-elo_prob,...,away_raptor_prob,home_score,away_score,Date1,Home_Team,Away_Team,Home_Final,Away_Final,home_ml,away_ml


#### Looks good!
Now let's get rid of the unneccessary columns and save as a csv locally for easy reference.

In [281]:
df = df.drop(['Date1', 'Home_Team', 'Away_Team', 'Home_Final', 'Away_Final'], axis=1)
df.head(3)

Unnamed: 0,date,season,neutral,playoff,home_team,away_team,home_elo_prob,away_elo_prob,home_carm-elo_prob,away_carm-elo_prob,home_raptor_prob,away_raptor_prob,home_score,away_score,home_ml,away_ml
0,2007-10-30,2008,0,,SAS,POR,0.882229,0.117771,,,,,106.0,97.0,-1400,900
1,2007-10-30,2008,0,,LAL,HOU,0.536006,0.463994,,,,,93.0,95.0,190,-230
2,2007-10-30,2008,0,,GSW,UTA,0.637471,0.362529,,,,,96.0,117.0,-120,100


In [282]:
# dropping row where moneyline was listed as 'NL'
print(df.loc[1090])
df = df.drop(index=1090)
df[(df['home_ml'] == 'NL') | (df['away_ml'] == 'NL')]

date                  2008-03-30
season                      2008
neutral                        0
playoff                      NaN
home_team                    BOS
away_team                    MIA
home_elo_prob           0.960997
away_elo_prob           0.039003
home_carm-elo_prob           NaN
away_carm-elo_prob           NaN
home_raptor_prob             NaN
away_raptor_prob             NaN
home_score                  88.0
away_score                  62.0
home_ml                       NL
away_ml                       NL
Name: 1090, dtype: object


Unnamed: 0,date,season,neutral,playoff,home_team,away_team,home_elo_prob,away_elo_prob,home_carm-elo_prob,away_carm-elo_prob,home_raptor_prob,away_raptor_prob,home_score,away_score,home_ml,away_ml


In [283]:
# convert object type to int type
df['home_ml'] = df['home_ml'].astype(float).astype(int)
df['away_ml'] = df['away_ml'].astype(float).astype(int)

In [284]:
df.to_csv('538_ml.csv')