In this notebook I will be exploring the NBADATA.csv file, and trying to identify the relevant statistics for modelling. This will be split into two sections. 


1. Which stats are the most correlated to victory? 
    By looking at all of these box scores, we can see what the game's final point differential (+/-) was. Using this as an output along with all of the box score statistics, we can infer which values are the most correlated to it, along with other useful takeaways to be described concurrently with the study. 
    

2. How many games are sufficient for lookback? 
    Obviously we cannot use the ingame data for prediction purposes, so some sort of lookback prediction proxy must be used. In order to capture what the teams expected performance will be in the game for prediction, a lookback window will be used over the previous x games to serve as that team's forecasted stats. I will try to identify how many games of lookback is the best for this purpose. 

In [1]:
#import dependencies, and dataset. 


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline



# Going to stop here for the day. The next step is combing the rolling columns here with the relevant stats of the previous one, make sure the game_ids are aligned, and confirm some sense of correlation between the two. Generalize it enough so it can be done for an arbitrary window size to compare these, and also maybe see if these columns still posess a correlaiton to the outcome as well? After all that is the point of this study. 

In [2]:
def create_dataset(data,ngames):
    """Using all the other information acquired above, create the datset using the relevant categories stated. 
    
     data : dataframe
         The NBADATA dataframe. 
    """
    nba_explore = pd.read_csv('NBADATA.csv')
    del nba_explore['Unnamed: 0'],nba_explore['GAME_ID'],nba_explore['Date'],nba_explore['Team'],nba_explore['Home'],nba_explore['Away']
    del nba_explore['OU'],nba_explore['TOTAL']

#add some other potential columns, like efficency. 
    nba_explore['3P%'] = np.divide(nba_explore['3P'].values,nba_explore['3PA'].values) 

    nba_explore['FG%'] = np.divide(nba_explore['FG'].values,nba_explore['FGA'].values)
    nba_explore['FT%'] = np.divide(nba_explore['FT'].values,nba_explore['FTA'].values)
    nba_explore['TRB']  = nba_explore['OR'] + nba_explore['DR']

    nba_explore['AST/TO'] = np.divide(nba_explore['AST'].values,nba_explore['TO'].values)


    relevant_stats = []
    for col in nba_explore.columns:
        if col != 'PLUS_MINUS':
           # print(col + " Correlation to Outcome")
            corr = np.corrcoef(nba_explore[col],nba_explore['PLUS_MINUS'])
            #print(corr[0][1])
            if abs(corr[0][1]) < .1:
                pass
            else:
                relevant_stats.append(col)
        

    data['AST/TO'] = np.divide(data['AST'].values,data['TO'].values)
    data['3P%'] = np.divide(data['3P'].values,data['3PA'].values) 
    data['FG%'] = np.divide(data['FG'].values,data['FGA'].values)
    data['FT%'] = np.divide(data['FT'].values,data['FTA'].values)
    del data['Unnamed: 0'],data['TOTAL']
   # del data['Date']
    data = data.loc[data['GAME_ID'].values < 41300001] #genius! No playoff games now :)   
    #del data['Team'] 
    #data = pd.get_dummies(data) #sometimes option to hot tcode team, but not yet. Seems like overfitting. 
    teams = data.Team.unique() #each nba team. 
#iterate over those teams, make a rolling window
    nba_data = pd.DataFrame([])
    season_ids = []
    for i,val in enumerate(data['GAME_ID'].values):  #loop through every game
        season_ids.append(str(val)[1:3])

    data['Season_ID'] = season_ids #identify the unique seasons. 

    for team in teams:  #for each team
       # print(team)
    #get separate seasons here
        team_data = data.loc[data['Team'] == team]  #this contains the box score of every team game from 2013 to 2018.
        for season in data['Season_ID'].unique(): #this contains the box score of that team for that season. 
            #print(season)
            team_season = team_data.loc[team_data['Season_ID'] == season]
        
            stuff_to_turn_into_avgs =  relevant_stats  #['OR', 'DR', 'TOT', 'PF', 'ST', 'TO', 'BL', '3P%', 'FG%', 'FT%']
            for col in team_season.columns:
                if col in stuff_to_turn_into_avgs:
                        team_season['Rolling ' + col] = team_season[col].rolling(window=ngames).mean().shift(1)

            #split each season up here, 
                    #if col != 'PTS':
                    #    team_season['Rolling ' + col] = team_season[col].rolling(window=N_GAMES).mean().shift(1)

                        del team_season[col]
                    
            nba_data =  nba_data.append(team_season)

           # df = pd.concat([road_df,home_df],axis=1)
#reorganize the dataset. 
    nba_data_splits = nba_data.sort_values(by = ['GAME_ID', 'Home','Away'], ascending=[True, True,False])

    nba_data_splits.dropna(inplace=True)

    del nba_data_splits['FGA'], nba_data_splits['3PA'], nba_data_splits['FTA'], nba_data_splits['OR'],nba_data_splits['PF']                                                                                                                                
    del nba_data_splits['PLUS_MINUS'], nba_data_splits['OU'],nba_data_splits['Rolling SPREAD'],nba_data_splits['Season_ID']
    nba_dataset = pd.read_csv('NBADATA.csv')                                                                                                               
    rolling_vals = nba_data_splits
    
    
    return rolling_vals

In [3]:
data = pd.read_csv('NBADATA.csv')
rolling_vals = create_dataset(data,30)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [4]:
spreads = pd.read_csv('NBADATA.csv')
spreads = spreads[['GAME_ID','SPREAD','Team']]

In [5]:
#spreads

In [6]:
test = rolling_vals.merge(spreads,on=['GAME_ID','Team'])

In [7]:
test.sort_values(by=['GAME_ID'])

Unnamed: 0,GAME_ID,Date,Team,Home,Away,Rolling FG,Rolling 3P,Rolling FT,Rolling DR,Rolling AST,Rolling ST,Rolling TO,Rolling BL,Rolling PTS,Rolling AST/TO,Rolling 3P%,Rolling FG%,Rolling FT%,SPREAD
0,21300275,4/8/14,San Antonio Spurs,0,1,41.033333,8.966667,17.366667,35.066667,25.966667,7.366667,14.200000,5.766667,108.400000,1.940596,0.401666,0.486488,0.814569,-7.0
1,21300275,4/8/14,Minnesota Timberwolves,1,0,39.166667,7.066667,22.633333,31.366667,24.500000,8.600000,13.600000,3.766667,108.033333,1.916725,0.333378,0.458320,0.766028,7.0
2,21300413,12/23/13,Utah Jazz,0,1,35.366667,6.066667,15.866667,29.166667,19.666667,7.166667,14.700000,4.900000,92.666667,1.518058,0.330579,0.430660,0.754741,6.5
3,21300424,12/26/13,Houston Rockets,1,0,37.833333,9.466667,21.333333,34.633333,20.900000,7.766667,16.233333,6.166667,106.466667,1.387903,0.354389,0.480004,0.699482,-7.5
4,21300426,12/26/13,Los Angeles Clippers,0,1,37.933333,8.000000,21.133333,32.600000,23.666667,8.300000,13.866667,4.600000,105.000000,1.805647,0.332414,0.466906,0.721228,4.5
5,21300428,12/27/13,Detroit Pistons,0,1,38.900000,6.400000,16.600000,30.066667,20.200000,9.000000,14.566667,5.300000,100.800000,1.495818,0.305121,0.453750,0.665598,-4.0
6,21300433,12/27/13,Utah Jazz,1,0,35.333333,6.233333,15.633333,29.033333,19.633333,7.100000,14.400000,4.966667,92.533333,1.541718,0.336766,0.431026,0.761408,-3.5
7,21300435,12/27/13,Golden State Warriors,1,0,38.500000,9.533333,16.033333,34.766667,22.500000,7.733333,16.733333,4.966667,102.566667,1.474916,0.404747,0.460122,0.728847,-6.5
8,21300439,12/28/13,Detroit Pistons,0,1,38.900000,6.266667,16.033333,29.966667,19.933333,9.000000,14.466667,5.333333,100.100000,1.486929,0.301880,0.450768,0.656002,3.5
9,21300440,12/28/13,Charlotte Hornets,0,1,34.466667,4.833333,18.400000,32.633333,19.366667,6.400000,12.600000,5.433333,92.166667,1.722615,0.317372,0.422401,0.718197,6.0


In [8]:
test = test.sort_values(by = ['GAME_ID', 'Home','Away'], ascending=[True, True,False])


In [9]:
from collections import Counter

counts = Counter(test['GAME_ID'].values)

test['GAME_ID'].values

vals = np.array(list(counts.values())) == 2

useable_games = np.array(list(counts.keys()))[vals]

In [10]:
useable_games = np.array(list(counts.keys()))[vals]

In [11]:
test

Unnamed: 0,GAME_ID,Date,Team,Home,Away,Rolling FG,Rolling 3P,Rolling FT,Rolling DR,Rolling AST,Rolling ST,Rolling TO,Rolling BL,Rolling PTS,Rolling AST/TO,Rolling 3P%,Rolling FG%,Rolling FT%,SPREAD
0,21300275,4/8/14,San Antonio Spurs,0,1,41.033333,8.966667,17.366667,35.066667,25.966667,7.366667,14.200000,5.766667,108.400000,1.940596,0.401666,0.486488,0.814569,-7.0
1,21300275,4/8/14,Minnesota Timberwolves,1,0,39.166667,7.066667,22.633333,31.366667,24.500000,8.600000,13.600000,3.766667,108.033333,1.916725,0.333378,0.458320,0.766028,7.0
2,21300413,12/23/13,Utah Jazz,0,1,35.366667,6.066667,15.866667,29.166667,19.666667,7.166667,14.700000,4.900000,92.666667,1.518058,0.330579,0.430660,0.754741,6.5
3,21300424,12/26/13,Houston Rockets,1,0,37.833333,9.466667,21.333333,34.633333,20.900000,7.766667,16.233333,6.166667,106.466667,1.387903,0.354389,0.480004,0.699482,-7.5
4,21300426,12/26/13,Los Angeles Clippers,0,1,37.933333,8.000000,21.133333,32.600000,23.666667,8.300000,13.866667,4.600000,105.000000,1.805647,0.332414,0.466906,0.721228,4.5
5,21300428,12/27/13,Detroit Pistons,0,1,38.900000,6.400000,16.600000,30.066667,20.200000,9.000000,14.566667,5.300000,100.800000,1.495818,0.305121,0.453750,0.665598,-4.0
6,21300433,12/27/13,Utah Jazz,1,0,35.333333,6.233333,15.633333,29.033333,19.633333,7.100000,14.400000,4.966667,92.533333,1.541718,0.336766,0.431026,0.761408,-3.5
7,21300435,12/27/13,Golden State Warriors,1,0,38.500000,9.533333,16.033333,34.766667,22.500000,7.733333,16.733333,4.966667,102.566667,1.474916,0.404747,0.460122,0.728847,-6.5
8,21300439,12/28/13,Detroit Pistons,0,1,38.900000,6.266667,16.033333,29.966667,19.933333,9.000000,14.466667,5.333333,100.100000,1.486929,0.301880,0.450768,0.656002,3.5
9,21300440,12/28/13,Charlotte Hornets,0,1,34.466667,4.833333,18.400000,32.633333,19.366667,6.400000,12.600000,5.433333,92.166667,1.722615,0.317372,0.422401,0.718197,6.0


In [12]:
#test.loc[test['GAME_ID'].values in useable_games]

In [13]:
len(useable_games)

3850

In [14]:
test.columns

Index(['GAME_ID', 'Date', 'Team', 'Home', 'Away', 'Rolling FG', 'Rolling 3P',
       'Rolling FT', 'Rolling DR', 'Rolling AST', 'Rolling ST', 'Rolling TO',
       'Rolling BL', 'Rolling PTS', 'Rolling AST/TO', 'Rolling 3P%',
       'Rolling FG%', 'Rolling FT%', 'SPREAD'],
      dtype='object')

In [15]:
clunky = pd.DataFrame([])
for col in test.columns:
    clunky[col] = test[col]  #how to assign the same columns, and values in it!

for i, row in enumerate(clunky.values):
    if row[0] not in useable_games:
        print(i)
        print('invalid')
        clunky = clunky.drop(index=i)

2
invalid
3
invalid
4
invalid
5
invalid
6
invalid
7
invalid
8
invalid
9
invalid
10
invalid
13
invalid
14
invalid
15
invalid
16
invalid
17
invalid
18
invalid
21
invalid
22
invalid
25
invalid
28
invalid
31
invalid
38
invalid
39
invalid
40
invalid
63
invalid
1560
invalid
1561
invalid
1562
invalid
1563
invalid
1564
invalid
1565
invalid
1566
invalid
1567
invalid
1568
invalid
1571
invalid
1572
invalid
1573
invalid
1574
invalid
1575
invalid
1588
invalid
1607
invalid
1610
invalid
1631
invalid
3120
invalid
3121
invalid
3122
invalid
3123
invalid
3124
invalid
3131
invalid
3132
invalid
3133
invalid
3134
invalid
3137
invalid
3138
invalid
3145
invalid
3146
invalid
3147
invalid
3150
invalid
3151
invalid
3152
invalid
3157
invalid
3158
invalid
3159
invalid
3168
invalid
3175
invalid
3176
invalid
3177
invalid
4682
invalid
4685
invalid
4686
invalid
4687
invalid
4688
invalid
4689
invalid
4698
invalid
4699
invalid
4700
invalid
4701
invalid
4702
invalid
4703
invalid
4704
invalid
4709
invalid
4710
invalid
471

In [16]:
len(clunky)

7700

In [17]:
nba_data  = clunky
nba_data_splits = nba_data.sort_values(by = ['GAME_ID', 'Home','Away'], ascending=[True, True,False])


In [18]:
#nba_data_splits

In [19]:
clunky.head()

Unnamed: 0,GAME_ID,Date,Team,Home,Away,Rolling FG,Rolling 3P,Rolling FT,Rolling DR,Rolling AST,Rolling ST,Rolling TO,Rolling BL,Rolling PTS,Rolling AST/TO,Rolling 3P%,Rolling FG%,Rolling FT%,SPREAD
0,21300275,4/8/14,San Antonio Spurs,0,1,41.033333,8.966667,17.366667,35.066667,25.966667,7.366667,14.2,5.766667,108.4,1.940596,0.401666,0.486488,0.814569,-7.0
1,21300275,4/8/14,Minnesota Timberwolves,1,0,39.166667,7.066667,22.633333,31.366667,24.5,8.6,13.6,3.766667,108.033333,1.916725,0.333378,0.45832,0.766028,7.0
11,21300447,12/28/13,Utah Jazz,0,1,35.633333,6.366667,15.6,29.066667,19.8,7.3,14.233333,5.033333,93.233333,1.56394,0.340005,0.434021,0.767087,14.5
12,21300447,12/28/13,Los Angeles Clippers,1,0,38.2,8.0,20.9,32.8,23.566667,8.3,13.5,4.633333,105.3,1.909397,0.333049,0.465288,0.719054,-14.5
19,21300455,12/30/13,Dallas Mavericks,0,1,39.466667,8.533333,16.466667,30.7,23.233333,9.533333,14.133333,4.7,103.933333,1.840691,0.381,0.469546,0.799961,4.0


In [20]:
#del nba_data_splits['GAME_ID'],nba_data_splits['Date']
#del nba_data_splits['Home'],nba_data_splits['Away'],nba_data_splits['Team']
 
    #
#Convert to the common box score already used. 

road_df = nba_data_splits.iloc[::2]
home_df = nba_data_splits.iloc[1::2]
for col in nba_data_splits.columns:
    road_df['road_' + col] = road_df[col]
    home_df['home_' + col] = home_df[col]
    
    del road_df[col],home_df[col]

home_df.reset_index(inplace=True)
road_df.reset_index(inplace=True)

#merged into a dataframe here. 
df = pd.concat([road_df,home_df],axis=1)
del df['index']

#create the dataset here. Can consider the spread, or winner. 
#at the moment only using a single classifier, that seems sufficient. A home team loss is synonymous with a road team win. 

#del df['road_PTS'], df['home_PTS'],df['home_SPREAD']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [21]:
#df

#only retain the home flag, since we just care about the being home outcome since it aligns with home team spread
df['Home'] = df['home_Home']
df['GAME_ID'] = df['road_GAME_ID']
del df['road_GAME_ID'],df['home_GAME_ID'],df['road_Date'],df['home_Date']
del df['road_Away'],df['road_Home'],df['home_Away'],df['home_Home']

In [22]:
data = pd.read_csv('NBADATA.csv')
data = data[['GAME_ID','PLUS_MINUS','Home']]

In [23]:
df = df.merge(data,on=['GAME_ID','Home'])  #this is correct. #this is the +/- fp

#for the home team. 

In [24]:
df.columns

#remove the final extraneous columns. 

del df['road_Team'],df['home_Team'],df['GAME_ID']
del df['Home']
#df['Home Team Covers?'] = 

Quick brainstorm. This plus minus refers to the final point differential for the home team. If the point differential is smaller than the spread attributed to the home team, 

Ex. +- = -10. Means home team lost by 10. If spread was +11, then the team covere. Positive number

if +- = 24. If home spread was -26, they didn't cover spread negative number


so if +- plus spread >0, they covered. Else they didn/t 

In [25]:
df.columns

Index(['road_Rolling FG', 'road_Rolling 3P', 'road_Rolling FT',
       'road_Rolling DR', 'road_Rolling AST', 'road_Rolling ST',
       'road_Rolling TO', 'road_Rolling BL', 'road_Rolling PTS',
       'road_Rolling AST/TO', 'road_Rolling 3P%', 'road_Rolling FG%',
       'road_Rolling FT%', 'road_SPREAD', 'home_Rolling FG', 'home_Rolling 3P',
       'home_Rolling FT', 'home_Rolling DR', 'home_Rolling AST',
       'home_Rolling ST', 'home_Rolling TO', 'home_Rolling BL',
       'home_Rolling PTS', 'home_Rolling AST/TO', 'home_Rolling 3P%',
       'home_Rolling FG%', 'home_Rolling FT%', 'home_SPREAD', 'PLUS_MINUS'],
      dtype='object')

In [None]:
df.to_csv('30_game_rolling_stats.csv')

In [32]:
df = pd.read_csv('30_game_rolling_stats.csv',dtype=np.float32)

In [33]:
outcome = df['PLUS_MINUS'] #df['home_SPREAD'] + df['PLUS_MINUS'] 

In [34]:
y = []
for val in outcome:
    if val>0: 
        y.append(1) #home team wins. 
    else:
        y.append(0)

In [35]:
del df['PLUS_MINUS']

In [36]:
X = df

In [37]:
del X['Unnamed: 0']

In [39]:
sum(y)/len(y)  



0.5903896103896104

The home team roughly wins 60% of the time. 

In [40]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [41]:
scaler = MinMaxScaler()
X_train,X_test,y_train,y_test = train_test_split(X,y,shuffle=False,test_size = .25)

In [42]:
type(X_test.values[0][0])

numpy.float32

Now prior to normalizing, create a separate var that holds onto all of the spreads for the home teams in the testing games. 

In [None]:
test_spreads = X_test['home_SPREAD']

In [None]:
scaler.fit(X_train.values)
X_train = scaler.transform(X_train.values)
X_test = scaler.transform(X_test.values)

# Now apply tensorflow model here!

In [None]:
import  tensorflow as tf
tf.set_random_seed(456)
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

In [None]:
X_train.shape[1]

In [None]:
# Generate tensorflow graph
d = X_train.shape[1]
n_hidden = 100
learning_rate = .001
n_epochs = 100
batch_size = 100
dropout_prob = 1.0

In [None]:
type(X_train[0][0])

In [None]:
with tf.name_scope("placeholders"):
    x = tf.placeholder(tf.float32, (None, d))
    y = tf.placeholder(tf.float32, (None,))
    keep_prob = tf.placeholder(tf.float32)
with tf.name_scope("hidden-layer-1"):
    W = tf.Variable(tf.random_normal((d, n_hidden)))
    b = tf.Variable(tf.random_normal((n_hidden,)))
    x_hidden_1 = tf.nn.relu(tf.matmul(x, W) + b)
  # Apply dropout
    x_hidden_1 = tf.nn.dropout(x_hidden_1, keep_prob)

with tf.name_scope("hidden-layer-2"):
    W = tf.Variable(tf.random_normal((n_hidden, n_hidden)))
    b = tf.Variable(tf.random_normal((n_hidden,)))
    x_hidden_2 = tf.nn.relu(tf.matmul(x_hidden_1, W) + b)
  # Apply dropout
    x_hidden_2 = tf.nn.dropout(x_hidden_2, keep_prob)

with tf.name_scope("output"):
    W = tf.Variable(tf.random_normal((n_hidden, 1)))
    b = tf.Variable(tf.random_normal((1,)))
    y_logit = tf.matmul(x_hidden_2, W) + b
  # the sigmoid gives the class probability of 1
    y_one_prob = tf.sigmoid(y_logit)
  # Rounding P(y=1) will give the correct prediction.
    y_pred = tf.round(y_one_prob)
with tf.name_scope("loss"):
  # Compute the cross-entropy term for each datapoint
    y_expand = tf.expand_dims(y, 1)
    entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_logit, labels=y_expand)
  # Sum all contributions
    l = tf.reduce_sum(entropy)

with tf.name_scope("optim"):
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(l)

with tf.name_scope("summaries"):
    tf.summary.scalar("loss", l)
    merged = tf.summary.merge_all()

In [None]:
train_writer = tf.summary.FileWriter('/tmp/nba-train',tf.get_default_graph())


In [None]:
N = X_train.shape[0]
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step = 0
    for epoch in range(n_epochs):
        pos = 0
        while pos < N:
            batch_X = X_train[pos:pos+batch_size]
            batch_y = y_train[pos:pos+batch_size]
            feed_dict = {x: batch_X, y: batch_y, keep_prob: dropout_prob}
            _, summary, loss = sess.run([train_op, merged, l], feed_dict=feed_dict)
            print("epoch %d, step %d, loss: %f" % (epoch, step, loss))
            train_writer.add_summary(summary, step)
    
            step += 1
            pos += batch_size

  # Make Predictions (set keep_prob to 1.0 for predictions)
    train_y_pred = sess.run(y_pred, feed_dict={x: X_train, keep_prob: 1.0})
   # valid_y_pred = sess.run(y_pred, feed_dict={x: valid_X, keep_prob: 1.0})
    test_y_pred = sess.run(y_pred, feed_dict={x: X_test, keep_prob: 1.0})

train_weighted_score = accuracy_score(y_train, train_y_pred)
print("Train Weighted Classification Accuracy: %f" % train_weighted_score)
#valid_weighted_score = accuracy_score(valid_y, valid_y_pred, sample_weight=valid_w)
#print("Valid Weighted Classification Accuracy: %f" % valid_weighted_score)
test_weighted_score = accuracy_score(y_test, test_y_pred)
print("Test Weighted Classification Accuracy: %f" % test_weighted_score)

In [None]:
!tensorboard --logdir=/tmp/nba-train 

In [None]:
from sklearn.neural_network import MLPClassifier

from sklearn.svm import SVC
model = MLPClassifier()

model.fit(X_train,y_train)



In [None]:
model.score(X_train,y_train)

In [None]:
model.score(X_test,y_test)

Now to compare: 

In [None]:
predictions = model.predict(X_test)


for i,prediction in enumerate(predictions):
   # print("Prediction: ", prediction)
    #print("Spread of the home team for this game: ", test_spreads.values[i])
    
    if test_spreads.values[i] > 0 and prediction ==1:
        print("Picked a home team underdog")
        
    if test_spreads.values[i] < 0 and prediction ==0:
        print("Picked a away team underdog")

This is a good time to note this seems to very rarely pick underdogs, but more often it does on the road.  -- OK now the model picks the opposite, let's come back to this later after a proper payout calculator is made at least. 

In [None]:
def spread2ML(spread):
    """Converts spread into a moneyline value using the equation I derived in another notebook. 
    """
    if spread <=1.5:
        
        ML = 1.71409498 * spread**3 + 10.90008433 * spread **2 + 22.40247106 * spread - 138.20112341
    else: 

        ML = 1.66494668 * spread**3 -20.03302374 * spread**2 + 101.20347437 * spread - 34.68833849
    
    return ML


def ML2Payout(ML,bet,win=True):
    """Convert Moneyline odds to a payout. 
    """
    if win:
        if ML < 0: # - moneyline, 
        # PAYOUT = BET AMOUNT / (-1 *MONEYLINE ODDS / 100)

            payout = bet / (-1*ML/100)

        elif ML > 0:   #now for the underdog
        #PAYOUT = BET AMOUNT * ODDS / 100
            payout = (bet * ML) / 100

            
        else:
            payout = bet
    else:
        if ML > 0: 
            payout = -bet
        elif ML < 0:
            #in the circumstances where its a favorite, the computer makes you put down more. ie -190 means 19 to win 10. 
            payout = -bet
            
        else:
            payout = -bet
    
    return payout 

In [None]:
spread2ML(1)

In [None]:
def risk2payout(ML,bet,win=True):
    """Depending on the moneyline, the risk reward formula changes. 
    """
    
    if ML < 0: # if betting on a favorite. 
        
        risk = -ML/bet
        reward = bet
        if win:
            payout = reward
        else:
            payout = -risk
        
    if ML > 0: #if betting on an underdog. 
        risk = bet
        reward = ML/bet
        
        if win:
            payout = reward #this is your risked money back, plus the reward. 
        else:
            payout = -risk  #this is how much you risked, and it's gone. 
    
    return risk, payout
        
        
        
        
        

In [None]:
risk2payout(-105,10,win=False)

In [None]:
money_made = 0
acc_count = 0
total_winings = 0
total_losings = 0
init_bet = 10

for i,prediction in enumerate(predictions):
       
     
    
    spread = test_spreads.values[i]
    ML_odds = spread2ML(spread)
    print()
    print("Odds of Game: ", ML_odds)

   # ML * 
    print("Spread of Game: ",spread)

    if y_test[i] == prediction:
        acc_count+=1
        risk , winnings = risk2payout(ML_odds,init_bet,win=True)
        print("Correct! Win $", winnings)
        print("You risked $",risk)
       # print('$',winnings)
        
        money_made += winnings
        total_winings += winnings
       # if ML_odds < -
    else:
        _ , losings = risk2payout(ML_odds,init_bet,win=False)
        print("Wrong! Lose $", -losings)

        money_made += losings
        total_losings += losings
     #   print("xxx")
    
    
  #  if test_spreads.values[i] > 0 and prediction ==1:
   #     print("Picked a home team underdog")
  #      
  #  if test_spreads.values[i] < 0 and prediction ==0:
   #     print("Picked a away team underdog")

In [None]:
money_made

In [None]:
total_winings

In [None]:
total_losings

# So considering the spread along with these rolling statistics, able to successfully pick the winner of NBA games with approximately 69% accuracy. By also considering moneyline odds converted from spread, how much money do you win? 

In [None]:
len(X_test)

# Quick Review

# 1. Identified stats most correlated to point differential in a game.

# 2. Identified ngamesplits most correlated to performance in said game. 

# 3. Trained a model on these statistics and found can pick winners with ~68% accuracy, and spread still barely over 50%. 

# What's up for next time? Refactor this code, and turn into an updateable model as more games come in using those scrapers, and also provide a fast application for guess and check. Also convert between traditional spread to create a converter for money made. 