# Generating Predictions

Using the AdaBoost model that we selected in the Selecting a Model notebook, we will create preditions for the 2018 NCAA Tournament.

In [1]:
# Import packages
import sys
sys.path.append('/Users/phil/Documents/Documents/College Basketball')

import pandas as pd
import collegebasketball as cbb
cbb.__version__

import warnings
warnings.filterwarnings('ignore')

## Train the Model

Using the same method as before, we will train the model. To understand how I arrived at this model, please look at the Selecting a Model notebook for more information.

However, there is one major difference in how we will train the model this time. Since we were using the tournament data as a test set before, we did not use it to train the model. However, since we are now predicting on the 2018 data, we can use the tournament data to help train the model.

In [2]:
# Load the csv files that contain the scores/kenpom data
season = cbb.load_csv('/Users/phil/Documents/Documents/College Basketball/Data/regular_season.csv')
march = cbb.load_csv('/Users/phil/Documents/Documents/College Basketball/Data/march.csv')

# Combine all of our data
data = pd.concat([season, march])

print('Number of Games in Regular Season Data: ' + str(len(season)))
print('Number of Games in March Data: ' + str(len(march)))
print('Number of Games in Combined Data: ' + str(len(data)))

Number of Games in Regular Season Data: 83707
Number of Games in March Data: 1976
Number of Games in Combined Data: 85683


In [3]:
# Generate the Features
train, features = cbb.gen_features(data)

In [4]:
# Blocking step
train = cbb.block_table(train)
print('Now there are ' + str(len(train)) + ' games.')
print(str(len(train[train['Label'] == 1])) + ' of those games are upsets')

Now there are 45863 games.
16449 of those games are upsets


In [5]:
# Train the AdaBoost classifier
ada = cbb.AdaBoostClassifier(n_estimators=50, random_state=0)
ada.fit(train[features], train[['Label']])

AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=50, random_state=0)

## Predict Games Using the Classifier

Now that we have a trained model, we can use it to predict games in the 2018 NCAA Tournament. First we need to load in the games from the first round and create feature vectors.

In [6]:
# Load games csv
games = cbb.load_csv('/Users/phil/Documents/Documents/College Basketball/Data/Tourney/2018.csv')
games.head()

Unnamed: 0,Home,Away
0,Virginia,UMBC
1,Creighton,Kansas St.
2,Kentucky,Davidson
3,Arizona,Buffalo
4,Miami FL,Loyola Chicago


In [7]:
# Get the 2018 kenpom data
path = '/Users/phil/Documents/Documents/College Basketball/Data/Kenpom/2018_kenpom.csv'
cbb.load_kenpom_dataframe(csv_file_path=path, year=2018, save_data=True)
kenpom = cbb.load_csv(path)
kenpom.head()

Unnamed: 0,Rank,Team,Conf,Wins,Losses,AdjEM,AdjO,AdjO Rank,AdjD,AdjD Rank,...,Luck,Luck Rank,OppAdjEM,OppAdjEM Rank,OppO,OppO Rank,OppD,OppD Rank,NCSOS AdjEM,NCSOS AdjEM Rank
0,1,Virginia,ACC,31,2,32.06,116.6,21,84.5,1,...,0.032,98,9.88,25,110.8,29,100.9,26,0.23,159
1,2,Villanova,BE,30,4,31.4,127.4,1,96.0,22,...,-0.018,224,10.21,20,111.3,19,101.1,29,4.14,64
2,3,Duke,ACC,26,7,29.04,122.6,3,93.6,7,...,-0.029,261,10.8,12,110.9,26,100.1,10,4.63,53
3,4,Cincinnati,Amer,30,4,26.97,113.2,55,86.2,2,...,-0.004,185,3.15,89,107.0,99,103.8,91,-4.17,295
4,5,Purdue,B10,28,6,26.66,123.3,2,96.6,31,...,0.003,168,8.53,44,109.3,54,100.8,21,0.93,146


In [8]:
# Merge Games data with the kenpom data
games = pd.merge(games, kenpom, left_on='Home', right_on='Team', sort=False)
games = pd.merge(games, kenpom, left_on='Away', right_on='Team', suffixes=('_Home', '_Away'), sort=False)
games.insert(0, 'Year', 2018)
games.head()

Unnamed: 0,Year,Home,Away,Rank_Home,Team_Home,Conf_Home,Wins_Home,Losses_Home,AdjEM_Home,AdjO_Home,...,Luck_Away,Luck Rank_Away,OppAdjEM_Away,OppAdjEM Rank_Away,OppO_Away,OppO Rank_Away,OppD_Away,OppD Rank_Away,NCSOS AdjEM_Away,NCSOS AdjEM Rank_Away
0,2018,Virginia,UMBC,1,Virginia,ACC,31,2,32.06,116.6,...,0.145,1,-5.96,299,102.4,291,108.4,304,-3.45,279
1,2018,Creighton,Kansas St.,28,Creighton,BE,21,11,17.24,116.5,...,0.056,56,9.22,36,111.4,17,102.2,59,-6.05,321
2,2018,Kentucky,Davidson,18,Kentucky,SEC,24,10,20.38,116.4,...,-0.051,297,1.96,108,105.9,131,103.9,94,1.8,117
3,2018,Arizona,Buffalo,21,Arizona,P12,27,7,19.21,119.0,...,-0.005,192,-0.94,165,104.8,174,105.8,165,4.11,65
4,2018,Miami FL,Loyola Chicago,37,Miami FL,ACC,22,9,15.63,113.3,...,0.058,53,0.03,139,103.8,225,103.7,88,-5.61,315


Now that we have feature vectors for the first round of the tournament and a trained model, we can make our predictions for the 2018 NCAA Tournament.

In [9]:
# Make Predictions
predictions = cbb.predict(ada, games)

In [10]:
# First Round
predictions.iloc[0:32,:]

Unnamed: 0,Favored,Underdog,Predicted Winner,Probabilities
0,Virginia,UMBC,Virginia,0.494144
1,Creighton,Kansas St.,Kansas St.,0.500056
2,Kentucky,Davidson,Kentucky,0.495594
3,Arizona,Buffalo,Arizona,0.496677
4,Miami FL,Loyola Chicago,Loyola Chicago,0.499971
5,Tennessee,Wright St.,Tennessee,0.496323
6,Nevada,Texas,Texas,0.499808
7,Cincinnati,Georgia St.,Cincinnati,0.493606
8,Xavier,Texas Southern,Xavier,0.486106
9,Florida St.,Missouri,Missouri,0.49937


In [11]:
# Second Round
predictions.iloc[32:48,:]

Unnamed: 0,Favored,Underdog,Predicted Winner,Probabilities
32,Virginia,Kansas St.,Virginia,0.492241
33,Kentucky,Arizona,Arizona,0.499107
34,Tennessee,Loyola Chicago,Tennessee,0.497599
35,Cincinnati,Texas,Cincinnati,0.496429
36,Xavier,Missouri,Xavier,0.491823
37,Gonzaga,South Dakota St.,Gonzaga,0.494185
38,Michigan,Houston,Michigan,0.497899
39,North Carolina,Providence,North Carolina,0.495336
40,Villanova,Virginia Tech,Villanova,0.491454
41,West Virginia,Wichita St.,Wichita St.,0.499107


In [12]:
# Later Rounds
predictions.iloc[48:,:]

Unnamed: 0,Favored,Underdog,Predicted Winner,Probabilities
48,Virginia,Arizona,Virginia,0.494504
49,Cincinnati,Tennessee,Cincinnati,0.497996
50,Gonzaga,Xavier,Xavier,0.502
51,North Carolina,Michigan,Michigan,0.498663
52,Villanova,Wichita St.,Villanova,0.491435
53,Purdue,St. Bonaventure,Purdue,0.497115
54,Kansas,Clemson,Kansas,0.496736
55,Duke,Michigan St.,Michigan St.,0.499933
56,Virginia,Cincinnati,Virginia,0.494647
57,Michigan,Xavier,Xavier,0.501486
