#Machine Learning Prototype 

This notebook takes all the data collected, and compares different machine learning algorithms to determine which one is the best. This could not have been possible without using Payton Soicher's machine learning writeup as a reference. You can find it here: https://towardsdatascience.com/can-you-accurately-predict-mlb-games-based-on-home-and-away-records-8a9a919bad29

This model looked at the head to head matchup to see if a team at home would win.

In [1]:
#imports
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import xgboost as xgb
from constants import ID_TO_NAME

In [2]:
#getting the schedule data and team stats
df = pd.read_csv("data2019.csv")
stats = pd.read_csv("stats.csv")

#setting index to team name for easier refrencing
stats = stats.set_index("Team Name")

In [4]:
#filtering out the data
colums = ["stage", "away id","away","away score", "home id", "home score", "home", "winner", "winner id", "winner label",
         "Map 1 Name", "Map 1 Type", "Map 1 Away Points", "Map 1 Home Points", "Map 1 Winner",
        "Map 2 Name", "Map 2 Type", "Map 2 Away Points", "Map 2 Home Points", "Map 2 Winner",
        "Map 3 Name", "Map 3 Type", "Map 3 Away Points", "Map 3 Home Points", "Map 3 Winner", 
        "Map 4 Name", "Map 4 Type", "Map 4 Away Points", "Map 4 Home Points", "Map 4 Winner",
        "Map 5 Name", "Map 5 Type", "Map 5 Away Points", "Map 5 Home Points", "Map 5 Winner",
         ]

sdf = df[colums]

With the team stats, I had collected both map specific stats, as well as general map type stats. For now, I am sticking to map type stats, with the goal being to collect map specific stats based on what maps are being played.

In [5]:
#filter out to map type stats
import itertools
cols = ["Points Earned", "Points Lost", "Points Differential", "Points Differential Rank", "True Win %", "Map Potential %", "Map Potential % Rank"]
types = ["Average Assault", "Average Control", "Average Hybrid", "Average Escort"]
columns = []
for i in types:
    columns.append([i + " " + j for j in cols])

columns = list(itertools.chain.from_iterable(columns))
sta = stats[columns]
sta

Unnamed: 0_level_0,Average Assault Points Earned,Average Assault Points Lost,Average Assault Points Differential,Average Assault Points Differential Rank,Average Assault True Win %,Average Assault Map Potential %,Average Assault Map Potential % Rank,Average Control Points Earned,Average Control Points Lost,Average Control Points Differential,...,Average Hybrid True Win %,Average Hybrid Map Potential %,Average Hybrid Map Potential % Rank,Average Escort Points Earned,Average Escort Points Lost,Average Escort Points Differential,Average Escort Points Differential Rank,Average Escort True Win %,Average Escort Map Potential %,Average Escort Map Potential % Rank
Team Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Atlanta Reign,2.3416,2.15,0.1916,6.0,55.0,0.751,11.0,1.2904,1.0764,0.214,...,57.3334,0.7708,7.0,2.3945,2.385667,0.008833,8.0,51.865,0.799333,4.0
Boston Uprising,2.3466,2.6266,-0.28,14.0,33.5,0.7144,14.0,0.818,1.5638,-0.7458,...,40.3334,0.6908,15.0,1.8945,2.222167,-0.327667,13.0,33.333333,0.6235,20.0
Chengdu Hunters,2.2584,2.3084,-0.05,11.0,45.8334,0.789,8.0,1.25,1.21,0.04,...,49.6666,0.7758,6.0,1.938833,2.272167,-0.333333,14.0,35.0,0.663167,15.0
Dallas Fuel,2.2092,2.7736,-0.5644,20.0,28.2858,0.6474,19.0,1.128,1.1584,-0.0304,...,32.8572,0.6958,14.0,2.241667,2.791667,-0.55,19.0,31.666667,0.652167,16.0
Florida Mayhem,2.4058,2.6104,-0.2046,13.0,38.3332,0.7464,13.0,0.7534,1.5706,-0.8172,...,31.7142,0.7178,12.0,1.863833,2.325,-0.461167,17.0,41.111167,0.633333,19.0
Guangzhou Charge,2.2058,2.1962,0.0096,10.0,50.6192,0.712,15.0,1.3128,1.0682,0.2446,...,48.7858,0.7446,10.0,2.062667,2.338833,-0.276167,12.0,43.531667,0.686833,10.0
Hangzhou Spark,2.1294,2.0376,0.0918,8.0,56.8572,0.7934,6.0,1.3236,1.1042,0.2194,...,45.9166,0.7078,13.0,2.261333,2.1695,0.091833,7.0,54.0875,0.766667,7.0
Houston Outlaws,1.7492,2.0878,-0.3386,16.0,36.0,0.549,20.0,1.1562,1.2646,-0.1084,...,34.1666,0.5592,20.0,1.841667,2.211167,-0.3695,16.0,38.888833,0.684167,12.0
London Spitfire,2.262,2.0416,0.2204,5.0,57.6786,0.7848,9.0,1.2546,1.2016,0.053,...,50.2976,0.727,11.0,2.038833,2.211167,-0.172333,11.0,48.1945,0.680333,13.0
Los Angeles Gladiators,2.1894,1.9448,0.2446,3.0,57.1516,0.801,4.0,1.1678,1.1632,0.0046,...,69.3334,0.8632,3.0,2.1625,2.0625,0.1,6.0,49.166667,0.7285,8.0


In [7]:
#turning all important catagorical data into numeric data
def get_team_stats(team):
    teamrow = sta.loc[team, :]
    return teamrow

def home_team_winner(row):
    if row['home'] == row['winner']:
        return 1 
    else:
        return 0
    

finaldf = []
noplay = sdf[["stage", "away", "away id", "away score", "home score", "home id", "home", "winner", "winner id"]]

#combining stats and schedule dataframes
for index, row in noplay.iterrows():
    awayrow = get_team_stats(row["away"])
    homerow = get_team_stats(row["home"])
    awayrow = awayrow.rename(lambda x: "Away " + x)
    homerow = homerow.rename(lambda x: "Home " + x)
    test = pd.concat([row, awayrow, homerow], )
    finaldf.append(test)
finaldf = pd.DataFrame(finaldf)
finaldf.insert(finaldf.columns.get_loc("winner"), 'HomeTeamWin', finaldf.apply(home_team_winner, axis = 1))

#dropping all catagorical data
finaldf = finaldf.drop(["stage", "home", "away", 'away score', 'home score', "winner", "winner id"], axis = 1)
finaldf = finaldf.loc[:, ~finaldf.columns.str.contains("Rank")]


finaldf

Unnamed: 0,away id,home id,HomeTeamWin,Away Average Assault Points Earned,Away Average Assault Points Lost,Away Average Assault Points Differential,Away Average Assault True Win %,Away Average Assault Map Potential %,Away Average Control Points Earned,Away Average Control Points Lost,...,Home Average Hybrid Points Earned,Home Average Hybrid Points Lost,Home Average Hybrid Points Differential,Home Average Hybrid True Win %,Home Average Hybrid Map Potential %,Home Average Escort Points Earned,Home Average Escort Points Lost,Home Average Escort Points Differential,Home Average Escort True Win %,Home Average Escort Map Potential %
0,4524,4410,0,2.3196,2.6138,-0.2942,36.9446,0.7500,1.0026,1.4804,...,2.2656,2.5166,-0.2510,50.2976,0.7270,2.038833,2.211167,-0.172333,48.194500,0.680333
1,4403,4402,0,2.1624,1.9222,0.2402,59.9884,0.8234,1.3690,1.0052,...,2.2316,2.4650,-0.2334,40.3334,0.6908,1.894500,2.222167,-0.327667,33.333333,0.623500
2,4409,4406,0,2.3574,2.2094,0.1480,53.3056,0.7998,1.3700,1.1900,...,2.4400,2.0434,0.3966,69.3334,0.8632,2.162500,2.062500,0.100000,49.166667,0.728500
3,4408,7693,1,2.6066,2.5466,0.0600,55.0000,0.7742,1.3450,1.1692,...,2.4034,2.3216,0.0818,45.9166,0.7078,2.261333,2.169500,0.091833,54.087500,0.766667
4,7695,4525,0,1.9016,2.2484,-0.3468,38.8334,0.6884,1.0394,1.3994,...,1.6952,2.3442,-0.6490,34.1666,0.5592,1.841667,2.211167,-0.369500,38.888833,0.684167
5,7698,4407,0,2.3416,2.1500,0.1916,55.0000,0.7510,1.2904,1.0764,...,2.0524,2.5780,-0.5256,31.7142,0.7178,1.863833,2.325000,-0.461167,41.111167,0.633333
6,4523,4404,1,2.2092,2.7736,-0.5644,28.2858,0.6474,1.1280,1.1584,...,2.8054,1.8916,0.9138,75.8938,0.9046,2.845833,1.380333,1.465500,90.773833,0.931500
7,7692,7699,0,2.2584,2.3084,-0.0500,45.8334,0.7890,1.2500,1.2100,...,2.3514,2.3986,-0.0472,48.7858,0.7446,2.062667,2.338833,-0.276167,43.531667,0.686833
8,4410,7694,1,2.2620,2.0416,0.2204,57.6786,0.7848,1.2546,1.2016,...,2.5834,2.7400,-0.1566,45.2500,0.7556,2.511167,2.361167,0.150000,56.944500,0.804833
9,7697,4403,1,1.5908,1.9764,-0.3856,41.0714,0.6902,0.8500,1.6034,...,2.5336,2.0938,0.4398,63.1558,0.8286,2.105333,1.776167,0.329167,63.273833,0.792333


In [21]:
#setting all data to same scale
from sklearn.preprocessing import scale

X = finaldf.loc[:, ~finaldf.columns.isin(['HomeTeamWin'])]
y = finaldf.loc[:, 'HomeTeamWin']

for col in X.loc[:, "Away Average Assault Points Earned": "Home Average Escort Map Potential %"].columns:
    X[col] = scale(X[col])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ind

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ind

Unnamed: 0,away id,home id,Away Average Assault Points Earned,Away Average Assault Points Lost,Away Average Assault Points Differential,Away Average Assault True Win %,Away Average Assault Map Potential %,Away Average Control Points Earned,Away Average Control Points Lost,Away Average Control Points Differential,...,Home Average Hybrid Points Earned,Home Average Hybrid Points Lost,Home Average Hybrid Points Differential,Home Average Hybrid True Win %,Home Average Hybrid Map Potential %,Home Average Escort Points Earned,Home Average Escort Points Lost,Home Average Escort Points Differential,Home Average Escort True Win %,Home Average Escort Map Potential %
0,4524,4410,0.383535,1.456770,-0.884121,-0.954492,-0.093244,-0.897993,1.253639,-1.089809,...,-0.128012,0.921959,-0.558648,-0.012240,-0.281921,-0.647986,0.012385,-0.363378,-0.162836,-0.607901
1,4403,4402,-0.218600,-1.091522,0.716218,0.727167,0.844508,0.787849,-0.985190,0.898625,...,-0.239578,0.710750,-0.521440,-0.619737,-0.667250,-1.151725,0.045333,-0.660770,-1.087454,-1.261677
2,4409,4406,0.528323,-0.033296,0.440112,0.239479,0.542997,0.792450,-0.114534,0.464364,...,0.444256,-1.014945,0.810418,1.148333,1.167852,-0.216376,-0.432913,0.158017,-0.102351,-0.053821
3,4408,7693,1.482853,1.209162,0.176583,0.363131,0.215933,0.677423,-0.212530,0.454441,...,0.324158,0.123784,0.144912,-0.279341,-0.486294,0.128563,-0.112418,0.142382,0.203809,0.385225
4,7695,4525,-1.217563,0.110405,-1.041640,-0.816654,-0.880241,-0.728672,0.872021,-0.811485,...,-1.999693,0.216291,-1.400044,-0.995714,-2.068059,-1.336119,0.012385,-0.740862,-0.741808,-0.563804
5,7698,4407,0.467803,-0.252163,0.570678,0.363131,-0.080468,0.426203,-0.649743,0.544695,...,-0.827595,1.173282,-1.139169,-1.145232,-0.379850,-1.258755,0.353347,-0.916362,-0.603540,-1.148560
6,4523,4404,-0.039339,2.045574,-1.693275,-1.586382,-1.404054,-0.321015,-0.263413,-0.032744,...,1.643260,-1.636293,1.903811,1.548307,1.608532,2.168532,-2.476190,2.772329,2.486325,2.281368
7,7692,7699,0.149116,0.331483,-0.152828,-0.305817,0.405017,0.240319,-0.020308,0.133589,...,0.153528,0.438961,-0.127802,-0.104412,-0.094579,-0.564805,0.394782,-0.562171,-0.452944,-0.533129
8,4410,7694,0.162905,-0.651577,0.656924,0.558606,0.351358,0.261484,-0.059883,0.164303,...,0.914801,1.836381,-0.359080,-0.319982,0.022510,1.000509,0.461676,0.253744,0.381563,0.824272
9,7697,4403,-2.408044,-0.891815,-1.157832,-0.653332,-0.857244,-1.600120,1.833134,-1.740964,...,0.751390,-0.808648,0.901746,0.771697,0.799555,-0.415894,-1.290560,0.596767,0.775356,0.680479


In [45]:
#splitting into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y
                                                   , random_state = 55
                                                   , stratify = finaldf.loc[:, 'HomeTeamWin'])

Here, I decided to use a variety of different models to find which one worked best. It's worth noting that the Overwatch League is rapidly changing it's ruleset, allowing teams to be terrible at the beginning of the season and do really well towards the end of the season. The stats collected are for overall season, which means games played at the beginning of the season aren't properally predicted compared to games towards the end of the season. This is something I hope to fix in the future. 

In [46]:
rfc = RandomForestClassifier(500, random_state = 534)
rfc.fit(X_train, y_train)
print('-- Random Forest -- ')
print('Training Accuracy: ', accuracy_score(y_train, rfc.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, rfc.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],rfc.predict(finaldf.loc[:, X_train.columns])))
print('\n')

lr = LogisticRegression(random_state = 534)
lr.fit(X_train, y_train)
print('-- Logistic Regression -- ')
print('Training Accuracy: ', accuracy_score(y_train, lr.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, lr.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],lr.predict(finaldf.loc[:, X_train.columns])))
print('\n')

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
print('-- K Nearest Neighbors -- ')
print('Training Accuracy: ', accuracy_score(y_train, knn.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, knn.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],knn.predict(finaldf.loc[:, X_train.columns])))
print('\n')

sv = SVC()
sv.fit(X_train, y_train)
print('-- SVC -- ')
print('Training Accuracy: ', accuracy_score(y_train, sv.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, sv.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],sv.predict(finaldf.loc[:, X_train.columns])))
print('\n')

xgboost = xgb.XGBClassifier(seed = 82)
xgboost.fit(X_train, y_train)
print('-- XGBoost --')
print('Training Accuracy: ', accuracy_score(y_train, xgboost.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, xgboost.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],xgboost.predict(finaldf.loc[:, X_train.columns])))

-- Random Forest -- 
Training Accuracy:  0.9623430962343096
Testing Accuracy:  0.775
Whole Dataset:  0.6927899686520376


-- Logistic Regression -- 
Training Accuracy:  0.698744769874477
Testing Accuracy:  0.775
Whole Dataset:  0.6112852664576802


-- K Nearest Neighbors -- 
Training Accuracy:  0.7782426778242678
Testing Accuracy:  0.6875
Whole Dataset:  0.670846394984326


-- SVC -- 
Training Accuracy:  0.8368200836820083
Testing Accuracy:  0.7125
Whole Dataset:  0.5297805642633229


-- XGBoost --
Training Accuracy:  0.8619246861924686
Testing Accuracy:  0.7125
Whole Dataset:  0.6300940438871473




As seen, each of the models vary in it's predictions. The testing accuracy does not go above 80%, as well as the whole dataset accuracy being significantly worse (this is expected since the overall season is much more in flux). This is overall very good accuracy, as the Overwatch League is in such a constant state of flux that anything above 50% is considered good. This is also very good results, as it shows that we have not run into an overfitting problem yet. 

Ideally, with more relavent features, the accuracy can go up. Furthermore, in previous seasons the league was run on a stage-by-stage basis, meaning every 7 weeks the game patch would change. This would introduce the possibility that teams do much better, since the overall way to play changes drastically. However in 2019, while the game patch will continue to change, there will no longer be stages. As such, momentum shifts may be limited, which would in turn better standarize our data and allow better accuracy.

In [47]:
#created a function that would choose any two teams from the overwatch league and determine the winner. 
def test_predict(away, home):
    
    #turn id back into name
    def convert_prediction(prediction):
        if prediction[0] == 1:
            #Home Won
            return ID_TO_NAME.get(testrow[1])
        if prediction[0] == 0:
            #Away Won
            return ID_TO_NAME.get(testrow[0])


    #create test data series
    newcol = ["away", "home"]
    
    #enter names of teams to get
    testrow = pd.Series([away, home], index=newcol)

    #get all stats for each team
    awayrow = get_team_stats(testrow[0])
    homerow = get_team_stats(testrow[1])

    #convert columns to proper team placement
    awayrow = awayrow.rename(lambda x: "Away " + x)
    homerow = homerow.rename(lambda x: "Home " + x)

    #turn name into id
    for name, team in ID_TO_NAME.items():
        if team == testrow[0]:
            testrow[0] = name
        if team == testrow[1]:
            testrow[1] = name
    testrow = pd.concat([testrow, awayrow, homerow])
    testrow = testrow[~testrow.index.str.contains("Rank")]

    #predictions
    rfcprediction = rfc.predict([testrow])
    lrprediction = lr.predict([testrow])
    knnprediction = knn.predict([testrow])
    svprediction = sv.predict([testrow])
    
    print("Random Forest Prediction: ", convert_prediction(rfcprediction))
    print(" ")
    print("Logistic Regression Prediction: ", convert_prediction(lrprediction))
    print(" ")
    print("K Nearest Neighbors Prediction: ", convert_prediction(knnprediction))
    print(" ")
    print("SVC Prediction: ", convert_prediction(svprediction))
    print(" ")


test_predict("San Francisco Shock", "Vancouver Titans")

Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Vancouver Titans
 


One note worth mentioning is how being the home team has a massive impact on whether teams will win or not. While for teams far enough away in the standings, this does not impact them, teams that are close to each other in the standings will often be chosen based on if they are home or away. While the features below do not prove that hypothesis, general testing shows it (done below). 

In [48]:
teams = ["Atlanta Reign",
    "Boston Uprising",
    "Chengdu Hunters",
    "Dallas Fuel",
    "Florida Mayhem",
    "Guangzhou Charge",
    "Hangzhou Spark",
    "Houston Outlaws",
    "London Spitfire",
    "Los Angeles Gladiators",
    "Los Angeles Valiant",
    "New York Excelsior",
    "Paris Eternal",
    "Philadelphia Fusion",
    "San Francisco Shock",
    "Seoul Dynasty",
    "Shanghai Dragons",
    "Toronto Defiant",
    "Vancouver Titans",
    "Washington Justice"
        ]

for home in teams:
    for away in teams:
        print(away, "vs", home)
        print("=========================")
        
        test_predict(away, home)

         

Atlanta Reign vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Boston Uprising vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Chengdu Hunters vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Chengdu Hunters
 
SVC Prediction:  Atlanta Reign
 
Dallas Fuel vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Florida Mayhem vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC

Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Boston Uprising
 
Seoul Dynasty vs Boston Uprising
Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Prediction:  Seoul Dynasty
 
SVC Prediction:  Boston Uprising
 
Shanghai Dragons vs Boston Uprising
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Boston Uprising
 
Toronto Defiant vs Boston Uprising
Random Forest Prediction:  Boston Uprising
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Prediction:  Boston Uprising
 
SVC Prediction:  Boston Uprising
 
Vancouver Titans vs Boston Uprising
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Predi

Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  New York Excelsior
 
K Nearest Neighbors Prediction:  New York Excelsior
 
SVC Prediction:  Dallas Fuel
 
Paris Eternal vs Dallas Fuel
Random Forest Prediction:  Dallas Fuel
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Dallas Fuel
 
Philadelphia Fusion vs Dallas Fuel
Random Forest Prediction:  Philadelphia Fusion
 
Logistic Regression Prediction:  Philadelphia Fusion
 
K Nearest Neighbors Prediction:  Dallas Fuel
 
SVC Prediction:  Dallas Fuel
 
San Francisco Shock vs Dallas Fuel
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Dallas Fuel
 
Seoul Dynasty vs Dallas Fuel
Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Seoul Dynasty
 
K Nearest Neighbors Prediction:  Seoul Dynasty
 
SV

Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Guangzhou Charge
 
SVC Prediction:  Guangzhou Charge
 
Hangzhou Spark vs Guangzhou Charge
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Hangzhou Spark
 
SVC Prediction:  Guangzhou Charge
 
Houston Outlaws vs Guangzhou Charge
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Guangzhou Charge
 
SVC Prediction:  Guangzhou Charge
 
London Spitfire vs Guangzhou Charge
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  London Spitfire
 
SVC Prediction:  Guangzhou Charge
 
Los Angeles Gladiators vs Guangzhou Charge
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest N

Random Forest Prediction:  Houston Outlaws
 
Logistic Regression Prediction:  Houston Outlaws
 
K Nearest Neighbors Prediction:  Boston Uprising
 
SVC Prediction:  Houston Outlaws
 
Chengdu Hunters vs Houston Outlaws
Random Forest Prediction:  Chengdu Hunters
 
Logistic Regression Prediction:  Chengdu Hunters
 
K Nearest Neighbors Prediction:  Chengdu Hunters
 
SVC Prediction:  Houston Outlaws
 
Dallas Fuel vs Houston Outlaws
Random Forest Prediction:  Houston Outlaws
 
Logistic Regression Prediction:  Houston Outlaws
 
K Nearest Neighbors Prediction:  Houston Outlaws
 
SVC Prediction:  Houston Outlaws
 
Florida Mayhem vs Houston Outlaws
Random Forest Prediction:  Houston Outlaws
 
Logistic Regression Prediction:  Houston Outlaws
 
K Nearest Neighbors Prediction:  Florida Mayhem
 
SVC Prediction:  Houston Outlaws
 
Guangzhou Charge vs Houston Outlaws
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Guangz

Random Forest Prediction:  London Spitfire
 
Logistic Regression Prediction:  London Spitfire
 
K Nearest Neighbors Prediction:  London Spitfire
 
SVC Prediction:  London Spitfire
 
Atlanta Reign vs Los Angeles Gladiators
Random Forest Prediction:  Los Angeles Gladiators
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Los Angeles Gladiators
 
Boston Uprising vs Los Angeles Gladiators
Random Forest Prediction:  Los Angeles Gladiators
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Los Angeles Gladiators
 
Chengdu Hunters vs Los Angeles Gladiators
Random Forest Prediction:  Los Angeles Gladiators
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Los Angeles Gladiators
 
Dallas Fuel vs Los Angeles Gladiators
Random Forest Prediction: 

Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Shanghai Dragons vs Los Angeles Valiant
Random Forest Prediction:  Los Angeles Valiant
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Toronto Defiant vs Los Angeles Valiant
Random Forest Prediction:  Los Angeles Valiant
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Vancouver Titans vs Los Angeles Valiant
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Los Angeles Valiant
 
Washington Justice vs Los Angeles Valiant
Random Forest Prediction:  Los Angeles Valiant
 
Logistic R

Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  New York Excelsior
 
SVC Prediction:  Paris Eternal
 
Paris Eternal vs Paris Eternal
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Paris Eternal
 
Philadelphia Fusion vs Paris Eternal
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Paris Eternal
 
San Francisco Shock vs Paris Eternal
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Paris Eternal
 
Seoul Dynasty vs Paris Eternal
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Seoul Dynasty
 
SVC P

Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
London Spitfire vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
Los Angeles Gladiators vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
Los Angeles Valiant vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
New York Excelsior vs San Francisco Shock
Random Forest Prediction:  San Francis

Hangzhou Spark vs Shanghai Dragons
Random Forest Prediction:  Hangzhou Spark
 
Logistic Regression Prediction:  Hangzhou Spark
 
K Nearest Neighbors Prediction:  Hangzhou Spark
 
SVC Prediction:  Shanghai Dragons
 
Houston Outlaws vs Shanghai Dragons
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
London Spitfire vs Shanghai Dragons
Random Forest Prediction:  London Spitfire
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
Los Angeles Gladiators vs Shanghai Dragons
Random Forest Prediction:  Los Angeles Gladiators
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
Los Angeles Valiant vs Shanghai Dragons
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression P

Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Vancouver Titans
 
Chengdu Hunters vs Vancouver Titans
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Chengdu Hunters
 
SVC Prediction:  Vancouver Titans
 
Dallas Fuel vs Vancouver Titans
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Vancouver Titans
 
Florida Mayhem vs Vancouver Titans
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Vancouver Titans
 
Guangzhou Charge vs Vancouver Titans
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors

Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Washington Justice
 
Washington Justice vs Washington Justice
Random Forest Prediction:  Washington Justice
 
Logistic Regression Prediction:  Washington Justice
 
K Nearest Neighbors Prediction:  Washington Justice
 
SVC Prediction:  Washington Justice
 


In [49]:
#listing features for Random Forest
pd.DataFrame(list(zip(rfc.feature_importances_, X_train.columns)), columns = ['Feature Importance','Feature']
            ).sort_values('Feature Importance',ascending = False)

Unnamed: 0,Feature Importance,Feature
14,0.036052,Away Average Hybrid Points Differential
35,0.033427,Home Average Hybrid True Win %
25,0.032583,Home Average Assault True Win %
9,0.031289,Away Average Control Points Differential
10,0.029798,Away Average Control True Win %
11,0.029148,Away Average Control Map Potential %
24,0.028773,Home Average Assault Points Differential
7,0.028653,Away Average Control Points Earned
8,0.02794,Away Average Control Points Lost
33,0.026503,Home Average Hybrid Points Lost
