#Machine Learning Prototype 

This notebook takes all the data collected, and compares different machine learning algorithms to determine which one is the best. This could not have been possible without using Payton Soicher's machine learning writeup as a reference. You can find it here: https://towardsdatascience.com/can-you-accurately-predict-mlb-games-based-on-home-and-away-records-8a9a919bad29

This model looked at the head to head matchup to see if a team at home would win.

In [1]:
#imports
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import xgboost as xgb
from constants import ID_TO_NAME

In [3]:
#getting the schedule data and team stats
df = pd.read_csv("data2019.csv")
stats = pd.read_csv("stats.csv")

#setting index to team name for easier refrencing
stats = stats.set_index("Team Name")

In [4]:
#filtering out the data
colums = ["stage", "away id","away","away score", "home id", "home score", "home", "winner", "winner id", "winner label",
         "Map 1 Name", "Map 1 Type", "Map 1 Away Points", "Map 1 Home Points", "Map 1 Winner",
        "Map 2 Name", "Map 2 Type", "Map 2 Away Points", "Map 2 Home Points", "Map 2 Winner",
        "Map 3 Name", "Map 3 Type", "Map 3 Away Points", "Map 3 Home Points", "Map 3 Winner", 
        "Map 4 Name", "Map 4 Type", "Map 4 Away Points", "Map 4 Home Points", "Map 4 Winner",
        "Map 5 Name", "Map 5 Type", "Map 5 Away Points", "Map 5 Home Points", "Map 5 Winner",
         ]

sdf = df[colums]

With the team stats, I had collected both map specific stats, as well as general map type stats. For now, I am sticking to map type stats, with the goal being to collect map specific stats based on what maps are being played.

In [11]:
#filter out to map type stats
import itertools
cols = ["Points Earned", "Points Lost", "Points Differential", "Points Differential Rank", "True Win %", "Map Potential %", "Map Potential % Rank"]
types = ["Average Assault", "Average Control", "Average Hybrid", "Average Escort"]
columns = []
for i in types:
    columns.append([i + " " + j for j in cols])

columns = list(itertools.chain.from_iterable(columns))
sta = stats[columns]
sta

cols = ["Wins", "Losses", "Draws"]
types = ["Total Assault", "Total Control", "Total Hybrid", "Total Escort"]
columns.clear()
for i in types:
    columns.append([i + " " + j for j in cols])
columns = list(itertools.chain.from_iterable(columns))
sta2 = stats[columns]

sta = pd.concat([sta2, sta], axis = 1)

sta

Unnamed: 0_level_0,Total Assault Wins,Total Assault Losses,Total Assault Draws,Total Control Wins,Total Control Losses,Total Control Draws,Total Hybrid Wins,Total Hybrid Losses,Total Hybrid Draws,Total Escort Wins,...,Average Hybrid True Win %,Average Hybrid Map Potential %,Average Hybrid Map Potential % Rank,Average Escort Points Earned,Average Escort Points Lost,Average Escort Points Differential,Average Escort Points Differential Rank,Average Escort True Win %,Average Escort Map Potential %,Average Escort Map Potential % Rank
Team Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Atlanta Reign,17.0,14.0,1.0,25.0,17.0,0.0,17.0,15.0,0.0,16.0,...,53.125,0.7708,7.0,2.3945,2.385667,0.008833,8.0,50.0,0.799333,4.0
Boston Uprising,9.0,18.0,2.0,11.0,27.0,0.0,11.0,18.0,0.0,10.0,...,37.931034,0.6908,15.0,1.8945,2.222167,-0.327667,13.0,35.714286,0.6235,20.0
Chengdu Hunters,12.0,16.0,1.0,21.0,19.0,0.0,13.0,16.0,0.0,10.0,...,44.827586,0.7758,6.0,1.938833,2.272167,-0.333333,14.0,34.482759,0.663167,15.0
Dallas Fuel,7.0,19.0,3.0,17.0,16.0,0.0,10.0,19.0,0.0,9.0,...,34.482759,0.6958,14.0,2.241667,2.791667,-0.55,19.0,32.142857,0.652167,16.0
Florida Mayhem,10.0,15.0,3.0,7.0,25.0,0.0,8.0,18.0,2.0,11.0,...,32.142857,0.7178,12.0,1.863833,2.325,-0.461167,17.0,39.285714,0.633333,19.0
Guangzhou Charge,15.0,14.0,1.0,23.0,16.0,0.0,15.0,14.0,1.0,13.0,...,51.666667,0.7446,10.0,2.062667,2.338833,-0.276167,12.0,43.333333,0.686833,10.0
Hangzhou Spark,17.0,15.0,3.0,26.0,20.0,0.0,16.0,18.0,1.0,20.0,...,47.142857,0.7078,13.0,2.261333,2.1695,0.091833,7.0,57.142857,0.766667,7.0
Houston Outlaws,9.0,17.0,3.0,17.0,19.0,0.0,10.0,19.0,0.0,11.0,...,34.482759,0.5592,20.0,1.841667,2.211167,-0.3695,16.0,39.285714,0.684167,12.0
London Spitfire,16.0,13.0,3.0,16.0,22.0,0.0,15.0,14.0,3.0,16.0,...,51.5625,0.727,11.0,2.038833,2.211167,-0.172333,11.0,50.0,0.680333,13.0
Los Angeles Gladiators,17.0,13.0,2.0,18.0,22.0,0.0,22.0,9.0,1.0,15.0,...,70.3125,0.8632,3.0,2.1625,2.0625,0.1,6.0,48.387097,0.7285,8.0


In [12]:
#turning all important catagorical data into numeric data
def get_team_stats(team):
    teamrow = sta.loc[team, :]
    return teamrow

def home_team_winner(row):
    if row['home'] == row['winner']:
        return 1 
    else:
        return 0
    

finaldf = []
noplay = sdf[["stage", "away", "away id", "away score", "home score", "home id", "home", "winner", "winner id"]]

#combining stats and schedule dataframes
for index, row in noplay.iterrows():
    awayrow = get_team_stats(row["away"])
    homerow = get_team_stats(row["home"])
    awayrow = awayrow.rename(lambda x: "Away " + x)
    homerow = homerow.rename(lambda x: "Home " + x)
    test = pd.concat([row, awayrow, homerow], )
    finaldf.append(test)
finaldf = pd.DataFrame(finaldf)
finaldf.insert(finaldf.columns.get_loc("winner"), 'HomeTeamWin', finaldf.apply(home_team_winner, axis = 1))

#dropping all catagorical data
finaldf = finaldf.drop(["stage", "home", "away", 'away score', 'home score', "winner", "winner id"], axis = 1)
finaldf = finaldf.loc[:, ~finaldf.columns.str.contains("Rank")]


finaldf

Unnamed: 0,away id,home id,HomeTeamWin,Away Total Assault Wins,Away Total Assault Losses,Away Total Assault Draws,Away Total Control Wins,Away Total Control Losses,Away Total Control Draws,Away Total Hybrid Wins,...,Home Average Hybrid Points Earned,Home Average Hybrid Points Lost,Home Average Hybrid Points Differential,Home Average Hybrid True Win %,Home Average Hybrid Map Potential %,Home Average Escort Points Earned,Home Average Escort Points Lost,Home Average Escort Points Differential,Home Average Escort True Win %,Home Average Escort Map Potential %
0,4524,4410,0,10.0,17.0,4.0,15.0,25.0,0.0,19.0,...,2.2656,2.5166,-0.2510,51.562500,0.7270,2.038833,2.211167,-0.172333,50.000000,0.680333
1,4403,4402,0,20.0,13.0,3.0,28.0,19.0,0.0,23.0,...,2.2316,2.4650,-0.2334,37.931034,0.6908,1.894500,2.222167,-0.327667,35.714286,0.623500
2,4409,4406,0,15.0,15.0,4.0,24.0,18.0,0.0,20.0,...,2.4400,2.0434,0.3966,70.312500,0.8632,2.162500,2.062500,0.100000,48.387097,0.728500
3,4408,7693,1,17.0,14.0,3.0,23.0,18.0,0.0,12.0,...,2.4034,2.3216,0.0818,47.142857,0.7078,2.261333,2.169500,0.091833,57.142857,0.766667
4,7695,4525,0,11.0,16.0,2.0,14.0,18.0,0.0,6.0,...,1.6952,2.3442,-0.6490,34.482759,0.5592,1.841667,2.211167,-0.369500,39.285714,0.684167
5,7698,4407,0,17.0,14.0,1.0,25.0,17.0,0.0,17.0,...,2.0524,2.5780,-0.5256,32.142857,0.7178,1.863833,2.325000,-0.461167,39.285714,0.633333
6,4523,4404,1,7.0,19.0,3.0,17.0,16.0,0.0,10.0,...,2.8054,1.8916,0.9138,74.418605,0.9046,2.845833,1.380333,1.465500,90.476190,0.931500
7,7692,7699,0,12.0,16.0,1.0,21.0,19.0,0.0,13.0,...,2.3514,2.3986,-0.0472,51.666667,0.7446,2.062667,2.338833,-0.276167,43.333333,0.686833
8,4410,7694,1,16.0,13.0,3.0,16.0,22.0,0.0,15.0,...,2.5834,2.7400,-0.1566,44.642857,0.7556,2.511167,2.361167,0.150000,57.142857,0.804833
9,7697,4403,1,9.0,14.0,5.0,13.0,20.0,0.0,8.0,...,2.5336,2.0938,0.4398,63.888889,0.8286,2.105333,1.776167,0.329167,62.857143,0.792333


In [13]:
#setting all data to same scale
from sklearn.preprocessing import scale

X = finaldf.loc[:, ~finaldf.columns.isin(['HomeTeamWin'])]
y = finaldf.loc[:, 'HomeTeamWin']

for col in X.loc[:, "Away Total Assault Wins": "Home Average Escort Map Potential %"].columns:
    X[col] = scale(X[col])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A va

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A va

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A va

In [14]:
#splitting into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y
                                                   , random_state = 55
                                                   , stratify = finaldf.loc[:, 'HomeTeamWin'])

Here, I decided to use a variety of different models to find which one worked best. It's worth noting that the Overwatch League is rapidly changing it's ruleset, allowing teams to be terrible at the beginning of the season and do really well towards the end of the season. The stats collected are for overall season, which means games played at the beginning of the season aren't properally predicted compared to games towards the end of the season. This is something I hope to fix in the future. 

In [15]:
rfc = RandomForestClassifier(500, random_state = 534)
rfc.fit(X_train, y_train)
print('-- Random Forest -- ')
print('Training Accuracy: ', accuracy_score(y_train, rfc.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, rfc.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],rfc.predict(finaldf.loc[:, X_train.columns])))
print('\n')

lr = LogisticRegression(random_state = 534)
lr.fit(X_train, y_train)
print('-- Logistic Regression -- ')
print('Training Accuracy: ', accuracy_score(y_train, lr.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, lr.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],lr.predict(finaldf.loc[:, X_train.columns])))
print('\n')

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
print('-- K Nearest Neighbors -- ')
print('Training Accuracy: ', accuracy_score(y_train, knn.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, knn.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],knn.predict(finaldf.loc[:, X_train.columns])))
print('\n')

sv = SVC()
sv.fit(X_train, y_train)
print('-- SVC -- ')
print('Training Accuracy: ', accuracy_score(y_train, sv.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, sv.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],sv.predict(finaldf.loc[:, X_train.columns])))
print('\n')

xgboost = xgb.XGBClassifier(seed = 82)
xgboost.fit(X_train, y_train)
print('-- XGBoost --')
print('Training Accuracy: ', accuracy_score(y_train, xgboost.predict(X_train)))
print('Testing Accuracy: ', accuracy_score(y_test, xgboost.predict(X_test)))
print('Whole Dataset: ', accuracy_score(finaldf['HomeTeamWin'],xgboost.predict(finaldf.loc[:, X_train.columns])))

-- Random Forest -- 
Training Accuracy:  0.9623430962343096
Testing Accuracy:  0.7625
Whole Dataset:  0.5297805642633229


-- Logistic Regression -- 
Training Accuracy:  0.702928870292887
Testing Accuracy:  0.7625
Whole Dataset:  0.6990595611285266


-- K Nearest Neighbors -- 
Training Accuracy: 



 0.7573221757322176
Testing Accuracy:  0.6875
Whole Dataset:  0.6865203761755486


-- SVC -- 
Training Accuracy:  0.8200836820083682
Testing Accuracy:  0.7125
Whole Dataset:  0.5297805642633229


-- XGBoost --
Training Accuracy:  0.8702928870292888
Testing Accuracy:  0.6875
Whole Dataset:  0.5235109717868338


As seen, each of the models vary in it's predictions. The testing accuracy does not go above 80%, as well as the whole dataset accuracy being significantly worse (this is expected since the overall season is much more in flux). This is overall very good accuracy, as the Overwatch League is in such a constant state of flux that anything above 50% is considered good. This is also very good results, as it shows that we have not run into an overfitting problem yet. 

Ideally, with more relavent features, the accuracy can go up. Furthermore, in previous seasons the league was run on a stage-by-stage basis, meaning every 7 weeks the game patch would change. This would introduce the possibility that teams do much better, since the overall way to play changes drastically. However in 2019, while the game patch will continue to change, there will no longer be stages. As such, momentum shifts may be limited, which would in turn better standarize our data and allow better accuracy.

In [20]:
#created a function that would choose any two teams from the overwatch league and determine the winner. 
def test_predict(away, home):
    
    #turn id back into name
    def convert_prediction(prediction):
        if prediction[0] == 1:
            #Home Won
            return ID_TO_NAME.get(testrow[1])
        if prediction[0] == 0:
            #Away Won
            return ID_TO_NAME.get(testrow[0])


    #create test data series
    newcol = ["away", "home"]
    
    #enter names of teams to get
    testrow = pd.Series([away, home], index=newcol)

    #get all stats for each team
    awayrow = get_team_stats(testrow[0])
    homerow = get_team_stats(testrow[1])

    #convert columns to proper team placement
    awayrow = awayrow.rename(lambda x: "Away " + x)
    homerow = homerow.rename(lambda x: "Home " + x)

    #turn name into id
    for name, team in ID_TO_NAME.items():
        if team == testrow[0]:
            testrow[0] = name
        if team == testrow[1]:
            testrow[1] = name
    testrow = pd.concat([testrow, awayrow, homerow])
    testrow = testrow[~testrow.index.str.contains("Rank")]

    #predictions
    rfcprediction = rfc.predict([testrow])
    lrprediction = lr.predict([testrow])
    knnprediction = knn.predict([testrow])
    svprediction = sv.predict([testrow])
    
    print("Random Forest Prediction: ", convert_prediction(rfcprediction))
    print(" ")
    print("Logistic Regression Prediction: ", convert_prediction(lrprediction))
    print(" ")
    print("K Nearest Neighbors Prediction: ", convert_prediction(knnprediction))
    print(" ")
    print("SVC Prediction: ", convert_prediction(svprediction))
    print(" ")


print("Shock (Away) vs Titans (Home)")
print("===============================")
print(test_predict("San Francisco Shock", "Vancouver Titans"))

print("Titans (Away) vs Shock (Home)")
print("===============================")
print(test_predict("Vancouver Titans", "San Francisco Shock"))


Shock (Away) vs Titans (Home)
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Vancouver Titans
 
None
Titans (Away) vs Shock (Home)
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
None


One note worth mentioning is how being the home team has a massive impact on whether teams will win or not. While for teams far enough away in the standings, this does not impact them, teams that are close to each other in the standings will often be chosen based on if they are home or away. While the features below do not prove that hypothesis, general testing shows it (done below). 

In [17]:
teams = ["Atlanta Reign",
    "Boston Uprising",
    "Chengdu Hunters",
    "Dallas Fuel",
    "Florida Mayhem",
    "Guangzhou Charge",
    "Hangzhou Spark",
    "Houston Outlaws",
    "London Spitfire",
    "Los Angeles Gladiators",
    "Los Angeles Valiant",
    "New York Excelsior",
    "Paris Eternal",
    "Philadelphia Fusion",
    "San Francisco Shock",
    "Seoul Dynasty",
    "Shanghai Dragons",
    "Toronto Defiant",
    "Vancouver Titans",
    "Washington Justice"
        ]

for home in teams:
    for away in teams:
        print(away, "vs", home)
        print("=========================")
        
        test_predict(away, home)

         

Atlanta Reign vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Boston Uprising vs Atlanta Reign
Random Forest Prediction:  Boston Uprising
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Chengdu Hunters vs Atlanta Reign
Random Forest Prediction:  Chengdu Hunters
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Chengdu Hunters
 
SVC Prediction:  Atlanta Reign
 
Dallas Fuel vs Atlanta Reign
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Atlanta Reign
 
Florida Mayhem vs Atlanta Reign
Random Forest Prediction:  Florida Mayhem
 
Logistic Regression Prediction:  Atlanta Reign
 
K Nearest Neighbors Prediction:  Atlanta Reign


Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Seoul Dynasty
 
K Nearest Neighbors Prediction:  Seoul Dynasty
 
SVC Prediction:  Boston Uprising
 
Shanghai Dragons vs Boston Uprising
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Boston Uprising
 
Toronto Defiant vs Boston Uprising
Random Forest Prediction:  Toronto Defiant
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Prediction:  Boston Uprising
 
SVC Prediction:  Boston Uprising
 
Vancouver Titans vs Boston Uprising
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Boston Uprising
 
Washington Justice vs Boston Uprising
Random Forest Prediction:  Washington Justice
 
Logistic Regression Prediction:  Boston Uprising
 
K Nearest Neighbors Predicti

Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  New York Excelsior
 
K Nearest Neighbors Prediction:  New York Excelsior
 
SVC Prediction:  Dallas Fuel
 
Paris Eternal vs Dallas Fuel
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Dallas Fuel
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Dallas Fuel
 
Philadelphia Fusion vs Dallas Fuel
Random Forest Prediction:  Philadelphia Fusion
 
Logistic Regression Prediction:  Philadelphia Fusion
 
K Nearest Neighbors Prediction:  Philadelphia Fusion
 
SVC Prediction:  Dallas Fuel
 
San Francisco Shock vs Dallas Fuel
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Dallas Fuel
 
Seoul Dynasty vs Dallas Fuel
Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Seoul Dynasty
 
K Nearest Neighbors Prediction:  Seoul Dyna

Random Forest Prediction:  Houston Outlaws
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Guangzhou Charge
 
SVC Prediction:  Guangzhou Charge
 
London Spitfire vs Guangzhou Charge
Random Forest Prediction:  London Spitfire
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  London Spitfire
 
SVC Prediction:  Guangzhou Charge
 
Los Angeles Gladiators vs Guangzhou Charge
Random Forest Prediction:  Los Angeles Gladiators
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Guangzhou Charge
 
Los Angeles Valiant vs Guangzhou Charge
Random Forest Prediction:  Los Angeles Valiant
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Guangzhou Charge
 
New York Excelsior vs Guangzhou Charge
Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  Ne

Random Forest Prediction:  Dallas Fuel
 
Logistic Regression Prediction:  Dallas Fuel
 
K Nearest Neighbors Prediction:  Houston Outlaws
 
SVC Prediction:  Houston Outlaws
 
Florida Mayhem vs Houston Outlaws
Random Forest Prediction:  Florida Mayhem
 
Logistic Regression Prediction:  Houston Outlaws
 
K Nearest Neighbors Prediction:  Florida Mayhem
 
SVC Prediction:  Houston Outlaws
 
Guangzhou Charge vs Houston Outlaws
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Prediction:  Guangzhou Charge
 
SVC Prediction:  Houston Outlaws
 
Hangzhou Spark vs Houston Outlaws
Random Forest Prediction:  Hangzhou Spark
 
Logistic Regression Prediction:  Hangzhou Spark
 
K Nearest Neighbors Prediction:  Hangzhou Spark
 
SVC Prediction:  Houston Outlaws
 
Houston Outlaws vs Houston Outlaws
Random Forest Prediction:  Houston Outlaws
 
Logistic Regression Prediction:  Houston Outlaws
 
K Nearest Neighbors Prediction:  Houston Outlaw

Random Forest Prediction:  Washington Justice
 
Logistic Regression Prediction:  London Spitfire
 
K Nearest Neighbors Prediction:  London Spitfire
 
SVC Prediction:  London Spitfire
 
Atlanta Reign vs Los Angeles Gladiators
Random Forest Prediction:  Atlanta Reign
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Atlanta Reign
 
SVC Prediction:  Los Angeles Gladiators
 
Boston Uprising vs Los Angeles Gladiators
Random Forest Prediction:  Boston Uprising
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Los Angeles Gladiators
 
Chengdu Hunters vs Los Angeles Gladiators
Random Forest Prediction:  Chengdu Hunters
 
Logistic Regression Prediction:  Los Angeles Gladiators
 
K Nearest Neighbors Prediction:  Los Angeles Gladiators
 
SVC Prediction:  Los Angeles Gladiators
 
Dallas Fuel vs Los Angeles Gladiators
Random Forest Prediction:  Dallas Fuel
 
Logistic Regre

Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  Los Angeles Valiant
 
Seoul Dynasty vs Los Angeles Valiant
Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Seoul Dynasty
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Shanghai Dragons vs Los Angeles Valiant
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Toronto Defiant vs Los Angeles Valiant
Random Forest Prediction:  Toronto Defiant
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Los Angeles Valiant
 
Vancouver Titans vs Los Angeles Valiant
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Predicti

Random Forest Prediction:  Los Angeles Valiant
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Paris Eternal
 
New York Excelsior vs Paris Eternal
Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  New York Excelsior
 
K Nearest Neighbors Prediction:  New York Excelsior
 
SVC Prediction:  Paris Eternal
 
Paris Eternal vs Paris Eternal
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Paris Eternal
 
Philadelphia Fusion vs Paris Eternal
Random Forest Prediction:  Philadelphia Fusion
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Paris Eternal
 
San Francisco Shock vs Paris Eternal
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Predic

Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
Hangzhou Spark vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
Houston Outlaws vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
London Spitfire vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Prediction:  San Francisco Shock
 
K Nearest Neighbors Prediction:  San Francisco Shock
 
SVC Prediction:  San Francisco Shock
 
Los Angeles Gladiators vs San Francisco Shock
Random Forest Prediction:  San Francisco Shock

Random Forest Prediction:  Boston Uprising
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
Chengdu Hunters vs Shanghai Dragons
Random Forest Prediction:  Chengdu Hunters
 
Logistic Regression Prediction:  Chengdu Hunters
 
K Nearest Neighbors Prediction:  Chengdu Hunters
 
SVC Prediction:  Shanghai Dragons
 
Dallas Fuel vs Shanghai Dragons
Random Forest Prediction:  Dallas Fuel
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
Florida Mayhem vs Shanghai Dragons
Random Forest Prediction:  Florida Mayhem
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Shanghai Dragons
 
Guangzhou Charge vs Shanghai Dragons
Random Forest Prediction:  Guangzhou Charge
 
Logistic Regression Prediction:  Guangzhou Charge
 
K Nearest Neighbors Predictio

Random Forest Prediction:  Seoul Dynasty
 
Logistic Regression Prediction:  Seoul Dynasty
 
K Nearest Neighbors Prediction:  Seoul Dynasty
 
SVC Prediction:  Toronto Defiant
 
Shanghai Dragons vs Toronto Defiant
Random Forest Prediction:  Shanghai Dragons
 
Logistic Regression Prediction:  Shanghai Dragons
 
K Nearest Neighbors Prediction:  Shanghai Dragons
 
SVC Prediction:  Toronto Defiant
 
Toronto Defiant vs Toronto Defiant
Random Forest Prediction:  Toronto Defiant
 
Logistic Regression Prediction:  Toronto Defiant
 
K Nearest Neighbors Prediction:  Toronto Defiant
 
SVC Prediction:  Toronto Defiant
 
Vancouver Titans vs Toronto Defiant
Random Forest Prediction:  Vancouver Titans
 
Logistic Regression Prediction:  Vancouver Titans
 
K Nearest Neighbors Prediction:  Vancouver Titans
 
SVC Prediction:  Toronto Defiant
 
Washington Justice vs Toronto Defiant
Random Forest Prediction:  Washington Justice
 
Logistic Regression Prediction:  Washington Justice
 
K Nearest Neighbors Predi

Random Forest Prediction:  Los Angeles Valiant
 
Logistic Regression Prediction:  Los Angeles Valiant
 
K Nearest Neighbors Prediction:  Los Angeles Valiant
 
SVC Prediction:  Washington Justice
 
New York Excelsior vs Washington Justice
Random Forest Prediction:  New York Excelsior
 
Logistic Regression Prediction:  New York Excelsior
 
K Nearest Neighbors Prediction:  New York Excelsior
 
SVC Prediction:  Washington Justice
 
Paris Eternal vs Washington Justice
Random Forest Prediction:  Paris Eternal
 
Logistic Regression Prediction:  Paris Eternal
 
K Nearest Neighbors Prediction:  Paris Eternal
 
SVC Prediction:  Washington Justice
 
Philadelphia Fusion vs Washington Justice
Random Forest Prediction:  Philadelphia Fusion
 
Logistic Regression Prediction:  Philadelphia Fusion
 
K Nearest Neighbors Prediction:  Washington Justice
 
SVC Prediction:  Washington Justice
 
San Francisco Shock vs Washington Justice
Random Forest Prediction:  San Francisco Shock
 
Logistic Regression Pred

In [18]:
#listing features for Random Forest
pd.DataFrame(list(zip(rfc.feature_importances_, X_train.columns)), columns = ['Feature Importance','Feature']
            ).sort_values('Feature Importance',ascending = False)

Unnamed: 0,Feature Importance,Feature
5,0.027537,Away Total Control Wins
59,0.025415,Home Average Hybrid True Win %
20,0.025058,Away Average Control Points Lost
22,0.023834,Away Average Control True Win %
26,0.023609,Away Average Hybrid Points Differential
21,0.021822,Away Average Control Points Differential
40,0.021433,Home Total Hybrid Wins
48,0.021207,Home Average Assault Points Differential
19,0.021063,Away Average Control Points Earned
23,0.020560,Away Average Control Map Potential %
