# How post-game LoL stats affect the outcome

### Description

[League of Legends](https://en.wikipedia.org/wiki/League_of_Legends) is a multiple online battle arena (MOBA) video game that involves two teams of five players. Each team wins by destroying the enemy "nexus" located in the enemy base. Each player earns experience points (exp) and gold to give their character extra statistics in order to defeat the enemy. 

In this project, I will be classifying League of Legends games as either `win` or `loss` based on player game statistics at the end of the game, such as number of kills/deaths, gold earned, and damage dealt. I will explore how these different statistics in the game affect the outcome of these games. In particular, I will be looking at my own ranked solo/duo games under the summoner name `Mirinda`.  

I will be obtaining my data through the [RiotAPI](https://developer.riotgames.com/apis). I will pull a list of matches and choose a particular player to view their statistics.

In order to explore how different post-game statistics affect the outcome of the game, I will train multiple classifications models using different sets of features. These classification models are as follows:

- Perceptron
- Adaline SGD
- Logistic regression
- Decision tree


In each of these models, I will train 10 different models for each set of features. I will then test each model with 10 sets of testing data. I will record the accuracy of the trained models for different sets of features and compare the overall averages of each different model for each set of features.

In [1]:
# imports
import csv 
import numpy as np
import os
import pandas as pd
from tabulate import tabulate

### The data

I have obtained my data through the RiotAPI using a Python script with the `riotwatcher` package.  I pulled a list of 907 matches. Through each match, I iterated through each `participant` and pulled game results for `Mirinda`. I then wrote this data obtained onto a CSV file. 

Each match contains a list of 54 different statistics (see`df.columns`). These will be used as possible features.

In [2]:
# read in data
df = pd.read_csv('game-data.csv',
                     encoding='utf-8')
# possible features exluding `win` and `matchID`
df.columns

Index(['assists', 'bountyLevel', 'champExperience', 'champLevel',
       'consumablesPurchased', 'damageDealtToBuildings',
       'damageDealtToObjectives', 'damageDealtToTurrets',
       'damageSelfMitigated', 'deaths', 'detectorWardsPlaced', 'doubleKills',
       'goldEarned', 'goldSpent', 'killingSprees', 'kills',
       'largestCriticalStrike', 'largestKillingSpree', 'largestMultiKill',
       'longestTimeSpentLiving', 'magicDamageDealt',
       'magicDamageDealtToChampions', 'magicDamageTaken',
       'neutralMinionsKilled', 'pentaKills', 'physicalDamageDealt',
       'physicalDamageDealtToChampions', 'physicalDamageTaken', 'quadraKills',
       'sightWardsBoughtInGame', 'spell1Casts', 'spell2Casts', 'spell3Casts',
       'spell4Casts', 'timeCCingOthers', 'timePlayed', 'totalDamageDealt',
       'totalDamageDealtToChampions', 'totalDamageShieldedOnTeammates',
       'totalDamageTaken', 'totalHeal', 'totalHealsOnTeammates',
       'totalMinionsKilled', 'totalTimeCCDealt', 'totalTim

### Feature selections

In [3]:
# features: all
X_all = df.iloc[:, :54]
X_all = np.array(X_all)

# features: player combat -- offensive
# 'largestCriticalStrike', 'magicDamageDealt','magicDamageDealtToChampions', 'physicalDamageDealt',
# 'physicalDamageDealtToChampions', 'totalDamageDealt', 'totalDamageDealtToChampions'
# 'trueDamageDealt', 'trueDamageDealtToChampions',
X_pcombatOff = df.iloc[:, [16, 20, 21, 25, 26, 36, 37, 47, 48]]
X_pcombatOff = np.array(X_pcombatOff)

# features: player combat -- defensive
# 'damageSelfMitigated', 'magicDamageTaken', 'physicalDamageTaken', 'totalDamageShieldedOnTeammates', 'totalDamageTaken', 
# 'totalHeal', 'totalHealsOnTeammates', 'trueDamageTaken'
X_pcombatDef = df.iloc[:, [8, 22, 27, 38, 39, 40, 41, 49]]
X_pcombatDef = np.array(X_pcombatDef)

# features: player combat -- offensive AND defensive
X_pcombat = df.iloc[:, [16, 20, 21, 25, 26, 36, 37, 47, 48, 8, 22, 27, 38, 39, 40, 41, 49]]
X_pcombat = np.array(X_pcombat)

# features: spell casts
# 'spell1Casts', 'spell2Casts', 'spell3Casts','spell4Casts'
X_spells = df.iloc[:, [30, 31, 32, 33]]
X_spells = np.array(X_spells)

# features: KDA, kill streaks
# 'assists', 'bountyLevel', 'deaths', 'doubleKills', 'killingSprees', 'kills',
# 'largestKillingSpree', 'largestMultiKill', 'pentaKills', 'quadraKills', 'tripleKills',
X_kda = df.iloc[:, [0, 1, 9, 11, 14, 15, 17, 18, 24, 28, 46]]
X_kda = np.array(X_kda)

# features: vision
# 'detectorWardsPlaced', 'sightWardsBoughtInGame', 'visionScore', 'visionWardsBoughtInGame', 'wardsKilled', 'wardsPlaced'
X_vision = df.iloc[:, [10, 29, 50, 51, 52, 53]]
X_vision = np.array(X_vision)

# list of different sets of features
features_list = [X_all, X_pcombatOff, X_pcombatDef, X_pcombat, X_spells, X_kda, X_vision]

# classes
y = df.iloc[:, 54].values
# True = 1, False = 0
y = np.where(y, 1, 0)

### Split data into training/testing sets

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

"""
split_data(): split data into n sets of training and testing sets with a 70/30 split respectively, and standardize training set
    input:
        randomStart - start value
        randomEnd - end value
        features - set of features as an array
        targets - classes
    output:
        dictionary of data splits:
            dataSplits[random_state] = [X_train_std, X_test_std, y_train, y_test]
"""

def split_data(randomStart, randomEnd, features, targets):

    # dataSplits[random_state] = [X_train_std, X_test_std, y_train, y_test]
    dataSplits = {}
    
    for i in range(randomStart, randomEnd):
        # Split data into testing and training 
        X_train, X_test, y_train, y_test = train_test_split(features, targets, test_size=0.3, random_state=i, stratify=targets)

        # Standardize training set
        sc = StandardScaler()
        sc.fit(X_train)
        X_train_std = sc.transform(X_train)
        X_test_std = sc.transform(X_test)

        # add split to dictionary
        dataSplits[i] = [X_train_std, X_test_std, y_train, y_test]
        
    return dataSplits

In [5]:
# Split data into training and testing
X_all_split = split_data(1, 11, X_all, y)
X_pcombatOff_split = split_data(1, 11, X_pcombatOff, y)
X_pcombatDef_split = split_data(1, 11, X_pcombatDef, y)
X_pcombat_split = split_data(1, 11, X_pcombat, y)
X_spells_split = split_data(1, 11, X_spells, y)
X_kda_split = split_data(1, 11, X_kda, y)
X_vision_split = split_data(1, 11, X_vision, y)

### Train Perceptron models



In [6]:
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

"""
train_perceptrons(): train n different perceptron models and test each n times
    input:
        randomStart - start value
        randomEnd - end value
        dataSplits - dictionary of data splits
    output:
        dictionary of perceptron models:
            ppns[Perceptron model] = [(Misclassified examples, Accuracy)...]
"""

def train_perceptrons(randomStart, randomEnd, dataSplits):
        
    # ppns[Perceptron model] = [(Misclassified examples, Accuracy)...]
    ppns = {}

    for randState in dataSplits:

        # current testing and training split
        currentSplit = dataSplits[randState]

        # list of models for each random state
        results = []

        # Train Perceptron model
        ppn = Perceptron(eta0=0.1, random_state=randState)
        ppn.fit(currentSplit[0], currentSplit[2]) # X_train_std, y_train

        # test Perceptron model x times
        for i in range(randomStart, randomEnd):

            # put results into dict
            y_pred = ppn.predict(dataSplits[i][1]) # X_test_std

            # append tuple (Misclassified examples, Accuracy) ==> (y_test != y_pred).sum(), accuracy_score(y_test, y_pred))
            results.append(((dataSplits[i][3] != y_pred).sum(), accuracy_score(dataSplits[i][3], y_pred)))
            
        # append results to current perceptron dict
        ppns[ppn] = results
        
    return ppns

In [7]:
"""
print_avgs(): print average misclassified examples and average accuracy
    input:
        model_dict - dictionary of models:
            model_dict[model] = [(Misclassified examples, Accuracy)...]
    output:
        array containing average misclassified examples and average accuracy:
            [average misclassified examples, average accuracy]
"""
def print_avgs(model_dict):
    
    total = len(model_dict)*len(model_dict[list(model_dict.keys())[0]])
    sums = []
    
    for model in model_dict:
        results = model_dict[model]
        sums.append([sum(tup) for tup in zip(*results)])

    final_list = [sum(value) for value in zip(*sums)]
    final_list = [x / total for x in final_list]
    
    print('Avg misclassified examples: %d' % final_list[0])
    print('Avg accuracy: %.3f' % final_list[1])
    
    return final_list

### Train Perceptrons

In [8]:
# train Perceptrons on all features
ppns_all = train_perceptrons(1, 11, X_all_split)
ppns_all_results = print_avgs(ppns_all)

Avg misclassified examples: 25
Avg accuracy: 0.906


In [9]:
# train Perceptrons on player combat -- offensive
ppns_pcombatOff = train_perceptrons(1, 11, X_pcombatOff_split)
ppns_pcombatOff_results = print_avgs(ppns_pcombatOff)

Avg misclassified examples: 125
Avg accuracy: 0.538


In [10]:
# train Perceptrons on player combat -- defensive
ppns_pcombatDef = train_perceptrons(1, 11, X_pcombatDef_split)
ppns_pcombatDef_results = print_avgs(ppns_pcombatDef)

Avg misclassified examples: 94
Avg accuracy: 0.652


In [11]:
# train Perceptrons on player combat -- both
ppns_pcombat = train_perceptrons(1, 11, X_pcombat_split)
ppns_pcombat_results = print_avgs(ppns_pcombat)

Avg misclassified examples: 78
Avg accuracy: 0.712


In [12]:
# train Perceptrons on spell casts
ppns_spells = train_perceptrons(1, 11, X_spells_split)
ppns_spells_results = print_avgs(ppns_spells)

Avg misclassified examples: 131
Avg accuracy: 0.517


In [13]:
# train Perceptrons on kda/killing sprees
ppns_kda = train_perceptrons(1, 11, X_kda_split)
ppns_kda_results = print_avgs(ppns_kda)

Avg misclassified examples: 51
Avg accuracy: 0.812


In [14]:
# train Perceptrons on vision
ppns_vision = train_perceptrons(1, 11, X_vision_split)
ppns_vision_results = print_avgs(ppns_vision)

Avg misclassified examples: 118
Avg accuracy: 0.566


### Perceptron results

In [15]:
features_about =  ["All features", "Player combat (offensive)", "Player combat (defensive)",
                  "Player combat", "Spells casted", "KDA/kill streaks", "Vision"]
ppn_results_list = [ppns_all_results, ppns_pcombatOff_results, ppns_pcombatDef_results, 
                  ppns_pcombat_results, ppns_spells_results, ppns_kda_results, ppns_vision_results]

ppn_results = pd.DataFrame(ppn_results_list, index = features_about, columns = ["Avg misclassified examples", "Avg accuracy"])
ppn_results = ppn_results.sort_values(by=["Avg accuracy"], ascending=False)
ppn_results

Unnamed: 0,Avg misclassified examples,Avg accuracy
All features,25.54,0.906447
KDA/kill streaks,51.45,0.811538
Player combat,78.53,0.712344
Player combat (defensive),94.95,0.652198
Vision,118.48,0.566007
Player combat (offensive),125.99,0.538498
Spells casted,131.79,0.517253


### Train Adaline SGD

In [16]:
class AdalineSGD:
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
    n_iter : int
      Passes over the training dataset.
    shuffle : bool (default: True)
      Shuffles training data every epoch if True to prevent cycles.
    random_state : int
      Random number generator seed for random weight
      initialization.


    Attributes
    -----------
    w_ : 1d-array
      Weights after fitting.
    b_ : Scalar
        Bias unit after fitting.
    losses_ : list
      Mean squared error loss function value averaged over all
      training examples in each epoch.

        
    """
    def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        self.random_state = random_state
        
    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_examples, n_features]
          Training vectors, where n_examples is the number of examples and
          n_features is the number of features.
        y : array-like, shape = [n_examples]
          Target values.

        Returns
        -------
        self : object

        """
        self._initialize_weights(X.shape[1])
        self.losses_ = []
        for i in range(self.n_iter):
            if self.shuffle:
                X, y = self._shuffle(X, y)
            losses = []
            for xi, target in zip(X, y):
                losses.append(self._update_weights(xi, target))
            avg_loss = np.mean(losses)
            self.losses_.append(avg_loss)
        return self

    def partial_fit(self, X, y):
        """Fit training data without reinitializing the weights"""
        if not self.w_initialized:
            self._initialize_weights(X.shape[1])
        if y.ravel().shape[0] > 1:
            for xi, target in zip(X, y):
                self._update_weights(xi, target)
        else:
            self._update_weights(X, y)
        return self

    def _shuffle(self, X, y):
        """Shuffle training data"""
        r = self.rgen.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        """Initialize weights to small random numbers"""
        self.rgen = np.random.RandomState(self.random_state)
        self.w_ = self.rgen.normal(loc=0.0, scale=0.01, size=m)
        self.b_ = np.float_(0.)
        self.w_initialized = True
        
    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights"""
        output = self.activation(self.net_input(xi))
        error = (target - output)
        self.w_ += self.eta * 2.0 * xi * (error)
        self.b_ += self.eta * 2.0 * error
        loss = error**2
        return loss
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_) + self.b_

    def activation(self, X):
        """Compute linear activation"""
        return X

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.5, 1, 0)

In [17]:
"""
train_adalineSGDs(): train n different Adaline SGD models and test each n times
    input:
        randomStart - start value
        randomEnd - end value
        dataSplits - dictionary of data splits
    output:
        dictionary of Adaline SGD models:
            aSGDs[Adaline SGD model] = [(Misclassified examples, Accuracy)...]
"""
def train_adalineSGDs(randomStart, randomEnd, dataSplits):
        
    # aSGDs[Adaline SGD model] = [(Misclassified examples, Accuracy)...]
    aSGDs = {}

    for randState in dataSplits:

        # current testing and training split
        currentSplit = dataSplits[randState]

        # list of models for each random state
        results = []

        # Train Adaline SGD model
        aSGD = AdalineSGD(n_iter=30, eta=0.01, random_state=randState)
        aSGD.fit(currentSplit[0], currentSplit[2]) # X_train_std, y_train

        # test Adaline SGD model x times
        for i in range(randomStart, randomEnd):

            # put results into dict
            y_pred = aSGD.predict(dataSplits[i][1]) # X_test_std

            # append tuple (Misclassified examples, Accuracy) ==> (y_test != y_pred).sum(), accuracy_score(y_test, y_pred))
            results.append(((dataSplits[i][3] != y_pred).sum(), accuracy_score(dataSplits[i][3], y_pred)))
            
        # append results to current perceptron dict
        aSGDs[aSGD] = results
        
    return aSGDs

In [18]:
# train Adaline SGD on all features
aSGDs_all = train_adalineSGDs(1, 11, X_all_split)
aSGDs_all_results = print_avgs(aSGDs_all)

Avg misclassified examples: 134
Avg accuracy: 0.508


In [19]:
# train Adaline SGD on player combat -- offensive
aSGDs_pcombatOff = train_adalineSGDs(1, 11, X_pcombatOff_split)
aSGDs_pcombatOff_results = print_avgs(aSGDs_pcombatOff)

Avg misclassified examples: 146
Avg accuracy: 0.462


In [20]:
# train Adaline SGD on player combat -- defensive
aSGDs_pcombatDef = train_adalineSGDs(1, 11, X_pcombatDef_split)
aSGDs_pcombatDef_results = print_avgs(aSGDs_pcombatDef)

Avg misclassified examples: 78
Avg accuracy: 0.713


In [21]:
# train Adaline SGD on player combat -- both
aSGDs_pcombat = train_adalineSGDs(1, 11, X_pcombat_split)
aSGDs_pcombat_results = print_avgs(aSGDs_pcombat)

Avg misclassified examples: 131
Avg accuracy: 0.518


In [22]:
# train Adaline SGD on spell casts
aSGDs_spells = train_adalineSGDs(1, 11, X_spells_split)
aSGDs_spells_results = print_avgs(aSGDs_spells)

Avg misclassified examples: 127
Avg accuracy: 0.533


In [23]:
# train Adaline SGD on kda/killing sprees
aSGDs_kda = train_adalineSGDs(1, 11, X_kda_split)
aSGDs_kda_results = print_avgs(aSGDs_kda)

Avg misclassified examples: 145
Avg accuracy: 0.467


In [24]:
# train Adaline SGD on vision
aSGDs_vision = train_adalineSGDs(1, 11, X_vision_split)
aSGDs_vision_results = print_avgs(aSGDs_vision)

Avg misclassified examples: 104
Avg accuracy: 0.618


### AdalineSGD results

In [25]:
aSGD_results_list = [aSGDs_all_results, aSGDs_pcombatOff_results, aSGDs_pcombatDef_results, 
                  aSGDs_pcombat_results, aSGDs_spells_results, aSGDs_kda_results, aSGDs_vision_results]

aSGD_results = pd.DataFrame(aSGD_results_list, index = features_about, columns = ["Avg misclassified examples", "Avg accuracy"])
aSGD_results = aSGD_results.sort_values(by=["Avg accuracy"], ascending=False)
aSGD_results

Unnamed: 0,Avg misclassified examples,Avg accuracy
Player combat (defensive),78.27,0.713297
Vision,104.32,0.617875
Spells casted,127.52,0.532894
Player combat,131.69,0.517619
All features,134.42,0.507619
KDA/kill streaks,145.49,0.46707
Player combat (offensive),146.8,0.462271


### Train logistic regression

In [36]:
from sklearn.linear_model import LogisticRegression

"""
train_lrGDs(): train n different logistic regression GD models and test each n times
    input:
        randomStart - start value
        randomEnd - end value
        dataSplits - dictionary of data splits
    output:
        dictionary of logistic regression GD models:
            lrGDs[logistic regression GD model] = [(Misclassified examples, Accuracy)...]
"""
def train_lrGDs(randomStart, randomEnd, dataSplits):
        
    # lrGDs[logistic regression GD model] = [(Misclassified examples, Accuracy)...]
    lrGDs = {}

    for randState in dataSplits:

        # current testing and training split
        currentSplit = dataSplits[randState]

        # list of models for each random state
        results = []

        # Train logistic regression GD model
        lrGD = LogisticRegression(random_state=randState)
        lrGD.fit(currentSplit[0], currentSplit[2]) # X_train_std, y_train

        # test logistic regression GD model x times
        for i in range(randomStart, randomEnd):

            # put results into dict
            y_pred = lrGD.predict(dataSplits[i][1]) # X_test_std

            # append tuple (Misclassified examples, Accuracy) ==> (y_test != y_pred).sum(), accuracy_score(y_test, y_pred))
            results.append(((dataSplits[i][3] != y_pred).sum(), accuracy_score(dataSplits[i][3], y_pred)))
            
        # append results to current perceptron dict
        lrGDs[lrGD] = results
        
    return lrGDs

In [37]:
# train logistic regression GD on all features
lrGDs_all = train_lrGDs(1, 11, X_all_split)
lrGDs_all_results = print_avgs(lrGDs_all)

Avg misclassified examples: 16
Avg accuracy: 0.941


In [38]:
# train logistic regression GD on player combat -- offensive
lrGDs_pcombatOff = train_lrGDs(1, 11, X_pcombatOff_split)
lrGDs_pcombatOff_results = print_avgs(lrGDs_pcombatOff)

Avg misclassified examples: 115
Avg accuracy: 0.578


In [39]:
# train logistic regression GD on player combat -- defensive
lrGDs_pcombatDef = train_lrGDs(1, 11, X_pcombatDef_split)
lrGDs_pcombatDef_results = print_avgs(lrGDs_pcombatDef)

Avg misclassified examples: 64
Avg accuracy: 0.762


In [40]:
# train logistic regression GD on player combat -- both
lrGDs_pcombat = train_lrGDs(1, 11, X_pcombat_split)
lrGDs_pcombat_results = print_avgs(lrGDs_pcombat)

Avg misclassified examples: 59
Avg accuracy: 0.782


In [41]:
# train logistic regression GD on spell casts
lrGDs_spells = train_lrGDs(1, 11, X_spells_split)
lrGDs_spells_results = print_avgs(lrGDs_spells)

Avg misclassified examples: 117
Avg accuracy: 0.570


In [42]:
# train logistic regression GD on kda/killing sprees
lrGDs_kda = train_lrGDs(1, 11, X_kda_split)
lrGDs_kda_results = print_avgs(lrGDs_kda)

Avg misclassified examples: 36
Avg accuracy: 0.866


In [43]:
# train logistic regression GD on vision
lrGDs_vision = train_lrGDs(1, 11, X_vision_split)
lrGDs_vision_results = print_avgs(lrGDs_vision)

Avg misclassified examples: 95
Avg accuracy: 0.649


### Logistic regression GD results

In [44]:
lrGD_results_list = [lrGDs_all_results, lrGDs_pcombatOff_results, lrGDs_pcombatDef_results, 
                  lrGDs_pcombat_results, lrGDs_spells_results, lrGDs_kda_results, lrGDs_vision_results]

lrGD_results = pd.DataFrame(lrGD_results_list, index = features_about, columns = ["Avg misclassified examples", "Avg accuracy"])
lrGD_results = lrGD_results.sort_values(by=["Avg accuracy"], ascending=False)
lrGD_results

Unnamed: 0,Avg misclassified examples,Avg accuracy
All features,16.09,0.941062
KDA/kill streaks,36.55,0.866117
Player combat,59.44,0.782271
Player combat (defensive),64.87,0.762381
Vision,95.8,0.649084
Player combat (offensive),115.3,0.577656
Spells casted,117.45,0.56978


### Train decision tree

In [45]:
from sklearn.tree import DecisionTreeClassifier

"""
train_dts(): train n different decision tree models and test each n times
    input:
        randomStart - start value
        randomEnd - end value
        dataSplits - dictionary of data splits
    output:
        dictionary of decision tree models:
            dts[decision tree model] = [(Misclassified examples, Accuracy)...]
"""
def train_dts(randomStart, randomEnd, dataSplits):
        
    # dts[decision tree model] = [(Misclassified examples, Accuracy)...]
    dts = {}

    for randState in dataSplits:

        # current testing and training split
        currentSplit = dataSplits[randState]

        # list of models for each random state
        results = []

        # Train decision tree model
        dt = DecisionTreeClassifier(random_state=randState)
        dt.fit(currentSplit[0], currentSplit[2]) # X_train_std, y_train

        # test decision tree model x times
        for i in range(randomStart, randomEnd):

            # put results into dict
            y_pred = dt.predict(dataSplits[i][1]) # X_test_std

            # append tuple (Misclassified examples, Accuracy) ==> (y_test != y_pred).sum(), accuracy_score(y_test, y_pred))
            results.append(((dataSplits[i][3] != y_pred).sum(), accuracy_score(dataSplits[i][3], y_pred)))
            
        # append results to current perceptron dict
        dts[dt] = results
        
    return dts

In [47]:
# train decision tree on all features
dts_all = train_dts(1, 11, X_all_split)
dts_all_results = print_avgs(dts_all)

Avg misclassified examples: 23
Avg accuracy: 0.916


In [48]:
# train decision tree on player combat -- offensive
dts_pcombatOff = train_dts(1, 11, X_pcombatOff_split)
dts_pcombatOff_results = print_avgs(dts_pcombatOff)

Avg misclassified examples: 88
Avg accuracy: 0.674


In [49]:
# train decision tree on player combat -- defensive
dts_pcombatDef = train_dts(1, 11, X_pcombatDef_split)
dts_pcombatDef_results = print_avgs(dts_pcombatDef)

Avg misclassified examples: 56
Avg accuracy: 0.794


In [50]:
# train decision tree on player combat -- both
dts_pcombat = train_dts(1, 11, X_pcombat_split)
dts_pcombat_results = print_avgs(dts_pcombat)

Avg misclassified examples: 55
Avg accuracy: 0.796


In [51]:
# train decision tree on spell casts
dts_spells = train_dts(1, 11, X_spells_split)
dts_spells_results = print_avgs(dts_spells)

Avg misclassified examples: 79
Avg accuracy: 0.710


In [52]:
# train decision tree on kda/killing sprees
dts_kda = train_dts(1, 11, X_kda_split)
dts_kda_results = print_avgs(dts_kda)

Avg misclassified examples: 26
Avg accuracy: 0.903


In [53]:
# train decision tree on vision
dts_vision = train_dts(1, 11, X_vision_split)
dts_vision_results = print_avgs(dts_vision)

Avg misclassified examples: 55
Avg accuracy: 0.796


### Decision tree results

In [54]:
dt_results_list = [dts_all_results, dts_pcombatOff_results, dts_pcombatDef_results, 
                  dts_pcombat_results, dts_spells_results, dts_kda_results, dts_vision_results]

dt_results = pd.DataFrame(dt_results_list, index = features_about, columns = ["Avg misclassified examples", "Avg accuracy"])
dt_results = dt_results.sort_values(by=["Avg accuracy"], ascending=False)
dt_results

Unnamed: 0,Avg misclassified examples,Avg accuracy
All features,23.0,0.915751
KDA/kill streaks,26.41,0.90326
Vision,55.6,0.796337
Player combat,55.67,0.796081
Player combat (defensive),56.2,0.794139
Spells casted,79.21,0.709853
Player combat (offensive),88.95,0.674176


### Train random forest

In [55]:
from sklearn.ensemble import RandomForestClassifier

"""
train_rfs(): train n different random forest models and test each n times
    input:
        randomStart - start value
        randomEnd - end value
        dataSplits - dictionary of data splits
    output:
        dictionary of random forest models:
            rfs[random forest model] = [(Misclassified examples, Accuracy)...]
"""
def train_rfs(randomStart, randomEnd, dataSplits):
        
    # rfs[random forest model] = [(Misclassified examples, Accuracy)...]
    rfs = {}

    for randState in dataSplits:

        # current testing and training split
        currentSplit = dataSplits[randState]

        # list of models for each random state
        results = []

        # Train random forest model
        rf = RandomForestClassifier(random_state=randState)
        rf.fit(currentSplit[0], currentSplit[2]) # X_train_std, y_train

        # test random forest model x times
        for i in range(randomStart, randomEnd):

            # put results into dict
            y_pred = rf.predict(dataSplits[i][1]) # X_test_std

            # append tuple (Misclassified examples, Accuracy) ==> (y_test != y_pred).sum(), accuracy_score(y_test, y_pred))
            results.append(((dataSplits[i][3] != y_pred).sum(), accuracy_score(dataSplits[i][3], y_pred)))
            
        # append results to current perceptron dict
        rfs[rf] = results
        
    return rfs

In [56]:
# train random forest on all features
rfs_all = train_rfs(1, 11, X_all_split)
rfs_all_results = print_avgs(rfs_all)

Avg misclassified examples: 12
Avg accuracy: 0.955


In [57]:
# train random forest on player combat -- offensive
rfs_pcombatOff = train_rfs(1, 11, X_pcombatOff_split)
rfs_pcombatOff_results = print_avgs(rfs_pcombatOff)

Avg misclassified examples: 61
Avg accuracy: 0.775


In [58]:
# train random forest on player combat -- defensive
rfs_pcombatDef = train_rfs(1, 11, X_pcombatDef_split)
rfs_pcombatDef_results = print_avgs(rfs_pcombatDef)

Avg misclassified examples: 35
Avg accuracy: 0.871


In [59]:
# train random forest on player combat -- both
rfs_pcombat = train_rfs(1, 11, X_pcombat_split)
rfs_pcombat_results = print_avgs(rfs_pcombat)

Avg misclassified examples: 30
Avg accuracy: 0.890


In [60]:
# train random forest on spell casts
rfs_spells = train_rfs(1, 11, X_spells_split)
rfs_spells_results = print_avgs(rfs_spells)

Avg misclassified examples: 63
Avg accuracy: 0.767


In [61]:
# train random forest on kda/killing sprees
rfs_kda = train_rfs(1, 11, X_kda_split)
rfs_kda_results = print_avgs(rfs_kda)

Avg misclassified examples: 23
Avg accuracy: 0.914


In [62]:
# train random forest on vision
rfs_vision = train_rfs(1, 11, X_vision_split)
rfs_vision_results = print_avgs(rfs_vision)

Avg misclassified examples: 44
Avg accuracy: 0.836


### Random forest results

In [63]:
rf_results_list = [rfs_all_results, rfs_pcombatOff_results, rfs_pcombatDef_results, 
                  rfs_pcombat_results, rfs_spells_results, rfs_kda_results, rfs_vision_results]

rf_results = pd.DataFrame(rf_results_list, index = features_about, columns = ["Avg misclassified examples", "Avg accuracy"])
rf_results = rf_results.sort_values(by=["Avg accuracy"], ascending=False)
rf_results

Unnamed: 0,Avg misclassified examples,Avg accuracy
All features,12.2,0.955311
KDA/kill streaks,23.59,0.91359
Player combat,30.03,0.89
Player combat (defensive),35.16,0.871209
Vision,44.86,0.835678
Player combat (offensive),61.53,0.774615
Spells casted,63.69,0.766703
