### Agenda

* Data Introduction
    - About the Game
    - About the Data
    - Feature Engineering/Selection
    - The Multicollinearity problem

*  t-SNE
    - observing bins vs avg win probability of features used for modeling
    
* Modeling GBC & MLP
    - GBC model specifications
    - MLP model specifications
    - Hyperpermeter Tuning (MLP)

* Model Evaluation
    - ROC-AUC
    - Classification Matrix
    - Confusion Matrix  

* Practical Uses (It's just a game LOL)

* Final notes future work needed to make practical uses feasible & future work ideas

### Data Introduction

## About the game
League of Legends (LoL) is a multiplayer online battle arena video game developed and published by Riot Games. In League of Legends, players assume the role of a "champion" with unique abilities and battle against a team of other players- or computer-controlled champions. The goal is usually to destroy the opposing team's "Nexus", a structure that lies at the heart of a base protected by defensive structures, although other distinct game modes exist as well with varying objectives, rules, and maps. Each League of Legends match is discrete, with all champions starting off relatively weak but increasing in strength by accumulating items and experience over the course of the game.

https://www.youtube.com/watch?v=mwERJ6qJPuc

About the Data

So what we have is a ten-minute snapshot of scalars reflecting an aspect of team performance and whether that team won or not. 

Target

Our "bluewins" variable is the target and it is a binary feature. The overall goal of this analysis is to extract informative indicators that may lead the blue team to win or lose a match.

In [None]:
# Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn import decomposition
from sklearn.preprocessing import normalize, StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.feature_selection import SelectKBest, f_classif, chi2
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.manifold import TSNE
import scipy
import time
import tensorflow as tf
from tensorflow import keras
import datetime, os, io
from google.colab import files
%matplotlib inline
%load_ext tensorboard

### The Data 

Distribution Chart
We have mixed data types here discreet and continuous. There are noticeable skew and kurtosis aspects of these distributions that vary from what I'd expect to see with a normal distribution, but overall it looks pretty good to me. Not too much data cleaning was needed which was nice, but this dataset presented its own problems which I will glance over later. 

### Feature Engineering

For feature engineering, I created 15 different metrics and I also experimented with team acquisition ratios of different buffs based on what is obtainable within the first ten minutes of a game.

In [None]:
# Uploading data
uploaded = files.upload()
df_og = pd.read_csv(io.BytesIO(uploaded['high_diamond_ranked_10min.csv']))
df_og.columns = map(lambda x:x.lower(), df_og.columns)
df_og['gameid'] = df_og['gameid'].astype(str)
numeric_columns = df_og.select_dtypes(['int64', 'float64']).columns
FILL_LIST = []
for cols in df_og[:]:
    if cols in numeric_columns:
        FILL_LIST.append(cols)
plt.figure(figsize=(35, 95))
plt.subplots_adjust(hspace=1, wspace=1)
for i, col in enumerate(FILL_LIST):
    try:
        plt.subplot(len(FILL_LIST), 7, i+1)
        sns.distplot(df_og[col], kde=False)
        plt.title(col)
    except TypeError:
        pass
plt.tight_layout()

In [None]:
# Just incase
df1 = df_og.copy()

# Feature Engineering comparative team performance variables
df1['bcspermin_diff'] = df1['bluecspermin'] - df1['redcspermin']
df1['btotexp_diff'] = df1['bluetotalexperience'] - df1['redtotalexperience']
df1['bavglvl_diff'] = df1['blueavglevel'] - df1['redavglevel']
df1['bwardsplaced_diff'] = df1['bluewardsplaced'] - df1['redwardsplaced']
df1['bwardsdestroyed_diff'] = df1['bluewardsdestroyed'] - df1['redwardsdestroyed']
df1['btowerdeaths_diff'] = df1['bluetowersdestroyed'] - df1['redtowersdestroyed']
df1['bkills_diff'] = df1['bluekills'] - df1['redkills']
df1['bdeaths_diff'] = df1['bluedeaths'] - df1['reddeaths']
df1['bgold_per_min_diff'] = df1['bluegoldpermin'] - df1['redgoldpermin']
df1['belite_diff'] = df1['blueelitemonsters'] - df1['redelitemonsters']
df1['bdrag_diff'] = df1['bluedragons'] - df1['reddragons']
df1['bheralds_diff'] = df1['blueheralds'] - df1['redheralds']
df1['blaneminions_diff'] = df1['bluetotalminionskilled'] - df1['redtotalminionskilled']
df1['bjgmionions_diff'] = df1['bluetotaljungleminionskilled'] - df1['redtotaljungleminionskilled']
df1['bteamtotminionsdiff'] = (df1['blueelitemonsters'] + df1['bluedragons'] + df1['blueheralds'] + df1['bluetotalminionskilled'] + df1['bluetotaljungleminionskilled']) - (df1['redelitemonsters'] + df1['reddragons'] + df1['redheralds'] + df1['redtotalminionskilled'] + df1['redtotaljungleminionskilled'])

df_ana = df1.loc[:, ['gameid', 'bluewins', 'bluegolddiff', 'blueexperiencediff', 'bkills_diff',
                     'bavglvl_diff', 'bluegoldpermin', 'bluetotalexperience',
                     'blueavglevel', 'bteamtotminionsdiff', 'bluekills',
                     'bcspermin_diff', 'blaneminions_diff', 'bluetotalgold']]

### Features Selected 

I made use of skLearn's SelectKBest and Anova test functions to help me pinpoint the best variables to proceed with.

'bluewins' - binary target

'bluegolddiff' - the difference in accumulated gold

'blueexperiencediff' - the difference in accumulated experience

'bluegoldpermin' - gold accumulated per minute

'bluetotalexperience' - total experienced gained

'bteamtotminionsdiff' - blue team total minions including (elite, rift, and jungle minions) killed compared to red team killed

'blaneminions_diff' - blue team total lane minion difference

'bluetotalgold' - blue total gold acquired


In [None]:
# Uploading data
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['data_chi3_anova9.csv']))
df = df.drop(['Unnamed: 0'], axis = 1)
df.columns = map(lambda x:x.lower(), df.columns)
df['gameid'] = df['gameid'].astype(str)
numeric_columns = df.select_dtypes(['int64', 'float64']).columns
FILL_LIST = []
for cols in df[:]:
    if cols in numeric_columns:
        FILL_LIST.append(cols)
plt.figure(figsize=(35, 95))
plt.subplots_adjust(hspace=1, wspace=1)
for i, col in enumerate(FILL_LIST):
    try:
        plt.subplot(len(FILL_LIST), 7, i+1)
        sns.distplot(df[col], kde=False)
        plt.title(col)
    except TypeError:
        pass
plt.tight_layout()

### Novel Data Problem

Every dataset has it's quirks and this one is no different. Interestingly enough there is a persistence of multicollinearity with the features that correlate with the target considering correlation at a minimum of 0.4. It is for this reason that Principle Component Analysis piped into K-Means could not produce definitive separability between games where the blue team won or lost the match.


In [None]:
mask = np.zeros_like(df_ana.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
plt.figure(dpi=155)
sns.heatmap(abs(df_ana.corr()), mask=mask, vmin=0.4, cmap='gist_heat_r')
df_ana.corr()['bluewins'].sort_values(ascending=False)

The below cells goes from X & y to implementing and plotting t-SNE.

In [None]:
X = df.iloc[:, 2:]
y = df['bluewins'].values.ravel()
scaler = preprocessing.StandardScaler()
X_stand = scaler.fit_transform(X)
X_stand_df = pd.DataFrame(X_stand, columns=X.columns)
feat_cols = [ X_stand_df.columns[i] for i in range(X_stand_df.shape[1]) ]
data2 = pd.DataFrame(X_stand_df,columns=feat_cols)
data2['y'] = y
data2['label'] = data2['y'].apply(lambda i: str(i))

# For reproducability of the results
np.random.seed(57)

# random observation selection
rndperm = np.random.permutation(data2.shape[0])

# Number of observations to use during cluster selection
N = 9000

# dataframe obj holding randomly selected data
data_subset = data2.loc[rndperm[:N],:].copy()

# data values obj from dataframe
df_subset = data_subset[feat_cols].values

time_start = time.time()

tsne = TSNE(n_components=2,
            verbose=1,
            n_iter=1000,
            perplexity=30,
            learning_rate=300,
            early_exaggeration=12)

tsne_results = tsne.fit_transform(data_subset)

print('t-SNE done! Time elasped: {} seconds'.format(time.time() - time_start))

data_subset['tsne-2d-one'] = tsne_results[:,0]
data_subset['tsne-2d-two'] = tsne_results[:,1]

### t-SNE

This is when I made use of the t stochastic neighbor embedded model created by Laurens van der Maaten and Geoffrey Hinton, to explore/visualize dimensional separability between games where the blue team won or lost. This is the final product of that exploration. t-SNE makes use of the student t-test to compare variables in a high and low dimensional space using distance to assume similarity. 

It's easiest to think of every point on this plot as a game with the most similar results of that ten-minute snapshot clustered together and the point's color represents that game result. Looking at this two-dimensional plot and considering the four quadrants of it we can observe that a loss likely to occur if team performance in the first ten minutes of the game is more similar to the games in quadrants I & III vs II & IV, where it appears you are more likely to win as the blue team. There appears to be very little intermingling here of the red and blue points, which is good to see and fun to work through. Seeing is believing and now that we can observe some sort of boundary, we can take a peek at how each of the selected variables impacts where are likely to at the end of a game. 

In [None]:
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="y",
    palette=sns.color_palette("hls", 2),
    data=data_subset,
    legend="full",
    alpha=0.3)

In [None]:
# Binning features in new Dataframes for Visualizing
df_bins = df.copy()
df_bins['bluegolddiff_bins'] = pd.qcut(df_bins['bluegolddiff'], q=10)
pw_gd = df_bins.groupby('bluegolddiff_bins')['bluewins'].mean()
pw_gd = pw_gd.reset_index()
pw_gd.columns = ['bluegolddiff_bins', 'wp_bluegolddiff']
pw_gd['Delta'] = pw_gd['wp_bluegolddiff'].shift(-1) - pw_gd['wp_bluegolddiff']

df_bins['blueexperiencediff_bins'] = pd.qcut(df_bins['blueexperiencediff'], q=10)
pw_be = df_bins.groupby('blueexperiencediff_bins')['bluewins'].mean()
pw_be = pw_be.reset_index()
pw_be.columns = ['blueexperiencediff_bins', 'wp_blueexperiencediff']
pw_be['Delta'] = pw_be['wp_blueexperiencediff'].shift(-1) - pw_be['wp_blueexperiencediff']

df_bins['bluegoldpermin_bins'] = pd.qcut(df_bins['bluegoldpermin'], q=10)
pw_gm = df_bins.groupby('bluegoldpermin_bins')['bluewins'].mean()
pw_gm = pw_gm.reset_index()
pw_gm.columns = ['bluegoldpermin_bins', 'wp_bluegoldpermin']
pw_gm['Delta'] = pw_gm['wp_bluegoldpermin'].shift(-1) - pw_gm['wp_bluegoldpermin']

df_bins['bluetotalexperience_bins'] = pd.qcut(df_bins['bluetotalexperience'], q=10)
pw_te = df_bins.groupby('bluetotalexperience_bins')['bluewins'].mean()
pw_te = pw_te.reset_index()
pw_te.columns = ['bluetotalexperience_bins', 'wp_bluetotalexperience']
pw_te['Delta'] = pw_te['wp_bluetotalexperience'].shift(-1) - pw_te['wp_bluetotalexperience']

df_bins['bteamtotminionsdiff_bins'] = pd.qcut(df_bins['bteamtotminionsdiff'], q=10)
pw_tm = df_bins.groupby('bteamtotminionsdiff_bins')['bluewins'].mean()
pw_tm = pw_tm.reset_index()
pw_tm.columns = ['bteamtotminionsdiff_bins', 'wp_bteamtotminionsdiff']
pw_tm['Delta'] = pw_tm['wp_bteamtotminionsdiff'].shift(-1) - pw_tm['wp_bteamtotminionsdiff']

df_bins['blaneminions_diff_bins'] = pd.qcut(df_bins['blaneminions_diff'], q=10)
pw_lm = df_bins.groupby('blaneminions_diff_bins')['bluewins'].mean()
pw_lm = pw_lm.reset_index()
pw_lm.columns = ['blaneminions_diff_bins', 'wp_blaneminions_diff']
pw_lm['Delta'] = pw_lm['wp_blaneminions_diff'].shift(-1) - pw_lm['wp_blaneminions_diff']

df_bins['bluetotalgold_bins'] = pd.qcut(df_bins['bluetotalgold'], q=10)
pw_tg = df_bins.groupby('bluetotalgold_bins')['bluewins'].mean()
pw_tg = pw_tg.reset_index()
pw_tg.columns = ['bluetotalgold_bins', 'wp_bluetotalgold']
pw_tg['Delta'] = pw_tg['wp_bluetotalgold'].shift(-1) - pw_tg['wp_bluetotalgold']

### Binning

What we are going to see in the next is a series of histograms accompanied by a DataFrame that expresses the average win percentage relating to One of ten bins that feature has been condensed to. As we traverse through each bin, we expect the next bin relates to better performance and reasonably contributes to a higher average win percentage. I also have a delta column that makes it easy to see the average percent change per bin. Keep in mind if we are looking at a different metric than the blue starts the game closer to the middle bins. If not, then the blue team would start the game at bin zero. What I want to guide your eye towards and what I was on the lookout for here is; Where could my team’s effort be spent most effectively up to that ten-minute mark in the game and how can we detect clear wins or losses? (The option becomes available to surrender at 14 mins) So we are looking for the most informative bin changes.  With that in mind let’s proceed.


#### Blue Gold Diff

The biggest deltas are realized when leaving bins 0 and 7. So ideally the would be targeting the eighth bin starting at the 4th bin, but getting to the 0th bin pretty much means you’re going to get wrecked. Keep in mind that the bins on the end tend to have longer tails, but the fact remains if that’s where you are you’re not doing well.


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='bluegolddiff_bins', y='wp_bluegolddiff', data=pw_gd)
pw_gd

#### Blue experience Diff

The goal is pretty similar here starting in the fourth bin and trying to get to the 9th bin while avoiding the 0th bin. 


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='blueexperiencediff_bins', y='wp_blueexperiencediff', data=pw_be)
pw_be

#### Blue gold per min

Regarding this bin, its best get out of that first bin, i.e accumulate more than 1601 gold per min or else the game becomes a lot harder.


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='bluegoldpermin_bins', y='wp_bluegoldpermin', data=pw_gm)
pw_gm

#### Blue total exp

This one is similar to the last. The 0th, 8th, 9th bins are key determinants to where your team might end up. 


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='bluetotalexperience_bins', y='wp_bluetotalexperience', data=pw_te)
pw_te

#### Blue team total minions

This one is interesting in that even the 0th bin still leaves the team with a 20% chance of winning the match on average. So it might be a good idea to pick team battles for lanes in a way that minimizes minions lost and places the team effort into something more fruitful


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='bteamtotminionsdiff_bins', y='wp_bteamtotminionsdiff', data=pw_tm)
pw_tm

#### Blue lane minion difference

Somewhat of the same deal here


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='blaneminions_diff_bins', y='wp_blaneminions_diff', data=pw_lm)
pw_lm

#### Blue total gold

The difference between bins 0 and 1 are the biggest we observed so far. Just being in bin 1 give you an average 27-28 percent chance. 12 point bonus on reaching that last bin too. A feature that I would definitely watch all game, something that is always on display during Pro-level games.


In [None]:
plt.figure(figsize=(20,10))
sns.barplot(x='bluetotalgold_bins', y='wp_bluetotalgold', data=pw_tg)
pw_tg

### Modeling 

GBC
So I choose to use the GBC arbitrarily as it’s preferred one of mine and I thought based on the information and complexity. I Could have tried other models, but this model is more of a benchmark model while the MLP is really the star of the show. Here’s what hyper-parameters are set to and I used GridSearchCV in a different notebook. 


In [None]:
X = df.iloc[:, 2:]
scaler = StandardScaler()
X_stand = scaler.fit_transform(X)
X_df = pd.DataFrame(X_stand, columns=X.columns)
y = df.iloc[:, 1:2].values.ravel()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)

In [None]:
# show model hyperameters
gbc_tuned = GradientBoostingClassifier(learning_rate=0.01,
                                       n_estimators=300,
                                       subsample=0.1,
                                       min_samples_leaf=2,
                                       min_samples_split=2, 
                                       max_depth=4,
                                       max_features=6,
                                       min_impurity_decrease=0.1)

#### GBC Variable Importance Chart

I personally love to see the relative variable feature importance in supervised methods because I lose visibility as a cost of doing business in the universe of unsupervised learning.

In [None]:
gbc_tuned.fit(X_train, y_train)
# we are making predictions here
y_preds_train = gbc_tuned.predict(X_train)
y_preds_test = gbc_tuned.predict(X_test)
feature_importance = gbc_tuned.feature_importances_
# make importances relative to max importance
feature_importance = 100.0 * (feature_importance / feature_importance.max())
sorted_idx = np.argsort(feature_importance)
pos = np.arange(sorted_idx.shape[0]) + .5
plt.barh(pos, feature_importance[sorted_idx], align='center')
plt.yticks(pos, X.columns[sorted_idx])
plt.xlabel('Relative Importance')
plt.title('Variable Importance')

#### How Does the MLP work?

The moment at least I have been waiting for is here, the star of the show The MLP! I know the epoch accuracy and loss visualized in this manner looks unsavory, but if you focus your eye to the y-axis of this chart you’ll notice the model bounces in between .03 points quite sporadically. 

What I built was a DeepNet with 2 hidden layers. There are 7 variables that are feed into a node on their own which allows us to start feeding forward information. These 7 nodes feed the next layer of 70 nodes this input. The DeepNet needs to start with some weights and then iteratively update them to reduce loss. The kernel initializer provides a function to use for initializing the weights applied to the information received from the previous node, before applying the activation function. After that is done the same process is repeated in the second layer with different hyperparameters trained in a different notebook.|

In [None]:
%%time
def build_mlp():
  model = keras.Sequential([keras.layers.Dense(70, input_shape=(None, 7), kernel_initializer='glorot_normal',
                                               activation='sigmoid',),
                            keras.layers.Dense(70, activation='selu', kernel_initializer='normal'),
                            keras.layers.Dense(1, activation='sigmoid', kernel_initializer='he_normal')])
  optomizer = tf.keras.optimizers.RMSprop(learning_rate=0.001, momentum=0.1)
  model.compile(optimizer=optomizer, loss='binary_crossentropy', metrics=['accuracy'])
  return model

logdir9 = os.path.join("logs23", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir9, histogram_freq=1)
  
mlp_model = build_mlp()
mlp_model.fit(X_train, y_train, batch_size=15, epochs=100, verbose=0, callbacks=[tensorboard_callback])

y_mlp = mlp_model.predict(X_test).ravel()
%tensorboard --logdir logs23

#### Hyperparameter Tuning

Here’s cell construction that leads to the model provided in this report. Most of the parameters specified were chosen by a cell like this. Doing multiple tests across each parameter to find the best one for each layer manually. Some parameters were chosen with other parameters during Grid Searching in a way that made sense to me. The order in which parameters were tested was as follows: optimization algorithm, learning rate, and momentum, kernel initializer, Activation Function, Dropout rate and AlphaDropout for the second level because of the synergy created with the ‘selu’ activation function used, batch size and epochs, finally neurons per layer, excluding the last layer.

After doing that for a while, I got an idea to compose a cell that could help me uncover possible blind spots to the previous way of training by essentially testing more scenarios at once.

In [None]:
## Grid to determine each layer's activation function
def create_model(activation='softmax'):
  model = keras.Sequential([keras.layers.Dense(70, input_shape=(None, 7), activation='sigmoid'),
                            keras.layers.Dense(70, activation='selu'),
                            keras.layers.Dense(1, activation=activation)])
  opt = tf.keras.optimizers.Adagrad(learning_rate=0.005, initial_accumulator_value=0.1, epsilon=1e-07)
  model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
  return model

# fix random seed for reproductibility
seed = 7
np.random.seed(seed)
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)

# defining grid search parameters
activation = ["tanh", "sigmoid", 'relu', 'selu', 'softsign', 'softmax']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
  print("%f (%f) with: %r" % (mean, stdev, param))

#### Hyperparameter Tuning continued…

Here’s overall where I landed. This didn’t work as well as I thought. I had a hard time quantifying the computational limits of my environment and keeping the model parameter options within that limit. To be honest, there’s probably a better way to compose this algorithm. Perhaps in a fashion that would provide more consistent results (referring to printing a result before the kernel died) and/or quantifying my problem in a solvable manner given the time constraints.

The last layer receives the output from the second hidden layer and returns an output (for each game the model is used on this output is a number between 0 and 1) that I used for predictions. Should also note that there was no class imbalance, to begin with. 


In [None]:
# STAR GRID CELL
def create_model(learning_rate=0.001, momentum=0.0, optimizer='SGD',
                 loss='binary_crossentropy', init_mode='uniform', neurons=1,
                 activation='sigmoid', dropout_rate=0.0, weight_constraint=0):
  model = keras.Sequential([keras.layers.Dense(neurons, input_shape=(None, 7), kernel_initializer='glorot_normal', activation=activation, kernel_constraint=maxnorm(weight_constraint),),
                            keras.layers.Dropout(dropout_rate),
                            keras.layers.Dense(neurons, activation=activation, kernel_initializer='normal'),
                            keras.layers.Dropout(dropout_rate),
                            keras.layers.Dense(1, activation='sigmoid', kernel_initializer='he_normal')])
  model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
  return model

model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
weight_constraint = [0, 2, 4, 6]
dropout_rate = [0.0, 0.2, 0.4, 0.6, 0.8]
activation = ["tanh", "sigmoid", 'relu', 'softmax'] # could add 'selu'
init_mode = ['uniform', 'normal', 'glorot_normal', 'he_normal', 'he_uniform'] # ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
optimizer = ['RMSprop']
loss = ['binary_crossentropy']
neurons = [70, 85, 100]
learning_rate = [0.001, 0.01, 0.05, 0.1]
momentum = [.0, .2, .4]
batch_size = [14, 34, 72]
epochs = [200]

# Create Parameter Grid Object & Train Model
param_grid = dict(learning_rate=learning_rate, momentum=momentum,
                  batch_size=batch_size, epochs=epochs,
                  loss=loss, optimizer=optimizer,
                  activation=activation,
                  weight_constraint=weight_constraint,
                  dropout_rate=dropout_rate)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

In [None]:
df['bluewins'].value_counts()

### Model Evaluation ROC-AUC

Here I will be comparing these model scores discussing the pros and cons to each.

Receiver Operating Characteristic (ROC) curves are a measure of a classifier’s predictive quality that compares and visualizes the tradeoff between the models’ sensitivity and specificity. The ROC curve displays the true positive rate on the Y-axis and the false positive rate on the X-axis on both a global average and a per-class basis. The ideal point is therefore the top-left corner of the plot: false positives are zero and true positives are one.

This leads to another metric, area under the curve (AUC), a computation of the relationship between false positives and true positives. The higher the AUC, the better the model generally is. The highest score possible is 1, which would reflect a perfect score.

So the MLP is doing better than the GBC on both metrics, but not by substantially more. The cost of exploring and developing the MLP was high in technical complexity for me personally, whereas the GBC is easier to reason about. 

In [None]:
fpr_gbc, tpr_gbc, thresholds_gbc = roc_curve(y_test, y_preds_test)
auc_gbc = auc(fpr_gbc, tpr_gbc)

fpr_mlp, tpr_mlp, thresholds_mlp = roc_curve(y_test, y_mlp)
auc_mlp = auc(fpr_mlp, tpr_mlp)

plt.plot(fpr_gbc, tpr_gbc, label='GBC (area = {:.3f})'.format(auc_gbc))
plt.plot(fpr_mlp, tpr_mlp, label='MLP (area = {:.3f})'.format(auc_mlp))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

As we look at the following metrics make a mental note that I experienced higher variance with the MLP’s scores when returning to my work to test the model again. Sometimes there’s higher specificity and others there’s higher sensitivity. Just things to keep in mind.

In [None]:
scores = cross_val_score(gbc_tuned, X_train, y_train, cv=3)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print('GBC AUC : ', auc_gbc)
print()
print('Confusion Matrix GBC')
print(confusion_matrix(y_test, y_preds_test))
print()
print('Classification Matrix GBC')
print(classification_report(y_test, y_preds_test))

In [None]:
# Add cross_validation_score for mlp
print('MLP AUC : ', auc_mlp)
print()
print('Confusion Matrix MLP')
print(confusion_matrix(y_test, y_mlp.round(decimals=0, out=None)))
print()
print('Classification Matrix MLP')
print(classification_report(y_test, y_mlp.round(decimals=0, out=None)))

### Limitations and Future Work

Think back to hyperparameter tuning of the MLP - the output of the final node is actually a value between or including 0 and 1. Here’s the distribution of that output. As we can see, there is some middle ground here that could be exposed to develop some cool near win/loss analysis. But also as an additional usability case, you can use this to determine where your game is by running the model after you pull the data into a notebook and architected a structure to get here.

In [None]:
plt.title('Blue Wins Distribution')
sns.distplot(y)

In [None]:
plt.title("MLP Prediction Distribution")
sns.distplot(y_mlp)

#### Limitations and Future Work continued

Player Positivity Score
  * Communication is key in this game
  * Swearing is reported by RIOT to negatively affect win rate
  * And I can personally confirm that your teammates may tilt you if you go in
  without tough skin and reminding players it's just a game may not help LOL

More Data AKA More Games
Ten-minute snapshot
  * Could use whole games
  * Could also use the time dimension

Could Answer more pinpointed questions - (Surrender becomes available at 14 minutes) How many teams ff when they have more than a 40% chance of winning?


Also, there are various minion types from 2 different lane minions to different jungle and rift minions that provide different team buffs, and the return on those haven't been investigated here.

Player skill information relating to the role
  * If you “main” mid then forced to the jungle you may perform under the
  the expected curve of player performance which creates openings for the enemy
  team to take advantage of

Player skill relating to champion
  * Most players look at their “main” champs win rate 
  * What about the champ they are playing against
  * Could be a counter pick (picked during champion select that has a natural 
  the advantage over another champion)

Late or Early Game Team orientation
  * Score composed based on the champions on a team and when their power spikes 
  dropoff is likely to occur
  * “Early Game Teams” have more power in the early game but tend to fall off 
  later
  * There is also the concept of “Late Game Teams”

Season Patch
  * Champion and item buffs and nerfs certainly may have an effect

### Example Use Cases

- You could predict on your own game

- A team with a coach could use this model for premium game information

- Pregame information to determine where your energy as a player is best spent. Also sharing the information with your team

- To save time, if you're just getting wrecked in a department that leaves your team in a definite loss zone you could use this to know if you should FF. Games can last around 1 to 2 hours but can be resolved in a lot less. Saving everyone a little time and frustration


Thanks for checking out the notebook! Feel free to add me on Linkedin and just chat if you're in the mood LOL. 

https://www.linkedin.com/in/lateef-medley/

Best, 

Lateef