![](https://github.com/oracle-devrel/leagueoflegends-optimizer/raw/livelabs/images/structure_2023.webp)


If we haven't already, we need to install Python dependencies for our environment.

In [None]:
#!pip install -r /Users/mattgunnin/Sites/AI/00_Active/leagueoflegends-optimizer/deps/requirements_2023.txt

# Data Preparation & Exploration

First, we will read the data from our previously-exported CSV file. Then, we will split the CSV file into train and test so that we can use different CSV files for training and testing.

In [None]:
import pandas as pd
from autogluon.tabular import TabularDataset
from sklearn.model_selection import train_test_split
import warnings
import numpy as np
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
print(plt.style.available)
plt.style.use('ggplot')

In [None]:
df = pd.read_csv(
    "/Users/mattgunnin/Sites/AI/20_GPT/gpt_lolcoach/performance_report.csv",
    skipinitialspace=True,
    index_col=[0]
)
# special train-test split into two equally shaped dataframes
df['split'] = np.random.randn(df.shape[0], 1)

msk = np.random.rand(len(df)) <= 0.85 # I create a boolean mask so my resulting dataframe is easily filterable,
# like this:
train = df[msk]
test = df[~msk]

# put into files for future use
train.to_csv('train.csv')
test.to_csv('test.csv')

In [None]:
train['calculated_player_performance'].hist()

In [None]:
train[['calculated_player_performance', 'f1', 'f2', 'f3']].describe()

In [None]:
train.hist(column=['assists', 'baronKills', 'champExperience',
                   'deaths', 'detectorWardsPlaced', 'dragonKills', 'goldEarned', 
                   'goldSpent', 'kills', 'largestCriticalStrike', 'largestMultiKill', 'largestKillingSpree',
                   'doubleKills', 'tripleKills', 'quadraKills', 'pentaKills', 'totalDamageDealt', 'totalDamageTaken', 'visionScore',
                   'wardsKilled', 'turretKills', 'duration', 'f1', 'f2', 'f3'],
           figsize=(30, 30),
           bins=10)

In [None]:
X = train

# We will create a first model with no f's, and removing the auxiliary 'split' column (which was used for train-test splitting).
X = X.drop(columns=['f1', 'f2', 'f3', 'f4', 'f5', 'split'])

In [None]:
# We create a TabularDataset object (it's a Pandas Dataframe with more powers)
train_data = TabularDataset(X)

In [None]:
# This is the kind of data we can expect:
train_data.tail(2)

In [None]:
train_data.columns

In [None]:
train_data.describe()

In [None]:
print(train_data.iloc[0])

In [None]:
list_available_graphs = ['stacked_histogram', 'density', 'box_plot', 'scatter_patterns']

df1 = X.cumsum()
print('Calculated cumulative sum of df')
ax = df1.plot()
print('Got ax')

for x in range(len(list_available_graphs)):
    print('Creating multiple visualizations...')
    
    ax = df1.plot()
    if x == 'stacked_histogram':
        ax = X.plot.hist(bins=25, stacked=True) # for stacked histogram plot
    elif x == 'density':
        ax = X.plot.kde() # for a density plot
    elif x == 'box_plot':
        X.plot.box(vert=False) # for a box plot
    elif x == 'scatter_patterns':
        ax = X.plot.scatter(x='x', y='y') # for comparing scatter patterns between variables x and y


    # from here down – standard plot output
    ax.set_title('Visualization {}'.format(x))
    ax.set_xlabel('X Axis')
    ax.set_ylabel('Y Axis')
    fig = ax.figure
    fig.set_size_inches(8, 3)
    fig.tight_layout(pad=1)
    fig.savefig('filename_{}.png'.format(x), dpi=125)
    plt.close()

# Model Training

Now that we've seen the shape of our dataset and we have the variable we want to predict (in this case, calculated_player_performance), we train as many models as possible for 10 minutes.

Training

In [None]:
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='calculated_player_performance',
                             verbosity=2,
                            problem_type='regression',
                            path='./player_performance_models',
                            ).fit(train_data, time_limit=10*60, presets='medium_quality')
# https://auto.gluon.ai/0.5.2/tutorials/tabular_prediction/tabular-quickstart.html#presets # medium_quality, good_quality, high_quality, best_quality
# Among the three presets, medium_quality has the smallest model size. 

Monitoring

In [None]:
# We display a leaderboard of the best trained models ordered by decreasing RMSE 
predictor.leaderboard()

In [None]:
predictor.fit_summary(show_plot=True)
# this show_plot=True will generate a HTML file with detailed infromation about each model

Predicting with ensemble of models

# Model Testing

Now that we have our first set of models trained, let's demonstrate how to make predictions on new data. Since we previously created test.csv, we can use the data that's in there already.¡

In [None]:
test_data = TabularDataset(test)

test_data.head()

In [None]:
# We make all predictions in parallel
predictor.predict(test_data)

In [None]:
# Return the class probabilities for classification -> since this is a regression problem, probabilities are the same.
# predictor.predict_proba(test_data)

The MSE, MAE, RMSE, and R-Squared metrics are mainly used to evaluate the prediction error rates and model performance in regression analysis.
- MAE (Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set.
- MSE (Mean Squared Error) represents the difference between the original and predicted values extracted by squared the average difference over the data set.
- RMSE (Root Mean Squared Error) is the error rate by the square root of MSE.
- R-squared (Coefficient of determination) represents the coefficient of how well the values fit compared to the original values.
    The value from 0 to 1 interpreted as percentages. The higher the value is, the better the model is.
- The Pearson correlation coefficient is a descriptive statistic, meaning that it summarizes the characteristics of a dataset
    Specifically, it describes the strength and direction of the linear relationship between two quantitative variables.

In [None]:
# Evaluate various metrics, it needs test_data to have the label column
predictor.evaluate(test_data)

# This helps us evaluate how well our model behaves
'''

'''

In [None]:
# Understand the importance of each feature. -> How much it affects the decision making of our models
predictor.feature_importance(test_data)

## Predicting with one only model
Even if we're creating several models, we can choose to use our favorite; even though the best performing models are usually weighted ensembles.

In [None]:
# Get a list of string names
models = predictor.get_model_names()
# Predict with the 2nd model. Both predict_proba and evaluate also accept the model argument
predictor.predict(test_data, model=models[1])

# Creating a Win Predictor

Now that we have a model that successfully predicts each player's performance, we will create a second group of models to predict the binary variable 'win'. This is just something extra, as the other model would be sufficient to determine how well you're performing, but I decided to provide as many relatively-useful models as possible.

In [None]:
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='win',
                             verbosity=2,
                            problem_type='binary', # ‘binary’, ‘multiclass’, ‘regression’, ‘quantile’
                            path='./winner_models',
                            ).fit(train_data, time_limit=10*60, presets='medium_quality') # https://auto.gluon.ai/0.5.2/tutorials/tabular_prediction/tabular-quickstart.html#presets # medium_quality, good_quality, high_quality, best_quality 

In [None]:
predictor.leaderboard()

![](https://github.com/oracle-devrel/leagueoflegends-optimizer/raw/livelabs/images/example_ensemble.png)
> **Note**: this is an example of an weighted ensemble model, in which decisions are taken using a technique called **bagging**: every model makes a prediction, and the best models will weigh more upon the final decision.

In [None]:
print(predictor.path)

# Model Inference

In this small chapter, we learn how to import already-trained models to this notebook (or any Python script) from our local storage.

In [None]:
del predictor

predictor = TabularPredictor.load('./winner_models/')

In [None]:
predictor.predict(test_data.iloc[0:5])

# Live Client API-Compatible Model

Now, we build a model compatible with the data that Live Client API provides. 

To give you an idea of the type of data present in this API, here's are some images of the full data:

![](https://github.com/oracle-devrel/leagueoflegends-optimizer/raw/livelabs/images/live_client_1.PNG)

This data was the one we primarily used last year: having information from the player on their current stats, we built a model that considered the player's stats and returned a winning probability. However, since stats aren't as important in our models (as observed by predictor.feature_importance(test_data)), the model had about 65-70% accuracy only.

However, we're interested in also getting the player level from this structure.

![](https://github.com/oracle-devrel/leagueoflegends-optimizer/raw/livelabs/images/live_client_2.PNG)

From this `gameData` structure, we get the `gameTime` variable to get player statistics per minute.

![](https://github.com/oracle-devrel/leagueoflegends-optimizer/raw/livelabs/images/live_client_3.PNG)

And, from this last object, we will extract:
- Kills
- Deaths
- Assists

And compute: 
- Kills + assists / gameTime ==> kills + assists ratio ==> f2
- Deaths / gameTime ==> death ratio ==> f1
- xp / gameTime ==> xp per min ==> f3

In our dataset, we also had two other variables that I was hoping I could also calculate with Live Client API data, but these variables weren't possible to accurately calculate:
- f4, which represented the total amount of damage per minute, wasn't present in the Live Client API in any field
- f5, which represented the total amount of gold per minute, wasn't either. You can only extract the **current** amount of gold, which doesn't add any real value to the model.



So, the idea now is to create a model that, given f1, f2 and f3, and the champion name, is **able to predict any player's performance**.

In [None]:
time_limit = 10*60  # train various models for ~10 min

# dataset f1...f5
'''
    'f1': deaths_per_min, - present
    'f2': k_a_per_min, - present
    'f3': level_per_min, - present
    'f4': total_damage_per_min, - NOT present
    'f5': gold_per_min, - NOT present
'''

# try a model with only f1...f3 as features and player performance as target

X = train
X = X[['championName', 'f1', 'f2', 'f3', 'calculated_player_performance']]
# This model will have 4 inputs and 1 output: calculated_player_performance.

In [None]:
# We instantiate the predictor and start fitting the model with our data.
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='calculated_player_performance',
                             verbosity=2,
                            problem_type='regression', # ‘binary’, ‘multiclass’, ‘regression’, ‘quantile’
                            path='./live_model_1',
                            ).fit(X, time_limit=time_limit, presets='medium_quality',
                                 #hyperparameters=hyperparameters,
                                 #hyperparameter_tune_kwargs=hyperparameter_tune_kwargs
                                ) # https://auto.gluon.ai/0.5.2/tutorials/tabular_prediction/tabular-quickstart.html#presets # medium_quality, good_quality, high_quality, best_quality 


In [None]:
# See how well it went
predictor.leaderboard()

In [None]:
new_test = test[['championName', 'f1', 'f2', 'f3', 'calculated_player_performance']]

predictor.feature_importance(new_test)