# League of Legends Ranked Game Analysis
League of Legends (LoL) is a multiplayer online battle arena video game developed and published by Riot Games for Microsoft Windows and macOS, in which players assume the role of a "champion" with unique abilities, varying around their class, and battle against a team of other player- or computer-controlled champions..

![League of Legends](https://www.gamersdecide.com/sites/default/files/styles/news_images/public/content-images/news/2016/10/15/league-legends-top-ten-wallpapers/hc6k-custom.jpg)

In the main game mode, Summoner's Rift, the goal is to destroy the opposing team's Nexus, a structure that lies at the heart of their base, protected by defensive structures. 

**Summoner's Rift** is the flagship game mode of League of Legends. When the mainstream press covers the game, it is Summoner's Rift that they refer to. On this map, two teams of five players compete to destroy an enemy structure called a Nexus, which is guarded by the enemy team and a number of defensive structures called turrets, or towers. Each team aims to defend its own structures and destroy the other team's structures. There are six turrets on each lane, three per team, and destroying one generates gold for the team that does so. Two additional turrets protect each team's nexus.

![Summoner's Rift](https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Map_of_MOBA.svg/240px-Map_of_MOBA.svg.png)

Each nexus is located in each enemy base on opposite sides of the map, in the lower-left and upper-right hand corners. Minions are generated in waves from each team's nexus, which advance toward the enemy base along three paths: top, middle, and bottom lanes. Players compete to advance these waves of minions into the enemy base, which allows them to destroy enemy structures and ultimately win the match. Summoner's Rift matches typically last between 30–40 minutes if played until nexus destruction, but an early surrender functionality is available in the game.

**Essential Game Elements**

* Champions: are the player-controlled character in League of Legends icon League of Legends. Each champion possesses unique abilities and attributes.
* Minions: are units that comprise the main force sent by the Nexus. They spawn periodically from their nexus and advance along a lane towards the enemy nexus, automatically engaging any enemy unit or structure they encounter. They are controlled by artificial intelligence, and only use basic attacks.
* Monsters: are neutral units in League of Legends. Unlike minions, monsters do not fight for either team, but when killed, can give a team an advantage over the other.
* A ward: is a deployable unit that removes the fog of war over the surrounding area.
* Turrets: also called towers, are heavy fortifications that attack enemy units on sight. Turrets are a core component of League of Legends. They deal damage to enemies and provide vision to their team, allowing them to better control the battlefield.
* Gold: is the in-game currency of League of Legends. It is used to buy items in the shop that provide champions with bonus stats and abilities, which in turn is one of the main ways for champions to increase their power over the course of a game.
* Champion experience (XP): is a game mechanic that allows champions to level up after reaching certain amounts of experience. Leveling up allows them access to new abilities or higher ranks of existing abilities.

# This Dataset

This dataset contains the first 10min. stats of approx. 10k ranked games (SOLO QUEUE) from a high ELO (DIAMOND I to MASTER). Players have roughly the same level, our goal is to do some analysis on the importance of each element in winning a game, and also develop a model that predicts which team will win.

# Importing Libraries
In this section we import different libraries for Analysis, Visualization, Modeling and Evaluation

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
from sklearn.model_selection import train_test_split # Training/Testing set split for performance measures
from sklearn.tree import DecisionTreeClassifier, plot_tree # Rule based classifier
from sklearn.metrics import accuracy_score, confusion_matrix # Performance measures

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Data Analysis and Preprocessing
First we're going to import the dataset and view its information to better understand the features and their datatypes, and also check for null values

In [None]:
# importing data
dataset = pd.read_csv("../input/league-of-legends-diamond-ranked-games-10-min/high_diamond_ranked_10min.csv")
dataset.info()

It turns out there are a lot of features for each game, indicating the stats of the Red and the Blue teams, and a column that indicates if the Blue team wins (1) or not (0)

In [None]:
dataset.head()

# Removing Redundant Data
There are some features that we're not going to use, and some other features that are considered redundant, such as:
* gameId: A unique key to identify each game, but unnecessary in analysis and modeling
* redFirstBlood: The opposite of blueFirstBlood, so if blueFirstBlood is 0, we already know that the red team has drawn first blood!
* redGoldDiff & redExperienceDiff: The negative of blueGoldDiff & blueExperienceDiff, so if blue gold/experience difference is negative, we already know that the blue team is behind on gold/experience.

In [None]:
dataset = dataset.drop(['gameId', 'redFirstBlood', 'redGoldDiff', 'redExperienceDiff'], axis=1) #Redundant Columns

# Dataset Description
In this section we're going to view a description of the dataset and some statistics on the continuous variables, including minimum and maximum values, mean, std, this description is very helpful if we're going to work on an algorithm that needs feature scaling, and helps us determine if the dataset needs to be scaled or not 

In [None]:
dataset.describe()

This description shows us that the features are on different scales, and if we're going to use an algorithm that needs feature scaling, then feature scaling for this dataset is necessary.

# Dataset Balance
A very important task in a classification problem, is to check if the dataset is balanced or not, fortunately it turns out the dataset is perfectly balanced

In [None]:
palette=sns.color_palette(['r', 'b'])
ax1 = sns.countplot(dataset.blueWins, palette=palette)
ax1.set(xticks=[0, 1], xticklabels=['Red Wins', 'Blue Wins'])
ax1.set_title('Blue vs Red Wins')
ax1.set_xlabel('')

# Dataset Correlations
Correlation means association - more precisely it is a measure of the extent to which two variables are related. ... Therefore, when one variable increases as the other variable increases, or one variable decreases while the other decreases.
In this section, we're going to measure correlations between features of the dataset themselves, and between features of the dataset and the output, to see how much, for example: increasing blueKills contributes to the whole winning process.

In [None]:
fig, ax = plt.subplots(figsize=(12, 12))
sns.heatmap(dataset.corr(), ax=ax, cmap='seismic_r')

In [None]:
def top_n_correlations(dataframe, n):
    
    correlations = dataframe.corr().unstack()
    correlations = correlations['blueWins'].abs().sort_values(kind='quicksort', ascending=False)
    
    if not n:
        return correlations
    
    return correlations[0:n]

In [None]:
correlations = top_n_correlations(dataset, 11).drop('blueWins')
correlations

It turns out that gold difference and experience difference are the most correlated features with the probability of whether the blue team wins or not, which means that they contribute the most in the winning process

# More on Dataset Correlations
In this section we're going to plot the most correlated features with each other

In [None]:
grid = sns.PairGrid(data = dataset, vars=['blueGoldDiff', 'blueExperienceDiff', 'blueGoldPerMin', 'blueTotalGold', 'blueTotalExperience'], hue='blueWins', palette=palette, hue_kws={"marker": ["D", "o"], "alpha": [0.3, 0.3]})
grid.map_diag(plt.hist)
grid.map_offdiag(plt.scatter)
grid.add_legend()
plt.show()

It turns out that the features are well correlated (not linearly) with each other, also, in the case of red team winning, the stats of the blue team are way behind, and smaller values of features as blueGoldDiff, blueExperienceDiff, etc are more frequent, also we've noticed that blueGoldPerMin and blueTotalGold have almost linear correlation,, which is quite sensible.

# More Preprocessing
For the modeling process, we need to measure the performance of our model by testing it on data it has never seen before, and for that, we're going to split the data into a training set and a test set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(dataset.drop(['blueWins'], axis=1), dataset['blueWins'], test_size=0.33, random_state=0)
print('Training examples: ', X_train.shape[0])
print('Testing examples: ',X_test.shape[0])

# Decision Tree Analysis

via: https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/ml-decision-tree/tutorial/

Decision Tree Analysis is a general, predictive modelling tool that has applications spanning a number of different areas. In general, decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. It is one of the most widely used and practical methods for supervised learning. Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

![](https://s3-ap-southeast-1.amazonaws.com/he-public-data/Fig%201-18e1a01b.png)

The decision rules are generally in form of if-then-else statements. The deeper the tree, the more complex the rules and fitter the model, but it can also, if it goes very deep, overfit the data.

The basic algorithm used in decision trees is known as the ID3 (by Quinlan) algorithm. The ID3 algorithm builds decision trees using a top-down, greedy approach. Briefly, the steps to the algorithm are:
- Select the best attribute → A 
- Assign A as the decision attribute (test case) for the NODE. 
- For each value of A, create a new descendant of the NODE.
- Sort the training examples to the appropriate descendant node leaf.
- If examples are perfectly classified, then STOP else iterate over the new leaf nodes.

The importance of features is calculated using Gini Impurity or Entropy:

**The Entropy** or information gain is calculated by this formula:
![](https://miro.medium.com/max/442/1*efLrD1ECWl-utII0KYb7tQ.jpeg)

**The Gini Impurity** is calculated by this formula:
![](https://miro.medium.com/max/442/1*vRlwRFknvfgWLBed1vsGoQ.jpeg)
Where p is the probability of the class in a given feature

**For this problem** We're going to use the gini score as it is calculated with less computational cost.

In [None]:
def test_classifier(classifier, X_train, X_test, y_train, y_test):
    
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    
    return f'{round(accuracy,2)*100}%', cm

# Model Performance
We've chosen a decision tree classifier with max depth of 3, because more depth will lead to overfitting and definitely worse performance on the test set
The model has performed 72% accuracy on the test set, we've also plotted a confusion matrix that shows both the number of correctly classified and the missclassified examples in terms of (True Positives, True Negatives, False Positives and False Negatives). 

In [None]:
classifier = DecisionTreeClassifier(max_depth=3)
accuracy, cm = test_classifier(classifier, X_train, X_test, y_train, y_test)
print("Decision Tree's Accuracy: ", accuracy)
plt.figure(figsize=(8,8))
ax = sns.heatmap(cm, annot=True, cmap='seismic_r', fmt='g')
ax.set_title("Model's Confusion Matrix")
ax.set_ylabel('Actual')
ax.set_xlabel('Predicted')

It turns out that the number of True Positives (Actual 1, Predicted 1) and the number of True Negatives (Actual 0, Predicted 0) are close to each other, also the number of False Positives and False Negatives) which is another evidence on the dataset's balance, and a good accuracy metric to rely on when the dataset is unbalanced.

# Decision Tree Visualization
Sometimes it's very useful to visualize the decision tree that was built, to drive more insight of the rules that were divided from the dataset

In [None]:
fig, ax = plt.subplots(figsize=(20,10))
plot_tree(classifier, feature_names=dataset.columns[1:], class_names=['redWins', 'blueWins'], precision=1, filled=True, ax=ax)
plt.show()

# What we've learned from this dataset

* Gold Difference and Experience Difference play the most important roles in deciding what team will win a game.
* The KDA (Kills/Deaths/Assists) are not the only determinants as obtaining gold and experience can be gained by killing minions and neutral monsters.
* It's not very important to get the Kill if you can get an Assist, it all contributes to the whole Gold & Experience Difference for the team.
* The early game phase can predict the outcome of the whole game by 72% accuracy, so it's very important.
* Although they give important advantages, it's not that important to go for an elite monster (Dragon, Herald) if there's a risk of dying to get it.