Hello there! In this notebook we are going to build a Decision Tree Classifier and see what criteria are important to get victorious in a League of Legends ranked match!

What will we do?:

* Build a Decision Tree Classifier to predict match result according to some criterias
* Try to understand how important is this criterias to win a ranked match.

So, here we go! Enjoy!

# The Firsts

Our dataset has some columns informating "the firts": FirstBlood, FirstTower, firstInhibitor, firstBaron, firstDragon, firstRiftHerald. I'm curious about this data and I want to see if it is possible predict a match result and how accurate are the prediction using this set of informations.
I will use the files: match_winner_data_version1.csv and match_loser_data_version1.csv to perform the analysis.
Let's code!

**First things first: Importing some modules and loading the data**

Please, note the function clean_data(). We will use it to delete rows with NaN/null values in the next steps.

In [None]:
import pandas as pd
from sklearn import tree
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import graphviz
import numpy as np

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

winner_data = pd.read_csv("../input/league-of-legendslol-ranked-games-2020-ver1/match_winner_data_version1.csv")
loser_data = pd.read_csv("../input/league-of-legendslol-ranked-games-2020-ver1/match_loser_data_version1.csv")

def clean_data(data):
    #Eliminating rows with NaN and null values
    if data.isnull().values.any():
        data.dropna(subset = ["win", "firstBlood", "firstTower", "firstInhibitor",
        "firstBaron", "firstDragon", "firstRiftHerald"], inplace=True)    
    return data

if (winner_data.empty or loser_data.empty):
    print("Problem reading files! Please check files path!")
else:
    print("Files read successfully!")
    print("Winner dataset shape: ", winner_data.shape)
    print("Loser dataset shape: ", loser_data.shape)

So, we loaded the data! Now it's time to make some preparing and cleaning.

First I will replace the win and fail values with True and False in "win" column and after I will take just the columns I have interest: The firsts columns and the win column.

In [None]:
#Replace "Win" and "Fail" with True and False
winner_data["win"].replace({"Win": True}, inplace=True)
loser_data["win"].replace({"Fail": False}, inplace=True)

#Separate training set
winner_training_data = winner_data.iloc[0:50000, 2:9] #I'm taking just a part of dataset to train our model
loser_training_data = loser_data.iloc[0:50000, 2:9]

#Separate test dataset
winner_test_data = winner_data.iloc[50000:108828, 2:9]
loser_test_data = loser_data.iloc[50000:108828, 2:9]

Now it's time to clean! I will remove all rows that has NaN/null values. For this, I created a function and you already know that function! It's time to use clean_data().

In [None]:
#Clean training data
winner_training_data = clean_data( winner_training_data)
loser_training_data = clean_data(loser_training_data)

#Clean test set
winner_test_data = clean_data(winner_test_data)
loser_test_data = clean_data(loser_test_data)

#Make sure the loser dataset column win is bool
loser_training_data['win'] = loser_training_data['win'].astype('bool')
loser_test_data['win'] = loser_test_data['win'].astype('bool')

Let's prepare the test data and the variables to run fit() method. First we need to concatenate winner and loser training and test datasets.

We want to predict the match result (Victory or Defeat) through a decision tree classifier. Thus, "win" column is the dependent variable and the other columns are the independent variables.

Y => "win" column

X => FirstBlood, FirstTower, firstInhibitor, firstBaron, firstDragon and firstRiftHerald columns.


In [None]:
#preparing test data
test_data = pd.concat([winner_test_data.iloc[:, 1:7], loser_test_data.iloc[:, 1:7]], ignore_index=True)
test_set_labels = pd.concat([winner_test_data["win"], loser_test_data["win"]], ignore_index=True)

#Preparing training data and define the dependent and independent variables
#Defining dependent variable
Y = pd.concat([winner_training_data["win"], loser_training_data["win"]], ignore_index=True)

#Defining independent variable
X = pd.concat([winner_training_data.iloc[:, 1:7], loser_training_data.iloc[:, 1:7]], ignore_index=True)

Now it's time to fit our model and calculate the accuracy the our decision tree can achieve:

In [None]:
clf = tree.DecisionTreeClassifier(criterion="entropy")
clf = clf.fit(X, Y)

new_matches_test = clf.predict(test_data)
acc = accuracy_score(test_set_labels, new_matches_test)

print('Accuracy Score: ', acc)

The accuracy score calculated means that our model has predicted 84,36% correctly. I can say that this is a "ok" model accuracy.

One of the biggest advantages of decision trees is that we can visualize them! And for that, just use the export_graphviz method

In [None]:
my_tree = tree.export_graphviz(clf, out_file=None, 
                              feature_names= ["firstBlood", "firstTower", "firstInhibitor",
                                              "firstBaron", "firstDragon", "firstRiftHerald"],  
                              class_names=["Win", "Lose"],  
                              filled=True, rounded=True,  
                              special_characters=True)

graph = graphviz.Source(my_tree)
graph.render("LoLDecisionTree")

The next step is to make an analysis of the tree to make sure it make sense! I'll let this for next notebooks!

If this notebook was somehow helpful to you, upvote!

Feel free to tell me if I made a mistake or where I can improve this little job!

Till next notebook!