<center><img src="img/ironhack.png" width="250"></center>

# Module 2 Mini Project
## Exploratory Data Analysis

<img src="img/lol_logo.png" width="200">

Beginning at the end of this month (September 2020), the League of Legends World Championships is an annual esports tournament that gathers the best players from all continents and attracts millions of viewers globally.

The game pits two teams of 5 players each on a symmetrical map where the ultimate goal is to destroy the enemy's base called the nexus. Each player, called summoner, can choose a champion from a pool of 150 different characters with their own story, style and set of attributes.

A simple combination calculation shows that we can have on the map the following number of 10-champion configuration:

$C_{150}^{10} = \frac{150!}{10!(150 − 10)!} = 1169554298222310$</br>
Or $1.16955429822230 × 10^{15}$ combinations of unique champions!

## The Subject and problematic
With this huge number of different combinations and with the uniqueness of every champion available to choose from, we will look into the synergy of all these characters to try and predict the best composition to win at the game. 

## The Dataset
This dataset represents, at the time of writing and to the best of the author's knowledge, the largest and most comprehensive dataset of League of Legends ranked matches from North American server. There are approximately 10 000 matches in this data set, with each match containing over 700 individual items, ranging from champion/spell choice, team stats, to individual player performance.

In [1]:
#'b_towerKills', 'b_baronKills', 'b_dragonKills', 'b_riftHeraldKills',
#'r_towerKills', 'r_baronKills', 'r_dragonKills', 'r_riftHeraldKills',
#'b_summoner1_lane', 'b_summoner1_championId'

| Column  | Description  |
|---|---|
| b_win OR r_win  | 0 indicates a loss and 1 indicates a win  |
| b_towerKills OR r_towerKills | Number of turrets destroyed by a team  |
| b_baronKills OR r_baronKills  | Number of Nashor barons vanquished during the game  |
| b_dragonKills OR r_dragonKills  | Number of dragons, neutral objectives, killed  |
| Clarity  | A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))   |
| x  | Length in mm (0--10.74)  |
| y  | Width in mm (0--58.9)  |
| z  | Depth in mm (0--31.8)  |
| Depth  | Total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)  |
| Table  | Width of top of diamond relative to widest point (43--95)  |

## Cleaning the Data
As you may realize, the data is quite dense with 775 columns in total and therefore we will obviously drop a lot of columns that are not relevant to our analysis. The data is quite exhaustive and compile statistics for all 10 players during a party. Many of the columns we have are repeated 10 times across the set to show individual performance by each player.

In [2]:
# Imports
from riotwatcher import LolWatcher, ApiError
import pandas as pd
import numpy as np
import json

In [3]:
# Utility functions
# JSON function
def getJSONValue(val):
    with open('secret.json') as file:
        data = json.load(file)
        return data[val]

In [4]:
# RiotWatcher API credentials and config
api_key = getJSONValue('API_KEY')
watcher = LolWatcher(api_key)
my_region = 'euw'

In [5]:
latest = watcher.data_dragon.versions_for_region(my_region)['n']['champion']
# Lets get some champions static information
static_champ_list = watcher.data_dragon.champions(latest, False, 'en_US')

champ_dict = {}
for key in static_champ_list['data']:
    row = static_champ_list['data'][key]
    champ_dict[int(row['key'])] = row['id']

In [6]:
games = pd.read_csv('data/training_data.csv')
games.columns = [c.strip().replace("'", "") for c in games.columns.values.tolist()]

In [7]:
# Columns to drop
# team columns to drop
superfluous_cols = ['b_firstBlood',
'b_firstInhibitor',
'b_firstTower',
'b_firstBaron',
'b_firstDragon',
'b_firstRiftHerald',
'b_inhibitorKills',
'r_firstBlood',
'r_firstTower',
'r_firstInhibitor',
'r_firstBaron',
'r_firstDragon',
'r_firstRiftHerald',
'r_inhibitorKills']
# summoner columns to drop
summon_suffix_cols = ['accountId',
'level',
'role',
'championLevel',
'championPoints',
'lastPlayTime',
'championPointsSinceLastLevel',
'championPointsUntilNextLevel',
'chestGranted',
'tokensEarned',
'totalChampionMastery',
'spell1Id',
'spell2Id',
'item0',
'item1',
'item2',
'item3',
'item4',
'item5',
'item6',
'largestKillingSpree',
'largestMultiKill',
'killingSprees',
'longestTimeSpentLiving',
'doubleKills',
'tripleKills',
'quadraKills',
'pentaKills',
'magicDamageDealt',
'physicalDamageDealt',
'trueDamageDealt',
'largestCriticalStrike',
'totalDamageDealtToChampions',
'magicDamageDealtToChampions',
'physicalDamageDealtToChampions',
'trueDamageDealtToChampions',
'totalUnitsHealed',
'damageSelfMitigated',
'damageDealtToObjectives',
'damageDealtToTurrets',
'timeCCingOthers',
'magicalDamageTaken',
'physicalDamageTaken',
'trueDamageTaken',
'goldSpent',
'turretKills',
'inhibitorKills',
'totalMinionsKilled',
'neutralMinionsKilled',
'neutralMinionsKilledTeamJungle',
'neutralMinionsKilledEnemyJungle',
'totalTimeCrowdControlDealt',
'visionWardsBoughtInGame',
'sightWardsBoughtInGame',
'wardsPlaced',
'wardsKilled',
'firstBloodKill',
'firstBloodAssist',
'firstTowerKill',
'firstTowerAssist',
'combatPlayerScore',
'objectivePlayerScore',
'totalPlayerScore',
'totalScoreRank']
# prefix to summoner columns to drop
summon_prefix_cols = ['b_summoner1_', 'b_summoner2_', 'b_summoner3_', 'b_summoner4_', 'b_summoner5_',
                      'r_summoner1_', 'r_summoner2_', 'r_summoner3_', 'r_summoner4_', 'r_summoner5_']


# Drop all superfluous columns
summoner_cols = [(col + suffix).strip().replace('\n', '') for suffix in summon_suffix_cols for col in summon_prefix_cols]
drop_cols = summoner_cols + superfluous_cols
games.drop(drop_cols, axis='columns', inplace=True)

In [8]:
# Set game ID as index
games.set_index('gameId', inplace=True)

In [9]:
games.head()

Unnamed: 0_level_0,b_towerKills,b_baronKills,b_dragonKills,b_riftHeraldKills,r_towerKills,r_baronKills,r_dragonKills,r_riftHeraldKills,b_summoner1_lane,b_summoner1_championId,...,r_summoner5_deaths,r_summoner5_assists,r_summoner5_totalDamageDealt,r_summoner5_totalHeal,r_summoner5_visionScore,r_summoner5_totalDamageTaken,r_summoner5_goldEarned,r_summoner5_champLevel,b_win,r_win
gameId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3376031541,9,1,4,0,1,0,0,1,TOP,62,...,7,5,44086,975,48,19662,6587,11,1,0
3419506031,10,3,3,0,3,0,2,2,TOP,114,...,7,4,23721,970,42,22525,7787,14,1,0
3419541356,2,0,1,1,0,0,1,0,BOTTOM,89,...,3,0,54787,678,11,14516,5912,13,1,0
3419796753,8,1,4,1,3,0,1,1,JUNGLE,35,...,10,11,181105,9223,15,30624,10700,15,1,0
3418930158,11,0,1,1,1,0,3,0,TOP,101,...,12,1,89158,1879,8,25343,7016,14,1,0


In [10]:
# Convert champion Id to champion name
for pre in summon_prefix_cols:
    games[pre + 'championId'] = games[pre + 'championId'].apply(lambda x: champ_dict[x])

In [14]:
games.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10013 entries, 3376031541 to 3438028658
Columns: 120 entries, b_towerKills to r_win
dtypes: int64(100), object(20)
memory usage: 9.2+ MB


In [None]:
# Create Team K/D/A columns
# Sum for each team the values of the columns Kills, Deaths and Assists
# Blue Team
games['b_kills'] = games[['b_summoner1_kills', 'b_summoner2_kills', 'b_summoner3_kills', 'b_summoner4_kills', 'b_summoner5_kills']].sum(axis=1)
games['b_deaths'] = games[['b_summoner1_deaths', 'b_summoner2_deaths', 'b_summoner3_deaths', 'b_summoner4_deaths', 'b_summoner5_deaths']].sum(axis=1)
games['b_assists'] = games[['b_summoner1_assists', 'b_summoner2_assists', 'b_summoner3_assists', 'b_summoner4_assists', 'b_summoner5_assists']].sum(axis=1)
games['b_avgLevel'] = games[['b_summoner1_champLevel', 'b_summoner2_champLevel', 'b_summoner3_champLevel', 'b_summoner4_champLevel', 'b_summoner5_champLevel']].sum(axis=1) / 5
games['b_totalGold'] = games[['b_summoner1_goldEarned', 'b_summoner2_goldEarned', 'b_summoner3_goldEarned', 'b_summoner4_goldEarned', 'b_summoner5_goldEarned']].sum(axis=1)
games['b_totalVisionScore'] = games[['b_summoner1_visionScore', 'b_summoner2_visionScore', 'b_summoner3_visionScore', 'b_summoner4_visionScore', 'b_summoner5_visionScore']].sum(axis=1)
games['b_totalDamageDealt'] = games[['b_summoner1_totalDamageDealt', 'b_summoner2_totalDamageDealt', 'b_summoner3_totalDamageDealt', 'b_summoner4_totalDamageDealt', 'b_summoner5_totalDamageDealt']].sum(axis=1)
games['b_totalHeal'] = games[['b_summoner1_totalHeal', 'b_summoner2_totalHeal', 'b_summoner3_totalHeal', 'b_summoner4_totalHeal', 'b_summoner5_totalHeal']].sum(axis=1)
games['b_totalDamageTaken'] = games[['b_summoner1_totalDamageTaken', 'b_summoner2_totalDamageTaken', 'b_summoner3_totalDamageTaken', 'b_summoner4_totalDamageTaken', 'b_summoner5_totalDamageTaken']].sum(axis=1)

# Red Team
games['r_kills'] = games[['r_summoner1_kills', 'r_summoner2_kills', 'r_summoner3_kills', 'r_summoner4_kills', 'r_summoner5_kills']].sum(axis=1)
games['r_deaths'] = games[['r_summoner1_deaths', 'r_summoner2_deaths', 'r_summoner3_deaths', 'r_summoner4_deaths', 'r_summoner5_deaths']].sum(axis=1)
games['r_assists'] = games[['r_summoner1_assists', 'r_summoner2_assists', 'r_summoner3_assists', 'r_summoner4_assists', 'r_summoner5_assists']].sum(axis=1)
games['r_avgLevel'] = games[['r_summoner1_champLevel', 'r_summoner2_champLevel', 'r_summoner3_champLevel', 'r_summoner4_champLevel', 'r_summoner5_champLevel']].sum(axis=1) / 5
games['r_totalGold'] = games[['r_summoner1_goldEarned', 'r_summoner2_goldEarned', 'r_summoner3_goldEarned', 'r_summoner4_goldEarned', 'r_summoner5_goldEarned']].sum(axis=1)
games['r_totalVisionScore'] = games[['r_summoner1_visionScore', 'r_summoner2_visionScore', 'r_summoner3_visionScore', 'r_summoner4_visionScore', 'r_summoner5_visionScore']].sum(axis=1)
games['r_totalDamageDealt'] = games[['r_summoner1_totalDamageDealt', 'r_summoner2_totalDamageDealt', 'r_summoner3_totalDamageDealt', 'r_summoner4_totalDamageDealt', 'r_summoner5_totalDamageDealt']].sum(axis=1)
games['r_totalHeal'] = games[['r_summoner1_totalHeal', 'r_summoner2_totalHeal', 'r_summoner3_totalHeal', 'r_summoner4_totalHeal', 'r_summoner5_totalHeal']].sum(axis=1)
games['r_totalDamageTaken'] = games[['r_summoner1_totalDamageTaken', 'r_summoner2_totalDamageTaken', 'r_summoner3_totalDamageTaken', 'r_summoner4_totalDamageTaken', 'r_summoner5_totalDamageTaken']].sum(axis=1)

In [None]:
games.head()

In [None]:
# Locate 'NONE' in summoner lanes and replace them by 'NaN'
summoner_lanes = ['b_summoner1_lane', 'b_summoner2_lane', 'b_summoner3_lane', 'b_summoner4_lane', 'b_summoner5_lane', 'r_summoner1_lane', 'r_summoner2_lane', 'r_summoner3_lane', 'r_summoner4_lane', 'r_summoner5_lane']
games[(summoner_lanes)] = games[(summoner_lanes)].replace('NONE', np.NaN)
# Drop NaN lanes (854 out of 10013)
games = games.dropna(subset=summoner_lanes)

In [None]:
# Create subsets to isolate data specific to games and specific to teams
games_stats = games[['b_towerKills','b_baronKills','b_dragonKills','b_riftHeraldKills','r_towerKills','r_baronKills','r_dragonKills','r_riftHeraldKills','b_kills','b_deaths','b_assists','r_kills','r_deaths','r_assists','b_avgLevel','b_totalGold','b_totalVisionScore','b_totalDamageDealt','b_totalHeal','b_totalDamageTaken','r_avgLevel','r_totalGold','r_totalVisionScore','r_totalDamageDealt','r_totalHeal','r_totalDamageTaken','b_win','r_win']]
blue_stats = games.filter(regex=('b_summoner*'), axis=1)
red_stats = games.filter(regex=('r_summoner*'), axis=1)

# Save dataframes in new csv files
games_stats.to_csv('./data/games_stats.csv')
blue_stats.to_csv('./data/blue_stats.csv')
red_stats.to_csv('./data/red_stats.csv')
games.to_csv('./data/games_clean.csv')

In [None]:
games.describe()

In [None]:
blue_stats.describe()

In [None]:
red_stats.describe()