# League of Legends Item Balancing: Further Work Edition
### Capstone Project 1: Data Wrangling

This is the same process as the original capstone project, only now on newer data.

Some data will be acquired from the Riot Games Static API: https://ddragonexplorer.com/cdn/. This doesn't require a login.

The rest will be acquired from the Riot Games API: https://developer.riotgames.com/api-methods/. This does require a login. If you have an account for League of Legends, that will work.

In [2]:
import requests
import json
import math
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import time

In [308]:
# Do not store the API Key in a publicly available document :)
RIOT_API_KEY = ''

In [309]:
params = {'api_key': RIOT_API_KEY}

## Champion Table 

This imports champion basic data: name, role, championId.

I don't end up using it, but it was good practice for extracting the data and manipulating the JSON.

The data was extracted from the static API / Data Dragon.

Data is from patch 9.8.1

In [3]:
champions_request = requests.get('https://ddragonexplorer.com/cdn/9.8.1/data/en_US/champion.json')
champions_json = champions_request.json()
champions_json['data']['Aatrox']

{u'blurb': u'Once honored defenders of Shurima against the Void, Aatrox and his brethren would eventually become an even greater threat to Runeterra, and were defeated only by cunning mortal sorcery. But after centuries of imprisonment, Aatrox was the first to find...',
 u'id': u'Aatrox',
 u'image': {u'full': u'Aatrox.png',
  u'group': u'champion',
  u'h': 48,
  u'sprite': u'champion0.png',
  u'w': 48,
  u'x': 0,
  u'y': 0},
 u'info': {u'attack': 8, u'defense': 4, u'difficulty': 4, u'magic': 3},
 u'key': u'266',
 u'name': u'Aatrox',
 u'partype': u'Blood Well',
 u'stats': {u'armor': 33,
  u'armorperlevel': 3.25,
  u'attackdamage': 60,
  u'attackdamageperlevel': 5,
  u'attackrange': 175,
  u'attackspeed': 0.651,
  u'attackspeedperlevel': 2.5,
  u'crit': 0,
  u'critperlevel': 0,
  u'hp': 580,
  u'hpperlevel': 80,
  u'hpregen': 8,
  u'hpregenperlevel': 0.75,
  u'movespeed': 345,
  u'mp': 0,
  u'mpperlevel': 0,
  u'mpregen': 0,
  u'mpregenperlevel': 0,
  u'spellblock': 32.1,
  u'spellblockp

I build the champion table here. The tricky part is that in order for the formatting of the index to work, I needed to set it after I built the rest of the table.

In [4]:
# Get index values / champion IDs
champions_idx = [str(key) for key in champions_json['data'].keys()]

# Rest of the df
champions_df = json_normalize(champions_json['data'].values())

# Set the index
champions_df.index = champions_idx
champions_df.index = champions_df.index.rename('champion_name')
champions_df.head()

Unnamed: 0_level_0,blurb,id,image.full,image.group,image.h,image.sprite,image.w,image.x,image.y,info.attack,...,stats.movespeed,stats.mp,stats.mpperlevel,stats.mpregen,stats.mpregenperlevel,stats.spellblock,stats.spellblockperlevel,tags,title,version
champion_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MonkeyKing,Wukong is a vastayan trickster who uses his st...,MonkeyKing,MonkeyKing.png,champion,48,champion2.png,48,96,48,8,...,345,265.84,38.0,8.042,0.65,32.1,1.25,"[Fighter, Tank]",the Monkey King,9.8.1
Jax,Unmatched in both his skill with unique armame...,Jax,Jax.png,champion,48,champion1.png,48,144,48,7,...,350,338.8,32.0,7.576,0.7,32.1,1.25,"[Fighter, Assassin]",Grandmaster at Arms,9.8.1
Kayn,A peerless practitioner of lethal shadow magic...,Kayn,Kayn.png,champion,48,champion1.png,48,192,96,10,...,340,410.0,50.0,11.5,0.95,32.1,1.25,"[Fighter, Assassin]",the Shadow Reaper,9.8.1
Shaco,Crafted long ago as a plaything for a lonely p...,Shaco,Shaco.png,champion,48,champion3.png,48,384,0,8,...,350,297.2,40.0,7.156,0.45,32.1,1.25,[Assassin],the Demon Jester,9.8.1
Warwick,Warwick is a monster who hunts the gray alleys...,Warwick,Warwick.png,champion,48,champion4.png,48,48,48,9,...,335,280.0,35.0,7.466,0.575,32.1,1.25,"[Fighter, Tank]",the Uncaged Wrath of Zaun,9.8.1


This is a nicer way to look at the more relevant parts of this dataframe.

In [5]:
champion_cols = ['name', 'id', 'key', 'tags', 'info.attack', 'info.defense', 'info.difficulty', 'info.magic']
champions_df_min = champions_df[champion_cols]
champions_df_min.head()

Unnamed: 0_level_0,name,id,key,tags,info.attack,info.defense,info.difficulty,info.magic
champion_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MonkeyKing,Wukong,MonkeyKing,62,"[Fighter, Tank]",8,5,3,2
Jax,Jax,Jax,24,"[Fighter, Assassin]",7,5,5,7
Kayn,Kayn,Kayn,141,"[Fighter, Assassin]",10,6,8,1
Shaco,Shaco,Shaco,35,[Assassin],8,4,9,6
Warwick,Warwick,Warwick,19,"[Fighter, Tank]",9,5,3,3


In [6]:
champions_df_min.to_csv('../data/champions_min_9.8.1.csv')

## Item Table

This imports the item data. I use several columns from it for exploratory data analysis and to help assign item names to otherwise unknown data points.

The request gets the JSON for all purchasable items, acquired from the static API / Data Dragon.

Here is an example item: Targon's Brace.

Data is from patch 9.8.1

In [14]:
items_request = requests.get('https://ddragonexplorer.com/cdn/9.8.1/data/en_US/item.json')
lol_items_json = items_request.json()
lol_items_json['data'].values()[5]

{u'colloq': u';lethality',
 u'depth': 3,
 u'description': u"<stats>+70 Attack Damage</stats><br><br><unique>UNIQUE Passive:</unique> +18 <a href='Lethality'>Lethality</a><br><unique>UNIQUE Active:</unique> Mark your current location. After 4 seconds, you will return to the marked location (60 second cooldown).",
 u'effect': {u'Effect1Amount': u'18',
  u'Effect2Amount': u'4',
  u'Effect3Amount': u'60'},
 u'from': [u'4003', u'3134'],
 u'gold': {u'base': 400, u'purchasable': True, u'sell': 2100, u'total': 3000},
 u'image': {u'full': u'4004.png',
  u'group': u'item',
  u'h': 48,
  u'sprite': u'item3.png',
  u'w': 48,
  u'x': 240,
  u'y': 0},
 u'maps': {u'10': False, u'11': False, u'12': False},
 u'name': u'Spectral Cutlass',
 u'plaintext': u'Marks the ground, and returns you there after a few moments',
 u'stats': {u'FlatPhysicalDamageMod': 70},
 u'tags': [u'Armor', u'Damage', u'NonbootsMovement', u'ArmorPenetration']}

In [19]:
lol_items_json['data'].values()[200]

{u'colloq': u';',
 u'depth': 2,
 u'description': u"<stats>+40 Armor</stats><br><br><unique>UNIQUE Passive - Cold Steel:</unique> When hit by basic attacks, reduces the attacker's Attack Speed by 15% for 1 seconds.",
 u'effect': {u'Effect1Amount': u'-0.15', u'Effect2Amount': u'1'},
 u'from': [u'1029', u'1029'],
 u'gold': {u'base': 400, u'purchasable': True, u'sell': 700, u'total': 1000},
 u'image': {u'full': u'3082.png',
  u'group': u'item',
  u'h': 48,
  u'sprite': u'item1.png',
  u'w': 48,
  u'x': 144,
  u'y': 96},
 u'into': [u'3110', u'3143', u'3075'],
 u'maps': {u'10': True, u'11': True, u'12': True},
 u'name': u"Warden's Mail",
 u'plaintext': u'Slows Attack Speed of enemy champions when receiving basic attacks',
 u'stats': {u'FlatArmorMod': 40},
 u'tags': [u'Armor', u'Slow']}

The tricky part for this dataframe was only wanting certain columns. I named them all out manually, both for their keys in the JSON and for what I wanted the dataframe columns to be called.

In [20]:
item_cols = ['name', 'description', 'consumed', 'gold.base', 'depth', 'maps.11', 'effect.Effect1Amount', 'effect.Effect2Amount',
             'effect.Effect3Amount', 'effect.Effect4Amount', 'effect.Effect5Amount','effect.Effect6Amount',
             'effect.Effect7Amount', 'effect.Effect8Amount', 'from', 'into', 'gold.purchasable', 'gold.total', 'requiredChampion',
             'specialRecipe', 'stacks', 'stats.FlatArmorMod', 'stats.FlatCritChanceMod', 'stats.FlatHPPoolMod',
             'stats.FlatHPRegenMod', 'stats.FlatMagicDamageMod', 'stats.FlatMovementSpeedMod', 'stats.FlatPhysicalDamageMod',
             'stats.FlatSpellBlockMod', 'stats.PercentAttackSpeedMod', 'stats.PercentLifeStealMod',
             'stats.PercentMovementSpeedMod', 'tags']
item_col_names = ['name', 'description', 'consumed', 'base_gold', 'depth', 'sr', 'effect1amount', 'effect2amount',
                 'effect3amount', 'effect4amount', 'effect5amount', 'effect6amount', 'effect7amount', 'effect8amount', 'from', 'into',
                 'gold_purchasable', 'total_gold', 'req_champion', 'special_recipe', 'stacks', 'flat_armor_mod',
                 'flat_crit_chance_mod', 'flat_hp_pool_mod', 'flat_hp_regen_mod', 'flat_magic_dmg_mod', 'flat_ms_mod',
                 'flat_phys_dmg_mod', 'flat_spellblock_mod', 'flat_pct_atk_speed_mod', 'pct_lifesteal_mod', 'pct_movespeed_mod',
                  'tags']
lol_items_df = json_normalize(data=lol_items_json['data'].values())[item_cols]
lol_items_df.columns = item_col_names
lol_items_df.head()

Unnamed: 0,name,description,consumed,base_gold,depth,sr,effect1amount,effect2amount,effect3amount,effect4amount,...,flat_hp_pool_mod,flat_hp_regen_mod,flat_magic_dmg_mod,flat_ms_mod,flat_phys_dmg_mod,flat_spellblock_mod,flat_pct_atk_speed_mod,pct_lifesteal_mod,pct_movespeed_mod,tags
0,Skirmisher's Sabre,<groupLimit>Limited to 1 Gold Income or Jungle...,,300,2.0,True,80.0,30.0,5,8.0,...,,,,,,,,,,"[LifeSteal, ManaRegen, OnHit, Jungle]"
1,Heart of Targon,<stats>+60 Health<br>+50% Base Health Regen <b...,,400,,False,200.0,10.0,5,0.0,...,60.0,,,,,,,,,"[Health, HealthRegen, Aura, GoldPer, Lane]"
2,Philosopher's Medallion,<stats>+10% Cooldown Reduction<br>+50% Base He...,,450,,False,50.0,10.0,5,0.0,...,,,,,,,,,,"[HealthRegen, ManaRegen, GoldPer, CooldownRedu..."
3,Salvation,<stats><font color='#FFFFFF'>+300 Health</font...,,0,4.0,True,0.1,10.0,20,0.1,...,300.0,,,,,,,,,"[Health, HealthRegen, ManaRegen, CooldownReduc..."
4,Ghost Poro,<subtitleLeft><font color='#FFFFFF'>(Trinket)<...,True,0,,False,240.0,3.5,42,,...,,,,,,,,,,"[Vision, Trinket, Active]"


As with the champion dataframe, I need to set the set the index separately.

In [21]:
lol_items_df_idx = [str(key) for key in lol_items_json['data'].keys()]
lol_items_df.index = lol_items_df_idx
lol_items_df.index = lol_items_df.index.rename('item_id')

In [22]:
# Filter for only Summoner's Rift items
lol_items_df = lol_items_df[lol_items_df['sr'] == True].fillna(0)
lol_items_df = lol_items_df.sort_index()

# Here is what the data looks like now
lol_items_df.head()

Unnamed: 0_level_0,name,description,consumed,base_gold,depth,sr,effect1amount,effect2amount,effect3amount,effect4amount,...,flat_hp_pool_mod,flat_hp_regen_mod,flat_magic_dmg_mod,flat_ms_mod,flat_phys_dmg_mod,flat_spellblock_mod,flat_pct_atk_speed_mod,pct_lifesteal_mod,pct_movespeed_mod,tags
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001,Boots of Speed,<groupLimit>Limited to 1 pair of boots.</group...,0,300,0.0,True,0,0,0,0,...,0.0,0.0,0.0,25.0,0.0,0.0,0.0,0.0,0.0,[Boots]
1004,Faerie Charm,<stats><mana>+25% Base Mana Regen </mana></stats>,0,125,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[ManaRegen]
1006,Rejuvenation Bead,<stats>+50% Base Health Regen </stats>,0,150,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[HealthRegen]
1011,Giant's Belt,<stats>+380 Health</stats>,0,600,2.0,True,0,0,0,0,...,380.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[Health]
1018,Cloak of Agility,<stats>+20% Critical Strike Chance</stats>,0,800,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[CriticalStrike]


This is what an item's data looks like from the dataframe.

In [23]:
lol_items_df.loc['2010', :]

name                                      Total Biscuit of Everlasting Will
description               <consumable>Click to Consume:</consumable> Res...
consumed                                                               True
base_gold                                                                75
depth                                                                     0
sr                                                                     True
effect1amount                                                            15
effect2amount                                                             0
effect3amount                                                             0
effect4amount                                                             0
effect5amount                                                             0
effect6amount                                                             0
effect7amount                                                             0
effect8amoun

In [24]:
lol_items_df.to_csv('../data/items_9.8.1.csv')

## Match Data Acquisition

Need random players in Platinum and Diamond leagues, and random ranked games from their history. Then get all that match data, and boil it down with the same logic that I used above.

To start, I need to find the Platinum and Diamond leagues.

I can acquire players from any league / division with this request: /lol/league/v4/entries/{queue}/{tier}/{division}

I can get EVERY player in NA1, in every league / division that I want.

In [3]:
from itertools import repeat, permutations, product

queue = 'RANKED_SOLO_5x5'
leagues = ['GOLD', 'PLATINUM', 'DIAMOND']
divisions = ['IV', 'III', 'II', 'I']

leagues_and_divisions = list(product(leagues, divisions))
leagues_and_divisions

[('GOLD', 'IV'),
 ('GOLD', 'III'),
 ('GOLD', 'II'),
 ('GOLD', 'I'),
 ('PLATINUM', 'IV'),
 ('PLATINUM', 'III'),
 ('PLATINUM', 'II'),
 ('PLATINUM', 'I'),
 ('DIAMOND', 'IV'),
 ('DIAMOND', 'III'),
 ('DIAMOND', 'II'),
 ('DIAMOND', 'I')]

There are many pages of players in each request, I will just have to keep searching until I don't get any more.

In [13]:
req = 'https://na1.api.riotgames.com/lol/league/v4/entries/'\
        + queue + '/' + leagues[2] + '/' + divisions[3]
page = 1
req_w_page = req + "?page=" + str(page)
req_w_page

'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/I?page=1'

Big loop to get summoner IDs

In [86]:
%%time
# n is how many requests to run before saving
# Each request gets about 200 players, I want 10k players per division
n = 50
players_df_list = np.array(np.empty(n), dtype=pd.DataFrame)

# num_files helps me to name the files uniquely to store them all separately
num_files = 0

for league, division in leagues_and_divisions:
    # Keep track of the loop
    print('' + league + ' ' + division)
    req = 'https://na1.api.riotgames.com/lol/league/v4/entries/'\
        + queue + '/' + league + '/' + division
    page = 0
    
    # Emulate do-while loop
    while True:
        # Don't violate the rate limit of 100 req per 2 min
        time.sleep(1.2)
        page += 1
        
        # Set up req with page number
        req_w_page = req + "?page=" + str(page)
        players = requests.get(req_w_page, params=params)
        players_sub_df = json_normalize(players.json())
        
        # Empty df is a sign of no more data
        #if len(players_sub_df) == 0:
            #break
        
        if (players.status_code != 200):
            print(players.status_code)
        
        # If it's not empty, append the df and go to the next page
        players_df_list[page - 1] = players_sub_df
        
        # Every n entries, check
        if page % n == 0:
            print("Page %f" % page)
            
            # Save sub-file
            player_df = pd.concat(players_df_list, ignore_index=True)
            player_df.to_csv('../data/players_' + league + '_' + division + '.csv', encoding='utf-8')
            
            # Make new array, reset i
            players_df_list = np.array(np.empty(n), dtype=pd.DataFrame)
            num_files += 1
            break

GOLD IV
Page 50.000000
GOLD III
Page 50.000000
GOLD II
Page 50.000000
GOLD I
Page 50.000000
PLATINUM IV
Page 50.000000
PLATINUM III
Page 50.000000
PLATINUM II
Page 50.000000
PLATINUM I
Page 50.000000
DIAMOND IV
Page 50.000000
DIAMOND III
Page 50.000000
DIAMOND II
Page 50.000000
DIAMOND I
Page 50.000000
Wall time: 14min 5s


Big loop to obtain account IDs, then match histories.

The match histories can be used to obtain matchIDs, for actual match data in a final request

If there are 10000 players per division, then this should take ~7 hours.

In [178]:
def get_match_histories(league, division):
    
    # Set up data structures
    players_df = pd.DataFrame()
    match_list = []

    # For each league / division, import, get match histories, concatenate
    filename = '../data/players_' + league + '_' + division + '.csv'
    players_df = pd.read_csv(filename, index_col=0).drop_duplicates(subset=['summonerId'])

    # Get account ID, then match history
    for summonerId in players_df.summonerId:
        time.sleep(2.4)
        
        # Get account ID
        acct_req = 'https://na1.api.riotgames.com/lol/summoner/v4/summoners/' + summonerId
        account = requests.get(acct_req, params=params)
        
        if account.status_code != 200:
            print("Account Request Error %d" % account.status_code)
            
            if account.status_code == 503:
                time.sleep(10)
                continue
            
            else:
                break
        
        # Get match history
        match_history_req = 'https://na1.api.riotgames.com/lol/match/v4/matchlists/by-account/' + \
            account.json()['accountId'] + '?queue=420&beginIndex=0'
        match_history = requests.get(match_history_req, params=params)
        
        if match_history.status_code != 200:
            print("Match History Request Error %d" % match_history.status_code)
            
            if match_history.status_code == 503:
                time.sleep(10)
                continue
            
            else:
                break
            
        if 'matches' not in match_history.json().keys():
            continue
            
        match_list.append(json_normalize(match_history.json()['matches']))
    
    # Concatenate and save file
    match_filename = '../data/matchlist_' + league + '_' + division + '.csv'
    match_list_df = pd.concat(match_list, ignore_index=True).drop_duplicates(subset=['gameId'])
    match_list_df.to_csv(match_filename)
    
    # Output
    print("%d matches" % len(match_list_df))
    print("Completed match history for " + league + " " + division)

In [141]:
account.json()

{u'accountId': u'3ekz5D9CXBReTkfcLF_4AJjd2FDDjjmtR09AivVCo2Mdq_GgWxcz6XBp',
 u'id': u'riwzZKHF5eBFDPQJ3K9SjFvmML5L-PXCweqr6CdAOnLlyegJ',
 u'name': u'Devil',
 u'profileIconId': 1453,
 u'puuid': u'T3QOBNhoKAGmqwHilMui_cwF9lNWkOpWoxFvcHHE8Y9BmRfSBibPp9dr1i3EhE3Xsr4Qv9_nfG0I6A',
 u'revisionDate': 1558241554000L,
 u'summonerLevel': 65}

In [145]:
players_df.summonerId.head()

0    QR20MkwwNnqbC8YJs43-sQVVruRh1__OoeEq_mbgjanASbID
1    vuV1GcxAXz2y5z3tgbGwimzwsglHJX0K2qzaAofUtedhOVOy
2    s1f9kEW7rO_s8UiP7CvRRkwYov721hrRMCmAw-13o_-fr1pu
3    WfLVGWbbplJutYN426PHcXFRrRoelDY3WxJ6JN0OOlidG5N8
4    riwzZKHF5eBFDPQJ3K9SjFvmML5L-PXCweqr6CdAOnLlyegJ
Name: summonerId, dtype: object

In [177]:
match_list_df.head()

Unnamed: 0,champion,gameId,lane,platformId,queue,role,season,timestamp
0,61,3042627899,MID,NA1,420,SOLO,13,1557871660761
1,39,3042640476,MID,NA1,420,SOLO,13,1557869435152
2,7,3042593498,MID,NA1,420,SOLO,13,1557866982096
3,13,3042609850,NONE,NA1,420,DUO_SUPPORT,13,1557865783108
4,39,3040735766,MID,NA1,420,SOLO,13,1557678704696


In [179]:
len(match_list_df)

500

In [180]:
len(match_list_df.drop_duplicates(subset=['gameId']))

500

The first round of data acquisition had some 503 errors. That is 'Service Unavailable' and completely out of my control.

The Gold I matches went smoothly until the API key expired, but I think 263k matches is enough, since I only want 1k from each division.

In [149]:
%%time
get_match_histories(leagues[0], divisions[0])
#get_match_histories(leagues[0], divisions[1])
#get_match_histories(leagues[0], divisions[2])

Match History Request Error 503
Completed match history for GOLD IV
Wall time: 11min


In [153]:
get_match_histories(leagues[0], divisions[1])

Match History Request Error 429
Completed match history for GOLD III


In [160]:
get_match_histories(leagues[0], divisions[2])
get_match_histories(leagues[0], divisions[3])

Match History Request Error 503
34504 matches
Completed match history for GOLD II
Match History Request Error 403
263201 matches
Completed match history for GOLD I


I'll keep the Gold I matches as they are but rerun Gold II, III, and IV match history collection.

In [169]:
%%time
get_match_histories(leagues[0], divisions[0])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 504
803839 matches
Completed match history for GOLD IV
Wall time: 7h 9min 10s


In [166]:
%%time
get_match_histories(leagues[0], divisions[1])

Match History Request Error 503
699840 matches
Completed match history for GOLD III
Wall time: 5h 37min 21s


In [167]:
%%time
get_match_histories(leagues[0], divisions[2])

Match History Request Error 503
319759 matches
Completed match history for GOLD II
Wall time: 2h 53min 14s


Platinum Match Histories

In [170]:
%%time
get_match_histories(leagues[1], divisions[0])

Match History Request Error 503
Account Request Error 500
311029 matches
Completed match history for PLATINUM IV
Wall time: 2h 33min 31s


In [171]:
%%time
get_match_histories(leagues[1], divisions[1])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 403
469265 matches
Completed match history for PLATINUM III
Wall time: 3h 41min 39s


In [174]:
%%time
get_match_histories(leagues[1], divisions[2])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 429
397366 matches
Completed match history for PLATINUM II
Wall time: 3h 12min 30s


In [175]:
%%time
get_match_histories(leagues[1], divisions[3])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 500
125655 matches
Completed match history for PLATINUM I
Wall time: 1h 2min 29s


Diamond Match Histories

I was having a problem where leagues[2] divisions[1] was hitting a 429 (Forbidden) error around 11 or 12 minutes in. This error usually means my API key is expired, but I know it is not here. I'm wondering if there is an account I'm not allowed to look at, and this is causing the error.

Waiting for the API key to expire and getting a new one helped. The 429 error popped up much later for Diamond III, and not at all for other divisions.

In [181]:
%%time
get_match_histories(leagues[2], divisions[0])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 504
688216 matches
Completed match history for DIAMOND IV
Wall time: 7h 48min 42s


In [192]:
%%time
get_match_histories(leagues[2], divisions[1])

Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 503
Match History Request Error 429
333816 matches
Completed match history for DIAMOND III
Wall time: 4h 4min 54s


In [193]:
%%time
get_match_histories(leagues[2], divisions[2])

Match History Request Error 500
139534 matches
Completed match history for DIAMOND II
Wall time: 1h 32min 40s


In [184]:
%%time
get_match_histories(leagues[2], divisions[3])

Match History Request Error 503
Match History Request Error 429
93045 matches
Completed match history for DIAMOND I
Wall time: 1h 6min 35s


### Select 1000 matches from the Available Data

Repeat this for each league and division

Only take matches that lasted at least 10 minutes. I can't check that here. Collect 1200 from each division, filter on time in the last request.

In [294]:
match_history_df.head()

Unnamed: 0,champion,gameId,lane,platformId,queue,role,season,timestamp
0,142,3046796305,MID,NA1,420,SOLO,13,1558322465062
1,25,3046775526,MID,NA1,420,SOLO,13,1558320478569
2,164,3046745800,TOP,NA1,420,SOLO,13,1558318166083
3,142,3046529920,MID,NA1,420,SOLO,13,1558299044778
4,99,3046495669,NONE,NA1,420,DUO,13,1558297544206


In [306]:
def get_match_samples(league, division):
    filepath = '../data/matchlist_' + league + '_' + division + '.csv'
    match_history_df = pd.read_csv(filepath, index_col=0).drop_duplicates(subset=['gameId'])
    match_sample_df = match_history_df.sample(n=1200, random_state=6)
    match_sample_df.to_csv('../data/matches_sample_' + league + '_' + division + '.csv')
    print("Samples obtained for %s %s" % (league, division))

In [307]:
%%time
for league, division in leagues_and_divisions:
    get_match_samples(league, division)

Samples obtained for GOLD IV
Samples obtained for GOLD III
Samples obtained for GOLD II
Samples obtained for GOLD I
Samples obtained for PLATINUM IV
Samples obtained for PLATINUM III
Samples obtained for PLATINUM II
Samples obtained for PLATINUM I
Samples obtained for DIAMOND IV
Samples obtained for DIAMOND III
Samples obtained for DIAMOND II
Samples obtained for DIAMOND I
Wall time: 4.95 s


### To get matches themselves

Filter on match length to make sure all the games last at least 10 minutes.

In [310]:
match_data_request = 'https://na1.api.riotgames.com/lol/match/v4/matches/'

In [311]:
def get_matches(league, division):
    
    input_filepath = '../data/matches_sample_' + league + '_' + division + '.csv'
    output_filepath = '../data/matches_' + league + '_' + division + '.csv'
    match_sample_df = pd.read_csv(input_filepath, index_col=0)
    
    match_data_list = []
    
    for game_id in match_sample_df.gameId:
        
        time.sleep(1.2)
        
        match_data = requests.get(match_data_request + str(game_id), params=params)
        
        # If the request didn't work properly, move on
        if 'participants' not in match_data.json().keys():
            continue
        
        # Ignore games that are under 10 minutes
        if match_data.json()['gameDuration'] < 600:
            continue
        
        match_data_df = json_normalize(match_data.json()['participants']).sort_values('participantId')
        match_data_list.append(match_data_df)
        
    matches_df = pd.concat(match_data_list, axis='index', ignore_index=True, sort=False)
    matches_df.to_csv(output_filepath)
    print('%d Matches Collected for %s %s' % (len(matches_df)/10, league, division))

In [312]:
%%time
for league, division in leagues_and_divisions:
    get_matches(league, division)

1165 Matches Collected for GOLD IV
1167 Matches Collected for GOLD III
1154 Matches Collected for GOLD II
1165 Matches Collected for GOLD I
1173 Matches Collected for PLATINUM IV
1177 Matches Collected for PLATINUM III
1175 Matches Collected for PLATINUM II
1135 Matches Collected for PLATINUM I
1173 Matches Collected for DIAMOND IV
1174 Matches Collected for DIAMOND III
1178 Matches Collected for DIAMOND II
1180 Matches Collected for DIAMOND I
Wall time: 5h 44min 11s


This project has 11,856 matches across 12 ranked divisions.

Though I have a couple million match IDs, if I wanted to expand it.

### Make Sure Data is Clean

In [313]:
def check_column_number(league, division):
    
    filepath = '../data/matches_' + league + '_' + division + '.csv'
    match_df = pd.read_csv(filepath, index_col=0)
    
    print('%s %s df has %d columns.' % (league, division, len(match_df.columns)))

In [314]:
for league, division in leagues_and_divisions:
    check_column_number(league, division)

GOLD IV df has 143 columns.
GOLD III df has 143 columns.


  if self.run_code(code, result):


GOLD II df has 143 columns.
GOLD I df has 143 columns.


  if self.run_code(code, result):


PLATINUM IV df has 143 columns.
PLATINUM III df has 143 columns.
PLATINUM II df has 143 columns.
PLATINUM I df has 143 columns.
DIAMOND IV df has 143 columns.
DIAMOND III df has 143 columns.
DIAMOND II df has 141 columns.
DIAMOND I df has 143 columns.


  if self.run_code(code, result):


Diamond 2, why you gotta be like that?

The columns missing are masteries and runes, which I don't need right now.

### Delete Them

at some point

In [2]:
def remove_columns(league, division, columns):
    input_filepath = '../data/matches_' + league + '_' + division + '.csv'
    output_filepath =  '../data/matches_' + league + '_' + division + '.csv'
    
    df = pd.read_csv(input_filepath, index_col=0)
    df = df.drop(columns=columns)
    df.to_csv(output_filepath)
    
    print("Columns removed for %s %s" % (league, division))

In [5]:
for league, division in leagues_and_divisions:
    remove_columns(league, division, ['masteries', 'runes'])

Columns removed for GOLD IV
Columns removed for GOLD III


  if self.run_code(code, result):


Columns removed for GOLD II
Columns removed for GOLD I


  if self.run_code(code, result):


Columns removed for PLATINUM IV
Columns removed for PLATINUM III
Columns removed for PLATINUM II
Columns removed for PLATINUM I
Columns removed for DIAMOND IV
Columns removed for DIAMOND III


KeyError: "['masteries' 'runes'] not found in axis"

In [7]:
remove_columns(leagues[2], divisions[3], ['masteries', 'runes'])

  if self.run_code(code, result):


Columns removed for DIAMOND I


### Finish Data Cleaning Checks

In [315]:
diamond_1 = pd.read_csv('../data/matches_DIAMOND_I.csv', index_col=0)
diamond_2 = pd.read_csv('../data/matches_DIAMOND_II.csv', index_col=0)
set(diamond_1.columns).difference(diamond_2.columns)

  interactivity=interactivity, compiler=compiler, result=result)


{'masteries', 'runes'}

In [10]:
def check_match_data_quality(league, division):
    
    filepath = '../data/matches_' + league + '_' + division + '.csv'
    match_df = pd.read_csv(filepath, index_col=0)
    
    # Checks
    assert type(match_df.index) == pd.core.indexes.numeric.Int64Index
    assert len(match_df.columns) == 141
    
    # Done
    print('%s %s looks good!' % (league, division))

In [11]:
for league, division in leagues_and_divisions:
    check_match_data_quality(league, division)

GOLD IV looks good!
GOLD III looks good!
GOLD II looks good!
GOLD I looks good!
PLATINUM IV looks good!
PLATINUM III looks good!
PLATINUM II looks good!
PLATINUM I looks good!
DIAMOND IV looks good!
DIAMOND III looks good!
DIAMOND II looks good!
DIAMOND I looks good!


Several of the boolean columns are listed as type 'object' because they have NaN values. However, NaN values make sense in many cases. For example, there is a column for getting an assist on First Blood. If First Blood happens, then these will be true or false. But if First Blood doesn't happen, then this it is inaccurate to say that you 'didn't' help with first blood, because first blood didn't happen. Thus, NaN.

However, I am realizing that I need to filter for matches that take at least 10-15 minutes. This may fix some other things.

In [281]:
matches_df = pd.read_csv('../data/matches_GOLD_I.csv', index_col=0)
matches_df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,championId,highestAchievedSeasonTier,participantId,spell1Id,spell2Id,stats.assists,stats.champLevel,stats.combatPlayerScore,stats.damageDealtToObjectives,stats.damageDealtToTurrets,...,timeline.xpDiffPerMinDeltas.30-end,timeline.xpPerMinDeltas.0-10,timeline.xpPerMinDeltas.10-20,timeline.xpPerMinDeltas.20-30,timeline.xpPerMinDeltas.30-end,stats.statPerk0,stats.statPerk1,stats.statPerk2,masteries,runes
0,54,GOLD,1,12,4,10,17,0,1759,133,...,-25.4,359.3,548.2,476.1,566.8,,,,,
1,104,PLATINUM,2,4,11,13,17,0,33578,3332,...,176.2,341.4,494.4,561.0,562.4,,,,,
2,16,DIAMOND,3,21,4,19,14,0,1589,645,...,136.9,292.6,287.5,309.0,573.8,,,,,
3,21,GOLD,4,7,4,9,17,0,21074,7529,...,136.9,314.1,389.4,543.1,664.0,,,,,
4,90,PLATINUM,5,4,12,14,17,0,6767,2380,...,143.0,434.3,489.6,489.2,594.2,,,,,


In [282]:
matches_df.iloc[:,14:16].head()

Unnamed: 0,stats.firstBloodKill,stats.firstInhibitorAssist
0,True,True
1,False,False
2,False,True
3,False,False
4,False,False


In [283]:
matches_df.describe()

Unnamed: 0,championId,participantId,spell1Id,spell2Id,stats.assists,stats.champLevel,stats.combatPlayerScore,stats.damageDealtToObjectives,stats.damageDealtToTurrets,stats.damageSelfMitigated,...,timeline.xpDiffPerMinDeltas.10-20,timeline.xpDiffPerMinDeltas.20-30,timeline.xpDiffPerMinDeltas.30-end,timeline.xpPerMinDeltas.0-10,timeline.xpPerMinDeltas.10-20,timeline.xpPerMinDeltas.20-30,timeline.xpPerMinDeltas.30-end,stats.statPerk0,stats.statPerk1,stats.statPerk2
count,9810.0,9810.0,9810.0,9810.0,9810.0,9810.0,9810.0,9810.0,9810.0,9810.0,...,7144.0,3206.0,1198.0,9680.0,9020.0,3960.0,1440.0,8560.0,8560.0,8560.0
mean,117.49735,5.5,7.760449,7.572477,7.749847,13.781142,0.0,8629.141692,2927.53578,15406.63,...,-1.432225e-16,2.216291e-16,-8.066261e-16,353.562025,451.207494,503.039899,539.6075,5006.641939,5006.943341,5001.744977
std,119.580816,2.872428,4.443332,4.42655,5.419238,2.716553,0.0,9261.886909,3081.858623,25905.73,...,128.3471,166.3371,269.468,78.654608,102.899345,120.313196,179.04323,1.40033,2.244218,0.707172
min,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,...,-801.7,-573.9,-955.8,0.0,0.0,0.0,0.0,5005.0,5002.0,5001.0
25%,37.0,3.0,4.0,4.0,4.0,12.0,0.0,2066.25,573.0,6867.25,...,-81.425,-108.975,-203.775,290.875,381.775,421.275,410.95,5005.0,5008.0,5001.0
50%,83.0,5.5,4.0,4.0,7.0,14.0,0.0,5480.5,2034.5,11488.0,...,0.0,0.0,0.0,341.6,449.3,498.4,524.25,5007.0,5008.0,5002.0
75%,143.0,8.0,12.0,12.0,11.0,16.0,0.0,12209.75,4356.75,19280.5,...,81.425,108.975,203.775,419.525,519.1,578.55,654.7,5008.0,5008.0,5002.0
max,555.0,10.0,21.0,21.0,41.0,18.0,0.0,95297.0,40649.0,2250337.0,...,801.7,573.9,955.8,595.2,910.1,1053.3,1157.6,5008.0,5008.0,5003.0


In [284]:
matches_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9810 entries, 0 to 9809
Columns: 143 entries, championId to runes
dtypes: bool(1), float64(57), int64(74), object(11)
memory usage: 10.7+ MB


In [285]:
type(matches_df.index)

pandas.core.indexes.numeric.Int64Index

In [286]:
len(matches_df.columns)

143