# League of Legends Item Balancing: Further Work Edition
### Capstone Project 1: Data Wrangling

This is the same process as the original capstone project, only now on newer data.

Some data will be acquired from the Riot Games Static API: https://ddragonexplorer.com/cdn/. This doesn't require a login.

The rest will be acquired from the Riot Games API: https://developer.riotgames.com/api-methods/. This does require a login. If you have an account for League of Legends, that will work.

In [32]:
import requests
import json
import math
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import time

In [62]:
# Do not store the API Key in a publicly available document :)
RIOT_API_KEY = ''

In [63]:
params = {'api_key': RIOT_API_KEY}

## Champion Table 

This imports champion basic data: name, role, championId.

I don't end up using it, but it was good practice for extracting the data and manipulating the JSON.

The data was extracted from the static API / Data Dragon.

Data is from patch 9.8.1

In [4]:
champions_request = requests.get('https://ddragonexplorer.com/cdn/9.8.1/data/en_US/champion.json')
champions_json = champions_request.json()
champions_json['data']['Aatrox']

{u'blurb': u'Once honored defenders of Shurima against the Void, Aatrox and his brethren would eventually become an even greater threat to Runeterra, and were defeated only by cunning mortal sorcery. But after centuries of imprisonment, Aatrox was the first to find...',
 u'id': u'Aatrox',
 u'image': {u'full': u'Aatrox.png',
  u'group': u'champion',
  u'h': 48,
  u'sprite': u'champion0.png',
  u'w': 48,
  u'x': 0,
  u'y': 0},
 u'info': {u'attack': 8, u'defense': 4, u'difficulty': 4, u'magic': 3},
 u'key': u'266',
 u'name': u'Aatrox',
 u'partype': u'Blood Well',
 u'stats': {u'armor': 33,
  u'armorperlevel': 3.25,
  u'attackdamage': 60,
  u'attackdamageperlevel': 5,
  u'attackrange': 175,
  u'attackspeed': 0.651,
  u'attackspeedperlevel': 2.5,
  u'crit': 0,
  u'critperlevel': 0,
  u'hp': 580,
  u'hpperlevel': 80,
  u'hpregen': 8,
  u'hpregenperlevel': 0.75,
  u'movespeed': 345,
  u'mp': 0,
  u'mpperlevel': 0,
  u'mpregen': 0,
  u'mpregenperlevel': 0,
  u'spellblock': 32.1,
  u'spellblockp

I build the champion table here. The tricky part is that in order for the formatting of the index to work, I needed to set it after I built the rest of the table.

In [5]:
# Get index values / champion IDs
champions_idx = [str(key) for key in champions_json['data'].keys()]

# Rest of the df
champions_df = json_normalize(champions_json['data'].values())

# Set the index
champions_df.index = champions_idx
champions_df.index = champions_df.index.rename('champion_name')
champions_df.head()

Unnamed: 0_level_0,blurb,id,image.full,image.group,image.h,image.sprite,image.w,image.x,image.y,info.attack,...,stats.movespeed,stats.mp,stats.mpperlevel,stats.mpregen,stats.mpregenperlevel,stats.spellblock,stats.spellblockperlevel,tags,title,version
champion_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MonkeyKing,Wukong is a vastayan trickster who uses his st...,MonkeyKing,MonkeyKing.png,champion,48,champion2.png,48,96,48,8,...,345,265.84,38.0,8.042,0.65,32.1,1.25,"[Fighter, Tank]",the Monkey King,9.8.1
Jax,Unmatched in both his skill with unique armame...,Jax,Jax.png,champion,48,champion1.png,48,144,48,7,...,350,338.8,32.0,7.576,0.7,32.1,1.25,"[Fighter, Assassin]",Grandmaster at Arms,9.8.1
Kayn,A peerless practitioner of lethal shadow magic...,Kayn,Kayn.png,champion,48,champion1.png,48,192,96,10,...,340,410.0,50.0,11.5,0.95,32.1,1.25,"[Fighter, Assassin]",the Shadow Reaper,9.8.1
Shaco,Crafted long ago as a plaything for a lonely p...,Shaco,Shaco.png,champion,48,champion3.png,48,384,0,8,...,350,297.2,40.0,7.156,0.45,32.1,1.25,[Assassin],the Demon Jester,9.8.1
Warwick,Warwick is a monster who hunts the gray alleys...,Warwick,Warwick.png,champion,48,champion4.png,48,48,48,9,...,335,280.0,35.0,7.466,0.575,32.1,1.25,"[Fighter, Tank]",the Uncaged Wrath of Zaun,9.8.1


This is a nicer way to look at the more relevant parts of this dataframe.

In [6]:
champion_cols = ['name', 'id', 'key', 'tags', 'info.attack', 'info.defense', 'info.difficulty', 'info.magic']
champions_df_min = champions_df[champion_cols]
champions_df_min.head()

Unnamed: 0_level_0,name,id,key,tags,info.attack,info.defense,info.difficulty,info.magic
champion_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MonkeyKing,Wukong,MonkeyKing,62,"[Fighter, Tank]",8,5,3,2
Jax,Jax,Jax,24,"[Fighter, Assassin]",7,5,5,7
Kayn,Kayn,Kayn,141,"[Fighter, Assassin]",10,6,8,1
Shaco,Shaco,Shaco,35,[Assassin],8,4,9,6
Warwick,Warwick,Warwick,19,"[Fighter, Tank]",9,5,3,3


## Item Table

This imports the item data. I use several columns from it for exploratory data analysis and to help assign item names to otherwise unknown data points.

The request gets the JSON for all purchasable items, acquired from the static API / Data Dragon.

Here is an example item: Targon's Brace.

Data is from patch 9.8.1

In [7]:
items_request = requests.get('https://ddragonexplorer.com/cdn/9.8.1/data/en_US/item.json')
lol_items_json = items_request.json()
lol_items_json['data'].values()[1]

{u'colloq': u";Targon's Brace;Relic Shield;Support",
 u'description': u'<stats>+60 Health<br>+50% Base Health Regen <br>+5 Gold per 10 seconds </stats><br><br><unique>UNIQUE Passive - Spoils of War:</unique> Melee basic attacks execute minions below 200 (+40 per level) Health. Killing a minion heals the owner and the nearest allied champion for 10 to 60 (based on missing health) and grants them kill Gold. 50% healing if the owner is ranged. These effects require a nearby ally. Recharges every 20 seconds. Max 3 charges.<br><br><groupLimit>Limited to 1 Starter item.</groupLimit>',
 u'effect': {u'Effect10Amount': u'0',
  u'Effect11Amount': u'5000',
  u'Effect12Amount': u'20',
  u'Effect13Amount': u'3',
  u'Effect14Amount': u'0',
  u'Effect15Amount': u'40',
  u'Effect16Amount': u'60',
  u'Effect17Amount': u'0.5',
  u'Effect18Amount': u'10',
  u'Effect1Amount': u'200',
  u'Effect2Amount': u'10',
  u'Effect3Amount': u'5',
  u'Effect4Amount': u'0',
  u'Effect5Amount': u'0',
  u'Effect6Amount'

The tricky part for this dataframe was only wanting certain columns. I named them all out manually, both for their keys in the JSON and for what I wanted the dataframe columns to be called.

In [8]:
item_cols = ['name', 'description', 'consumed', 'gold.base', 'depth', 'maps.11', 'effect.Effect1Amount', 'effect.Effect2Amount',
             'effect.Effect3Amount', 'effect.Effect4Amount', 'effect.Effect5Amount','effect.Effect6Amount',
             'effect.Effect7Amount', 'effect.Effect8Amount', 'from', 'gold.purchasable', 'gold.total', 'requiredChampion',
             'specialRecipe', 'stacks', 'stats.FlatArmorMod', 'stats.FlatCritChanceMod', 'stats.FlatHPPoolMod',
             'stats.FlatHPRegenMod', 'stats.FlatMagicDamageMod', 'stats.FlatMovementSpeedMod', 'stats.FlatPhysicalDamageMod',
             'stats.FlatSpellBlockMod', 'stats.PercentAttackSpeedMod', 'stats.PercentLifeStealMod',
             'stats.PercentMovementSpeedMod', 'tags']
item_col_names = ['name', 'description', 'consumed', 'base_gold', 'depth', 'sr', 'effect1amount', 'effect2amount',
                 'effect3amount', 'effect4amount', 'effect5amount', 'effect6amount', 'effect7amount', 'effect8amount', 'from',
                 'gold_purchasable', 'total_gold', 'req_champion', 'special_recipe', 'stacks', 'flat_armor_mod',
                 'flat_crit_chance_mod', 'flat_hp_pool_mod', 'flat_hp_regen_mod', 'flat_magic_dmg_mod', 'flat_ms_mod',
                 'flat_phys_dmg_mod', 'flat_spellblock_mod', 'flat_pct_atk_speed_mod', 'pct_lifesteal_mod', 'pct_movespeed_mod',
                  'tags']
lol_items_df = json_normalize(data=lol_items_json['data'].values())[item_cols]
lol_items_df.columns = item_col_names
lol_items_df.head()

Unnamed: 0,name,description,consumed,base_gold,depth,sr,effect1amount,effect2amount,effect3amount,effect4amount,...,flat_hp_pool_mod,flat_hp_regen_mod,flat_magic_dmg_mod,flat_ms_mod,flat_phys_dmg_mod,flat_spellblock_mod,flat_pct_atk_speed_mod,pct_lifesteal_mod,pct_movespeed_mod,tags
0,Skirmisher's Sabre,<groupLimit>Limited to 1 Gold Income or Jungle...,,300,2.0,True,80.0,30.0,5,8.0,...,,,,,,,,,,"[LifeSteal, ManaRegen, OnHit, Jungle]"
1,Heart of Targon,<stats>+60 Health<br>+50% Base Health Regen <b...,,400,,False,200.0,10.0,5,0.0,...,60.0,,,,,,,,,"[Health, HealthRegen, Aura, GoldPer, Lane]"
2,Philosopher's Medallion,<stats>+10% Cooldown Reduction<br>+50% Base He...,,450,,False,50.0,10.0,5,0.0,...,,,,,,,,,,"[HealthRegen, ManaRegen, GoldPer, CooldownRedu..."
3,Salvation,<stats><font color='#FFFFFF'>+300 Health</font...,,0,4.0,True,0.1,10.0,20,0.1,...,300.0,,,,,,,,,"[Health, HealthRegen, ManaRegen, CooldownReduc..."
4,Ghost Poro,<subtitleLeft><font color='#FFFFFF'>(Trinket)<...,True,0,,False,240.0,3.5,42,,...,,,,,,,,,,"[Vision, Trinket, Active]"


As with the champion dataframe, I need to set the set the index separately.

In [9]:
lol_items_df_idx = [str(key) for key in lol_items_json['data'].keys()]
lol_items_df.index = lol_items_df_idx
lol_items_df.index = lol_items_df.index.rename('item_id')

In [10]:
# Filter for only Summoner's Rift items
lol_items_df = lol_items_df[lol_items_df['sr'] == True].fillna(0)
lol_items_df = lol_items_df.sort_index()

# Here is what the data looks like now
lol_items_df.head()

Unnamed: 0_level_0,name,description,consumed,base_gold,depth,sr,effect1amount,effect2amount,effect3amount,effect4amount,...,flat_hp_pool_mod,flat_hp_regen_mod,flat_magic_dmg_mod,flat_ms_mod,flat_phys_dmg_mod,flat_spellblock_mod,flat_pct_atk_speed_mod,pct_lifesteal_mod,pct_movespeed_mod,tags
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001,Boots of Speed,<groupLimit>Limited to 1 pair of boots.</group...,0,300,0.0,True,0,0,0,0,...,0.0,0.0,0.0,25.0,0.0,0.0,0.0,0.0,0.0,[Boots]
1004,Faerie Charm,<stats><mana>+25% Base Mana Regen </mana></stats>,0,125,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[ManaRegen]
1006,Rejuvenation Bead,<stats>+50% Base Health Regen </stats>,0,150,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[HealthRegen]
1011,Giant's Belt,<stats>+380 Health</stats>,0,600,2.0,True,0,0,0,0,...,380.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[Health]
1018,Cloak of Agility,<stats>+20% Critical Strike Chance</stats>,0,800,0.0,True,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,[CriticalStrike]


This is what an item's data looks like from the dataframe.

In [11]:
lol_items_df.loc['2010', :]

name                                      Total Biscuit of Everlasting Will
description               <consumable>Click to Consume:</consumable> Res...
consumed                                                               True
base_gold                                                                75
depth                                                                     0
sr                                                                     True
effect1amount                                                            15
effect2amount                                                             0
effect3amount                                                             0
effect4amount                                                             0
effect5amount                                                             0
effect6amount                                                             0
effect7amount                                                             0
effect8amoun

Clean the item table. I think I can just fill all NaNs with 0.

## Match Data Acquisition

Need random players in Platinum and Diamond leagues, and random ranked games from their history. Then get all that match data, and boil it down with the same logic that I used above.

To start, I need to find the Platinum and Diamond leagues.

I can acquire players from any league / division with this request: /lol/league/v4/entries/{queue}/{tier}/{division}

I can get EVERY player in NA1, in every league / division that I want.

In [12]:
from itertools import repeat, permutations, product

queue = 'RANKED_SOLO_5x5'
leagues = ['GOLD', 'PLATINUM', 'DIAMOND']
divisions = ['IV', 'III', 'II', 'I']

leagues_and_divisions = list(product(leagues, divisions))
leagues_and_divisions

[('GOLD', 'IV'),
 ('GOLD', 'III'),
 ('GOLD', 'II'),
 ('GOLD', 'I'),
 ('PLATINUM', 'IV'),
 ('PLATINUM', 'III'),
 ('PLATINUM', 'II'),
 ('PLATINUM', 'I'),
 ('DIAMOND', 'IV'),
 ('DIAMOND', 'III'),
 ('DIAMOND', 'II'),
 ('DIAMOND', 'I')]

There are many pages of players in each request, I will just have to keep searching until I don't get any more.

In [13]:
req = 'https://na1.api.riotgames.com/lol/league/v4/entries/'\
        + queue + '/' + leagues[2] + '/' + divisions[3]
page = 1
req_w_page = req + "?page=" + str(page)
req_w_page

'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/I?page=1'

Big loop to get summoner IDs

In [86]:
%%time
# n is how many requests to run before saving
# Each request gets about 200 players, I want 10k players per division
n = 50
players_df_list = np.array(np.empty(n), dtype=pd.DataFrame)

# num_files helps me to name the files uniquely to store them all separately
num_files = 0

for league, division in leagues_and_divisions:
    # Keep track of the loop
    print('' + league + ' ' + division)
    req = 'https://na1.api.riotgames.com/lol/league/v4/entries/'\
        + queue + '/' + league + '/' + division
    page = 0
    
    # Emulate do-while loop
    while True:
        # Don't violate the rate limit of 100 req per 2 min
        time.sleep(1.2)
        page += 1
        
        # Set up req with page number
        req_w_page = req + "?page=" + str(page)
        players = requests.get(req_w_page, params=params)
        players_sub_df = json_normalize(players.json())
        
        # Empty df is a sign of no more data
        #if len(players_sub_df) == 0:
            #break
        
        if (players.status_code != 200):
            print(players.status_code)
        
        # If it's not empty, append the df and go to the next page
        players_df_list[page - 1] = players_sub_df
        
        # Every n entries, check
        if page % n == 0:
            print("Page %f" % page)
            
            # Save sub-file
            player_df = pd.concat(players_df_list, ignore_index=True)
            player_df.to_csv('../data/players_' + league + '_' + division + '.csv', encoding='utf-8')
            
            # Make new array, reset i
            players_df_list = np.array(np.empty(n), dtype=pd.DataFrame)
            num_files += 1
            break

GOLD IV
Page 50.000000
GOLD III
Page 50.000000
GOLD II
Page 50.000000
GOLD I
Page 50.000000
PLATINUM IV
Page 50.000000
PLATINUM III
Page 50.000000
PLATINUM II
Page 50.000000
PLATINUM I
Page 50.000000
DIAMOND IV
Page 50.000000
DIAMOND III
Page 50.000000
DIAMOND II
Page 50.000000
DIAMOND I
Page 50.000000
Wall time: 14min 5s


Big loop to obtain account IDs, then match histories.

The match histories can be used to obtain matchIDs, for actual match data in a final request

If there are 10000 players per division, then this should take ~7 hours.

In [128]:
def get_match_histories(league, division):
    
    # Set up data structures
    players_df = pd.DataFrame()
    match_list = []

    # For each league / division, import, get match histories, concatenate
    filename = '../data/players_' + league + '_' + division + '.csv'
    players_df = pd.read_csv(filename, index_col=0).drop_duplicates(subset=['summonerId'])

    # Get account ID, then match history
    for summonerId in players_df.summonerId:
        time.sleep(2.4)
        
        # Get account ID
        acct_req = 'https://na1.api.riotgames.com/lol/summoner/v4/summoners/' + summonerId
        account = requests.get(acct_req, params=params)
        
        if account.status_code != 200:
            print(account.status_code)
        
        # Get match history
        match_history_req = 'https://na1.api.riotgames.com/lol/match/v4/matchlists/by-account/' + \
            account.json()['accountId'] + '?queue=420&beginIndex=0'
        match_history = requests.get(match_history_req, params=params)
        
        if match_history.status_code != 200:
            print(match_history.status_code)
            
        match_list.append(json_normalize(match_history.json()['matches']))
    
    # Concatenate and save file
    match_filename = '../data/matchlist_' + league + '_' + division + '.csv'
    match_list_df = pd.concat(match_list, ignore_index=True)
    match_list_df.to_csv(match_filename)
    
    # Output
    print("Completed match history for " + league + " " + division)

In [None]:
get_match_histories(leagues[0], divisions[0])
get_match_histories(leagues[0], divisions[1])

### To get matches themselves

/lol/match/v4/matches/{matchId}

The original project had 1103 sub dfs and 118021 total players.

This iteration has .

Save the table of players so that I don't have to do this again.

In [98]:
players_df.to_csv('../data/summonerIds.csv', encoding='utf-8')

Read table of players.

In [20]:
players_df = pd.read_csv('../data/summonerIds.csv', encoding='utf-8', index_col=0)
players_df.head()

Unnamed: 0,freshBlood,hotStreak,inactive,leaguePoints,losses,miniSeries,rank,summonerId,summonerName,veteran,wins,leagueId,tier
0,False,False,False,7,156,,V,XeLmUgisomeVZV5sxnPZ_i_rvy3tY6UzTuwWMVF_6oqDXQ4,JohnLaeE,False,169,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
1,False,False,False,87,90,,IV,9tt-xsc6lJWaklLQb3cZ67JjKT9as9Twe2lhC6zLZ_ZbR54,ALEDSO,False,134,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
2,False,True,False,0,119,,V,ArNNtm1KFNm6f0OVQAY_hHUBeoqXXYrRWloV337NZPoKjtc,Cute School Girl,False,135,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
3,False,False,False,34,47,,V,1904JnVQBXOoScRAwJ0_suX0YInL7kkk4dgZ_q3-XobUqIs,iHardScop3ftw,False,59,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
4,False,False,False,2,32,,V,5oDyD3ShWA_nuiNWGLv9DjD6R1pYvT5W19aqFGySRp2QzQw,TieuTieu,False,51,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM


In [None]:
len(players_df)

Select a random subset of players

In [21]:
players_sample = players_df.sample(n=1000, axis='index', random_state=6)
players_sample.head()

Unnamed: 0,freshBlood,hotStreak,inactive,leaguePoints,losses,miniSeries,rank,summonerId,summonerName,veteran,wins,leagueId,tier
80474,False,False,False,17,189,,V,oTj32GWP1eKnScdkiY4JOE-ae_P8_tn8Yw3sdyOVBxxk5Ts,RedShowbiz,False,180,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
17929,False,False,False,0,19,,V,QemJEsvrcRlshgRKqeOye-nYdkOruPu8eyKIutboo5zj,2needles,False,29,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
18681,False,False,False,0,55,,V,LGexYmt0TgiRPUDZfxuONHiPHjwpWve-O8KQwgDuU9Is0c4,secret house,False,59,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
62432,False,False,False,20,64,,V,6yuSAEsz1st_5y5irj-TPGkdf2VQOyn0QlRbGb040f8-zYo,Xiqe,False,63,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM
48086,False,False,False,0,102,,V,IzJh2jHnpR3yE8AnralfaBecT6tzeBi6eOi6wJ1e5baPGow,BigJiggleyBooBs,False,139,000305f0-dc2c-11e8-b3fa-c81f66dbb56c,PLATINUM


In [22]:
len(players_sample)

1000

From summonerID, get accountID. Started 1000 requests at 12:58, finished in maybe 5 min.

In [23]:
account_id_list = []
for summonerId in players_sample['summonerId']:
    account = requests.get('https://na1.api.riotgames.com/lol/summoner/v4/summoners/' + summonerId, params=params)
    account_id_list.append(json_normalize(account.json()))

In [24]:
len(account_id_list)

1000

In [25]:
account_id_df = pd.concat(account_id_list)
account_id_df.head()

Unnamed: 0,accountId,id,name,profileIconId,puuid,revisionDate,status.message,status.status_code,summonerLevel
0,jcca11rEECVt92_fO-wROSTXjfe8Hg4QsFYLawdcz51Sazo,oTj32GWP1eKnScdkiY4JOE-ae_P8_tn8Yw3sdyOVBxxk5Ts,RedShowbiz,0,1Tq_gJJqWlD3v9ZIOepd3FOLpPqWHwnZ9wtyzNBSHLQA0f...,1543955000000.0,,,61
0,trDelhyIAQafBMCRfNN1GES4ZQoosvnAy1P1acEVziw,QemJEsvrcRlshgRKqeOye-nYdkOruPu8eyKIutboo5zj,2needles,8,gxIfoAubGK4gZJ256bDo1A_alBthIF-kSMRlRzeCR5JOaa...,1541031000000.0,,,35
0,qF_ruKGLkMWEMNFEZ_zIt-1X9mJJ3v8NUNRxcyBkReccFRM,LGexYmt0TgiRPUDZfxuONHiPHjwpWve-O8KQwgDuU9Is0c4,secret house,3175,bgxEYrLbTXn8hz11Qm56qDqGhG4ngA5o0M0BVjLN2_3GHX...,1540939000000.0,,,52
0,WmjmGrn29WyhsLpkIkAqfnDDJtbA5hbrJt1qZAIiiWaBzw,6yuSAEsz1st_5y5irj-TPGkdf2VQOyn0QlRbGb040f8-zYo,Xiqe,3795,6liYKrxX-1J1gmsqpLPuR8nSw08B05MrTitAg-LF6H04fe...,1543815000000.0,,,93
0,Cq0YZy3jXTMpp8j8AccpkjSLOtn5r8go5M1sxqAR6TKFtso,IzJh2jHnpR3yE8AnralfaBecT6tzeBi6eOi6wJ1e5baPGow,BigJiggleyBooBs,7,56TM7Y9fOoMYb-BJFMIzwH8J5Hc-PzqFg7z6s2I1NbCCtw...,1543204000000.0,,,87


In [44]:
len(account_id_df)

1000

Get match history for each player.

In [183]:
match_history_list = []
for accountId in account_id_df['accountId']:
    match_history = requests.get('https://na1.api.riotgames.com/lol/match/v4/matchlists/by-account/' + str(accountId) + '?queue=420',
                                params=params)
    if 'matches' in match_history.json():
        match_history_list.append(json_normalize(match_history.json()['matches']))
    time.sleep(1.2)

In [184]:
len(match_history_list)

196

In [185]:
match_history_df = pd.concat(match_history_list)
len(match_history_df)

18091

Make a set of the gameIds, then get match data from gameId. gameId is long, need to convert to int and then to str

In [248]:
match_history_df.head()

Unnamed: 0,champion,gameId,lane,platformId,queue,role,season,timestamp
0,126,2924316053,MID,NA1,420,SOLO,11,1543952595125
1,126,2924333379,NONE,NA1,420,DUO_SUPPORT,11,1543951246581
2,126,2924193158,MID,NA1,420,SOLO,11,1543908576060
3,126,2924177627,MID,NA1,420,DUO,11,1543906450796
4,126,2924029423,MID,NA1,420,SOLO,11,1543893366771


Take a random sample of 1000 games from the match dataframe. Make sure the game IDs are unique, in case some players in my random sample of players faced each other.

In [None]:
game_ids = match_history_df.sample(n=1000, axis='index')['gameId'].unique()

Make list of dataframes for each match, then turn into one big dataframe? Rate limits mean that fetching ~900 games should take 40 min

In [41]:
game_ids[0:10]

array([2855810690, 2894635369, 2792282086, 2842012227, 2889141116,
       2907087506, 2892832264, 2893954890, 2898065427, 2907077614], dtype=int64)

In [64]:
def get_item_timestamp( match_df, timing_df, participant_id, item_num ):
    
    col_name = 'stats.item' + str(item_num)
    blacklisted_item_ids = [2424, 3340, 2421, 3042, 2422, 3040, 2403, 3513, 2010]
    item_id = match_df[match_df['participantId'] == participant_id][col_name].values[0]
        
    if (match_df.loc[participant_id - 1, col_name] == 0) or (item_id in blacklisted_item_ids):
        return np.nan
    else:
        timestamps = timing_df[(timing_df['type'] == 'ITEM_PURCHASED') &
                         (timing_df['participantId'] == participant_id) &
                         (timing_df['itemId'] ==
                          match_df[match_df['participantId'] == participant_id][col_name].values[0]
                          )]['timestamp']
        if len(timestamps) == 0:
            return np.nan
        else:
            return timestamps.values[-1]

Here is the big request loop.

For each game ID, I get the match data, normalized properly, and join the timing data to it.

Also need to be sure to not violate my API key's request rate of 100 requests per 2 minutes.

In [66]:
match_data_list = []
item_timing_col_names = ['participantId', 'item0_time', 'item1_time', 'item2_time',
                         'item3_time', 'item4_time', 'item5_time', 'item6_time']
match_data
for game_id in game_ids:
    # Requests
    match_data = requests.get('https://na1.api.riotgames.com/lol/match/v4/matches/' + str(game_id), params=params)
    match_timeline = requests.get('https://na1.api.riotgames.com/lol/match/v4/timelines/by-match/' + str(game_id), params=params)
    
    # Dataframes
    if ('status' in match_timeline.json() and match_timeline.json()['status']['status_code'] == 404):
        continue
    
    timeline_df = json_normalize(match_timeline.json(), ['frames', 'events'])
    timeline_df = timeline_df[['participantId', 'itemId', 'timestamp', 'type']]
    
    match_data_df = json_normalize(match_data.json()['participants']).sort_values('participantId')
    
    item_timing_cols = pd.DataFrame(index=np.arange(1,11), columns=item_timing_col_names)
    
    # Fill in item_timing_cols
    for participant_id in match_data_df['participantId']:
    # 7 possible final items with wards
        item_timing_cols.loc[participant_id, 'participantId'] = participant_id
        item_timing_cols.loc[participant_id, 'item0_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 0)
        item_timing_cols.loc[participant_id, 'item1_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 1)
        item_timing_cols.loc[participant_id, 'item2_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 2)
        item_timing_cols.loc[participant_id, 'item3_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 3)
        item_timing_cols.loc[participant_id, 'item4_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 4)
        item_timing_cols.loc[participant_id, 'item5_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 5)
        item_timing_cols.loc[participant_id, 'item6_time'] = get_item_timestamp(match_data_df, timeline_df, participant_id, 6)
    
    match_data_with_timing_df = match_data_df.join(item_timing_cols, on='participantId', lsuffix='_l', rsuffix='_r')
    
    match_data_list.append(match_data_with_timing_df)
    time.sleep(2.5)

In [67]:
len(match_data_list)

925

In [68]:
matches_df = pd.concat(match_data_list)

Here is the final dataframe!

In [69]:
matches_df.head(10)

Unnamed: 0,championId,highestAchievedSeasonTier,item0_time,item1_time,item2_time,item3_time,item4_time,item5_time,item6_time,masteries,...,timeline.participantId,timeline.role,timeline.xpDiffPerMinDeltas.0-10,timeline.xpDiffPerMinDeltas.10-20,timeline.xpDiffPerMinDeltas.20-30,timeline.xpDiffPerMinDeltas.30-end,timeline.xpPerMinDeltas.0-10,timeline.xpPerMinDeltas.10-20,timeline.xpPerMinDeltas.20-30,timeline.xpPerMinDeltas.30-end
0,103,SILVER,1442408,1317001,1097684,,1955521.0,1859933.0,,,...,1,DUO_CARRY,-19.3,-24.55,-50.25,,332.5,458.8,424.8,
1,267,SILVER,1145366,1584846,566750,1916517.0,1956258.0,1956258.0,,,...,2,DUO_SUPPORT,-19.3,-24.55,-50.25,,306.0,294.8,424.2,
2,24,UNRANKED,704373,1326581,959447,1989154.0,,,,,...,3,NONE,-43.7,-79.2,-2.3,,285.1,459.3,558.5,
3,61,PLATINUM,1170289,805550,1953211,1656055.0,1168408.0,1959599.0,,,...,4,SOLO,13.0,-131.1,60.5,,490.0,434.0,474.4,
4,41,GOLD,1592110,644571,1142726,1834880.0,,,,,...,5,SOLO,-18.2,-190.3,69.4,,411.9,428.6,511.5,
5,80,UNRANKED,15125,763991,1318684,1185242.0,1646738.0,1777846.0,,,...,6,SOLO,18.2,190.3,-69.4,,430.1,618.9,442.1,
6,51,GOLD,1840722,604770,1507416,788646.0,1158770.0,1651397.0,,,...,7,DUO_CARRY,19.3,24.55,50.25,,392.3,503.8,623.2,
7,99,UNRANKED,939744,554210,1239105,1486997.0,1885613.0,,,,...,8,SOLO,-13.0,131.1,-60.5,,477.0,565.1,413.9,
8,111,GOLD,554507,1443795,1905387,789306.0,1907371.0,1907371.0,1240392.0,,...,9,DUO_SUPPORT,19.3,24.55,50.25,,284.8,298.9,326.3,
9,64,GOLD,1903638,977667,1623427,571074.0,1347487.0,1907239.0,,,...,10,NONE,43.7,79.2,2.3,,328.8,538.5,560.8,


In [70]:
matches_df.to_csv('../data/match_data_2.csv', encoding='utf-8')

In [71]:
matches_df = pd.read_csv('../data/match_data_2.csv', encoding='utf-8', index_col=0)

In [72]:
matches_df.head()

Unnamed: 0,championId,highestAchievedSeasonTier,item0_time,item1_time,item2_time,item3_time,item4_time,item5_time,item6_time,masteries,...,timeline.participantId,timeline.role,timeline.xpDiffPerMinDeltas.0-10,timeline.xpDiffPerMinDeltas.10-20,timeline.xpDiffPerMinDeltas.20-30,timeline.xpDiffPerMinDeltas.30-end,timeline.xpPerMinDeltas.0-10,timeline.xpPerMinDeltas.10-20,timeline.xpPerMinDeltas.20-30,timeline.xpPerMinDeltas.30-end
0,103,SILVER,1442408,1317001,1097684,,1955521.0,1859933.0,,,...,1,DUO_CARRY,-19.3,-24.55,-50.25,,332.5,458.8,424.8,
1,267,SILVER,1145366,1584846,566750,1916517.0,1956258.0,1956258.0,,,...,2,DUO_SUPPORT,-19.3,-24.55,-50.25,,306.0,294.8,424.2,
2,24,UNRANKED,704373,1326581,959447,1989154.0,,,,,...,3,NONE,-43.7,-79.2,-2.3,,285.1,459.3,558.5,
3,61,PLATINUM,1170289,805550,1953211,1656055.0,1168408.0,1959599.0,,,...,4,SOLO,13.0,-131.1,60.5,,490.0,434.0,474.4,
4,41,GOLD,1592110,644571,1142726,1834880.0,,,,,...,5,SOLO,-18.2,-190.3,69.4,,411.9,428.6,511.5,


In [73]:
len(matches_df)

9250