<img src='figures/title_slide.png'>

<p>
    <b>Project Details:</b>
    <ul>
        <li>Project title: Decoding Summoner's Rift</li>
        <li>Project description: Creating and analyzing a summoner-specific dataset of League of Legends ranked matches</li>
        <li>APIs used: <a href='https://developer.riotgames.com/'>Riot API</a></li>
        <li>Project creator: Harrison Knapp</li>
        <li>Creator background: Previous work with NASA and NOAA has focused on leveraging geospatial analytics and
remote sensing to better understand natural and human-built systems (work with Python, R, and MATLAB)</li>
        <li>Undergrad: B.S. in GeoDesign, B.A. Earth Sciences</li>
    </ul>
</p>

<h1>Introduction</h1>

<h3>What is League?</h3>

<p>
    <i>League of Legends</i> (LoL) is a multiplayer online battle arena computer game developed and published by Riot Games. Since its release in October 2009, it has become a world leader in the esports space and continues to lead global revenue charts for free-to-play games. During a standard LoL match, two teams of five players go head-to-head in a strategic battle for control of the map. Each player—referred to as a “summoner”—commands a “champion,” a character with unique abilities that occupies a specific role on the team. Together, summoners engage in PvP combat with the enemy team, push to gain control over map objectives, and gradually become stronger by collecting experience points and purchasing items to gain an edge over enemy summoners. The game is won once a team pushes their way through the opposing team’s defending turrets and destroys their nexus, a large structure that serves as the heart of the enemy team’s base.<br><br>Due to the League’s high skill cap and increasingly popular “Ranked” game mode (games played to advance up a ladder, with the highest tier being professional play), there is an ever-present desire by players to learn and improve. While there certainly are well-established micro and macro strategies that bring success, there is still much to be understood about which in-game factors most significantly contribute to win rate. Machine learning techniques may be useful in this context, as clustering algorithms and neural networks can help us predict win rate given a range of input parameters. However, training a model requires a massive database of match and player statistics, something that has not yet been aggregated nor centralized.
</p>

<h3>The Objective</h3>

<p>
    The original goal of this project was to create a dataset of this magnitude and a framework for future ML exploration through leveraging Riot Games' League of Legends API. By using the Riot API to query the match history of high elo players, isolating other players in said matches, and repeating the process for their own match histories, a dataset of individual games and associated in-game/player stats can be made. Such a table would have rows for each match corresponding to an individual matchId, while the columns would include information such as which team secured "first blood," how many dragons (a neutral objective that boosts stats) were slain by each team, the damage dealt and gold earned by each individual player, and much more. This dataset could then be used in various ML explorations aimed at understanding what factors lead a team on Summoner's Rift to victory.<br><img src='figures/pretty_stats.jpeg'><br>However, after conducting futher research into this starting objective, I discovered a few Kaggle users that had already created such a dataset. <a href='https://www.kaggle.com/bobbyscience/league-of-legends-diamond-ranked-games-10-min'>One developer in particular</a> had created nearly exactly what I had set out to make in the first place: a dataset with 10,000+ matches at Diamond rank or higher containing stats from the first 10 minutes of the game, the critical first moments of a League match where the momentum is set and the winner is typically decided. Finding this encouraged me to think about how I could differentiate myself from these already created datasets. For starters, I still wanted to write a series of scripts that would create a similar high elo match database, since the existing dataset is a year old. Each new year marks the start of a new season in League which brings along with it a host of fundamental gameplay updates, item tweaks, champion balancing, and other alterations that change how the game is played at all levels. This year, Riot kicked off season 11 by implementing an entirely new item system, scrapping the old one and completely changing how players viewed itemization and gameplay as a whole. With that in mind, I still saw creating a large dataset of matches as an important and worthwhile task, especially for the machine learning exploration I intend on doing in the future.<br><br>But I still wanted to push things further. What else could I do with my newfound understanding of the Riot API? How else can machine learning help players win games? It was at that point that I realized I could also implement a player-centric approach to my database collection plans: create a database of ranked matches for a single top-level player (or anyone, for that matter) and extract player stats to see what wins and loses them games. I thought the perfect candidate for this would be Tyler1, the biggest name in League streaming and one of the largest content creators on Twitch. Tyler plays ranked League for 8+ hours a day, 7 days a week, and is currently only playing the top lane position. This makes him a perfect candidate for an ML exploration, as there are already a few variables being held constant. With this new direction and renewed excitement, I started working.<br><img src='figures/tyler_reformed.jpeg'>
</p>

<h1>Revision &amp; Implementation</h1>

<p>
    The code below is the complete workflow for this project and accomplishes the following tasks:
    <ul>
        <li>Compile a list of matchIds for a given summoner and time period (in this case Tyler1)</li>
        <li>Request match data from the Riot API for each of those games in order</li>
        <li>Store the player and team stats (extracted features) in a Pandas dataframe and export to CSV</li>
        <li>Process that CSV in parallel using PySpark to get average stats for each champion (in-game character) and export to CSV (ex. win rate and average kills per game for Aatrox)</li>
        <li>Convert both the original match data and champion-specific stats CSVs to JSON documents</li>
        <li>Upload those JSON documents to a Firebase realtime database (summoner-stats) as summoner_matches and champ_stats</li>
        <li>Utilize ipywidgets to build a front-end for querying the Firebase database</li>
    </ul>
    This notebook is organized such that the database front-end comes first, since the Firebase database has already been populated with the summoner_matches and champ_stats JSON documents. The component of the project not shown are the environment setup files stored in the GitHub repo that help guide Binder and the repo2docker library when setting up the Docker image/JupyterHub cloud computing environment. By going to <a href='https://mybinder.org/'>mybinder.org</a> and pasting in the GitHub repo link, a JupyterHub instance will be spun up on a cloud server with all of the required Python libraries and Spark/Hadoop binaries installed. This way, anyone can run the code for themselves to test out the process without having to worry about setting up the correct environment on their local machine.
</p>

In [None]:
# Get all required libraries
from riotwatcher import LolWatcher

from pyspark.sql import SparkSession
import pyspark.sql.functions as fc

import math
import json
import os
import glob
import pandas as pd
import time
import requests

import ipywidgets as ipw
from ipyfilechooser import FileChooser
from IPython.display import Markdown, display

def printmd(string):
    display(Markdown(string))

<img src='figures/iblitz.jpeg'>

<h1>Database Query: Matches</h1>

<p>
    The code below allows users to query the summoner_matches component of the Firebase database, i.e. the JSON document that stores the raw feature information from each of the matches stored from the matchId list.<br><br>First, run the cell immediately below this one to initialize the ipywidgets. There will be an option to specify the search field (win, first blood takedown, etc.) and the search input (true or false). Once both of these options have been selected, run the second cell to receive the output for the query, which in this case has been converted to a string for easy printing. The resulting strings will correspond to each individual match and will print out the champion the summoner was playing as (in championId form), their KDA (kill/death/assist numbers), and whether or not they won the match.
</p>

In [None]:
### WIDGET INIT CELL

printmd('<h2>MATCHES QUERY PARAMS:</h2>')

# Field select
field_select = ipw.Dropdown(
    options=[('Win', 1), ('First Blood Takedown', 2), ('First Tower Takedown', 3), ('First Baron', 4)],
    value=1,
    description='Number:',
)

# Search term
search_term = ipw.Dropdown(
    options=[('True', 1), ('False', 2)],
    value=1,
    description='Input:',
)

# Build accordion
accordion = ipw.Accordion(children=[field_select, search_term])
accordion.set_title(0, 'Search Field')
accordion.set_title(1, 'Search Input')
display(accordion)

In [None]:
### EXECUTE QUERY

# Take widget input and store as variables
field_options = {
    1 : 'win',
    2 : 'firstBloodTakedown',
    3 : 'firstTowerTakedown',
    4 : 'team_firstBaron'
}
if search_term.value == 1:
    term_value = True
else:
    term_value = False
term_key = field_options[field_select.value]

# Execute query to firebase
url = 'https://summoner-stats-c8f6a-default-rtdb.firebaseio.com/summoner_matches.json'
r = requests.get(url)
match_list = json.loads(r.text)

for match in match_list:
    if match[term_key] == term_value:
        champID = match['championId']
        win_status = match['win']
        kills = match['kills']
        deaths = match['deaths']
        assists = match['assists']
        
        print('Summoner playing as champID = ' + str(champID) + '; KDA = '
             + str(kills) + '/' + str(deaths) + '/' + str(assists) + '; Win = ' + str(win_status))

<h1>Database Query: Champion Stats</h1>

<p>
    The code below allows users to query the champ_stats component of the Firebase database, i.e. the JSON document that stores the feature information averaged by champion from each of the matches stored from the matchId list.<br><br>First, run the cell immediately below this one to initialize the ipywidgets. There will be an option to specify the search type (champion ID or name) and the search input (either the ID as defined by Riot CommunityDragon or the name of the champion). Once both of these options have been specified, run the second cell to receive the output for the query, which in this case has been converted to a string for easy printing. The resulting string will contain all of the average stats for that summoner's champion, as well as the number of games that were taken into account for the aggregation.
</p>

In [None]:
### WIDGET INIT CELL

printmd('<h2>CHAMP STATS QUERY PARAMS:</h2>')

# Field select
field_select = ipw.Dropdown(
    options=[('Champion ID', 1), ('Champion Name', 2)],
    value=1,
    description='Number:',
)

# Search term
search_term = ipw.Text(
    value='Hello World',
    placeholder='ID/Name',
    description='Input:',
    disabled=False
)

# Build accordion
accordion = ipw.Accordion(children=[field_select, search_term])
accordion.set_title(0, 'Search Type')
accordion.set_title(1, 'Champion ID/Name (capitalized, no spaces)')
display(accordion)

In [None]:
### EXECUTE QUERY

# Take widget input and store as variables
field_options = {
    1 : 'championID',
    2 : 'championName',
}
if search_term.value.isnumeric():
    term_value = int(search_term.value)
else:
    term_value = search_term.value
term_key = field_options[field_select.value]

# Execute query to firebase
url = 'https://summoner-stats-c8f6a-default-rtdb.firebaseio.com/champ_stats.json'
r = requests.get(url)
stat_list = json.loads(r.text)

for entry in stat_list:
    if entry[term_key] == term_value:
        championName = entry['championName']
        avgKills = entry['avgKills']
        avgDeaths = entry['avgDeaths']
        avgAssists = entry['avgAssists']
        avgF10csPerMin = entry['avgF10csPerMin']
        avgF10goldPerMin = entry['avgF10goldPerMin']
        avgF10xpPerMin = entry['avgF10xpPerMin']
        avgTotalCS = entry['avgTotalCS']
        avgVisionScore = entry['avgVisionScore']
        championID = entry['championID']
        championName = entry['championName']
        games = entry['games']
        winRate = entry['winRate']
        
        print("Average stats for Summoner's " + str(championName) + ":")
        print('\tAverage Kills:\t\t\t\t' + str(avgKills))
        print('\tAverage Deaths:\t\t\t\t' + str(avgDeaths))
        print('\tAverage Assists:\t\t\t' + str(avgAssists))
        print('\tAverage CS/min (first 10 min):\t\t' + str(avgF10csPerMin))
        print('\tAverage Gold/min (first 10 min):\t' + str(avgF10goldPerMin))
        print('\tAverage XP/min (first 10 min):\t\t' + str(avgF10xpPerMin))
        print('\tAverage Total CS:\t\t\t' + str(avgTotalCS))
        print('\tAverage Vision score:\t\t\t' + str(avgVisionScore))
        print('\tAverage Games analyzed:\t\t\t' + str(games))
        print('\tAverage Win rate:\t\t\t' + str(winRate))

<img src='figures/heimer.jpeg'>

<h1>Dataset Creation</h1>

<p>
    The code below executes all of the steps outlined above in the Revisions &amp; Implementation section. First, inputs such as your API key, region, summoner name, and input/output files must be defined. Then, each cell will accomplish a different task in the analysis process: get matchIds, ask the Riot API for match information, store in datasets, get average stats per champion using PySpark, and convert the CSVs to JSON documents. The code for uploading the resulting JSON datasets to Firebase is described in the following section.
</p>

In [None]:
### USER INPUTS (for dataset creation)

# League of Legends Watcher scrub parameters
api_key = 'INSERT_API_KEY_HERE' # Riot api key
region = 'na1'  # LoL region for analysis
summoner_name = 'HULKSMASH1337' # LoL summoner name, in this case for Tyler1
season = 13  # season as defined by CommunityDragon/Data/patches.json for MatchApiV4 query
queue = 420  # ranked solo queue type


### GLOBAL VARIABLES (for dataset creation)

# In/out file names
MIDs_name = 'MIDs'
sumInfo_name = 'summoner_info'
sumMatches_name = 'T1_example_matches'  # for routine deployment, change to 'summoner_matches'
champInfo_name = 'champion'
part_dir_name = 'pyspark-champ-stats'
champStats_name = 'champ_stats'

# Resulting paths
base_path = 'data/'  # directory where all data files are stored
MIDs_json = base_path + MIDs_name + '.json'
sumInfo_json = base_path + sumInfo_name + '.json'
sumMatches_csv = base_path + sumMatches_name + '.csv'
sumMatches_json = base_path + sumMatches_name + '.json'
part_dir = base_path + part_dir_name
champStats_file = champStats_name + '.csv'
champStats_csv = part_dir + '/' + champStats_file
champStats_json = base_path + champStats_name + '.json'
champInfo_json = champInfo_name + '.json'

<h3>Get list of matchIds (MIDs)</h3>

In [None]:
# Prepare for analysis
lol_watcher = LolWatcher(api_key=api_key)  # create watcher object

summoner = lol_watcher.summoner.by_name(region, summoner_name)  # create summoner object for T1
print(summoner)

with open(sumInfo_json, 'w', encoding='utf-8') as f:  # dump summoner info into JSON file
    json.dump(summoner, f, ensure_ascii=False, indent=4)


# Get total number of ranked games and iterations of 100 games needed to scrub API
match_list = lol_watcher.match.matchlist_by_account(region, summoner['accountId'], season=season,
                                                    queue=queue, begin_index=100)
total_games = match_list['totalGames']  # get the total number of games given by API call
iterations = math.ceil(total_games/100)  # figure out number of iterations necessary to get all game IDs


# Create list of game IDs by iterating through match history
game_ids = []
for i in range(iterations):

    begin_index = i*100
    match_list = lol_watcher.match.matchlist_by_account(region, summoner['accountId'], season=season, queue=queue,
                                                        begin_index=begin_index)
    for match in match_list['matches']:
        game_ids.append(match['gameId'])


# Remove duplicates, export list to JSON formatted file
game_ids_cleaned = []
[game_ids_cleaned.append(x) for x in game_ids if x not in game_ids_cleaned]

with open(MIDs_json, 'w', encoding='utf-8') as f:
    json.dump(game_ids_cleaned, f, ensure_ascii=False, indent=4)

<h3>Get match information using MIDs list</h3>

In [None]:
# Prepare for analysis
lol_watcher = LolWatcher(api_key=api_key)  # create watcher object

# Initialize df
header = ['matchId', 'championId', 'win', 'kills', 'deaths', 'assists',
          'visionScore', 'controlWardsBought', 'totalGold', 'f10_goldPerMin',
          'creepScore', 'f10_csPerMin', 'f10_csDiffPerMin', 'f10_xpPerMin', 'f10_xpDiffPerMin',
          'firstBloodKill', 'firstBloodAssist', 'firstBloodTakedown',
          'firstTowerKill', 'firstTowerAssist', 'firstTowerTakedown',
          'team_firstDragon', 'team_firstHerald', 'team_firstBaron', 'team_dragonKills']
df = pd.DataFrame(columns=header)

# Load in match IDs from JSON
f = open(MIDs_json)
matches = json.load(f)
num_matches = len(matches)


# Iterate through matches
match_counter = 0
for match_id in matches:

    time.sleep(2)
    match_counter += 1
    print('Now processing: ' + str(match_id) + ' (' + str(match_counter) + '/' + str(num_matches) + ')')

    try:
        match = lol_watcher.match.by_id(region=region, match_id=match_id)

        # Get player ID and index
        p_ID = 0
        p_index = 0
        p_counter = -1
        for participant in match['participantIdentities']:
            p_counter += 1
            if participant['player']['summonerName'] == summoner_name:
                pID = participant['participantId']
                p_index = p_counter

        # Get team ID and index
        t_ID = match['participants'][p_index]['teamId']
        t_index = 0
        t_counter = -1
        for team in match['teams']:
            t_counter += 1
            if team['teamId'] == t_ID:
                t_index = t_counter

        # Check top lane
        if match['participants'][p_index]['timeline']['lane'] == 'TOP':

            # Get stats
            championId = match['participants'][p_index]['championId']

            win = match['participants'][p_index]['stats']['win']
            kills = match['participants'][p_index]['stats']['kills']
            deaths = match['participants'][p_index]['stats']['deaths']
            assists = match['participants'][p_index]['stats']['assists']

            visionScore = match['participants'][p_index]['stats']['visionScore']
            controlWardsBought = match['participants'][p_index]['stats']['visionWardsBoughtInGame']

            totalGold = match['participants'][p_index]['stats']['goldEarned']
            f10_goldPerMin = match['participants'][p_index]['timeline']['goldPerMinDeltas']['0-10']

            creepScore = match['participants'][p_index]['stats']['totalMinionsKilled']
            f10_csPerMin = match['participants'][p_index]['timeline']['creepsPerMinDeltas']['0-10']
            if 'csDiffPerMinDeltas' in match['participants'][p_index]['timeline']:
                f10_csDiffPerMin = match['participants'][p_index]['timeline']['csDiffPerMinDeltas']['0-10']
            else:
                f10_csDiffPerMin = None

            f10_xpPerMin = match['participants'][p_index]['timeline']['xpPerMinDeltas']['0-10']
            if 'xpDiffPerMinDeltas' in match['participants'][p_index]['timeline']:
                f10_xpDiffPerMin = match['participants'][p_index]['timeline']['xpDiffPerMinDeltas']['0-10']
            else:
                f10_xpDiffPerMin = None

            firstBloodKill = match['participants'][p_index]['stats']['firstBloodKill']
            firstBloodAssist = match['participants'][p_index]['stats']['firstBloodAssist']
            firstBloodTakedown = False
            if firstBloodKill or firstBloodAssist:
                firstBloodTakedown = True

            firstTowerKill = match['participants'][p_index]['stats']['firstTowerKill']
            firstTowerAssist = match['participants'][p_index]['stats']['firstTowerAssist']
            firstTowerTakedown = False
            if firstTowerKill or firstTowerAssist:
                firstTowerTakedown = True

            team_firstDragon = match['teams'][t_index]['firstDragon']
            team_firstHerald = match['teams'][t_index]['firstRiftHerald']
            team_firstBaron = match['teams'][t_index]['firstBaron']
            team_dragonKills = match['teams'][t_index]['dragonKills']

            # Add match data to pandas frame
            row = [match_id, championId, win, kills, deaths, assists,
                   visionScore, controlWardsBought, totalGold, f10_goldPerMin,
                   creepScore, f10_csPerMin, f10_csDiffPerMin, f10_xpPerMin, f10_xpDiffPerMin,
                   firstBloodKill, firstBloodAssist, firstBloodTakedown,
                   firstTowerKill, firstTowerAssist, firstTowerTakedown,
                   team_firstDragon, team_firstHerald, team_firstBaron, team_dragonKills]

            df_length = len(df)
            df.loc[df_length] = row

    except Exception as inst:
        print(inst)

# Export df to CSV
df.to_csv(sumMatches_csv, index=False)

<h3>Leverage PySpark to get champ-specific average stats</h3>

In [None]:
# Initialize spark session
spark = SparkSession.builder.appName('hextech').getOrCreate()

# Process CSV in parallel using PySpark, export champion stats CSVs (each part is single row)
df = spark.read.options(header=True, inferSchema=True).csv(sumMatches_csv)
# df.printSchema()

result = df.groupBy('championID').agg(fc.avg(fc.col('win').cast('double')).alias('winRate'),
                                      fc.count('*').alias('games'),
                                      fc.mean('kills').alias('avgKills'),
                                      fc.mean('deaths').alias('avgDeaths'),
                                      fc.mean('assists').alias('avgAssists'),
                                      fc.mean('visionScore').alias('avgVisionScore'),
                                      fc.mean('creepScore').alias('avgTotalCS'),
                                      fc.mean('f10_goldPerMin').alias('avgF10goldPerMin'),
                                      fc.mean('f10_csPerMin').alias('avgF10csPerMin'),
                                      fc.mean('f10_xpPerMin').alias('avgF10xpPerMin')
                                      )

# result.show(100, False)
result.printSchema()
result.write.options(header=True, delimiter=',').csv(part_dir)

# Merge single-row parts into summary CSV
os.chdir(part_dir)

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv(champStats_file, index=False, encoding='utf-8-sig')

os.chdir('../..')

<h3>Convert output CSVs to JSON</h3>

In [None]:
# Convert matches CSV to JSON
df_matches = pd.read_csv(sumMatches_csv)
df_matches.to_json(path_or_buf=sumMatches_json, orient='records')


# Add champ name to champ_stats from champion.json, convert to JSON
df_champ_stats = pd.read_csv(champStats_csv)

with open(champInfo_json, 'r') as f:
    champs = json.loads(f.read())

data = list()  # get champion info as list of lists [Name, ID]
for item in champs['data']:
    champ_name = champs['data'][item]['id']
    champ_id = champs['data'][item]['key']

    row = [champ_name, champ_id]
    data.append(row)

df_champs = pd.DataFrame(data, columns=['championName', 'championID'])  # convert data list to pd dataframe
df_champs['championID'] = df_champs['championID'].astype(int)

df_join = pd.merge(df_champ_stats, df_champs, on='championID')  # merge champ stats and info dfs, restructure cols
cols = df_join.columns.tolist()
cols = cols[-1:] + cols[:-1]
df_join = df_join[cols]

df_join.to_json(path_or_buf=champStats_json, orient='records')

<h1>Upload to Firebase</h1>

<p>
    The following code is used for uploading the summoner_matches and champ_stats JSON documents to a specified Firebase realtime databse. It has been commented out since the database has already been created and is ready to be queried using the ipywidget interface defined above.
</p>

In [None]:
# url = 'https://summoner-stats-c8f6a-default-rtdb.firebaseio.com/'
# matches = 'data/summoner_matches.json'  # temp inputs for IDE testing
# champ_stats = 'data/champ_stats.json'

# # Load roster data to knapp-hw1 Firebase project
# with open(matches) as f:  # load file in as list
#     match_list = json.load(f)
# match_data = json.dumps(match_list)  # convert list to JSON string
# matches_url = url + '/summoner_matches.json'  # set url for upload
# r = requests.put(matches_url, data=match_data)  # execute put
# # print(r.text)  # print status if necessary

# # Load chat data to knapp-hw1 Firebase project
# with open(champ_stats) as f:
#     champ_stats_list = json.load(f)
# champ_stats_data = json.dumps(champ_stats_list)
# champ_stats_url = url + '/champ_stats.json'
# r = requests.put(champ_stats_url, data=champ_stats_data)
# # print(r.text)

<p>
    <b>Process Diagram: </b>shown below is a rough diagram of the database creation process and the technologies used with added information about the Binder interface
</p>

<img src='figures/diagram.png'>

<h1>Comments &amp; Conclusions</h1>

<p>
    Working on this project has been an absolute blast; I never thought that I would be able to bridge my understanding of programming and my passion for video games in such a tangible way. The possibilities for future ML explorations using this database creation platform/code as a backbone are endless, and I can't wait to get started chipping away at that original driving goal of understanding what wins and loses a game of League. In working on this project, I encountered many technologies and problems I had never worked with/through before: 504 error handling and time throttling my code so I didn't overload my API key limits, working with Firebase in a production setting, properly setting up Spark/Hadoop on a Docker container/JupyterHub cloud environment, and many other fun challenges to overcome. Altogether, I'm proud of what I've accomplished with this project and am looking forward to further polishing the tool as I move forward into the summer. My skills with database management and Python development have improved tremendously, and I can't wait to take that progress and continue to build on it as I move forward in my education and professional life.
</p>

<img src='figures/win_loss.png'>

<br>