# A League of Legends Recommender System

## Part 1: Data Scraping & Cleaning

In this notebook we'll be collecting the necessary data to train our model off the [Riot API](https://developer.riotgames.com/) and [op.gg](https://www.op.gg/) website. Note that if you wish to run this notebook for yourself, you'll have to input your own Riot Development API Key in a `.env` file in the same folder as this notebook.

In order to obtain the details of each match, such as items, rune builds, etc. for our model, we will first have to scrape the encrypted summoner IDs of all of the players on the ranked ladder using the League-V4 API. We'll only consider Diamond ranked and higher players, as this comprises the top-most 4.5% of players. As there are over 50,000 such players, we'll also have to scrape significantly fewer, due to rate limits on the API.

After, we'll call the Account-V1 API to convert the encrypted summoner IDs to account PUUIDS. Then we use the Match-V5 API to retrieve the 10 most recently played ranked matches for each account. This should land us with a couple tens of thousands of matches ranked Emerald+, which will then need to be cleaned in the next notebook.

---

Start by loading the API key in the `.env` file. If you create your own, it should be of the form
```
api_key = '{YOUR_API_KEY}'
```
Personal API keys are limited to
- 20 requests every 1 second
- 100 requests every 2 minutes

In [2]:
from dotenv import load_dotenv
import json
import os
import pandas as pd
import requests
import time
load_dotenv()
pd.set_option('display.max_columns', None)

In [3]:
# You will have to input your api key within your own .env file
api_key = os.environ.get('api_key')

We'll use the `requests` library to make calls to the API. To start off, we create a function that scrapes the ladder.

In [3]:
def get_ranked_players(tier, region='NA1', div_start=1, div_end=4, page_start=1, page_end=1):
    """
    Gets the encrypted summoner IDs of current ranked players of a single tier, all divisions.

    Args:
        tier (str): Tier of rankings to retrieve. Valid options are:
            - `CHALLENGER`
            - `GRANDMASTER`
            - `MASTER`
            - `DIAMOND`
            - `EMERALD`
            - `PLATINUM`
            - `GOLD`
            - `SILVER`
            - `BRONZE`
            - `IRON`
        region (str, optional): Region. Defaults to 'NA1'.
        div_start (int, optional): Only relevant for tiers `DIAMOND` and lower. Division to start retrieving on. Defaults to 1.
        div_end (int, optional): Only relevant for tiers `DIAMOND` and lower. Division to end retrieving on. Must be greater than or equal to div_start. Max of 4. Defaults to 4.
        page_start (int, optional): Only relevant for tiers `DIAMOND` and lower. Page to start retrieving on. Defaults to 1.
        page_end (int, optional): Only relevant for tiers `DIAMOND` and lower. Page to end retrieving on. Must be greater than or equal to page_start. Defaults to 1.

    Returns:
        list: Encrypted summoner IDs.

    Raises:
        ValueError: If `tier` is not valid.
        ValueError: If `div_end` is less than `div_start` or greater than 4.
        ValueError: If `page_end` is less than `page_start`.
        requests.exceptions.HTTPError: If an HTTP error occurs during the request.
        Exception: For other errors during the request.
    """

    # Validate tier
    valid_tiers = ['challenger', 'grandmaster', 'master', 'diamond', 'emerald', 'platinum', 'gold', 'silver', 'bronze', 'iron']
    if tier.lower() not in valid_tiers:
        raise ValueError('Invalid tier.')

    # Handle top 3 tiers with different URL
    if tier.lower() in ['challenger', 'grandmaster', 'master']:
        url = f'https://{region}.api.riotgames.com/lol/league/v4/{tier.lower()}leagues/by-queue/RANKED_SOLO_5x5?api_key={api_key}'
        try:
            print(f'Retrieving {tier.capitalize()}')
            response = requests.get(url)
            response.raise_for_status()             # Check if the request was successful
            return [entry['summonerId'] for entry in response.json()['entries']]

        except requests.exceptions.HTTPError as http_err:
            print(f'HTTP error occurred: {http_err}')
            raise
        except Exception as err:
            print(f'Error occurred: {err}')
            raise

    # Validate divisions and pages
    if div_start > div_end or div_end > 4:
        raise ValueError('div_end must be greater than or equal to div_start and less than or equal to 4.')
    if page_start > page_end or page_start < 1:
        raise ValueError('page_start must be greater than 0 and less than or equal to page_end.')

    summoner_ids = []
    divisions = ['I', 'II', 'III', 'IV'][div_start - 1:div_end]

    # Loop through selected divisions
    for div in divisions:
        # Loop through selected pages
        for page in range(page_start, page_end + 1):
            url = f'https://{region}.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/{tier.upper()}/{div}?page={page}&api_key={api_key}'
            try:
                print(f'Retrieving {tier.capitalize()} {div} page {page}')
                api_response = requests.get(url)
                api_response.raise_for_status()     # Check if the request was successful
                data = api_response.json()
                summoner_ids.extend(entry['summonerId'] for entry in data)
                
                # To avoid hitting API limits. Do not run on final loop
                if not (div == divisions[-1] and page == page_end):
                    time.sleep(1.25)

            except requests.exceptions.HTTPError as http_err:
                print(f'HTTP error occurred: {http_err}')
                raise
            except Exception as err:
                print(f'Error occurred: {err}')
                raise

    return summoner_ids

In [4]:
challenger_players = get_ranked_players(tier='challenger')
time.sleep(1.25)
grandmaster_players = get_ranked_players(tier='grandmaster')
time.sleep(1.25)
master_players = get_ranked_players(tier='master')
time.sleep(1.25)

Retrieving Challenger
Retrieving Grandmaster
Retrieving Master


As master tier is quite large with 7,000 or so players, we'll drop 7/8 of them. This is simply so that we don't need to spend days requesting match data from the API. Probability wise, most of them will still appear in the previous matches of the players we request anyway.

We'll also cut the number of challenger and grandmaster players by 3/4.

In [5]:
challenger_players_cut = challenger_players[::2][::2]
grandmaster_players_cut = grandmaster_players[::2][::2]
master_players_cut = master_players[::2][::2][::2]

print(len(challenger_players_cut))
print(len(grandmaster_players_cut))
print(len(master_players_cut))
print(f'total: {len(challenger_players_cut + grandmaster_players_cut + master_players_cut)}')

75
175
880
total: 1130


For Diamond I, 5 pages are picked, again so as to limit the time needed due to API constraints. Pages are staggered to create better coverage of the division. We can use https://www.op.gg/statistics/tiers to see the number of players per division. Since each page in the API call contains 205 players, we can determine the total number of pages a division has and create an even spread from that. We will automate the scraping of the number of players per division via `Selenium`.

**You will need to specify your own browser and webdriver paths in the codeblock below to run it.**

In [6]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

def scrape_opgg():
    # YOU WILL NEED TO SPECIFY YOUR OWN CHROME EXECUTABLE PATH
    chrome_options = Options()
    chrome_options.binary_location = r'YOUR\PATH\HERE'

    # WebDriver path
    chromedriver_path = os.path.join('utils', 'chromedriver-win64', 'chromedriver.exe')

    # Set up the WebDriver with automatic ChromeDriver management
    driver = webdriver.Chrome(service=Service(chromedriver_path), options=chrome_options)

    # Open the webpage
    driver.get("https://www.op.gg/statistics/tiers")

    # Wait for the table to load
    driver.implicitly_wait(2)

    # Find the table element
    table = driver.find_element(By.CSS_SELECTOR, "table")

    # Extract table rows
    rows = table.find_elements(By.TAG_NAME, "tr")

    # Loop through the rows and columns to get the data
    data = []
    for row in rows:
        cells = row.find_elements(By.TAG_NAME, "td")
        row_data = [cell.text for cell in cells]
        data.append(row_data)

    driver.quit()

    return data

In [7]:
data = scrape_opgg()

In [8]:
# Filter out the data that isn't Diamond I+
data_temp = []
for entry in data[4:5]:     # Replace with your own splice if you want tiers and divisions other than just Diamond I
    data_temp.append(entry[2])

data_filtered = []
for entry in data_temp:
    # Remove the percentage part and any extra spaces
    number_str = entry.split(' ')[0]
    # Remove commas and convert to integer
    number = int(number_str.replace(',', ''))
    data_filtered.append(number)

data_filtered

[5628]

In [11]:
# Create dictionary for the next code block
diamond_i_count = data_filtered[0]
#diamond_counts = {i + 1: data_filtered[i] for i in range(4)}
#emerald_counts = {i - 3: data_filtered[i] for i in range(4, 8)}

print(diamond_i_count)
#print(diamond_counts)
#print(emerald_counts)

5628


Now we retrieve 5 pages from the Diamond I division. Because it would still be too time-consuming to request all of the matches from these players, we'll quarter it so that we have ~250 players.

In [15]:
diamond_i_players = []
#diamond_players = []
#emerald_players = []

for page in range(0, 5):
    page_start_end = max(1, diamond_i_count // 205 // 5 * page)     # (Page 1 and 1/5, 2/5, 3/5, 4/5 of diamond_i_count//205)
    diamond_i_players.extend(get_ranked_players(tier='diamond', div_start=1, div_end=1, page_start=page_start_end, page_end=page_start_end))
    time.sleep(1.25)

#for div in range(1, 5):
#    diamond_div_count = diamond_counts[div]
#    for page in range(0, 5):
#        page_start_end = max(1, diamond_div_count // 205 // 5 * page)     # (Page 1 and 1/5, 2/5, 3/5, 4/5 of diamond_div_count//205)
#        diamond_players.extend(get_ranked_players(tier='diamond', div_start=div, div_end=div, page_start=page_start_end, page_end=page_start_end))
#        time.sleep(1.25)

#for div in range(1, 5):
#    emerald_div_count = emerald_counts[div]
#    for page in range(0, 5):
#        page_start_end = max(1, emerald_div_count // 205 // 5 * page)
#        emerald_players.extend(get_ranked_players(tier='emerald', div_start=div, div_end=div, page_start=page_start_end, page_end=page_start_end))
#        time.sleep(1.25)

Retrieving Diamond I page 1
Retrieving Diamond I page 5
Retrieving Diamond I page 10
Retrieving Diamond I page 15
Retrieving Diamond I page 20


In [16]:
# Cutting number of Diamond I players in half
diamond_i_players_cut = diamond_i_players[::2]
#diamond_players_cut = diamond_players[::2][::2]
#emerald_players_cut = emerald_players[::2][::2]

print(len(diamond_i_players_cut))
#print(len(diamond_players_cut))
#print(len(emerald_players_cut))

513


Merge everything together into one list and save it to a `.txt` file.

In [17]:
print(len(challenger_players_cut))
print(len(grandmaster_players_cut))
print(len(master_players_cut))
print(len(diamond_i_players_cut))
#print(len(diamond_players_cut))
#print(len(emerald_players_cut))

top_players = challenger_players_cut + grandmaster_players_cut + master_players_cut + diamond_i_players_cut #+ diamond_players_cut + emerald_players_cut
len(top_players)

75
175
880
513


1643

In [15]:
os.makedirs('data', exist_ok=True)

with open(os.path.join('data', 'top_players.txt'), 'w') as file:
    for player in top_players:
        file.write(f'{player}\n')

del file, player

In [14]:
# Read top_players out of the .txt file in case no longer in memory
with open(os.path.join('data', 'top_players.txt'), 'r') as file:
    lines = file.readlines()

top_players = [line.strip() for line in lines]
del file, lines

Now we create the function to retrieve the account PUUID from the encrypted summoner ID.

In [21]:
def get_puuid(summonerId, region='NA1'):
    """
    Gets the PUUID from an encrypted summoner ID.

    Args:
        summonerId (str): Encrypted Summoner ID.
        region (str, optional): Region. Defaults to 'NA1'.

    Returns:
        str: puuid

    Raises:
        ValueError: If the summonerId is None.
        requests.exceptions.HTTPError: If an HTTP error occurs during the request.
        Exception: For other errors during the request.
    """

    if summonerId is None:
        raise ValueError('Empty summoner ID.')

    url = f'https://{region}.api.riotgames.com/lol/summoner/v4/summoners/{summonerId}?api_key={api_key}'
    try:
        response = requests.get(url)
        response.raise_for_status()
        return response.json()['puuid']

    except requests.exceptions.HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
        raise
    except Exception as err:
        print(f'Error occurred: {err}')
        raise

In [22]:
puuids = []
num_players = len(top_players)
for i, player in enumerate(top_players):
    print(f'Retrieving PUUID {i + 1} of {num_players}.')
    puuids.append(get_puuid(player))
    time.sleep(1.25)

Retrieving PUUID 1 of 1643.
Retrieving PUUID 2 of 1643.
Retrieving PUUID 3 of 1643.
Retrieving PUUID 4 of 1643.
Retrieving PUUID 5 of 1643.
Retrieving PUUID 6 of 1643.
Retrieving PUUID 7 of 1643.
Retrieving PUUID 8 of 1643.
Retrieving PUUID 9 of 1643.
Retrieving PUUID 10 of 1643.
Retrieving PUUID 11 of 1643.
Retrieving PUUID 12 of 1643.
Retrieving PUUID 13 of 1643.
Retrieving PUUID 14 of 1643.
Retrieving PUUID 15 of 1643.
Retrieving PUUID 16 of 1643.
Retrieving PUUID 17 of 1643.
Retrieving PUUID 18 of 1643.
Retrieving PUUID 19 of 1643.
Retrieving PUUID 20 of 1643.
Retrieving PUUID 21 of 1643.
Retrieving PUUID 22 of 1643.
Retrieving PUUID 23 of 1643.
Retrieving PUUID 24 of 1643.
Retrieving PUUID 25 of 1643.
Retrieving PUUID 26 of 1643.
Retrieving PUUID 27 of 1643.
Retrieving PUUID 28 of 1643.
Retrieving PUUID 29 of 1643.
Retrieving PUUID 30 of 1643.
Retrieving PUUID 31 of 1643.
Retrieving PUUID 32 of 1643.
Retrieving PUUID 33 of 1643.
Retrieving PUUID 34 of 1643.
Retrieving PUUID 35 of 

In [12]:
os.makedirs('data', exist_ok=True)

with open(os.path.join('data', 'puuids.txt'), 'w') as file:
    for puuid in puuids:
        file.write(f'{puuid}\n')

del file, puuid

In [11]:
# Read puuids out of the .txt file in case no longer in memory
with open(os.path.join('data', 'puuids.txt'), 'r') as file:
    lines = file.readlines()

puuids = [line.strip() for line in lines]
del file, lines

Creating the function to retrieve the last few number of games played by a given PUUID.

In [28]:
def get_match_history(puuid, region='americas', start=0, count=20):
    """
    Gets the match history of player from their PUUID.

    Args:
        puuid (str): PUUID of player.
        region (str, optional): Region. Defaults to 'americas'.
        start (int, optional): Start index of matches. Defaults to 0
        count (int, optional): Number of match IDs to return. Max 100. Defaults to 20.

    Returns:
        list: match IDs.

    Raises:
        ValueError: If the puuid is None.
        requests.exceptions.HTTPError: If an HTTP error occurs during the request.
        Exception: For other errors during the request.
    """

    if puuid is None:
        raise ValueError('Empty PUUID.')

    url = f'https://{region}.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids?start={start}&count={count}&api_key={api_key}'
    try:
        print(f'Retrieving last {count} matches for {puuid}')
        response = requests.get(url)
        response.raise_for_status()
        return response.json()

    except requests.exceptions.HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
        raise
    except Exception as err:
        print(f'Error occurred: {err}')
        raise

In [29]:
matches = []

for i, puuid in enumerate(puuids):
    print(f'{i + 1} of {num_players}: ')
    matches.extend(get_match_history(puuid, count=5))       # Restrict to 5 games to save time querying the API
    time.sleep(1.25)

1 of 1643: 
Retrieving last 5 matches for Wy78TgnLmlcE9ORqWb1FF16XT_FJA5rx-UhjdWcKc-_fektTs3-utiTLLUYe_e-VEIFYXxm5cX56XA
2 of 1643: 
Retrieving last 5 matches for MN8tavlvwHP2MIjVATHRVPmoVXFSzEUC-wmBeYZh_MXWgDNr9H6Xl6pv-5JcXMtcBlbq6uMU57SqBA
3 of 1643: 
Retrieving last 5 matches for -5FeHlelVSUfKPm8yG9yZRfe25Mj4AOz-i1GcPQZ8Bd8OoHbOB9Pg8eV5zQx0PxKXACTVlmfeDHVzg
4 of 1643: 
Retrieving last 5 matches for ka3y_RQrNPrq0wg75Z2yHEvsRtVhdK8RQfEthg02J4iWUBWJZbj5yLjPm3HoJpcQH3PMh6uIa7ktgw
5 of 1643: 
Retrieving last 5 matches for xypYuWILvH6MKgyYfEkDNZgb5Vs4WKrKNjZYwYUN71dvfpaxBZ6PEO6GehHYIUFtUNuLEcVQNeAR_Q
6 of 1643: 
Retrieving last 5 matches for 8qfeh7Q2eHiyNAvFAGrtrslkaT8oKgwC8roiZtqSZi61dclUFH6npnzMu7LQcAUxaMi36RxUqE6quQ
7 of 1643: 
Retrieving last 5 matches for 08_3fiFoVvu7LPx7qOFfiDVWZ1E6l7kSuxEKJTCORP_kyZm14uJtelEX_etbikKGNS30CtqU8FgzWA
8 of 1643: 
Retrieving last 5 matches for v2rNz-h3D4s4qBLLQw-Afnz7w3BfekNGp5ydOI7PJjzX41IFaBt47nfifIUXQU5GurKFp0uwM8qSMg
9 of 1643: 
Retrieving last 5 ma

In [30]:
# Removing duplicate matches
matches = list(set(matches))
len(matches)

7268

In [17]:
os.makedirs('data', exist_ok=True)

with open(os.path.join('data', 'matches.txt'), 'w') as file:
    for match in matches:
        file.write(f'{match}\n')

del file, match

In [16]:
# Read matches out of the .txt file in case no longer in memory
with open(os.path.join('data', 'matches.txt'), 'r') as file:
    lines = file.readlines()

matches = [line.strip() for line in lines]
del file, lines

We now have close to 10,000 games to examine. We'll need a function which requests the API for the details of each match.

In [4]:
def get_match_details(matchId, region='americas'):
    """
    Gets the relevent details of a match from its ID.

    Args:
        matchId (str): ID of the match.
        region (str, optional): Region. Defaults to 'americas'.

    Returns:
        dict: All of the match details.

    Raises:
        ValueError: If the matchId is None.
        requests.exceptions.HTTPError: If an HTTP error occurs during the request.
        Exception: For other errors during the request.
    """

    if matchId is None:
        raise ValueError('Empty matchId.')

    url = f'https://{region}.api.riotgames.com/lol/match/v5/matches/{matchId}?api_key={api_key}'
    try:
        response = requests.get(url)
        response.raise_for_status()
        return response.json()

    except requests.exceptions.HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
        raise
    except Exception as err:
        print(f'Error occurred: {err}')
        raise

Now we get the details of all of our matches.

In [5]:
matches_detailed = []
num_matches = len(matches)

for i, match in enumerate(matches):
    print(f'Retrieving details for match {i + 1} of {num_matches}.')
    matches_detailed.append(get_match_details(match))
    time.sleep(1.25)

Retrieving details for match 1 of 7268.
Retrieving details for match 2 of 7268.
Retrieving details for match 3 of 7268.
Retrieving details for match 4 of 7268.
Retrieving details for match 5 of 7268.
Retrieving details for match 6 of 7268.
Retrieving details for match 7 of 7268.
Retrieving details for match 8 of 7268.
Retrieving details for match 9 of 7268.
Retrieving details for match 10 of 7268.
Retrieving details for match 11 of 7268.
Retrieving details for match 12 of 7268.
Retrieving details for match 13 of 7268.
Retrieving details for match 14 of 7268.
Retrieving details for match 15 of 7268.
Retrieving details for match 16 of 7268.
Retrieving details for match 17 of 7268.
Retrieving details for match 18 of 7268.
Retrieving details for match 19 of 7268.
Retrieving details for match 20 of 7268.
Retrieving details for match 21 of 7268.
Retrieving details for match 22 of 7268.
Retrieving details for match 23 of 7268.
Retrieving details for match 24 of 7268.
Retrieving details for ma

As this particular step will take a very long time without a developer API key, we may need to stop and continue the process for various reasons, such as the Riot API crashing (as it sometimes does) or needing to refresh the API key. It also seems like some matches may end up being purged from the server.

In [None]:
for i in range(--, num_matches):
    print(f'Retrieving details for match {i + 1} of {num_matches}.')
    matches_detailed.append(get_match_details(matches[i]))
    time.sleep(1.25)

In [6]:
len(matches_detailed)

7268

In [7]:
os.makedirs('data', exist_ok=True)

with open(os.path.join('data', 'matches_detailed.txt'), 'w') as file:
    json.dump(matches_detailed, file, indent=1)

In [4]:
# Read match details out of the .txt file in case no longer in memory
with open(os.path.join('data', 'matches_detailed.txt'), 'r') as file:
    matches_detailed = json.load(file)

del file

As the file storing the data is now well over 700MB in size, it will not be included in the GitHub repository. We should now have approximately 5,000 - 6,000 or so games (after removing non-ranked matches) worth of data. Unfortunately, this is not even close to the amount of data we could have should we get every match from every Diamond I+ player, but it would simply be too time-consuming and memory-heavy to include them. We could process it in batches to circumvent this, but we choose not to do that for this particular project.

The details of each match are contained in JSON format, so we'll need to extract out the information. While there exists a ton of information in the match details, we'll only take the information that we find most relevant with regards to performing some exploratory data analysis in the next part. We'll also drop the games that are either not ranked games, are remakes (game lasts less than 15 minutes), or are bot games (which for some reason are classified as ranked).

In [10]:
# Recursively search through JSON to extract values associated with a key
def json_extract(obj, key):
    def extract(obj, key):
        values = []
        if isinstance(obj, dict):
            for k, v in obj.items():
                if k == key:
                    values.append(v)
                if isinstance(v, (dict, list)):
                    values.extend(extract(v, key))
        elif isinstance(obj, list):
            for item in obj:
                values.extend(extract(item, key))

        return values

    return extract(obj, key)



def process_match(match_json):
    """
    Processes the match via its JSON into a pandas DataFrame with the most relevant information.

    Args:
        match_json (dict): The JSON of match details.

    Returns:
        pd.DataFrame: A Dataframe containing the relevant match information.

    Raises:
        ValueError: If the match is not a ranked game or the game duration is less than 15 minutes.
    """

    info = match_json['info']
    players = info['participants']

    # Filter out non-ranked games and remakes
    if info['queueId'] != 420:
        raise ValueError('Not a ranked game.')
    if info['gameDuration'] < 900:
        raise ValueError('Game too short.')

    # Define columns of DataFrame
    match_data = {
        'Match ID': [match_json['metadata']['matchId']] * 10,
        'Game Duration': [info['gameDuration']] * 10,
        'Game Version': [info['gameVersion']] * 10,
        'Summoner Name': [player['riotIdGameName'] if 'riotIdGameName' in player else player['riotIdName'] for player in players],
        'Summoner Tag': [player['riotIdTagline'] for player in players],
        'Champion ID': [player['championId'] for player in players],
        'Champion Name': [player['championName'] for player in players],
        'Champion Level': [player['champLevel'] for player in players],
        'Team': [player['teamId'] for player in players],
        'Role': [player['teamPosition'] for player in players],
        'Kills': [player['kills'] for player in players],
        'Deaths': [player['challenges']['deathsByEnemyChamps'] for player in players],
        'Assists': [player['assists'] for player in players],
        'CS': [player['totalMinionsKilled']for player in players],
        'CS (Jungle)': [(player['totalAllyJungleMinionsKilled'] + player['totalEnemyJungleMinionsKilled']) for player in players],
        'First Blood': [player['firstBloodKill'] for player in players],
        'First Tower': [player['firstTowerKill'] for player in players],
        'Objective Stolen': [player['objectivesStolen'] for player in players],
        'Total Gold Earned': [player['goldEarned'] for player in players],
        'Gold Spent': [player['goldSpent'] for player in players],
        'Gold/Minute': [player['challenges']['goldPerMinute'] for player in players],
        'Damage Dealt': [player['totalDamageDealtToChampions'] for player in players],
        '% of Team\'s Damage': [player['challenges']['teamDamagePercentage'] for player in players],
        'Damage Taken': [player['totalDamageTaken'] for player in players],
        'Damage Mitigated': [player['damageSelfMitigated'] for player in players],
        'Heal and Shielding': [player['challenges']['effectiveHealAndShielding'] for player in players],
        'CC Time Dealt': [player['totalTimeCCDealt'] for player in players],
        'Turret Plates Taken': [player['challenges']['turretPlatesTaken'] for player in players],
        'Turret Takedowns': [player['turretTakedowns'] for player in players],
        'Vision Score': [player['visionScore'] for player in players],
        'Rune 1': [player['perks']['styles'][0]['selections'][0]['perk'] for player in players],
        'Rune 2': [player['perks']['styles'][0]['selections'][1]['perk'] for player in players],
        'Rune 3': [player['perks']['styles'][0]['selections'][2]['perk'] for player in players],
        'Rune 4': [player['perks']['styles'][0]['selections'][3]['perk'] for player in players],
        'Sec Rune 1': [player['perks']['styles'][1]['selections'][0]['perk'] for player in players],
        'Sec Rune 2': [player['perks']['styles'][1]['selections'][1]['perk'] for player in players],
        'Stat 1': [player['perks']['statPerks']['offense'] for player in players],
        'Stat 2': [player['perks']['statPerks']['flex'] for player in players],
        'Stat 3': [player['perks']['statPerks']['defense'] for player in players],
        'Summ 1': [player['summoner1Id'] for player in players],
        'Summ 2': [player['summoner2Id'] for player in players],
        'Item 1': [player['item0'] for player in players],
        'Item 2': [player['item1'] for player in players],
        'Item 3': [player['item2'] for player in players],
        'Item 4': [player['item3'] for player in players],
        'Item 5': [player['item4'] for player in players],
        'Item 6': [player['item5'] for player in players],
        #'Ward': [player['item6'] for player in players],
        'Triplekills': [player['tripleKills'] for player in players],
        'Quadrakills': [player['quadraKills'] for player in players],
        'Pentakills': [player['pentaKills'] for player in players],
        'Grubs Taken (Team)': [info['teams'][0]['objectives']['horde']['kills'] if player['teamId'] == 100
                          else info['teams'][1]['objectives']['horde']['kills'] for player in players],
        'Heralds Taken (Team)': [info['teams'][0]['objectives']['riftHerald']['kills'] if player['teamId'] == 100
                            else info['teams'][1]['objectives']['riftHerald']['kills'] for player in players],
        'Barons Taken (Team)': [info['teams'][0]['objectives']['baron']['kills'] if player['teamId'] == 100
                           else info['teams'][1]['objectives']['baron']['kills'] for player in players],
        'Dragons Taken (Team)': [info['teams'][0]['objectives']['dragon']['kills'] if player['teamId'] == 100
                            else info['teams'][1]['objectives']['dragon']['kills'] for player in players],
        'Game Ended in Surrender': [player['gameEndedInSurrender'] for player in players],
        'Win': [player['win'] for player in players]
    }

    match_df = pd.DataFrame(match_data)

    return match_df

In [11]:
all_matches_df = pd.DataFrame()
num_matches_detailed = len(matches_detailed)

for i, match in enumerate(matches_detailed):
    try:
        print(f'Processing match {i + 1} of {num_matches_detailed}.')
        match_df = process_match(match)
        # Append latest match to the cumulative DataFrame
        all_matches_df = pd.concat([all_matches_df, match_df], ignore_index=True)
    except ValueError as e:
        print(f'Error processing match {i + 1}: {e} Skipping this match.')

Processing match 1 of 7268.
Processing match 2 of 7268.
Processing match 3 of 7268.
Error processing match 3: Not a ranked game. Skipping this match.
Processing match 4 of 7268.
Processing match 5 of 7268.
Processing match 6 of 7268.
Processing match 7 of 7268.
Processing match 8 of 7268.
Error processing match 8: Not a ranked game. Skipping this match.
Processing match 9 of 7268.
Processing match 10 of 7268.
Error processing match 10: Not a ranked game. Skipping this match.
Processing match 11 of 7268.
Processing match 12 of 7268.
Processing match 13 of 7268.
Processing match 14 of 7268.
Processing match 15 of 7268.
Processing match 16 of 7268.
Error processing match 16: Not a ranked game. Skipping this match.
Processing match 17 of 7268.
Error processing match 17: Not a ranked game. Skipping this match.
Processing match 18 of 7268.
Processing match 19 of 7268.
Processing match 20 of 7268.
Processing match 21 of 7268.
Error processing match 21: Not a ranked game. Skipping this match.


In [15]:
print(f'Shape: {all_matches_df.shape}')
all_matches_df.head(10)

Shape: (47670, 56)


Unnamed: 0,Match ID,Game Duration,Game Version,Summoner Name,Summoner Tag,Champion ID,Champion Name,Champion Level,Team,Role,Kills,Deaths,Assists,CS,CS (Jungle),First Blood,First Tower,Objective Stolen,Total Gold Earned,Gold Spent,Gold/Minute,Damage Dealt,% of Team's Damage,Damage Taken,Damage Mitigated,Heal and Shielding,CC Time Dealt,Turret Plates Taken,Turret Takedowns,Vision Score,Rune 1,Rune 2,Rune 3,Rune 4,Sec Rune 1,Sec Rune 2,Stat 1,Stat 2,Stat 3,Summ 1,Summ 2,Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Triplekills,Quadrakills,Pentakills,Grubs Taken (Team),Heralds Taken (Team),Barons Taken (Team),Dragons Taken (Team),Game Ended in Surrender,Win
0,NA1_5095396461,1404,14.16.612.449,K9 BDE,NA1,122,Darius,14,100,TOP,4,2,4,146,0,True,False,0,9688,8608,413.808623,8867,0.135032,14725,11145,0.0,41,5,2,22,8010,9111,9104,8299,8473,8242,5005,5008,5001,4,6,1055,3078,0,3053,3111,0,0,0,0,2,1,1,3,True,True
1,NA1_5095396461,1404,14.16.612.449,Hani,zzzzz,141,Kayn,14,100,JUNGLE,5,4,9,14,133,False,False,0,10510,9050,448.913791,12101,0.184288,29216,19095,0.0,362,0,1,14,8010,9111,9105,8017,8138,8105,5008,5008,5001,4,11,6694,6692,3134,3111,1036,0,0,0,0,2,1,1,3,True,True
2,NA1_5095396461,1404,14.16.612.449,bean and peanut,rina,893,Aurora,14,100,MIDDLE,2,2,13,183,0,False,False,0,9382,7950,400.702712,20446,0.311367,16188,7265,0.0,214,2,1,19,8112,8139,8138,8106,8226,8210,5005,5008,5001,4,12,1056,6653,1082,3118,3111,0,0,0,0,2,1,1,3,True,True
3,NA1_5095396461,1404,14.16.612.449,Lucius Artorius,CV1,22,Ashe,14,100,BOTTOM,8,1,8,200,4,False,False,0,12423,11200,530.618963,16308,0.248344,7652,3517,0.0,616,6,4,19,8005,9101,9104,8017,8345,8410,5005,5008,5011,4,1,1055,6672,3046,3031,1036,3006,0,0,0,2,1,1,3,True,True
4,NA1_5095396461,1404,14.16.612.449,whiteman enjoyer,2222,147,Seraphine,11,100,UTILITY,2,3,13,25,0,False,True,0,7418,6150,316.823169,7943,0.120969,11305,5905,6384.027344,132,5,4,51,8465,8463,8473,8453,8009,8017,5005,5010,5011,4,7,6620,3158,6617,3869,1004,0,0,0,0,2,1,1,3,True,True
5,NA1_5095396461,1404,14.16.612.449,gemi swift,NA1,78,Poppy,13,200,TOP,5,4,1,146,0,False,False,0,8791,8000,375.489477,11528,0.234211,19294,23368,0.0,782,1,1,13,8230,8226,8234,8237,8444,8451,5008,5008,5011,4,12,1054,3111,6660,6662,8020,1029,0,0,0,4,0,0,0,True,False
6,NA1_5095396461,1404,14.16.612.449,Giraffe Hugs,NA1,876,Lillia,13,200,JUNGLE,3,5,5,9,125,False,False,0,8660,8600,369.896944,11602,0.235707,31443,14583,0.0,437,0,0,27,8010,9111,9105,8014,8347,8304,5008,5008,5001,4,11,4633,1082,6653,3158,3916,0,0,0,0,4,0,0,0,True,False
7,NA1_5095396461,1404,14.16.612.449,Cupic,Senna,99,Lux,13,200,MIDDLE,1,5,1,188,0,False,False,0,7768,7250,331.782796,12022,0.244247,9643,5812,902.400024,387,1,0,10,8128,8126,8138,8106,8017,8009,5008,5008,5011,4,14,6655,3020,1082,1056,3145,1058,0,0,0,4,0,0,0,True,False
8,NA1_5095396461,1404,14.16.612.449,ThëBeesKnees,0001,202,Jhin,12,200,BOTTOM,2,4,6,163,0,False,False,0,7898,7175,337.335038,10315,0.209562,12305,6903,0.0,153,0,1,10,8021,8009,9103,8014,8234,8236,5008,5008,5001,21,4,1055,3087,3009,1018,1037,1038,0,0,0,4,0,0,0,True,False
9,NA1_5095396461,1404,14.16.612.449,Shiku,LMB,526,Rell,9,200,UTILITY,1,3,8,24,0,False,False,0,5788,5300,247.205844,3754,0.076274,17922,12144,0.0,53,0,1,67,8439,8446,8444,8451,8347,8306,5007,5010,5011,14,4,1001,3083,3067,3869,0,0,0,0,0,4,0,0,0,True,False


In [13]:
os.makedirs('data', exist_ok=True)

all_matches_df.to_csv(os.path.join('data', 'matches_data.csv'), index=False)

In [14]:
# Read the DataFrame out of the .csv file in case no longer in memory
all_matches_df = pd.read_csv(os.path.join('data', 'matches_data.csv'))