---
title: "Data Collection"
format:
    html: 
        code-fold: false
---

{{< include instructions.qmd >}} 


{{< include overview.qmd >}} 

{{< include methods.qmd >}} 

## Data Collection from Riot Games API

Purpose: In the following code, we be collecting data from the Riot Games API using Python. 

Constraints of the script:
I am limiting the reigion is to NA
I originally wanted a simple random sample of 250 Grandmaster, and Challenger rank players (the two highest level player ranks) and their 25 most recent game data. However, the API has run limitations which caused my code to only pull 422 players.

After some inital evaluation, I wanted to include the next lower rank (Master) to better diversify the playerbase (as Challenger + Grandmaster accounted for <1% of the playerbase while Master is 2.2%. 
However, when I ran the script the RIOT API had no data to pull due to the new season coming out. When this happens, the ranks are reset preventing me from pulling the Master rank players.

This script is still configured to pull Master player data.
This scripts only outputs will be the JSON files pulled from the API. All files are located in data/raw-data

In [24]:
#Import python packges script uses
import os
import time
import json
import random
import logging
import requests
from pathlib import Path
import pandas as pd

In [20]:
#NOTE: This code block was made with Chat-GPT 5.0

#This is a test to see if API key works
API_KEY = "PLACEHOLDER" #Note: when published make PLACEHODLER

#Fully play a game of TFT. At the end, there is a match ID.
match_id = "NA1_5419735432"

url = f"https://americas.api.riotgames.com/tft/match/v1/matches/{match_id}"
res = requests.get(url, params={"api_key": API_KEY})

#If STATUS = 200, then API key is good.
print("STATUS:", res.status_code)
print(res.text[:3000])

STATUS: 200
{"metadata":{"data_version":"6","match_id":"NA1_5419735432","participants":["z7tXLLhq0AA5E2G4Jdzq88KOlSkOndkZ56mWJnwqAzT5IV_6EOYhU1_ZySsxZtyudA53apw2vs_p5Q"]},"info":{"endOfGameResult":"GameComplete","gameCreation":1763881823000,"gameId":5419735432,"game_datetime":1763883928853,"game_length":2086.93115234375,"game_version":"Linux Version 15.23.728.3286 (Nov 21 2025/16:26:55) [PUBLIC] <Releases/15.23>","mapId":22,"participants":[{"companion":{"content_ID":"5897ad9f-4665-4372-8f3e-6c878adb8918","item_ID":1,"skin_ID":1,"species":"PetTFTAvatar"},"gold_left":38,"last_round":37,"level":9,"missions":{"PlayerScore2":207},"placement":1,"players_eliminated":3,"puuid":"z7tXLLhq0AA5E2G4Jdzq88KOlSkOndkZ56mWJnwqAzT5IV_6EOYhU1_ZySsxZtyudA53apw2vs_p5Q","riotIdGameName":"vornelix","riotIdTagline":"819","time_eliminated":2078.66748046875,"total_damage_to_players":267,"traits":[{"name":"TFT15_Bastion","num_units":2,"style":1,"tier_current":1,"tier_total":3},{"name":"TFT15_Captain","num_units"

# RIOT API Script

In [9]:
api_key = "PLACEHOLDER" #Note: when publish to Github, it should be PLACEHOLDER 

#Set matches per player, and where the JSON files go.
matches = 25
save_dir = Path("data/raw-data")
save_dir.mkdir(parents=True, exist_ok=True)

#Set up scrapper log to document history
logging.basicConfig(filename="tft_scraper.log",level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

#Request Wrapper with 429 status code
def get_riot(url, params=None):
    if params is None:
        params = {}
    params["api_key"] = API_KEY

    while True:
        r = requests.get(url, params=params)
        if r.status_code == 429:
            retry = int(r.headers.get("Retry-After", 1))
            logging.warning(f"429 hit. Sleeping {retry}s.")
            time.sleep(retry)
            continue
        return r

#Get Master PUUID
def get_master_puuids():
    url = f"https://na1.api.riotgames.com/tft/league/v1/master"
    r = get_riot(url)
    data = r.json()
    entries = data.get("entries", [])
    return [e["puuid"] for e in entries if e.get("puuid")]

#Get Challenger PUUID
def get_challenger_puuids():
    url = f"https://na1.api.riotgames.com/tft/league/v1/challenger"
    r = get_riot(url)
    data = r.json()
    entries = data.get("entries", [])
    return [e["puuid"] for e in entries if e.get("puuid")]

#Get Grandmaster PUUID
def get_grandmaster_puuids():
    url = f"https://na1.api.riotgames.com/tft/league/v1/grandmaster"
    r = get_riot(url)
    data = r.json()
    entries = data.get("entries", [])
    return [e["puuid"] for e in entries if e.get("puuid")]

#For a given PUUID, get MATCH IDS 
def get_match_ids(puuid, count = matches):
    url = f"https://americas.api.riotgames.com/tft/match/v1/matches/by-puuid/{puuid}/ids"
    params = {"count": count, "queue": 1100}  
    r = get_riot(url, params)
    if r.status_code != 200:
        logging.warning(f"Failed match list for PUUID {puuid}")
        return []

    return r.json()

#Fetch a match JSON and save it
def save_match(match_id):
    out_path = save_dir / f"{match_id}.json"
    if out_path.exists():
        return

    url = f"https://americas.api.riotgames.com/tft/match/v1/matches/{match_id}"
    r = get_riot(url)

    if r.status_code == 200:
        with open(out_path, "w") as f:
            json.dump(r.json(), f, indent=2)
        logging.info(f"Saved {match_id}")
    else:
        logging.warning(f"Match {match_id} failed with status {r.status_code}")

def main():
    logging.info("Starting TFT scraper")
    master = get_master_puuids()
    grandmaster = get_grandmaster_puuids()
    challenger = get_challenger_puuids()

    logging.info(f"Master count: {len(master)}")
    logging.info(f"Challenger count: {len(challenger)}")
    logging.info(f"Grandmaster count: {len(grandmaster)}")

    m_sample = random.sample(master, min(250, len(master)))
    gm_sample = random.sample(grandmaster, min(250, len(grandmaster)))
    chall_sample = random.sample(challenger, min(250, len(challenger)))
    all_puuids = m_sample + gm_sample + chall_sample
    random.shuffle(all_puuids)

    for i, puuid in enumerate(all_puuids, start=1):
        logging.info(f"Player {i}/{len(all_puuids)}: {puuid}")
        match_ids = get_match_ids(puuid)

        for m in match_ids:
            fetch_and_save_match(m)

    logging.info("Scrape complete.")

# Once the scrape is complete, get count of files and check they are located correctly.

In [22]:
json_dir = 'data/raw-data'
json_files = [f for f in os.listdir(json_dir) if f.endswith('.json')]
print(len(json_files))

6388
