# Phase 1: Data Collection

In this notebook, we will fetch data from the Riot Games API to build our dataset for the Draft Predictor.

## Goals
1. Connect to the Riot API.
2. Fetch a list of high-ELO players (Challenger/Grandmaster) to get their PUUIDs.
3. Fetch match IDs for these players.
4. Download match details (Champion picks, win/loss) for 100 matches.
5. Save the data to a CSV file.

In [1]:
# Install required packages in the current Jupyter kernel
%pip install requests pandas tqdm

Collecting requests
  Using cached requests-2.32.4-py3-none-any.whl (64 kB)
Collecting pandas
  Downloading pandas-2.0.3-cp38-cp38-macosx_10_9_x86_64.whl (11.7 MB)
[K     |████████████████████████████████| 11.7 MB 1.1 MB/s eta 0:00:01
[?25hCollecting tqdm
  Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.11-py3-none-any.whl (71 kB)
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.4-cp38-cp38-macosx_10_9_universal2.whl (198 kB)
[K     |████████████████████████████████| 198 kB 36.8 MB/s eta 0:00:01
[?25hCollecting urllib3<3,>=1.21.1
  Downloading urllib3-2.2.3-py3-none-any.whl (126 kB)
[K     |████████████████████████████████| 126 kB 107.2 MB/s eta 0:00:01
[?25hCollecting certifi>=2017.4.17
  Using cached certifi-2025.11.12-py3-none-any.whl (159 kB)
Collecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
[K     |████████████████████████████████| 509 kB 14.7 MB/s eta 0:00:01
[?25hCollec

In [2]:
import requests
import pandas as pd
import time
from tqdm import tqdm # Progress bar

# Configuration
REGION = "euw1" # Platform routing value (for summoner/league lookups)
MASS_REGION = "europe" # Regional routing value (for matches: americas, asia, europe, sea)

# PASTE YOUR API KEY HERE
# NOTE: Do not share this notebook with the key inside if you publish it.
API_KEY = "RGAPI-f041cbbf-f30d-458f-b676-025eaacfa325" 

headers = {
    "X-Riot-Token": API_KEY
}

In [6]:
def get_challenger_players(region, count=10):
    """
    Fetches the top 'count' players from Challenger league 
    and retrieves their PUUIDs directly.
    """
    # 1. Get the Challenger League for Ranked Solo/Duo
    url = f"https://{region}.api.riotgames.com/lol/league/v4/challengerleagues/by-queue/RANKED_SOLO_5x5"
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        print(f"Error fetching league: {response.status_code} - {response.text}")
        return []
    
    league_data = response.json()
    entries = league_data['entries']
    
    # Sort by League Points (LP)
    entries.sort(key=lambda x: x['leaguePoints'], reverse=True)
    
    # 2. Extract PUUIDs directly
    top_players = entries[:count]
    player_puuids = [player['puuid'] for player in top_players]
    
    return player_puuids

# Run the function
puuids = get_challenger_players(REGION, count=10)
print(f"Successfully collected {len(puuids)} PUUIDs.")
print(f"Sample PUUID: {puuids[0]}")

Successfully collected 10 PUUIDs.
Sample PUUID: k-qB-NXFoV2iFx8EiQ2ydfdzXuSgCCt0cwusUAEbqpG8w82qUQ-2K8AedBT7zwdDXxNdFtxcvmVcAg


In [8]:
def get_match_ids(puuids, mass_region, count=20):
    """
    Fetches recent match IDs for a list of players.
    """
    match_ids = set() # Use a set to automatically handle duplicates
    print(f"Fetching matches for {len(puuids)} players...")
    
    for puuid in tqdm(puuids):
        # Note: We use MASS_REGION (europe) here, not REGION (euw1)
        url = f"https://{mass_region}.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids"
        params = {
            "start": 0,
            "count": count,
            "queue": 420 # 420 is the ID for Ranked Solo/Duo
        }
        
        try:
            response = requests.get(url, headers=headers, params=params)
            if response.status_code == 200:
                new_matches = response.json()
                match_ids.update(new_matches)
            else:
                print(f"Error fetching matches: {response.status_code}")
        except Exception as e:
            print(f"Error: {e}")
            
        time.sleep(0.05) # Rate limiting
        
    return list(match_ids)

# Fetch matches
# We ask for 20 matches per player. With 10 players, we might get up to 200 matches.
match_ids = get_match_ids(puuids, MASS_REGION, count=20)

print(f"Found {len(match_ids)} unique match IDs.")
print(f"Sample Match ID: {match_ids[0]}")

Fetching matches for 10 players...


100%|██████████| 10/10 [00:01<00:00,  6.01it/s]

Found 181 unique match IDs.
Sample Match ID: EUW1_7603432728





In [9]:
def get_match_details(match_ids, mass_region):
    """
    Downloads game details for a list of match IDs.
    Returns a list of dictionaries (one per match).
    """
    data = []
    print(f"Downloading details for {len(match_ids)} matches...")
    
    for match_id in tqdm(match_ids):
        url = f"https://{mass_region}.api.riotgames.com/lol/match/v5/matches/{match_id}"
        
        try:
            response = requests.get(url, headers=headers)
            if response.status_code == 200:
                match_data = response.json()
                info = match_data['info']
                
                # We only want Classic 5v5 games (queueId 420 is Ranked Solo)
                # Sometimes the API returns other modes if we aren't careful.
                if info['queueId'] != 420:
                    continue
                
                # Extract the 10 participants
                participants = info['participants']
                
                # Create a row for our dataset
                row = {}
                
                # Teams are split: First 5 are Blue (100), Next 5 are Red (200)
                # But we can just loop through them.
                for i, p in enumerate(participants):
                    team_prefix = "blue" if p['teamId'] == 100 else "red"
                    # We need to distinguish positions (TOP, JUNGLE, etc.) 
                    # But for now, let's just save them as p1..p5 for each team
                    # Ideally, we sort them by role later.
                    
                    role_key = f"{team_prefix}_player_{i%5 + 1}" # e.g. blue_player_1
                    row[role_key] = p['championName']
                
                # Who won?
                # We check the first team (Blue). If they won, Blue Win.
                # The API stores 'win': True/False in each participant or team object.
                blue_team = match_data['info']['teams'][0] # Team 100
                row['winner'] = 'blue' if blue_team['win'] else 'red'
                
                data.append(row)
                
            elif response.status_code == 429:
                print("Rate limit exceeded! Waiting 10s...")
                time.sleep(10)
            else:
                print(f"Error {response.status_code} for {match_id}")
                
        except Exception as e:
            print(f"Failed to parse match {match_id}: {e}")
            
        time.sleep(0.05) # Be nice to the API
        
    return data

# Run the download
match_data_list = get_match_details(match_ids, MASS_REGION)

print(f"Successfully downloaded {len(match_data_list)} valid matches.")
if len(match_data_list) > 0:
    print("Sample game:", match_data_list[0])

Downloading details for 181 matches...


 55%|█████▌    | 100/181 [00:24<00:19,  4.16it/s]

Rate limit exceeded! Waiting 10s...


 56%|█████▌    | 101/181 [00:34<04:18,  3.23s/it]

Rate limit exceeded! Waiting 10s...


 56%|█████▋    | 102/181 [00:44<06:58,  5.30s/it]

Rate limit exceeded! Waiting 10s...


 57%|█████▋    | 103/181 [00:54<08:46,  6.75s/it]

Rate limit exceeded! Waiting 10s...


 57%|█████▋    | 104/181 [01:04<09:59,  7.78s/it]

Rate limit exceeded! Waiting 10s...


 58%|█████▊    | 105/181 [01:14<10:44,  8.49s/it]

Rate limit exceeded! Waiting 10s...


 59%|█████▊    | 106/181 [01:24<11:13,  8.98s/it]

Rate limit exceeded! Waiting 10s...


 59%|█████▉    | 107/181 [01:35<11:31,  9.34s/it]

Rate limit exceeded! Waiting 10s...


 60%|█████▉    | 108/181 [01:45<11:39,  9.58s/it]

Rate limit exceeded! Waiting 10s...


 60%|██████    | 109/181 [01:55<11:44,  9.79s/it]

Rate limit exceeded! Waiting 10s...


100%|██████████| 181/181 [02:24<00:00,  1.26it/s]

Successfully downloaded 171 valid matches.
Sample game: {'blue_player_1': 'Jax', 'blue_player_2': 'Elise', 'blue_player_3': 'Akshan', 'blue_player_4': 'AurelionSol', 'blue_player_5': 'Maokai', 'red_player_1': 'Heimerdinger', 'red_player_2': 'Naafiri', 'red_player_3': 'Irelia', 'red_player_4': 'Ezreal', 'red_player_5': 'Fiora', 'winner': 'red'}





In [10]:
# Convert list of dictionaries to a Pandas DataFrame
df = pd.DataFrame(match_data_list)

# Display the first 5 rows to check if it looks right
print("First 5 rows of our dataset:")
display(df.head())

# Save to CSV
csv_filename = "league_matches_raw.csv"
df.to_csv(csv_filename, index=False)

print(f"Dataset saved to {csv_filename}")

First 5 rows of our dataset:


Unnamed: 0,blue_player_1,blue_player_2,blue_player_3,blue_player_4,blue_player_5,red_player_1,red_player_2,red_player_3,red_player_4,red_player_5,winner
0,Jax,Elise,Akshan,AurelionSol,Maokai,Heimerdinger,Naafiri,Irelia,Ezreal,Fiora,red
1,Teemo,Naafiri,Ryze,Yunara,Alistar,Kennen,Malphite,Zed,Kaisa,Leona,red
2,Kennen,Talon,Quinn,Kaisa,Pyke,Vayne,Qiyana,Ekko,Lucian,Rakan,blue
3,Ambessa,Nidalee,Irelia,Mel,Rell,KSante,Elise,Hwei,Smolder,Bard,blue
4,Shen,Viego,Syndra,Yunara,Nautilus,Ambessa,Zed,Zoe,Ashe,Bard,blue


Dataset saved to league_matches_raw.csv
