## Problem Statement and Background

The main domain that I will be exploring is League of Legends Esports, which is a competitive online game that is played by teams worldwide for money and prizes. Over the past decade, League of Legends has exploded in popularity, with their international tournament, Worlds, [grossing nearly 7 million concurrent viewers in 2024](https://escharts.com/news/2024-league-legends-worlds-record). League of Legends is distinct from other cognitive sports like poker or chess in that it features a “drafting phase” prior to the start of games, which involves players picking which champions they will play in the game itself. Deciding which champions to pick is an incredibly complex process, as there are over 170 to choose from, and all have complex interactions that may make them more or less favorable as the draft progresses. Gaining a better understanding of draft is relevant to not only the teams themselves, but also sports betters and analysts: [In 2023, esports betting was valued at $2.5 billion](https://www.skyquestt.com/report/esports-betting-market), and [the prize pools for Worlds 2025 alone will be $5 million](https://esportsinsider.com/2025/03/league-of-legends-lol-world-championship-2025-prize-pool).

For this reason, the main question that I am interested in is, “Can machine learning models accurately predict the outcomes of professional League of Legends games using historical game data and pre-game statistics?”.

Game outcome in this instance could be measured a number of different ways: Primarily, my starting point will be whether the game is won or lost, which seems to be the most straightforward to predict. Other potential outputs to explore could be a composite index of team or individual player performance, including objectives taken, damage dealt, gold accumulated, and kills/deaths/assists.

The main pre-game factors explored will be the teams, the draft, and the date, although these alone could be augmented into a number of other statistics, such as the team's head to head win rate or the draft's individual champion's win rates.

## Dataset

The dataset used is the Leagueopedia API, which is a third party API maintained to support the Leagueopedia, a wiki for League of Legends Esports. Although it is not an official source, the API is well-maintained with very little missing attributes, especially for the professional tiers of competition where I will be pulling most of my data. The main challenge in working with this dataset is not the lack of data, but rather accessing it - The data is stored in a highly normalized format across nearly 100 different tables, requiring several joins to get the desired information. In addition, the SQL client available for accessing it doesn't allow subqueries, imposes a limit of 500 rows returned per request, and has a rate limit. Because of this, querying data can be a long process, and cached when possible 

In [None]:
from mwrogue.esports_client import EsportsClient
import datetime as dt
import time

from cached_cargo_client import CachedCargoClient

site = EsportsClient("lol")
cached_client = CachedCargoClient(site.cargo_client)


def fetch_match_ids(leagues, from_date):
    tournament_filter = ", ".join(f"'{t}'" for t in leagues)

    matches = cached_client.query(
        tables="Tournaments=T, MatchSchedule=MS",
        join_on="T.OverviewPage = MS.OverviewPage",
        fields=("MS.MatchId"),
        where=(
            f"((T.League IN ({tournament_filter})) OR T.Region = 'International')"
            f" AND T.DateStart >= '{from_date}'"
        ),
        order_by="T.DateStart DESC, MS.DateTime_UTC ASC",
    )

    return [match["MatchId"] for match in matches]


def fetch_games(match_ids, batch_size=500):
    all_games = []
    for i in range(0, len(match_ids), batch_size):
        batch = match_ids[i : i + batch_size]
        batch_str = ",".join(f'"{mid}"' for mid in batch)

        print(batch_str)
        response = cached_client.query(
            tables="MatchScheduleGame=MSG, ScoreboardGames=SG",
            fields="MSG.Blue, MSG.Red, SG.WinTeam, SG.Team1Score, SG.Team2Score, SG.Team1Bans, SG.Team2Bans, SG.Team1Picks, SG.Team2Picks, SG.Patch",
            join_on="MSG.GameId = SG.GameId",
            where=f"MSG.MatchId IN ({batch_str})",
        )
        all_games.extend(response)
        time.sleep(1)

    return all_games


major_leagues = [
    "LoL Champions Korea",
    "League of Legends Championship of The Americas North",
    "Tencent LoL Pro League",
    "LoL EMEA Championship",
    "League of Legends Championship Series",
]
today = dt.date.today()
two_years_ago = today.replace(year=today.year - 3)
match_ids = fetch_match_ids(major_leagues, two_years_ago)
games = fetch_games(match_ids)
print(len(games))

"None","2025 Season World Championship_Play-In_1","2025 Season World Championship_Round 1_1","2025 Season World Championship_Round 1_2","2025 Season World Championship_Round 1_3","2025 Season World Championship_Round 1_4","2025 Season World Championship_Round 1_5","2025 Season World Championship_Round 1_6","2025 Season World Championship_Round 1_7","2025 Season World Championship_Round 1_8","2025 Season World Championship_Round 2_1","2025 Season World Championship_Round 2_2","2025 Season World Championship_Round 2_3","2025 Season World Championship_Round 2_4","2025 Season World Championship_Round 2_5","2025 Season World Championship_Round 2_6","2025 Season World Championship_Round 2_7","2025 Season World Championship_Round 2_8","2025 Season World Championship_Round 3_1","2025 Season World Championship_Round 3_2","2025 Season World Championship_Round 3_3","2025 Season World Championship_Round 3_4","2025 Season World Championship_Round 3_5","2025 Season World Championship_Round 3_6","202