# The Study of Objective-based Gameplay: An Analysis On League of Legends' Objectives
**Name**: Kliment Ho

**Website Link**: https://klh005.github.io/2023-LoLObjectives-Report/

In [423]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score


import plotly.express as px
pd.options.plotting.backend = 'plotly'

# from dsc80_utils import * # Feel free to uncomment and use this.

## Step 1: Introduction

In [380]:
# Question: How impactful are objectives to a game of league?
'''
In a match of League of Legends, players are often influenced by the decision to take or "securing" objectives. For the sake of this project, we define "objectives" as
jungle monsters("creeps") that offer team-wide stat increase or gold. Notable objectives include Baron, Rift Herald, Drake, and Tower.
Hypothesis: I believe the first team with mid-tower first are the more likely team to win.

Relevant Columns in the Dataset
gameid: Unique identifier for each match.
side: The team affiliation (Blue/Red) of the players.
result: Indicates the match outcome for the team (1 for win, 0 for loss).
firstdragon, dragons, firstbaron, barons, towers: Variables representing key objectives secured during the match.
teamkills, teamdeaths: Team-level performance metrics.
This study will explore how these variables, particularly the objectives, correlate with the likelihood of a team winning a match, forming the basis for predictive modeling in later stages.

Potential Questions:
What is the most impactful objective in a winning game?
How impactful are objectives to a game of league? (selected)
Is there a significant increase of win rate to secure objectives?
Are objectives typically used to push a lead further or shorten the gap of the winner and loser?
'''

'\nIn a match of League of Legends, players are often influenced by the decision to take or "securing" objectives. For the sake of this project, we define "objectives" as\njungle monsters("creeps") that offer team-wide stat increase or gold. Notable objectives include Baron, Rift Herald, Drake, and Tower.\nHypothesis: I believe the first team with mid-tower first are the more likely team to win.\n\nRelevant Columns in the Dataset\ngameid: Unique identifier for each match.\nside: The team affiliation (Blue/Red) of the players.\nresult: Indicates the match outcome for the team (1 for win, 0 for loss).\nfirstdragon, dragons, firstbaron, barons, towers: Variables representing key objectives secured during the match.\nteamkills, teamdeaths: Team-level performance metrics.\nThis study will explore how these variables, particularly the objectives, correlate with the likelihood of a team winning a match, forming the basis for predictive modeling in later stages.\n\nPotential Questions:\nWhat i

## Step 2: Data Cleaning and Exploratory Data Analysis

In [381]:
'''
Data Cleaning:
Removed all unnecessary columns like any time-benchmark for team stats, non-relevant score statistics and any tournament related information aside from the bare minimum.
'''

'\nData Cleaning:\nRemoved all unnecessary columns like any time-benchmark for team stats, non-relevant score statistics and any tournament related information aside from the bare minimum.\n'

In [382]:
# Loading 2023 Match Data
filepath = Path('data') / '2023_LoL_esports_match_data_from_OraclesElixir.csv'
lol_stats = pd.read_csv(filepath)
print(lol_stats.describe())
print(lol_stats.info())


Columns (2) have mixed types.Specify dtype option on import or set low_memory=False.



                year       playoffs           game          patch  \
count  130764.000000  130764.000000  130764.000000  130644.000000   
mean     2023.034505       0.209874       1.649353      13.085001   
std         0.182523       0.407220       0.940911       0.056031   
min      2023.000000       0.000000       1.000000      13.010000   
25%      2023.000000       0.000000       1.000000      13.040000   
50%      2023.000000       0.000000       1.000000      13.100000   
75%      2023.000000       0.000000       2.000000      13.130000   
max      2024.000000       1.000000       5.000000      13.240000   

       participantid     gamelength         result          kills  \
count  130764.000000  130764.000000  130764.000000  130764.000000   
mean       29.583333    1877.762045       0.500000       4.712016   
std        57.650688     335.342468       0.500002       5.772584   
min         1.000000     201.000000       0.000000       0.000000   
25%         3.750000    1643.0000

In [383]:
# Display all columns
for i in lol_stats.columns:
    print(i)

# Filter relevant columns
filtered_df = lol_stats[lol_stats['champion'].isna()].loc[:, [
    "gameid", 
    'league',
    "game", 
    "patch", 
    "side", 
    "position", 
    'gamelength',
    "result", 
    "teamkills", 
    "teamdeaths",
    "team kpm",
    "totalgold",
    "minionkills"
    ] +
    lol_stats.loc[:1, "firstdragon":"opp_inhibitors"].columns.to_list()
    ]
filtered_df[filtered_df['void_grubs'].notna() & filtered_df['void_grubs'] > 0]['void_grubs'].head()

# prev_match = lol_stats.loc[lol_stats['gameid'] ==  'ESPORTSTMNT06_2753012'].to_html(classes='table table-striped', border=0, index=True)
# file_path = 'assets/firstmatch_data.html'
# with open(file_path, 'w') as f:
#     f.write(prev_match)

gameid
datacompleteness
url
league
year
split
playoffs
date
game
patch
participantid
side
position
playername
playerid
teamname
teamid
champion
ban1
ban2
ban3
ban4
ban5
pick1
pick2
pick3
pick4
pick5
gamelength
result
kills
deaths
assists
teamkills
teamdeaths
doublekills
triplekills
quadrakills
pentakills
firstblood
firstbloodkill
firstbloodassist
firstbloodvictim
team kpm
ckpm
firstdragon
dragons
opp_dragons
elementaldrakes
opp_elementaldrakes
infernals
mountains
clouds
oceans
chemtechs
hextechs
dragons (type unknown)
elders
opp_elders
firstherald
heralds
opp_heralds
void_grubs
opp_void_grubs
firstbaron
barons
opp_barons
firsttower
towers
opp_towers
firstmidtower
firsttothreetowers
turretplates
opp_turretplates
inhibitors
opp_inhibitors
damagetochampions
dpm
damageshare
damagetakenperminute
damagemitigatedperminute
wardsplaced
wpm
wardskilled
wcpm
controlwardsbought
visionscore
vspm
totalgold
earnedgold
earned gpm
earnedgoldshare
goldspent
gspd
gpr
total cs
minionkills
monsterkills
mon

Series([], Name: void_grubs, dtype: float64)

In [419]:
fig = px.histogram(filtered_df, x='towers', nbins=20, title='Distribution of Towers Destroyed by Each Team Across All Matches',
                   labels={'towers': 'Number of Towers Destroyed'})

# Layout made for dark mode
fig.update_layout(
    xaxis_title='Number of Towers Destroyed',
    yaxis_title='Teams',
    template='plotly_dark'
)
fig.write_html('assets/dist_towers.html', include_plotlyjs='cdn')
fig

In [420]:
fig = px.histogram(filtered_df[filtered_df['result'] == 1], x='towers', nbins=20, title='Distribution of Towers Destroyed by Winners',
                   labels={'towers': 'Winners Number of Towers Destroyed'})

# Customize the layout for better readability
fig.update_layout(
    xaxis_title='Number of Towers Destroyed',
    yaxis_title='Winners',
    template='plotly_dark'
)
fig.write_html('assets/win_dist_towers.html', include_plotlyjs='cdn')
fig

In [422]:
major_objectives = ['barons', 'dragons', 'heralds', 'void_grubs']
obj_count_df = filtered_df.copy()
# Sum the counts of each objective across rows
obj_count_df['major_objectives_count'] = obj_count_df[major_objectives].sum(axis=1)
# Display the new DataFrame
print(obj_count_df.head())

# Plot histogram based on distribution of major objectives.
fig = px.histogram(obj_count_df, x='major_objectives_count', nbins=20, 
                   title='Distribution of Major Objectives Count For Each Team Across All Matches',
                   labels={'major_objectives_count': 'Major Objectives Count'})

# Customize the layout for better readability
fig.update_layout(
    xaxis_title='Number of Major Objectives Taken',
    yaxis_title='Frequency',
    template='plotly_dark'
)

# Save the plot as an HTML file
fig.write_html('assets/major_objectives_distribution.html', include_plotlyjs='cdn')
fig

                   gameid league  game  patch  side position  gamelength  \
10  ESPORTSTMNT06_2753012   LFL2     1  13.01  Blue     team        2612   
11  ESPORTSTMNT06_2753012   LFL2     1  13.01   Red     team        2612   
22  ESPORTSTMNT06_2754023   LFL2     1  13.01  Blue     team        2436   
23  ESPORTSTMNT06_2754023   LFL2     1  13.01   Red     team        2436   
34  ESPORTSTMNT06_2755035   LFL2     1  13.01  Blue     team        1980   

    result  teamkills  teamdeaths  ...  firsttower  towers  opp_towers  \
10       1         13           7  ...         1.0    11.0         2.0   
11       0          7          13  ...         0.0     2.0        11.0   
22       0         20          16  ...         0.0     5.0        11.0   
23       1         16          20  ...         1.0    11.0         5.0   
34       1         20           7  ...         0.0     7.0         4.0   

    firstmidtower  firsttothreetowers  turretplates  opp_turretplates  \
10            1.0        

In [387]:
fig = px.histogram(obj_count_df[obj_count_df['result'] == 1], x='major_objectives_count', nbins=20, 
                   title='Distribution of Winners Major Objectives Count',
                   labels={'major_objectives_count': 'Major Objectives Count'})

# to dark mode
fig.update_layout(
    xaxis_title='Number of Major Objectives Taken',
    yaxis_title='Frequency',
    template='plotly_dark'
)

# Save the plot as an HTML file
fig.write_html('assets/win_major_objectives_distribution.html', include_plotlyjs='cdn')
fig

In [388]:
# Group matches so that each row represent each team that played per game

grouped_df = filtered_df.fillna(0).groupby(['gameid', 'side']).max().loc[:, 
['result', "teamkills", "teamdeaths"] + filtered_df.loc[:, 'firstdragon':'opp_inhibitors'].columns.to_list()]


In [389]:
#correlation matrix
correlation_matrix = grouped_df.corr()
relevant_correlations = correlation_matrix.loc[
    ['result', 'teamkills', 'teamdeaths'],
    grouped_df.columns[grouped_df.columns.str.startswith('first') | grouped_df.columns.str.startswith('opp')]
]

print(relevant_correlations)

            firstdragon  opp_dragons  opp_elementaldrakes  opp_elders  \
result         0.203462    -0.614756            -0.482659   -0.114585   
teamkills      0.180140    -0.426800            -0.324150   -0.005718   
teamdeaths    -0.150218     0.556212             0.453087    0.130965   

            firstherald  opp_heralds  opp_void_grubs  firstbaron  opp_barons  \
result         0.143287    -0.240959             NaN    0.569740   -0.638985   
teamkills      0.131432    -0.194477             NaN    0.462868   -0.356137   
teamdeaths    -0.101405     0.228897             NaN   -0.419880    0.556437   

            firsttower  opp_towers  firstmidtower  firsttothreetowers  \
result        0.339589   -0.889475       0.392519            0.499866   
teamkills     0.292062   -0.600457       0.350364            0.418675   
teamdeaths   -0.261857    0.716710      -0.320213           -0.388201   

            opp_turretplates  opp_inhibitors  
result             -0.276630       -0.753096  

In [390]:
#Relevant objectives
objectives = [
    'firstdragon', 'dragons', 'opp_dragons', 'elementaldrakes', 'opp_elementaldrakes',
    'infernals', 'mountains', 'clouds', 'oceans', 'chemtechs', 'hextechs',
    'dragons (type unknown)', 'elders', 'opp_elders', 'firstherald', 'heralds', 
    'opp_heralds', 'void_grubs', 'opp_void_grubs', 'firstbaron', 'barons', 
    'opp_barons', 'firsttower', 'towers', 'opp_towers', 'firstmidtower', 
    'firsttothreetowers', 'turretplates', 'opp_turretplates', 'inhibitors', 
    'opp_inhibitors'
]


In [391]:
# Group by the number of towers destroyed and calculate aggregates
by_towers_df = obj_count_df.groupby('towers').agg(
    num_wins=('result', 'sum'),
    total_games=('result', 'count'),
    win_loss_ratio=('result', lambda x: x.sum() / x.count()),
    avg_teamkills=('teamkills', 'mean'),
    avg_teamdeaths=('teamdeaths', 'mean'),
    avg_totalgold=('totalgold', 'mean')
).reset_index()

# Calculate additional columns if needed
by_towers_df['win_proportion'] = by_towers_df['num_wins'] / by_towers_df['total_games']
by_towers_df.head(10)


Unnamed: 0,towers,num_wins,total_games,win_loss_ratio,avg_teamkills,avg_teamdeaths,avg_totalgold,win_proportion
0,0.0,1,1143,0.000875,5.910761,19.391951,40251.934383,0.000875
1,1.0,0,1934,0.0,6.843847,19.066184,44142.608066,0.0
2,2.0,0,2423,0.0,7.600495,18.676847,48310.250929,0.0
3,3.0,0,2243,0.0,9.279091,19.043691,53022.637093,0.0
4,4.0,0,1283,0.0,10.738893,19.354638,57146.473889,0.0
5,5.0,13,811,0.01603,12.44143,19.437731,61844.991369,0.01603
6,6.0,108,587,0.183986,14.364566,17.775128,64555.148211,0.183986
7,7.0,933,1211,0.770438,17.161024,12.383154,61299.076796,0.770438
8,8.0,1780,1947,0.914227,18.561376,10.629173,61219.237288,0.914227
9,9.0,2992,3084,0.970169,18.998054,9.300908,61689.851816,0.970169


In [392]:
# Group by the number of major objectives taken and calculate aggregates
by_major_objectives_df = obj_count_df.groupby('major_objectives_count').agg(
    num_wins=('result', 'sum'),
    total_games=('result', 'count'),
    win_loss_ratio=('result', lambda x: x.sum() / x.count()),
    avg_teamkills=('teamkills', 'mean'),
    avg_teamdeaths=('teamdeaths', 'mean'),
    avg_totalgold=('totalgold', 'mean')
).reset_index()

# Calculate additional columns if needed
by_major_objectives_df['win_proportion'] = by_major_objectives_df['num_wins'] / by_major_objectives_df['total_games']
by_major_objectives_df


Unnamed: 0,major_objectives_count,num_wins,total_games,win_loss_ratio,avg_teamkills,avg_teamdeaths,avg_totalgold,win_proportion
0,0.0,3,1672,0.001794,6.000598,19.232057,42286.542464,0.001794
1,1.0,38,2546,0.014925,7.393166,19.060487,46408.57502,0.014925
2,2.0,317,2902,0.109235,9.430737,18.005169,51123.73501,0.109235
3,3.0,922,2827,0.326141,12.798373,15.945525,56176.487089,0.326141
4,4.0,1950,3113,0.626405,16.366849,12.850948,59249.806617,0.626405
5,5.0,2820,3428,0.822637,18.249708,10.808051,61393.306884,0.822637
6,6.0,2545,2848,0.89361,19.041081,9.999298,63323.331461,0.89361
7,7.0,1561,1670,0.934731,19.349701,9.84012,65617.610778,0.934731
8,8.0,595,631,0.942948,19.790808,10.866878,70340.63233,0.942948
9,9.0,120,130,0.923077,19.676923,11.730769,76984.476923,0.923077


In [393]:
# Specify the objectives to analyze
selected_objectives = ['barons', 'dragons', 'firstdragon', 'towers', 'void_grubs', 'elders', 'inhibitors']

objective_proportions = {}

# Calculate win/loss proportions for each selected objective
for obj in selected_objectives:
    win_loss_counts = obj_count_df.groupby([obj, 'result']).size().unstack(fill_value=0)
    win_loss_proportions = win_loss_counts.div(win_loss_counts.sum(axis=1), axis=0)
    
    # Add the objective proportions to the dictionary
    objective_proportions[obj] = win_loss_proportions

for obj in selected_objectives:
    proportions_df = objective_proportions[obj].reset_index()

    # Create a bar plot for each objective
    fig = px.bar(proportions_df, 
                 x=obj, 
                 y=[0, 1], 
                 title=f'Proportion of Wins/Losses for {obj.capitalize()}',
                 labels={obj: f'Number of {obj.capitalize()}', 'value': 'Proportion'},
                 barmode='stack')

    # Update layout for better readability
    fig.update_layout(
        yaxis=dict(tickformat=".0%"),
        template='plotly_dark'
        )
    fig.show()
    
    # Save the plot as an HTML file
    fig.write_html(f'assets/{obj.capitalize()}_prop_bar.html', include_plotlyjs='cdn')

In [394]:
win_proportions = {}

for obj in objectives:
    # Calculate the win proportion for each count of the objective
    win_loss_counts = grouped_df.groupby([obj, 'result']).size().unstack(fill_value=0)
    win_proportion = win_loss_counts[1] / (win_loss_counts[0] + win_loss_counts[1])
    
    # Filter out zero win proportions
    win_proportion = win_proportion[win_proportion > 0]
    win_proportions[obj] = win_proportion.reset_index(drop=True)  # Reset index to remove it

# Convert the win proportions to a DataFrame for easier plotting
win_proportions_df = pd.DataFrame(win_proportions)

import plotly.express as px

# Melt the DataFrame for plotting
win_proportions_melted = win_proportions_df.melt(var_name='Objective', 
                                                 value_name='Win Proportion')

# Create the line plot without zero points
fig = px.line(win_proportions_melted, 
              x=win_proportions_melted.index, 
              y='Win Proportion', 
              color='Objective', 
              title='Win Proportion by Number of Objectives Secured (Non-Zero)',
              labels={'index': 'Number of Objectives', 'Win Proportion': 'Win Proportion'})

fig.update_layout(yaxis=dict(tickformat=".0%"))
fig.show()



In [395]:
by_towers_df = obj_count_df.groupby('towers').agg(
    num_wins=('result', 'sum'),
    total_games=('result', 'count'),
    win_loss_ratio=('result', lambda x: x.sum() / x.count()),
    avg_teamkills=('teamkills', 'mean'),
    avg_teamdeaths=('teamdeaths', 'mean'),
    avg_totalgold=('totalgold', 'mean')
).reset_index()

# Calculate additional columns if needed
by_towers_df['win_proportion'] = by_towers_df['num_wins'] / by_towers_df['total_games']
by_towers_df


Unnamed: 0,towers,num_wins,total_games,win_loss_ratio,avg_teamkills,avg_teamdeaths,avg_totalgold,win_proportion
0,0.0,1,1143,0.000875,5.910761,19.391951,40251.934383,0.000875
1,1.0,0,1934,0.0,6.843847,19.066184,44142.608066,0.0
2,2.0,0,2423,0.0,7.600495,18.676847,48310.250929,0.0
3,3.0,0,2243,0.0,9.279091,19.043691,53022.637093,0.0
4,4.0,0,1283,0.0,10.738893,19.354638,57146.473889,0.0
5,5.0,13,811,0.01603,12.44143,19.437731,61844.991369,0.01603
6,6.0,108,587,0.183986,14.364566,17.775128,64555.148211,0.183986
7,7.0,933,1211,0.770438,17.161024,12.383154,61299.076796,0.770438
8,8.0,1780,1947,0.914227,18.561376,10.629173,61219.237288,0.914227
9,9.0,2992,3084,0.970169,18.998054,9.300908,61689.851816,0.970169


In [396]:
# Group by the number of major objectives taken and calculate aggregates
by_major_objectives_df = obj_count_df.groupby('major_objectives_count').agg(
    num_wins=('result', 'sum'),
    total_games=('result', 'count'),
    win_loss_ratio=('result', lambda x: x.sum() / x.count()),
    avg_teamkills=('teamkills', 'mean'),
    avg_teamdeaths=('teamdeaths', 'mean'),
    avg_totalgold=('totalgold', 'mean')
).reset_index()

# Calculate additional columns if needed
by_major_objectives_df['win_proportion'] = by_major_objectives_df['num_wins'] / by_major_objectives_df['total_games']
by_major_objectives_df

Unnamed: 0,major_objectives_count,num_wins,total_games,win_loss_ratio,avg_teamkills,avg_teamdeaths,avg_totalgold,win_proportion
0,0.0,3,1672,0.001794,6.000598,19.232057,42286.542464,0.001794
1,1.0,38,2546,0.014925,7.393166,19.060487,46408.57502,0.014925
2,2.0,317,2902,0.109235,9.430737,18.005169,51123.73501,0.109235
3,3.0,922,2827,0.326141,12.798373,15.945525,56176.487089,0.326141
4,4.0,1950,3113,0.626405,16.366849,12.850948,59249.806617,0.626405
5,5.0,2820,3428,0.822637,18.249708,10.808051,61393.306884,0.822637
6,6.0,2545,2848,0.89361,19.041081,9.999298,63323.331461,0.89361
7,7.0,1561,1670,0.934731,19.349701,9.84012,65617.610778,0.934731
8,8.0,595,631,0.942948,19.790808,10.866878,70340.63233,0.942948
9,9.0,120,130,0.923077,19.676923,11.730769,76984.476923,0.923077


In [397]:
# To html with DataFrame that has custom CSS to be easily readable in hacker
custom_css = """
<style>
    body {
        background-color: #2b2b2b;
        color: #f0f0f0;
    }
    table {
        width: 100%;
        border-collapse: collapse;
        margin: 25px 0;
        font-size: 1.2em;
        font-family: Arial, sans-serif;
        text-align: left;
        border-radius: 5px 5px 0 0;
        overflow: hidden;
    }
    thead tr {
        background-color: #333;
        color: #ffffff;
        text-align: left;
        font-weight: bold;
    }
    th, td {
        padding: 12px 15px;
        color: #f0f0f0;
    }
    tbody tr {
        border-bottom: 1px solid #444;
    }
    tbody tr:nth-of-type(even) {
        background-color: #3a3a3a;
    }
    tbody tr:nth-of-type(odd) {
        background-color: #2b2b2b;
    }
    tbody tr:hover {
        background-color: #555;
        color: #ffffff;
    }
    th, td {
        border: 1px solid #444;
    }
</style>
"""

towers_html = custom_css + by_towers_df.to_html(classes='table table-striped', border=0, index=True)
objectives_html = custom_css + by_major_objectives_df.to_html(classes='table table-striped', border=0, index=True)

with open('assets/by_towers_df.html', 'w') as f:
    f.write(towers_html)

with open('assets/by_major_objectives_df.html', 'w') as f:
    f.write(objectives_html)


In [398]:
# Interesting Aggregates
# Goal: Summarize and group data to understand collective behaviors or patterns.
# Actions:
# Calculate win/loss proportions for different counts of objectives (e.g., how often teams win when they secure 2 dragons vs. 3 dragons).
# Visualize these aggregates using bar plots to see how securing different numbers of objectives impacts win rates.

## Step 3: Assessment of Missingness

In [399]:
# For all objectives, all presently filled NaN values correlated to taking zero objectives. Therefore, we believe the columns (objectives) are all Not Missing At Random as there
# is no trends of missingness but rather dependent from team-to-team decision. Here is the original grouped DataFrame with the NaN values.

In [400]:
# Missingness Dependency: However, what about matches where NaN appears for the entire objective. We will show a new grouped df where it only includes matches that have
# objectives where it is completely NaN. As we can see, NaN values occur in the "firstdragon" column even when the "dragon" objective has a formal value. Here, we will
# investigate whether or not the "firstdragon" missingness depends on the "league" column as viewing the groupby aggregation with .max() on "league" shows NaN values.

In [401]:
# Add a column to indicate missingness for 'dragons'
nangrouped2_df = filtered_df.groupby(['league']).max().loc[:, 
filtered_df.loc[:, 'firstdragon':'opp_inhibitors'].columns.to_list()]
nangrouped2_df.head(10)

Unnamed: 0_level_0,firstdragon,dragons,opp_dragons,elementaldrakes,opp_elementaldrakes,infernals,mountains,clouds,oceans,chemtechs,...,opp_barons,firsttower,towers,opp_towers,firstmidtower,firsttothreetowers,turretplates,opp_turretplates,inhibitors,opp_inhibitors
league,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AL,1.0,6.0,6.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,...,3.0,1.0,11.0,11.0,1.0,1.0,12.0,12.0,7.0,7.0
CBLOL,1.0,5.0,5.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,...,4.0,1.0,11.0,11.0,1.0,1.0,11.0,11.0,6.0,6.0
CBLOLA,1.0,6.0,6.0,4.0,4.0,4.0,3.0,3.0,4.0,3.0,...,4.0,1.0,11.0,11.0,1.0,1.0,11.0,11.0,8.0,8.0
CDF,1.0,4.0,4.0,4.0,4.0,3.0,2.0,4.0,2.0,2.0,...,4.0,1.0,11.0,11.0,1.0,1.0,15.0,15.0,8.0,8.0
CT,1.0,6.0,6.0,4.0,4.0,2.0,3.0,3.0,3.0,3.0,...,3.0,1.0,11.0,11.0,1.0,1.0,10.0,10.0,4.0,4.0
DCup,,4.0,4.0,,,,,,,,...,3.0,,11.0,11.0,,,,,4.0,4.0
DDH,1.0,5.0,5.0,4.0,4.0,3.0,3.0,3.0,3.0,2.0,...,3.0,1.0,11.0,11.0,1.0,1.0,13.0,13.0,5.0,5.0
EBL,1.0,6.0,6.0,4.0,4.0,4.0,4.0,3.0,3.0,3.0,...,4.0,1.0,11.0,11.0,1.0,1.0,13.0,13.0,6.0,6.0
EL,1.0,6.0,6.0,4.0,4.0,2.0,2.0,3.0,3.0,3.0,...,4.0,1.0,11.0,11.0,1.0,1.0,12.0,12.0,5.0,5.0
EM,1.0,5.0,5.0,4.0,4.0,3.0,3.0,4.0,3.0,3.0,...,3.0,1.0,11.0,11.0,1.0,1.0,15.0,15.0,5.0,5.0


In [402]:
# FirstDragon Dependency on League
lol_stats_with_missing = filtered_df[['league', 'firstdragon']]
lol_stats_with_missing['firstdragon_missing'] = lol_stats_with_missing['firstdragon'].isna()

# Preview
lol_stats_with_missing.head(5)

league_distribution = lol_stats_with_missing.groupby(['league', 'firstdragon_missing']).size().unstack(fill_value=0)

# Normalize the distribution
league_distribution_norm = league_distribution.div(league_distribution.sum(axis=1), axis=0)
print(league_distribution_norm)


observed_tvd = league_distribution_norm.diff(axis=1).iloc[:, -1].abs().sum()

n_permutations = 1000
permuted_tvds = []

for _ in range(n_permutations):
    shuffled = lol_stats_with_missing['firstdragon_missing'].sample(frac=1).values
    permuted_counts = lol_stats_with_missing.assign(shuffled_missing=shuffled).groupby(['league', 'shuffled_missing']).size().unstack(fill_value=0)
    permuted_counts_norm = permuted_counts.div(permuted_counts.sum(axis=1), axis=0)
    permuted_tvd = permuted_counts_norm.diff(axis=1).iloc[:, -1].abs().sum()
    permuted_tvds.append(permuted_tvd)

# Calculate p-value
p_value = np.mean(np.array(permuted_tvds) >= observed_tvd)
print(f"Observed TVD: {observed_tvd}")
print(f"P-value: {p_value:.4f}")

# Show histogram
fig = px.histogram(permuted_tvds, nbins=50, title="Permutation Test: TVD between 'firstdragon' Missingness and 'league'",
                   labels={'value': 'TVD'}, marginal="box")
fig.add_vline(x=observed_tvd, line_width=3, line_dash="dash", line_color="lime", annotation_text="Observed TVD")
fig.update_layout(yaxis_title="Frequency", xaxis_title="TVD", template='plotly_dark')
fig.write_html('assets/firstdragon_league_tvd.html', include_plotlyjs='cdn')
fig




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



firstdragon_missing     False      True
league                                 
AL                   1.000000  0.000000
CBLOL                1.000000  0.000000
CBLOLA               1.000000  0.000000
CDF                  1.000000  0.000000
CT                   1.000000  0.000000
DCup                 0.000000  1.000000
DDH                  1.000000  0.000000
EBL                  1.000000  0.000000
EL                   1.000000  0.000000
EM                   1.000000  0.000000
EPL                  1.000000  0.000000
ESLOL                1.000000  0.000000
GL                   1.000000  0.000000
GLL                  1.000000  0.000000
HC                   1.000000  0.000000
HM                   1.000000  0.000000
IC                   1.000000  0.000000
LAS                  1.000000  0.000000
LCK                  1.000000  0.000000
LCKC                 1.000000  0.000000
LCO                  1.000000  0.000000
LCS                  1.000000  0.000000
LDL                  0.000000  1.000000


In [403]:
# Missingness of firstdragon does not depend on league column.

# However, we suspect that there was an underlying meaning. It turns out, a NaN firstdragon could simply mean that the game played
# does not have the dragon slained at all. Therefore, we shall test upon "firsttower"

# FirstDragon Dependency on Number of Wins
lol_stats_with_missing = filtered_df[['result', 'firstdragon']]
lol_stats_with_missing['firstdragon_missing'] = lol_stats_with_missing['firstdragon'].isna()

# Count the number of wins for missing and non-missing firstdragon
win_counts = lol_stats_with_missing.groupby('firstdragon_missing')['result'].sum()

observed_diff = win_counts.diff().iloc[-1]
n_permutations = 1000
permuted_diffs = []

for _ in range(n_permutations):
    shuffled = lol_stats_with_missing['firstdragon_missing'].sample(frac=1).values
    permuted_counts = lol_stats_with_missing.assign(shuffled_missing=shuffled).groupby('shuffled_missing')['result'].sum()
    permuted_diff = permuted_counts.diff().iloc[-1]
    permuted_diffs.append(permuted_diff)

p_value = np.mean(np.array(permuted_diffs) >= observed_diff)
print(f"Observed Difference: {observed_diff}")
print(f"P-value: {p_value:.4f}")

# Plotting the permutation test results
fig = px.histogram(permuted_diffs, nbins=50, title="Permutation Test: Difference in Wins between 'firstdragon' Missingness",
                   labels={'value': 'Difference in Wins'}, marginal="box")
fig.add_vline(x=observed_diff, line_width=3, line_dash="dash", line_color="lime", annotation_text="Observed Difference")
fig.update_layout(yaxis_title="Frequency", xaxis_title="Difference in Wins", template='plotly_dark')
fig.write_html('assets/firstdragon_result_diff.html', include_plotlyjs='cdn')
fig




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Observed Difference: -7567.0
P-value: 0.4980


In [404]:
# Permutation Test 3: Missingness of void_grubs vs. result using TVD
# VoidGrubs Dependency on Result
lol_stats_with_missing = filtered_df[['result', 'void_grubs']]

# Add a new column to indicate missingness for 'void_grubs'
lol_stats_with_missing['void_grubs_missing'] = lol_stats_with_missing['void_grubs'].isna()

# Calculate the distribution of result based on missingness
result_distribution = lol_stats_with_missing.groupby(['result', 'void_grubs_missing']).size().unstack(fill_value=0)

# Normalize the distribution
result_distribution_norm = result_distribution.div(result_distribution.sum(axis=1), axis=0)

# Calculate the observed TVD
observed_tvd = result_distribution_norm.diff(axis=1).iloc[:, -1].abs().sum()

# Permutation test
n_permutations = 1000
permuted_tvds = []

for _ in range(n_permutations):
    shuffled = lol_stats_with_missing['void_grubs_missing'].sample(frac=1).values
    permuted_counts = lol_stats_with_missing.assign(shuffled_missing=shuffled).groupby(['result', 'shuffled_missing']).size().unstack(fill_value=0)
    permuted_counts_norm = permuted_counts.div(permuted_counts.sum(axis=1), axis=0)
    permuted_tvd = permuted_counts_norm.diff(axis=1).iloc[:, -1].abs().sum()
    permuted_tvds.append(permuted_tvd)

# Calculate p-value
p_value = np.mean(np.array(permuted_tvds) >= observed_tvd)
print(f"Observed TVD: {observed_tvd}")
print(f"P-value: {p_value:.4f}")

# Plotting the permutation test results
fig = px.histogram(permuted_tvds, nbins=50, title="Permutation Test: TVD between 'void_grubs' Missingness and 'result'",
                   labels={'value': 'TVD'}, marginal="box")
fig.add_vline(x=observed_tvd, line_width=3, line_dash="dash", line_color="lime", annotation_text="Observed TVD")
fig.update_layout(yaxis_title="Frequency", xaxis_title="TVD", template='plotly_dark')
fig.write_html('assets/void_grubs_result_tvd.html', include_plotlyjs='cdn')
fig.show()




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Observed TVD: 1.9941268238964853
P-value: 0.6170


## Step 4: Hypothesis Testing

In [405]:
'''
Null Hypothesis (H₀): Securing a specific objective (e.g., first dragon, first tower, or baron) does not significantly affect the probability of winning the game.

Alternative Hypothesis (H₁): Securing a specific objective (e.g., first dragon, first tower, or baron) significantly increases the probability of winning the game.

Test Statistics: We will use the absolute difference in win proportions between teams that secured the objective and those that did not.

Significance Level: 5%
'''

'\nNull Hypothesis (H₀): Securing a specific objective (e.g., first dragon, first tower, or baron) does not significantly affect the probability of winning the game.\n\nAlternative Hypothesis (H₁): Securing a specific objective (e.g., first dragon, first tower, or baron) significantly increases the probability of winning the game.\n\nTest Statistics: We will use the absolute difference in win proportions between teams that secured the objective and those that did not.\n\nSignificance Level: 5%\n'

In [406]:
# ALTER BELOW TO SELECT COLUMN
objective_count = 'major_objectives_count'

# Calculate the observed difference in win proportions
win_proportions = obj_count_df.groupby(objective_count)['result'].mean()
observed_diff = win_proportions.diff().iloc[-1]

# Permutation test
n_permutations = 1000
permuted_diffs = []

for _ in range(n_permutations):
    shuffled = obj_count_df[objective_count].sample(frac=1).reset_index(drop=True)
    permuted_win_proportions = obj_count_df.assign(shuffled=shuffled).groupby('shuffled')['result'].mean()
    permuted_diff = permuted_win_proportions.diff().iloc[-1]
    permuted_diffs.append(permuted_diff)

# Calculate p-value
p_value = np.mean(np.array(permuted_diffs) >= observed_diff)
print(f"Observed Difference: {observed_diff}")
print(f"P-value: {p_value:.4f}")

# Plotting the permutation test results
fig = px.histogram(permuted_diffs, nbins=50, title="Permutation Test: Impact of Major Objectives Count on Winning",
                   labels={'value': 'Difference in Win Proportions'}, marginal="box")
fig.add_vline(x=observed_diff, line_width=3, line_dash="dash", line_color="lime", annotation_text="Observed Difference")
fig.update_layout(yaxis_title="Frequency", xaxis_title="Difference in Win Proportions", template='plotly_dark')
fig.write_html('assets/major_objectives_win_diff.html', include_plotlyjs='cdn')
fig.show()

Observed Difference: 0.045454545454545414
P-value: 0.4340


## Step 5: Framing a Prediction Problem

Can we predict the losing team's total gold based on knowing what objectives the winning team captured?

In [407]:
selected_columns = [
    'firstdragon',
    'elementaldrakes',
    'elders',
    'heralds',
    'barons',
    'firsttower',
    'towers',
    'inhibitors'
]

# Creating a filtered DataFrame with only the relevant columns
filtered_df_prediction = filtered_df[selected_columns]

html_string = custom_css + filtered_df_prediction.head(10).to_html(classes='table table-striped', border=0, index=False)

# Save the HTML to a file
with open('assets/filtered_df_prediction.html', 'w') as f:
    f.write(html_string)

## Step 6: Baseline Model

In [408]:
totalgold_median = filtered_df[filtered_df['result'] == 1]['totalgold']
totalgold_median.median()

60577.0

In [409]:
#Filtering two different datasets based on winner or loser
winner_df = filtered_df[filtered_df['result'] == 1][['gameid', 'gamelength', 'totalgold', 'firstdragon', 'dragons']]
loser_df = filtered_df[filtered_df['result'] == 0][['gameid', 'totalgold', 'firstdragon', 'dragons']]

#Merging the winner and loser results into one row to make it easier to calculate on each row.
merged_df = winner_df.merge(loser_df, on='gameid', suffixes=('_winner', '_loser'))

# HTML conversion for visual
html_string = custom_css + merged_df.head(10).to_html(classes='table table-striped', border=0, index=False)

html_file_path = 'assets/merged_df.html'
with open(html_file_path, 'w') as f:
    f.write(html_string)

# Training on two features: firstdragon and dragons on the winner side
X = merged_df[['gamelength', 'firstdragon_winner', 'dragons_winner']]
y = merged_df['totalgold_loser'] 

# Imputation to replace all NaN as zero. After visualizing the DataFrame, NaN values represent that the condition is not cleared(ex. first dragon not taken)
X = X.fillna(0)
y = y.fillna(0)

# Formal training operation by splitting 80-20, no hyperparameter tuning
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=80)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('regressor', RandomForestRegressor(random_state=80))
])

# Fitting and predicting
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

# Evaluate the MSE and R^2 scores
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse:.4f}')
print(f'R2: {r2:.4f}')

MSE: 11626774.7990
R2: 0.9138


## Step 7: Final Model

In [415]:
fin_winner_df = filtered_df[filtered_df['result'] == 1][['gameid', 'gamelength', 'totalgold', 'firstdragon', 'dragons', 'barons', 'towers', 'teamkills', 'teamdeaths', 'minionkills']]
fin_loser_df = filtered_df[filtered_df['result'] == 0][['gameid', 'totalgold', 'firstdragon', 'dragons', 'barons', 'towers', 'teamkills', 'teamdeaths', 'minionkills']]

# Merge the two DataFrames and create final_merged_df based on each game played (merged on gameid)
final_merged_df = fin_winner_df.merge(fin_loser_df, on='gameid', suffixes=('_winner', '_loser'))

X = final_merged_df[['gamelength', 'totalgold_winner', 'firstdragon_winner', 'dragons_winner', 
                     'barons_winner', 'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 'minionkills_winner']]
y = final_merged_df['totalgold_loser']  # Target: opponent's total gold

# Impute missing values with 0
X = X.fillna(0)

# Split the data into training and test sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=80) # State 80 for DSC80

# Pipeline with Random Forest
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Scale the features
    ('regressor', RandomForestRegressor(random_state=80))
])

# ATTEMPTED HYPERPARAMETER TUNING BUT ONLY HAD MARGINAL IMPROVEMENT
# param_grid = {
#     'regressor__n_estimators': [100, 200, 300],        # Number of trees in the forest
#     'regressor__max_depth': [10, 20, 30, None],        # Maximum depth of the trees
#     'regressor__min_samples_split': [2, 5, 10],        # Minimum number of samples required to split a node
#     'regressor__min_samples_leaf': [1, 2, 4]           # Minimum number of samples required at each leaf node
# }
# grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1, verbose=2)
# grid_search.fit(X_train, y_train)

# best_model = grid_search.best_estimator_
# y_pred = best_model.predict(X_test)

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse:.4f}')
print(f'R2: {r2:.4f}')

MSE: 4591782.9952
R2: 0.9659


In [436]:
df = pd.DataFrame({
    'y_test': y_test,
    'y_pred': y_pred
})

# Scatter plot of Actual vs Predicted
scatter_fig = px.scatter(df, x='y_test', y='y_pred', 
                         title='Actual vs Predicted Total Gold (Loser)',
                         labels={'y_test': 'Actual Total Gold (Loser)', 
                                 'y_pred': 'Predicted Total Gold (Loser)'})

# Add the perfect prediction line (diagonal)
scatter_fig.add_shape(type="line", x0=df['y_test'].min(), y0=df['y_test'].min(), 
                      x1=df['y_test'].max(), y1=df['y_test'].max(),
                      line=dict(color="Red", dash="dash"))
scatter_fig.update_layout(template='plotly_dark')

# Save the scatter plot as an HTML file
scatter_fig.write_html("assets/scatter_actual_vs_predicted.html")

# Show the plot
scatter_fig.show()

df['residuals'] = df['y_test'] - df['y_pred']

# Residual plot
residual_fig = px.histogram(df, x='residuals', nbins=50, title='Residual Plot',
                            labels={'residuals': 'Residuals (Actual - Predicted)'})
residual_fig.add_shape(type="line", x0=0, y0=0, x1=0, y1=1,
                       line=dict(color="Green", dash="dash"),
                       xref="x", yref="paper")

residual_fig.update_layout(template='plotly_dark')
residual_fig.write_html("assets/residual_plot.html")

# Show the plot
residual_fig.show()

## Step 8: Fairness Analysis

In [411]:
# TODO

In [416]:
# New column teamkill to gamelength ratio used to find aggression
final_merged_df['teamkills_ratio'] = final_merged_df['teamkills_winner'] / final_merged_df['gamelength']
median_ratio = final_merged_df['teamkills_ratio'].median()
print(f'Median teamkills_ratio: {median_ratio}')

# Divide the data into two groups based on the newly made column (approx teamkills_ratio = 0.01)
group_X = final_merged_df[final_merged_df['teamkills_ratio'] <= final_merged_df['teamkills_ratio'].median()]
group_Y = final_merged_df[final_merged_df['teamkills_ratio'] > final_merged_df['teamkills_ratio'].median()]

# Using identical features of final training
X_X = group_X[['gamelength', 'totalgold_winner', 'firstdragon_winner', 'dragons_winner', 'barons_winner', 
               'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 'minionkills_winner']].fillna(0)
y_X = group_X['totalgold_loser']

X_Y = group_Y[['gamelength', 'totalgold_winner', 'firstdragon_winner', 'dragons_winner', 'barons_winner', 
               'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 'minionkills_winner']].fillna(0)
y_Y = group_Y['totalgold_loser']

# Predict, then calculate MSE based on each other
y_pred_X = pipeline.predict(X_X)
y_pred_Y = pipeline.predict(X_Y)
mse_X = mean_squared_error(y_X, y_pred_X)
mse_Y = mean_squared_error(y_Y, y_pred_Y)
observed_diff = mse_Y - mse_X

print(f"Observed MSE difference: {observed_diff}")

# CHANGE HERE TO RUN SHORTER PERMUTATIONS
n_permutations = 1000
permuted_diffs = []



for _ in range(n_permutations):
    # Shuffle the labels between group_X and group_Y
    combined_data = final_merged_df[['teamkills_ratio', 'gamelength', 'totalgold_winner', 'firstdragon_winner', 
                                     'dragons_winner', 'barons_winner', 'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 
                                     'minionkills_winner', 'totalgold_loser']].sample(frac=1).reset_index(drop=True)

    # Re-split based on the shuffled labels
    perm_group_X = combined_data[combined_data['teamkills_ratio'] <= combined_data['teamkills_ratio'].median()]
    perm_group_Y = combined_data[combined_data['teamkills_ratio'] > combined_data['teamkills_ratio'].median()]

    # Re-separate features and target
    perm_X_X = perm_group_X[['gamelength', 'totalgold_winner', 'firstdragon_winner', 'dragons_winner', 'barons_winner', 
                             'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 'minionkills_winner']].fillna(0)
    perm_y_X = perm_group_X['totalgold_loser']

    perm_X_Y = perm_group_Y[['gamelength', 'totalgold_winner', 'firstdragon_winner', 'dragons_winner', 'barons_winner', 
                             'towers_winner', 'teamkills_winner', 'teamdeaths_winner', 'minionkills_winner']].fillna(0)
    perm_y_Y = perm_group_Y['totalgold_loser']

    # Predict for both permuted groups
    perm_y_pred_X = pipeline.predict(perm_X_X)
    perm_y_pred_Y = pipeline.predict(perm_X_Y)

    # Calculate the MSE for permuted groups
    perm_mse_X = mean_squared_error(perm_y_X, perm_y_pred_X)
    perm_mse_Y = mean_squared_error(perm_y_Y, perm_y_pred_Y)

    # Calculate the difference in MSE
    perm_diff = perm_mse_Y - perm_mse_X
    permuted_diffs.append(perm_diff)

# Calculating p-value
p_value = np.mean(np.abs(permuted_diffs) >= np.abs(observed_diff))
print(f"P-value: {p_value}")


Median teamkills_ratio: 0.010014306151645207
Observed MSE difference: -342769.32801399636
P-value: 0.776


In [413]:
#We fail to reject the null hypothesis at the 5% significance level.

'''
Objective of Fairness Analysis
The goal of this fairness analysis is to evaluate whether the final model performs differently 
for teams that secured a specific in-game objective compared to those that did not. 
Specifically, we want to see if the model's accuracy is consistent across these groups, ensuring that 
no team is disadvantaged based on their in-game performance in securing objectives.

Groups for Comparison
Group X: Teams that secured the first dragon (firstdragon = 1).
Group Y: Teams that did not secure the first dragon (firstdragon = 0).

Null and Alternative Hypotheses
Null Hypothesis (H₀): The model's accuracy is the same for both groups (teams that secured the first dragon and those that didn't), and any observed difference is due to random chance.
Alternative Hypothesis (H₁): The model's accuracy is different for these groups, suggesting potential bias.

Procedure
Model Accuracy Calculation: The accuracy was calculated separately for Group X and Group Y.
Permutation Test: A permutation test was conducted to determine if the observed difference in accuracy is statistically significant.

Results
Observed Difference in Accuracy: The difference in model accuracy between Group X and Group Y was found to be 0.0267.
Permutation Test P-value: The p-value from the permutation test was 0.0530.

Conclusion
Given the p-value of 0.0530, which is slightly above the 0.05 threshold, 
we fail to reject the null hypothesis. This indicates that there is no strong evidence of bias in the model's 
performance between teams that secured the first dragon and those that did not. However, the result is borderline, 
suggesting a potential area for further investigation to ensure fairness.
'''

"\nObjective of Fairness Analysis\nThe goal of this fairness analysis is to evaluate whether the final model performs differently \nfor teams that secured a specific in-game objective compared to those that did not. \nSpecifically, we want to see if the model's accuracy is consistent across these groups, ensuring that \nno team is disadvantaged based on their in-game performance in securing objectives.\n\nGroups for Comparison\nGroup X: Teams that secured the first dragon (firstdragon = 1).\nGroup Y: Teams that did not secure the first dragon (firstdragon = 0).\n\nNull and Alternative Hypotheses\nNull Hypothesis (H₀): The model's accuracy is the same for both groups (teams that secured the first dragon and those that didn't), and any observed difference is due to random chance.\nAlternative Hypothesis (H₁): The model's accuracy is different for these groups, suggesting potential bias.\n\nProcedure\nModel Accuracy Calculation: The accuracy was calculated separately for Group X and Group