# Kaggle League of Legends competition - ML Models

## Team: Elden Ring

<img src="https://eldenring.wiki.fextralife.com/file/Elden-Ring/mirel_pastor_of_vow.jpg" alt="PRAISE DOG" style="width:806px;height:600px;"/>

#### PRAISE THE DOG!

## How to Win at League of Legends?

### Uninstall LoL and [install Dota 2](https://store.steampowered.com/app/570/Dota_2/), EZ.

<img src = "https://static.wikia.nocookie.net/dota2_gamepedia/images/7/78/Keyart_phoenix.jpg/revision/latest/" alt="SKREE CAW CAW IM A BIRD" style="width:800px;height:497px;">

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, KFold

In [2]:
X_train = pd.read_csv('../data/participants_train.csv')
X_test = pd.read_csv('../data/participants_test.csv')
y_train = pd.read_csv('../data/train_winners.csv')

champion_mastery = pd.read_csv('../data/champion_mastery.csv')
champion = pd.read_json('../data/champion.json')

In [3]:
# to be used later to measure the accuracy!
kfold = KFold(n_splits = 10, shuffle = True, random_state = 42)

## Formulating Sample Submission as LogReg

In order to replicate it, I will take the max from each SummonerLevel per match & teamId combo.

In [4]:
# first copy the original data to not accidentally change it
X_train_modified = X_train

# find the max Summoner Level per each team
X_train_modified = X_train_modified.groupby(['matchId', 'teamId'])[['summonerLevel']].max().reset_index()

# mark them as positive (first team) or negative (second team), to compare the values
X_train_modified['team_summonerLevel'] = np.where(X_train_modified['teamId'] == 200,
                                                  -1* X_train_modified['summonerLevel'],
                                                  X_train_modified['summonerLevel'])

# finally, see which team has max summoner level (by adding the + and - from before)
X_train_modified = X_train_modified.groupby('matchId')[['team_summonerLevel']].sum().reset_index(drop = True)

In [5]:
logreg = LogisticRegression().fit(X_train_modified, y_train['winner'])

In [6]:
print(logreg.intercept_)
print(logreg.coef_)
print(- logreg.intercept_[0] / logreg.coef_[0])

[-0.03635984]
[[-9.10448626e-05]]
[-399.36181513]


In [7]:
y_predictions = logreg.predict(X_train_modified)
y_predictions

array([100, 100, 100, ..., 100, 200, 100])

In [8]:
accuracy_score(
    y_true = y_train['winner'],
    y_pred = y_predictions
)

0.50925

> NOTE: successfully replicated, with slightly better prediction, 50.9% with respect to 50.4%

In [9]:
base_cv_scores = cross_val_score(
    estimator = logreg,
    X = X_train_modified,
    y = y_train['winner'],
    cv = kfold
)

print(base_cv_scores)
print(np.mean(base_cv_scores))

[0.5     0.50125 0.5     0.47375 0.5225  0.51125 0.5     0.50375 0.52875
 0.505  ]
0.504625


## Now LogReg with Champion mastery as well

In [10]:
X_train_mastery = pd.merge(X_train, champion_mastery, how='left', on=['summonerId', 'championId']).fillna(0)

> NOTE: found out champion lvl is actually more indicative

In [11]:
X_train_mastery = X_train_mastery.groupby(['matchId', 'teamId'])[['summonerLevel', 'championLevel']].agg({'summonerLevel': 'max', 'championLevel': 'sum'}).reset_index()

X_train_mastery['team_summonerLevel'] = np.where(X_train_mastery['teamId'] == 200,
                                                  -1* X_train_mastery['summonerLevel'],
                                                  X_train_mastery['summonerLevel'])

X_train_mastery['team_championLevel'] = np.where(X_train_mastery['teamId'] == 200,
                                                  -1* X_train_mastery['championLevel'],
                                                  X_train_mastery['championLevel'])

X_train_mastery = X_train_mastery.groupby('matchId')[['team_summonerLevel', 'team_championLevel']].sum().reset_index(drop = True)

In [12]:
logreg_mastery = LogisticRegression().fit(X_train_mastery, y_train['winner'])

In [13]:
accuracy_score(
    y_true = y_train['winner'],
    y_pred = logreg_mastery.predict(X_train_mastery)
)

0.5525

In [14]:
mastery_cv_scores = cross_val_score(
    estimator = logreg_mastery,
    X = X_train_mastery,
    y = y_train['winner'],
    cv = kfold
)

print(mastery_cv_scores)
print(np.mean(mastery_cv_scores))

[0.56625 0.57125 0.5325  0.545   0.545   0.55375 0.5475  0.53875 0.5525
 0.55875]
0.5511250000000001


In [15]:
print(classification_report(y_train['winner'], logreg_mastery.predict(X_train_mastery)))

              precision    recall  f1-score   support

         100       0.56      0.60      0.58      4071
         200       0.55      0.51      0.53      3929

    accuracy                           0.55      8000
   macro avg       0.55      0.55      0.55      8000
weighted avg       0.55      0.55      0.55      8000



In [16]:
print(confusion_matrix(y_train['winner'], logreg_mastery.predict(X_train_mastery)))

[[2425 1646]
 [1934 1995]]


## Looking into other Champion info

In [17]:
champion = pd.json_normalize(champion['data'])
champion['key'] = champion['key'].astype(int)

In [18]:
X_train_mastery_champion = pd.merge(X_train, champion_mastery, how='left', on=['summonerId', 'championId']).fillna(0)
X_train_mastery_champion = pd.merge(X_train_mastery_champion, champion, how='inner', left_on='championId', right_on='key')

X_train_mastery_champion = X_train_mastery_champion.sort_values(['matchId', 'participantId'], ascending = [True, True]).reset_index(drop=True)

In [19]:
X_train_mastery_champion = (
    X_train_mastery_champion
    .groupby(['matchId', 'teamId'])[['championLevel', 'info.attack', 'info.defense', 'info.magic', 'info.difficulty']]
    .agg({'summonerLevel': 'max',
          'championLevel': 'sum',
          'info.attack': 'sum',
          'info.defense': 'sum',
          'info.magic': 'sum',
          'info.difficulty': 'sum'}).reset_index()
)

X_train_mastery_champion['team_summonerLevel'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['summonerLevel'],
                                                  X_train_mastery_champion['summonerLevel'])

X_train_mastery_champion['team_championLevel'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['championLevel'],
                                                  X_train_mastery_champion['championLevel'])

X_train_mastery_champion['team_attack'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['info.attack'],
                                                  X_train_mastery_champion['info.attack'])

X_train_mastery_champion['team_defense'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['info.defense'],
                                                  X_train_mastery_champion['info.defense'])

X_train_mastery_champion['team_magic'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['info.magic'],
                                                  X_train_mastery_champion['info.magic'])

X_train_mastery_champion['team_difficulty'] = np.where(X_train_mastery_champion['teamId'] == 200,
                                                  -1* X_train_mastery_champion['info.difficulty'],
                                                  X_train_mastery_champion['info.difficulty'])

X_train_mastery_champion = X_train_mastery_champion.groupby('matchId')[['team_summonerLevel', 'team_championLevel', 'team_attack', 'team_defense', 'team_magic', 'team_difficulty']].sum().reset_index(drop = True)

KeyError: "Column(s) ['summonerLevel'] do not exist"

In [None]:
logreg_mastery_champion = LogisticRegression().fit(X_train_mastery_champion, y_train['winner'])

In [None]:
accuracy_score(
    y_true = y_train['winner'],
    y_pred = logreg_mastery_champion.predict(X_train_mastery_champion)
)

In [None]:
mastery_champion_cv_scores = cross_val_score(
    estimator = logreg_mastery_champion,
    X = X_train_mastery_champion,
    y = y_train['winner'],
    cv = kfold
)

print(mastery_champion_cv_scores)
print(np.mean(mastery_champion_cv_scores))

## Another idea: looking at individual players summoner

so instead of taking the max, trating each as an individual variable

In [None]:
X_train_summoner = X_train.pivot_table(values='summonerLevel', index='matchId', columns='participantId').reset_index(drop=True)

In [None]:
X_train_summoner['pos1_6'] = X_train_summoner[1] - X_train_summoner[6]
X_train_summoner['pos2_7'] = X_train_summoner[2] - X_train_summoner[7]
X_train_summoner['pos3_8'] = X_train_summoner[3] - X_train_summoner[8]
X_train_summoner['pos4_9'] = X_train_summoner[4] - X_train_summoner[9]
X_train_summoner['pos5_10'] = X_train_summoner[5] - X_train_summoner[10]

In [None]:
X_train_summoner = X_train_summoner[['pos1_6', 'pos2_7', 'pos3_8', 'pos4_9', 'pos5_10']]

In [None]:
logreg_summoner = LogisticRegression().fit(X_train_summoner, y_train['winner'])

In [None]:
accuracy_score(
    y_true = y_train['winner'],
    y_pred = logreg_summoner.predict(X_train_summoner)
)

In [None]:
summoner_cv_scores = cross_val_score(
    estimator = logreg_summoner,
    X = X_train_summoner,
    y = y_train['winner'],
    cv = kfold
)

print(summoner_cv_scores)
print(np.mean(summoner_cv_scores))

looks like accounting for individual summoner's levels does not matter.