## Machine Learning Analysis of League of Legends Drafts


### Introduction
In the popular video game League of Legends, teams of five face off against each other. Before the game begins, each team picks five unique champions to play. This collection of champions is known as a particular game's draft. 
\
Each game is also played on a particular patch, a version of the game following biweekly balance changes. Patches through the end of 2024 follow the format `season.patch`, so patch 14.9 would be the ninth patch of the 14th season. 

This project aims to look at the drafts of professional League of Legends games and use machine learning to train models that are able to predict the winner of a game given each team's draft. 
\
The results of different models will be compared to each other in order to see what parameters might lead to more accurate results.



### Data

The data for this project was obtained from [Oracle's Elixir](https://oracleselixir.com/tools/downloads).

### Model 1
The model for this project was built using PyTorch and encoded through one-hot encoding for the patches and multi-hot encoding for the drafts.

### Setting Up the Data
Since the data currently contains a lot more information than necessary, it must be cleaned in order to present the model with something that is usable in an efficient manner.

**Packages**

In [30]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader


**Data Cleaning**

In [31]:
data = pd.read_csv(r'C:\Users\Josh\Downloads\2024_LoL_esports_match_data_from_OraclesElixir.csv', low_memory=False)

# Get the rows: 1st row (index 0), every 12th row (indices 11, 23, 35, ...), and the next row (12th, 24th, 36th, ...)
rows_to_select = list(range(10, len(data), 12)) + list(range(11, len(data), 12))

# Sort the rows (since they might be in an unordered sequence after the concatenation)
rows_to_select.sort()

# Column index positions: J=9, X=23, Y=24, Z=25, AA=26, AB=27, AD=29
columns_to_select = [9, 23, 24, 25, 26, 27, 29]

# Select the rows and the specific columns by index
cleaned_data = data.iloc[rows_to_select, columns_to_select]

# Display the cleaned data to double check
print(cleaned_data.head())



    patch         pick1     pick2    pick3   pick4       pick5  result
10  13.24       Kalista     Senna  Orianna  Maokai      Aatrox       0
11  13.24  Renata Glasc     Varus  LeBlanc    Rell      Rumble       1
22  13.24         Neeko  Bel'Veth   Kennen   Senna  Tahm Kench       0
23  13.24       Kalista       Jax  LeBlanc    Rell   Jarvan IV       1
34  13.24         Neeko   Caitlyn      Lux     Jax    Bel'Veth       1


**Grouping Data for the Model**

In [32]:
data_grouped = []
for i in range(0, len(cleaned_data), 2):
    # Combine the features of the two rows (picks, patch, etc.) into one row
    patch = cleaned_data.iloc[i, 0]  # Get the patch from the first row of the pair (index 0 is the patch column)
    
    # Combine picks for both teams (pick1-pick5) and the patch
    row = [patch] + cleaned_data.iloc[i, 1:6].tolist() + cleaned_data.iloc[i+1, 1:6].tolist()
    
    # Append result from the second row of the pair (team 1's result)
    row.append(cleaned_data.iloc[i+1, -1])  # Assuming result is the last column
    
    data_grouped.append(row)

# Convert the list of grouped data into a new DataFrame
columns = ['patch', 'pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
           'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2', 'result']

game_data = pd.DataFrame(data_grouped, columns=columns)

# Convert all columns to strings
game_data = game_data.astype(str)

# Convert result to an integer
game_data['result'] = game_data['result'].astype(int)

# Display the first few rows of the grouped data
print(game_data.head())
print(game_data['result'].unique())

   patch pick1_team1 pick2_team1   pick3_team1 pick4_team1 pick5_team1  \
0  13.24     Kalista       Senna       Orianna      Maokai      Aatrox   
1  13.24       Neeko    Bel'Veth        Kennen       Senna  Tahm Kench   
2  13.24       Neeko     Caitlyn           Lux         Jax    Bel'Veth   
3  13.24      Rumble      Draven  Renata Glasc    Tristana   Jarvan IV   
4  13.24       Varus        Azir      Nautilus    Xin Zhao    Renekton   

    pick1_team2   pick2_team2 pick3_team2 pick4_team2 pick5_team2  result  
0  Renata Glasc         Varus     LeBlanc        Rell      Rumble       1  
1       Kalista           Jax     LeBlanc        Rell   Jarvan IV       1  
2       Kalista  Renata Glasc        Azir     Lee Sin      Aatrox       0  
3       Orianna       Kalista       Senna      Aatrox      Wukong       0  
4         Milio       Orianna      Lucian     Lee Sin      Aatrox       0  
[1 0]


In [33]:
# Checking to make sure it correctly aligned the patch numbers
first_game_patch_14_01 = game_data[game_data['patch'] == '14.01'].iloc[0]
print(first_game_patch_14_01)

patch                 14.01
pick1_team1         LeBlanc
pick2_team1           Varus
pick3_team1    Renata Glasc
pick4_team1        Renekton
pick5_team1        Xin Zhao
pick1_team2         Kalista
pick2_team2           Neeko
pick3_team2         Lee Sin
pick4_team2        Nautilus
pick5_team2             Jax
result                    0
Name: 15, dtype: object


### Preprocessing the Data
In general, models in machine learning do not want to take strings as inputs, so categorical variables should be encoded. 
\
There are many methods of encoding, but this model will be using one-hot encoding for the patches and multi-hot encoding for each team's draft.


**A Short Explanation**

In one-hot encoding, a variable is converted into a binary vector, where each element represents a different category.

For example, say a variable `animal` can take on three possible values: `"cat"`, `"dog"`, or `"bird"`. \
Since there are three values, the one-hot vector would have three elements, and suppose that the first element represents `"cat"`, the second `"dog"`, and the last `"bird"`. \
Then every time there is a data point that is a `"cat"`, it would be encoded by the vector `[1, 0, 0]`. Similarly, `"dog"` would be encoded by `[0, 1, 0]`, and `[0, 0, 1]` for `"bird"`.

Multi-hot encoding works similarly, except a vector may contain more than one `1`.


In [34]:
one_hot_encoded_patches = pd.get_dummies(game_data['patch'], prefix='patch')

one_hot_encoded_patches = one_hot_encoded_patches.astype(int)

# Add the encoded patches back to the original DataFrame
game_data = pd.concat([game_data, one_hot_encoded_patches], axis=1)

print(game_data.head())


   patch pick1_team1 pick2_team1   pick3_team1 pick4_team1 pick5_team1  \
0  13.24     Kalista       Senna       Orianna      Maokai      Aatrox   
1  13.24       Neeko    Bel'Veth        Kennen       Senna  Tahm Kench   
2  13.24       Neeko     Caitlyn           Lux         Jax    Bel'Veth   
3  13.24      Rumble      Draven  Renata Glasc    Tristana   Jarvan IV   
4  13.24       Varus        Azir      Nautilus    Xin Zhao    Renekton   

    pick1_team2   pick2_team2 pick3_team2 pick4_team2  ... patch_14.14  \
0  Renata Glasc         Varus     LeBlanc        Rell  ...           0   
1       Kalista           Jax     LeBlanc        Rell  ...           0   
2       Kalista  Renata Glasc        Azir     Lee Sin  ...           0   
3       Orianna       Kalista       Senna      Aatrox  ...           0   
4         Milio       Orianna      Lucian     Lee Sin  ...           0   

   patch_14.15  patch_14.16  patch_14.17  patch_14.18  patch_14.19  \
0            0            0            0

In [35]:
# Identify all unique champions across all picks
champions = pd.unique(game_data[['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
                                 'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']].values.ravel())

# Create a mapping from champion name to index
champion_to_index = {champ: idx for idx, champ in enumerate(champions)}

# Create multi-hot encoding for each team
def create_multi_hot_vector(picks, champion_to_index, num_champions):
    vector = np.zeros(num_champions, dtype=int)
    for pick in picks:
        if pick in champion_to_index:  # Ignore invalid picks (e.g., NaN)
            vector[champion_to_index[pick]] = 1
    return vector

# Apply multi-hot encoding to the dataset
num_champions = len(champions)

team1_vectors = []
team2_vectors = []

for _, row in game_data.iterrows():
    team1_picks = row[['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1']]
    team2_picks = row[['pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']]
    
    team1_vectors.append(create_multi_hot_vector(team1_picks, champion_to_index, num_champions))
    team2_vectors.append(create_multi_hot_vector(team2_picks, champion_to_index, num_champions))

# Add the multi-hot vectors to the DataFrame
game_data['team1_vector'] = team1_vectors
game_data['team2_vector'] = team2_vectors

print(game_data[['team1_vector', 'team2_vector']].head())

                                        team1_vector  \
0  [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...   
1  [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, ...   
2  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, ...   
3  [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...   
4  [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...   

                                        team2_vector  
0  [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ...  
1  [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, ...  
2  [1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  
3  [1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  
4  [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  


In [37]:
# Concatenate features as before
patch_columns = [col for col in game_data.columns if col.startswith('patch_')]
patch_encoded = game_data[patch_columns].values 
team1_vectors = np.array(game_data['team1_vector'].tolist())
team2_vectors = np.array(game_data['team2_vector'].tolist())
X = np.hstack([patch_encoded, team1_vectors, team2_vectors])
y = game_data['result'].values

# Randomly shuffle and split the data
np.random.seed(42)  # Set the random seed for reproducibility
indices = np.random.permutation(len(X))
split_index = int(0.8 * len(X))

train_indices = indices[:split_index]
test_indices = indices[split_index:]

X_train, X_test = X[train_indices], X[test_indices]
y_train, y_test = y[train_indices], y[test_indices]

# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

# Verify
print(f"X_train_tensor shape: {X_train_tensor.shape}")
print(f"y_train_tensor shape: {y_train_tensor.shape}")
print(f"X_test_tensor shape: {X_test_tensor.shape}")
print(f"y_test_tensor shape: {y_test_tensor.shape}")

X_train_tensor shape: torch.Size([7751, 360])
y_train_tensor shape: torch.Size([7751, 1])
X_test_tensor shape: torch.Size([1938, 360])
y_test_tensor shape: torch.Size([1938, 1])


### Defining the Model

In [39]:
class LeagueDraftPredictor(nn.Module):
    def __init__(self, input_size):
        super(LeagueDraftPredictor, self).__init__()
        # Define the layers
        self.fc1 = nn.Linear(input_size, 128)  # Input layer to first hidden layer
        self.fc2 = nn.Linear(128, 64)          # First hidden layer to second hidden layer
        self.fc3 = nn.Linear(64, 1)            # Second hidden layer to output
        self.dropout = nn.Dropout(0.3)         # Dropout for regularization

    def forward(self, x):
        x = F.relu(self.fc1(x))              # First hidden layer with ReLU
        x = self.dropout(x)                  # Apply dropout
        x = F.relu(self.fc2(x))              # Second hidden layer with ReLU
        x = self.dropout(x)                  # Apply dropout
        x = torch.sigmoid(self.fc3(x))       # Sigmoid activation for binary classification
        return x

# Instantiate the model
input_size = X_train_tensor.shape[1]  # Number of input features
model = LeagueDraftPredictor(input_size)

# Print the model architecture
print(model)

LeagueDraftPredictor(
  (fc1): Linear(in_features=360, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=1, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)


**Loss Function**

In [40]:
# Loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross Entropy Loss
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with learning rate 0.001


### Training the Model

In [None]:
# Training parameters
epochs = 50
batch_size = 32

# DataLoader for batching
train_dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Training loop
for epoch in range(epochs):
    model.train()  # Set model to training mode
    epoch_loss = 0

    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()          # Clear gradients from the previous step
        outputs = model(batch_X)       # Forward pass
        loss = criterion(outputs, batch_y)  # Compute loss
        loss.backward()                # Backpropagation
        optimizer.step()               # Update weights
        epoch_loss += loss.item()      # Accumulate loss

    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss/len(train_loader):.4f}")


Epoch 1/100, Loss: 0.1203
Epoch 2/100, Loss: 0.1219
Epoch 3/100, Loss: 0.1156
Epoch 4/100, Loss: 0.1173
Epoch 5/100, Loss: 0.1129
Epoch 6/100, Loss: 0.1244
Epoch 7/100, Loss: 0.1189
Epoch 8/100, Loss: 0.1159
Epoch 9/100, Loss: 0.1127
Epoch 10/100, Loss: 0.1200
Epoch 11/100, Loss: 0.1166
Epoch 12/100, Loss: 0.1193
Epoch 13/100, Loss: 0.1159
Epoch 14/100, Loss: 0.1051
Epoch 15/100, Loss: 0.1132
Epoch 16/100, Loss: 0.1191
Epoch 17/100, Loss: 0.1078
Epoch 18/100, Loss: 0.1169
Epoch 19/100, Loss: 0.1144
Epoch 20/100, Loss: 0.1142
Epoch 21/100, Loss: 0.1063
Epoch 22/100, Loss: 0.1121
Epoch 23/100, Loss: 0.1075
Epoch 24/100, Loss: 0.1071
Epoch 25/100, Loss: 0.1108
Epoch 26/100, Loss: 0.1099
Epoch 27/100, Loss: 0.1079
Epoch 28/100, Loss: 0.1041
Epoch 29/100, Loss: 0.1039
Epoch 30/100, Loss: 0.1043
Epoch 31/100, Loss: 0.1096
Epoch 32/100, Loss: 0.1016
Epoch 33/100, Loss: 0.1053
Epoch 34/100, Loss: 0.1027
Epoch 35/100, Loss: 0.1076
Epoch 36/100, Loss: 0.1088
Epoch 37/100, Loss: 0.1019
Epoch 38/1

### Evaluating the Model

In [45]:
# Evaluation
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    test_outputs = model(X_test_tensor)  # Forward pass
    test_predictions = (test_outputs > 0.5).float()  # Convert probabilities to binary predictions
    accuracy = (test_predictions == y_test_tensor).float().mean().item()  # Compute accuracy

print(f"Test Accuracy: {accuracy:.4f}")


Test Accuracy: 0.5361
