## Machine Learning Analysis of League of Legends Drafts


### Introduction
In the popular video game League of Legends, teams of five face off against each other. Before the game begins, each team picks five unique champions to play. This collection of champions is known as a particular game's draft. 
\
Each game is also played on a particular patch, a version of the game following biweekly balance changes. Patches through the end of 2024 follow the format `season.patch`, so patch 14.9 would be the ninth patch of the 14th season. 

This project aims to look at the drafts of professional League of Legends games and use machine learning to train models that are able to predict the winner of a game given each team's draft. 
\
The results of different models will be compared to each other in order to see what parameters might lead to more accurate results.



### Data

The data for this project was obtained from [Oracle's Elixir](https://oracleselixir.com/tools/downloads).

## **Model 1**
The model for this project was built using PyTorch and encoded through one-hot encoding for the patches and multi-hot encoding for the drafts.

### Setting Up the Data
Since the data currently contains a lot more information than necessary, it must be cleaned in order to present the model with something that is usable in an efficient manner.

**Packages**

In [58]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split


**Data Cleaning**

In [17]:
data = pd.read_csv(r'C:\Users\Josh\Downloads\2024_LoL_esports_match_data_from_OraclesElixir.csv', low_memory=False)

# Get the rows: 1st row (index 0), every 12th row (indices 11, 23, 35, ...), and the next row (12th, 24th, 36th, ...)
rows_to_select = list(range(10, len(data), 12)) + list(range(11, len(data), 12))

# Sort the rows (since they might be in an unordered sequence after the concatenation)
rows_to_select.sort()

# Column index positions: J=9, X=23, Y=24, Z=25, AA=26, AB=27, AD=29
columns_to_select = [9, 23, 24, 25, 26, 27, 29]

# Select the rows and the specific columns by index
cleaned_data = data.iloc[rows_to_select, columns_to_select]

# Display the cleaned data to double check
print(cleaned_data.head())



    patch         pick1     pick2    pick3   pick4       pick5  result
10  13.24       Kalista     Senna  Orianna  Maokai      Aatrox       0
11  13.24  Renata Glasc     Varus  LeBlanc    Rell      Rumble       1
22  13.24         Neeko  Bel'Veth   Kennen   Senna  Tahm Kench       0
23  13.24       Kalista       Jax  LeBlanc    Rell   Jarvan IV       1
34  13.24         Neeko   Caitlyn      Lux     Jax    Bel'Veth       1


**Grouping Data for the Model**

In [18]:
data_grouped = []
for i in range(0, len(cleaned_data), 2):
    # Combine the features of the two rows (picks, patch, etc.) into one row
    patch = cleaned_data.iloc[i, 0]  # Get the patch from the first row of the pair (index 0 is the patch column)
    
    # Combine picks for both teams (pick1-pick5) and the patch
    row = [patch] + cleaned_data.iloc[i, 1:6].tolist() + cleaned_data.iloc[i+1, 1:6].tolist()
    
    # Append result from the second row of the pair (team 1's result)
    row.append(cleaned_data.iloc[i+1, -1])  # Assuming result is the last column
    
    data_grouped.append(row)

# Convert the list of grouped data into a new DataFrame
columns = ['patch', 'pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
           'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2', 'result']

game_data = pd.DataFrame(data_grouped, columns=columns)

# Convert all columns to strings
game_data = game_data.astype(str)

# Convert result to an integer
game_data['result'] = game_data['result'].astype(int)

# Display the first few rows of the grouped data
print(game_data.head())
print(game_data['result'].unique())

   patch pick1_team1 pick2_team1   pick3_team1 pick4_team1 pick5_team1  \
0  13.24     Kalista       Senna       Orianna      Maokai      Aatrox   
1  13.24       Neeko    Bel'Veth        Kennen       Senna  Tahm Kench   
2  13.24       Neeko     Caitlyn           Lux         Jax    Bel'Veth   
3  13.24      Rumble      Draven  Renata Glasc    Tristana   Jarvan IV   
4  13.24       Varus        Azir      Nautilus    Xin Zhao    Renekton   

    pick1_team2   pick2_team2 pick3_team2 pick4_team2 pick5_team2  result  
0  Renata Glasc         Varus     LeBlanc        Rell      Rumble       1  
1       Kalista           Jax     LeBlanc        Rell   Jarvan IV       1  
2       Kalista  Renata Glasc        Azir     Lee Sin      Aatrox       0  
3       Orianna       Kalista       Senna      Aatrox      Wukong       0  
4         Milio       Orianna      Lucian     Lee Sin      Aatrox       0  
[1 0]


In [6]:
# Checking to make sure it correctly aligned the patch numbers
first_game_patch_14_01 = game_data[game_data['patch'] == '14.01'].iloc[0]
print(first_game_patch_14_01)

patch                 14.01
pick1_team1         LeBlanc
pick2_team1           Varus
pick3_team1    Renata Glasc
pick4_team1        Renekton
pick5_team1        Xin Zhao
pick1_team2         Kalista
pick2_team2           Neeko
pick3_team2         Lee Sin
pick4_team2        Nautilus
pick5_team2             Jax
result                    0
Name: 15, dtype: object


### Preprocessing the Data
In general, models in machine learning do not want to take strings as inputs, so categorical variables should be encoded. 
\
There are many methods of encoding, but this model will be using one-hot encoding for the patches and multi-hot encoding for each team's draft.


**A Short Explanation**

In one-hot encoding, a variable is converted into a binary vector, where each element represents a different category.

For example, say a variable `animal` can take on three possible values: `"cat"`, `"dog"`, or `"bird"`. \
Since there are three values, the one-hot vector would have three elements, and suppose that the first element represents `"cat"`, the second `"dog"`, and the last `"bird"`. \
Then every time there is a data point that is a `"cat"`, it would be encoded by the vector `[1, 0, 0]`. Similarly, `"dog"` would be encoded by `[0, 1, 0]`, and `[0, 0, 1]` for `"bird"`.

Multi-hot encoding works similarly, except a vector may contain more than one `1`.


In [19]:
one_hot_encoded_patches = pd.get_dummies(game_data['patch'], prefix='patch')

one_hot_encoded_patches = one_hot_encoded_patches.astype(int)

# Add the encoded patches back to the original DataFrame
game_data = pd.concat([game_data, one_hot_encoded_patches], axis=1)

print(game_data.head())


   patch pick1_team1 pick2_team1   pick3_team1 pick4_team1 pick5_team1  \
0  13.24     Kalista       Senna       Orianna      Maokai      Aatrox   
1  13.24       Neeko    Bel'Veth        Kennen       Senna  Tahm Kench   
2  13.24       Neeko     Caitlyn           Lux         Jax    Bel'Veth   
3  13.24      Rumble      Draven  Renata Glasc    Tristana   Jarvan IV   
4  13.24       Varus        Azir      Nautilus    Xin Zhao    Renekton   

    pick1_team2   pick2_team2 pick3_team2 pick4_team2  ... patch_14.14  \
0  Renata Glasc         Varus     LeBlanc        Rell  ...           0   
1       Kalista           Jax     LeBlanc        Rell  ...           0   
2       Kalista  Renata Glasc        Azir     Lee Sin  ...           0   
3       Orianna       Kalista       Senna      Aatrox  ...           0   
4         Milio       Orianna      Lucian     Lee Sin  ...           0   

   patch_14.15  patch_14.16  patch_14.17  patch_14.18  patch_14.19  \
0            0            0            0

In [20]:
# Identify all unique champions across all picks
champions = pd.unique(game_data[['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
                                 'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']].values.ravel())

# Create a mapping from champion name to index
champion_to_index = {champ: idx for idx, champ in enumerate(champions)}

# Create multi-hot encoding for each team
def create_multi_hot_vector(picks, champion_to_index, num_champions):
    vector = np.zeros(num_champions, dtype=int)
    for pick in picks:
        if pick in champion_to_index:  # Ignore invalid picks (e.g., NaN)
            vector[champion_to_index[pick]] = 1
    return vector

# Apply multi-hot encoding to the dataset
num_champions = len(champions)

team1_vectors = []
team2_vectors = []

for _, row in game_data.iterrows():
    team1_picks = row[['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1']]
    team2_picks = row[['pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']]
    
    team1_vectors.append(create_multi_hot_vector(team1_picks, champion_to_index, num_champions))
    team2_vectors.append(create_multi_hot_vector(team2_picks, champion_to_index, num_champions))

# Add the multi-hot vectors to the DataFrame
game_data['team1_vector'] = team1_vectors
game_data['team2_vector'] = team2_vectors

print(game_data[['team1_vector', 'team2_vector']].head())

                                        team1_vector  \
0  [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...   
1  [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, ...   
2  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, ...   
3  [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...   
4  [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...   

                                        team2_vector  
0  [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ...  
1  [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, ...  
2  [1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  
3  [1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  
4  [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...  


In [9]:
# Concatenate features as before
patch_columns = [col for col in game_data.columns if col.startswith('patch_')]
patch_encoded = game_data[patch_columns].values 
team1_vectors = np.array(game_data['team1_vector'].tolist())
team2_vectors = np.array(game_data['team2_vector'].tolist())
X = np.hstack([patch_encoded, team1_vectors, team2_vectors])
y = game_data['result'].values

# Randomly shuffle and split the data
np.random.seed(42)  # Set the random seed for reproducibility
indices = np.random.permutation(len(X))
split_index = int(0.8 * len(X))

train_indices = indices[:split_index]
test_indices = indices[split_index:]

X_train, X_test = X[train_indices], X[test_indices]
y_train, y_test = y[train_indices], y[test_indices]

# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

# Verify
print(f"X_train_tensor shape: {X_train_tensor.shape}")
print(f"y_train_tensor shape: {y_train_tensor.shape}")
print(f"X_test_tensor shape: {X_test_tensor.shape}")
print(f"y_test_tensor shape: {y_test_tensor.shape}")

X_train_tensor shape: torch.Size([7751, 360])
y_train_tensor shape: torch.Size([7751, 1])
X_test_tensor shape: torch.Size([1938, 360])
y_test_tensor shape: torch.Size([1938, 1])


### Defining the Model

In [10]:
class LeagueDraftPredictor(nn.Module):
    def __init__(self, input_size):
        super(LeagueDraftPredictor, self).__init__()
        # Define the layers
        self.fc1 = nn.Linear(input_size, 512)  # Input layer to first hidden layer
        self.fc2 = nn.Linear(512, 512)          # First hidden layer to second hidden layer
        self.fc3 = nn.Linear(512, 128)
        self.fc4 = nn.Linear(128, 1)            # Second hidden layer to output
        self.dropout = nn.Dropout(0.3)         # Dropout for regularization

    def forward(self, x):
        x = F.relu(self.fc1(x))              # First hidden layer with ReLU
        x = self.dropout(x)                  # Apply dropout
        x = F.relu(self.fc2(x))              # Second hidden layer with ReLU
        x = self.dropout(x)                  # Apply dropout
        x = F.relu(self.fc3(x))
        x = self.dropout(x)
        x = torch.sigmoid(self.fc4(x))       # Sigmoid activation for binary classification
        return x

# Instantiate the model
input_size = X_train_tensor.shape[1]  # Number of input features
model = LeagueDraftPredictor(input_size)

# Print the model architecture
print(model)

LeagueDraftPredictor(
  (fc1): Linear(in_features=360, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=512, bias=True)
  (fc3): Linear(in_features=512, out_features=128, bias=True)
  (fc4): Linear(in_features=128, out_features=1, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)


**Loss Function**

In [11]:
# Loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross Entropy Loss
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with learning rate 0.001


### Training the Model

In [12]:
# Training parameters
epochs = 50
batch_size = 64

# DataLoader for batching
train_dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Training loop
for epoch in range(epochs):
    model.train()  # Set model to training mode
    epoch_loss = 0

    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()          # Clear gradients from the previous step
        outputs = model(batch_X)       # Forward pass
        loss = criterion(outputs, batch_y)  # Compute loss
        loss.backward()                # Backpropagation
        optimizer.step()               # Update weights
        epoch_loss += loss.item()      # Accumulate loss

    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss/len(train_loader):.4f}")


Epoch 1/50, Loss: 0.6908
Epoch 2/50, Loss: 0.6763
Epoch 3/50, Loss: 0.6388
Epoch 4/50, Loss: 0.5531
Epoch 5/50, Loss: 0.4225
Epoch 6/50, Loss: 0.3126
Epoch 7/50, Loss: 0.2589
Epoch 8/50, Loss: 0.2056
Epoch 9/50, Loss: 0.1774
Epoch 10/50, Loss: 0.1692
Epoch 11/50, Loss: 0.1493
Epoch 12/50, Loss: 0.1430
Epoch 13/50, Loss: 0.1318
Epoch 14/50, Loss: 0.1223
Epoch 15/50, Loss: 0.1203
Epoch 16/50, Loss: 0.1172
Epoch 17/50, Loss: 0.1115
Epoch 18/50, Loss: 0.1141
Epoch 19/50, Loss: 0.1080
Epoch 20/50, Loss: 0.1072
Epoch 21/50, Loss: 0.1062
Epoch 22/50, Loss: 0.0967
Epoch 23/50, Loss: 0.1026
Epoch 24/50, Loss: 0.1021
Epoch 25/50, Loss: 0.0934
Epoch 26/50, Loss: 0.0964
Epoch 27/50, Loss: 0.0961
Epoch 28/50, Loss: 0.0926
Epoch 29/50, Loss: 0.0934
Epoch 30/50, Loss: 0.0891
Epoch 31/50, Loss: 0.0947
Epoch 32/50, Loss: 0.0940
Epoch 33/50, Loss: 0.0862
Epoch 34/50, Loss: 0.0908
Epoch 35/50, Loss: 0.0914
Epoch 36/50, Loss: 0.0838
Epoch 37/50, Loss: 0.0862
Epoch 38/50, Loss: 0.0889
Epoch 39/50, Loss: 0.

### Evaluating the Model

In [13]:
# Evaluation
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    test_outputs = model(X_test_tensor)  # Forward pass
    test_predictions = (test_outputs > 0.5).float()  # Convert probabilities to binary predictions
    accuracy = (test_predictions == y_test_tensor).float().mean().item()  # Compute accuracy

print(f"Test Accuracy: {accuracy:.4f}")


Test Accuracy: 0.5299


We can see that this model has an accuracy of about 53%, which is acceptable considering it does not factor in team strength. For reference, champions in League of Legends are usually heavily nerfed if they are sitting at a 53-54% winrate, so this model is off to a good start.

However, it can definitely be improved, so embeddings will be used next to replace multi-hot encoding for hopefully better representation of the intracacies in drafts, potentially factoring in things such as the synergy between champions and what each champion is good at.

**What are embeddings?**

Embeddings are a method of representing categorical data for ML models to use, similar to n-hot encoding. However, instead of simply representing data through binary vectors, embeddings use vectors in a low-dimensional space to place objects closer together the more related they are.

For example, in n-hot encoding, apples, apple pie, and oranges would all be represented differently through binary vectors, and the model would have no way of telling which foods were more closely related. However, embeddings would allow the model to calculate the "distance" between apples and apple pie, and find that they are more closely related than apples and oranges.

## **Model 2**

### Mapping Champions to Indices

**Creating a new dataset**

In [None]:
data2 = pd.read_csv(r'C:\Users\Josh\Downloads\2024_LoL_esports_match_data_from_OraclesElixir.csv', low_memory=False)
rows_to_select = list(range(10, len(data2), 12)) + list(range(11, len(data2), 12))
rows_to_select.sort()
columns_to_select = [9, 23, 24, 25, 26, 27, 29]
cleaned_data2 = data2.iloc[rows_to_select, columns_to_select]

data_grouped2 = []
for i in range(0, len(cleaned_data2), 2):
    patch = cleaned_data2.iloc[i, 0]
    row = [patch] + cleaned_data2.iloc[i, 1:6].tolist() + cleaned_data2.iloc[i + 1, 1:6].tolist()
    row.append(cleaned_data2.iloc[i + 1, -1])
    data_grouped2.append(row)

columns = ['patch', 'pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
           'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2', 'result']
game_data2 = pd.DataFrame(data_grouped2, columns=columns)
game_data2 = game_data2.astype(str)
game_data2['result'] = game_data2['result'].astype(int)

one_hot_encoded_patches2 = pd.get_dummies(game_data2['patch'], prefix='patch')
one_hot_encoded_patches2 = one_hot_encoded_patches2.astype(int)
game_data2 = pd.concat([game_data2, one_hot_encoded_patches2], axis=1)

print(game_data2.head())

   patch pick1_team1 pick2_team1   pick3_team1 pick4_team1 pick5_team1  \
0  13.24     Kalista       Senna       Orianna      Maokai      Aatrox   
1  13.24       Neeko    Bel'Veth        Kennen       Senna  Tahm Kench   
2  13.24       Neeko     Caitlyn           Lux         Jax    Bel'Veth   
3  13.24      Rumble      Draven  Renata Glasc    Tristana   Jarvan IV   
4  13.24       Varus        Azir      Nautilus    Xin Zhao    Renekton   

    pick1_team2   pick2_team2 pick3_team2 pick4_team2  ... patch_14.14  \
0  Renata Glasc         Varus     LeBlanc        Rell  ...           0   
1       Kalista           Jax     LeBlanc        Rell  ...           0   
2       Kalista  Renata Glasc        Azir     Lee Sin  ...           0   
3       Orianna       Kalista       Senna      Aatrox  ...           0   
4         Milio       Orianna      Lucian     Lee Sin  ...           0   

   patch_14.15  patch_14.16  patch_14.17  patch_14.18  patch_14.19  \
0            0            0            0

**Mapping**

In [28]:
# Extract all champion names from the pick columns in game_data2
pick_columns = ['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1',
                'pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']

unique_champions = pd.unique(game_data2[pick_columns].values.ravel())  # Get all unique champion names
champion_to_index = {champ: idx for idx, champ in enumerate(unique_champions)}  # Map champions to indices

# Add this mapping to the game_data2 DataFrame for reference (optional)
game_data2['champion_to_index_mapping'] = str(champion_to_index)

# Display the mapping
print("Champion to Index Mapping:")
print(champion_to_index)

print(f"Index for a specific champion 'Aatrox': {champion_to_index.get('Aatrox', 'Not Found')}")
print(f"Total champions mapped: {len(champion_to_index)}")



Champion to Index Mapping:
{'Kalista': 0, 'Senna': 1, 'Orianna': 2, 'Maokai': 3, 'Aatrox': 4, 'Renata Glasc': 5, 'Varus': 6, 'LeBlanc': 7, 'Rell': 8, 'Rumble': 9, 'Neeko': 10, "Bel'Veth": 11, 'Kennen': 12, 'Tahm Kench': 13, 'Jax': 14, 'Jarvan IV': 15, 'Caitlyn': 16, 'Lux': 17, 'Azir': 18, 'Lee Sin': 19, 'Draven': 20, 'Tristana': 21, 'Wukong': 22, 'Nautilus': 23, 'Xin Zhao': 24, 'Renekton': 25, 'Milio': 26, 'Lucian': 27, 'Poppy': 28, 'Xayah': 29, 'Rakan': 30, 'Ryze': 31, 'Nocturne': 32, 'Zeri': 33, 'Lulu': 34, 'Nami': 35, 'Akali': 36, 'Sejuani': 37, 'Gragas': 38, 'Sylas': 39, 'Nidalee': 40, "K'Sante": 41, 'Vi': 42, 'Ashe': 43, "Kha'Zix": 44, 'Ezreal': 45, 'Braum': 46, 'Karthus': 47, 'Jhin': 48, 'Elise': 49, 'Syndra': 50, 'Jayce': 51, 'Viego': 52, 'Cassiopeia': 53, 'Malphite': 54, 'Veigar': 55, 'nan': 56, 'Taliyah': 57, 'Leona': 58, 'Ornn': 59, 'Aphelios': 60, 'Ziggs': 61, 'Briar': 62, "Vel'Koz": 63, "Kai'Sa": 64, 'Gwen': 65, 'Yone': 66, 'Blitzcrank': 67, 'Hwei': 68, 'Alistar': 69, 'Fior

### Preparing Data for Embeddings

In [29]:
# Replace champion names in the pick columns with their indices
for col in pick_columns:
    game_data2[col] = game_data2[col].map(champion_to_index)

# Verify the changes
print("Data with champion indices:")
print(game_data2[pick_columns].head())


Data with champion indices:
   pick1_team1  pick2_team1  pick3_team1  pick4_team1  pick5_team1  \
0            0            1            2            3            4   
1           10           11           12            1           13   
2           10           16           17           14           11   
3            9           20            5           21           15   
4            6           18           23           24           25   

   pick1_team2  pick2_team2  pick3_team2  pick4_team2  pick5_team2  
0            5            6            7            8            9  
1            0           14            7            8           15  
2            0            5           18           19            4  
3            2            0            1            4           22  
4           26            2           27           19            4  


In [47]:
# Step 1: Prepare inputs (separate picks and results)
X_picks_team1 = game_data2[['pick1_team1', 'pick2_team1', 'pick3_team1', 'pick4_team1', 'pick5_team1']].values
X_picks_team2 = game_data2[['pick1_team2', 'pick2_team2', 'pick3_team2', 'pick4_team2', 'pick5_team2']].values
X_patches = game_data2[one_hot_encoded_patches2.columns].values
y = game_data2['result'].values

# Step 2: Split the data into training and testing sets (80/20 split)
X_picks_team1_train, X_picks_team1_test, X_picks_team2_train, X_picks_team2_test, X_patches_train, X_patches_test, y_train, y_test = train_test_split(
    X_picks_team1, X_picks_team2, X_patches, y, test_size=0.2, random_state=42
)

# Step 3: Convert to PyTorch tensors
X_picks_team1_train_tensor = torch.tensor(X_picks_team1_train, dtype=torch.long)
X_picks_team2_train_tensor = torch.tensor(X_picks_team2_train, dtype=torch.long)
X_patches_train_tensor = torch.tensor(X_patches_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)  # Ensure y is 2D for binary classification

X_picks_team1_test_tensor = torch.tensor(X_picks_team1_test, dtype=torch.long)
X_picks_team2_test_tensor = torch.tensor(X_picks_team2_test, dtype=torch.long)
X_patches_test_tensor = torch.tensor(X_patches_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

# Step 4: Verify the shapes of the tensors
print(f"Team 1 picks train tensor shape: {X_picks_team1_train_tensor.shape}")
print(f"Team 2 picks train tensor shape: {X_picks_team2_train_tensor.shape}")
print(f"Patches train tensor shape: {X_patches_train_tensor.shape}")
print(f"Result train tensor shape: {y_train_tensor.shape}")

print(f"Team 1 picks test tensor shape: {X_picks_team1_test_tensor.shape}")
print(f"Team 2 picks test tensor shape: {X_picks_team2_test_tensor.shape}")
print(f"Patches test tensor shape: {X_patches_test_tensor.shape}")
print(f"Result test tensor shape: {y_test_tensor.shape}")



Team 1 picks train tensor shape: torch.Size([7751, 5])
Team 2 picks train tensor shape: torch.Size([7751, 5])
Patches train tensor shape: torch.Size([7751, 24])
Result train tensor shape: torch.Size([7751, 1])
Team 1 picks test tensor shape: torch.Size([1938, 5])
Team 2 picks test tensor shape: torch.Size([1938, 5])
Patches test tensor shape: torch.Size([1938, 24])
Result test tensor shape: torch.Size([1938, 1])


### Defining the Model

In [None]:
class LeagueDraftEmbeddingsModel(nn.Module):
    def __init__(self, num_champions, patch_input_size, embedding_dim=16):
        super(LeagueDraftEmbeddingsModel, self).__init__()
        
        # Embedding layer for champions picks (team 1 and team 2)
        self.embedding = nn.Embedding(num_embeddings=num_champions, embedding_dim=embedding_dim)

        # Fully connected layers
        self.fc1 = nn.Linear(2 * embedding_dim + patch_input_size, 128)  # Input: 2*embedding_dim + patch_input_size
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 1)  

        self.dropout = nn.Dropout(0.3)  # Dropout for regularization

    def forward(self, team1_picks, team2_picks, patch_features):
        # Average the embeddings of team picks
        team1_embedded = self.embedding(team1_picks).mean(dim=1)  # Average the embeddings for team 1
        team2_embedded = self.embedding(team2_picks).mean(dim=1)  # Average the embeddings for team 2

        # Concatenate the averaged embeddings of both teams and patch features
        combined_features = torch.cat([team1_embedded, team2_embedded, patch_features], dim=1)

        # Pass through fully connected layers
        x = F.relu(self.fc1(combined_features))  # First hidden layer with ReLU activation
        x = self.dropout(x)  # Apply dropout
        x = F.relu(self.fc2(x))  # Second hidden layer with ReLU activation
        x = self.dropout(x)  # Apply dropout

        # Final sigmoid layer for binary classification (0 or 1)
        x = torch.sigmoid(self.fc3(x))  # Output layer

        return x

# Instantiate the model
num_champions = len(champion_to_idx)  # Number of unique champions
patch_input_size = len(one_hot_encoded_patches2.columns)  # Number of patch features (after one-hot encoding)
embedding_dim = 16  # You can experiment with this value

model = LeagueDraftEmbeddingsModel(num_champions, patch_input_size, embedding_dim)

# Print model architecture
print(model)

LeagueDraftEmbeddingsModel(
  (embedding): Embedding(168, 16)
  (fc1): Linear(in_features=56, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=1, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)


### Training Loop

In [68]:
# Set parameters
epochs = 100
batch_size = 32
learning_rate = 0.001

# Step 1: Prepare the dataset and DataLoader for training
train_dataset = TensorDataset(X_picks_team1_train_tensor, X_picks_team2_train_tensor, X_patches_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Step 2: Define the model and instantiate it
model = LeagueDraftEmbeddingsModel(num_champions=len(champion_to_idx), patch_input_size=patch_input_size, embedding_dim=16)

# Step 3: Define the loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross Entropy loss for binary classification
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Step 4: Training loop
for epoch in range(epochs):
    model.train()  # Set model to training mode
    epoch_loss = 0
    correct_predictions = 0
    total_predictions = 0
    
    for batch_X1, batch_X2, batch_patch, batch_y in train_loader:
        optimizer.zero_grad()  # Clear gradients from the previous step

        # Forward pass
        outputs = model(batch_X1, batch_X2, batch_patch)  # Get predictions

        # Compute the loss
        loss = criterion(outputs, batch_y)  # Ensure y is the correct shape for BCELoss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights

        # Calculate the number of correct predictions
        predicted = (outputs > 0.5).float()
        correct_predictions += (predicted == batch_y).sum().item()
        total_predictions += batch_y.size(0)

        epoch_loss += loss.item()

    # Calculate accuracy for the epoch
    accuracy = 100 * correct_predictions / total_predictions

    # Print loss and accuracy for the current epoch
    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss/len(train_loader):.4f}, Accuracy: {accuracy:.2f}%")


Epoch 1/100, Loss: 0.6915, Accuracy: 53.39%
Epoch 2/100, Loss: 0.6882, Accuracy: 54.61%
Epoch 3/100, Loss: 0.6842, Accuracy: 55.71%
Epoch 4/100, Loss: 0.6790, Accuracy: 56.34%
Epoch 5/100, Loss: 0.6729, Accuracy: 58.46%
Epoch 6/100, Loss: 0.6686, Accuracy: 58.99%
Epoch 7/100, Loss: 0.6641, Accuracy: 59.21%
Epoch 8/100, Loss: 0.6552, Accuracy: 59.95%
Epoch 9/100, Loss: 0.6485, Accuracy: 61.23%
Epoch 10/100, Loss: 0.6437, Accuracy: 62.73%
Epoch 11/100, Loss: 0.6322, Accuracy: 63.60%
Epoch 12/100, Loss: 0.6252, Accuracy: 64.26%
Epoch 13/100, Loss: 0.6172, Accuracy: 65.55%
Epoch 14/100, Loss: 0.6101, Accuracy: 65.88%
Epoch 15/100, Loss: 0.6039, Accuracy: 66.29%
Epoch 16/100, Loss: 0.6012, Accuracy: 67.64%
Epoch 17/100, Loss: 0.5921, Accuracy: 67.94%
Epoch 18/100, Loss: 0.5831, Accuracy: 68.52%
Epoch 19/100, Loss: 0.5818, Accuracy: 67.86%
Epoch 20/100, Loss: 0.5734, Accuracy: 69.60%
Epoch 21/100, Loss: 0.5580, Accuracy: 70.13%
Epoch 22/100, Loss: 0.5564, Accuracy: 70.97%
Epoch 23/100, Loss: