<center> <img src='http://www.thumbnailtemplates.com/images/thumbs/thumb-099-dota-2-2.jpg'>

# Dota 2 winner prediction
    
In this Kernel I build a simple multilayer neural network (multilayer perceptron) in Pytorch to predict which team (Radiant or Dire) will win a Dota 2 match.

# Import modules

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import copy
import datetime
import pytz
import time
import random

# PyTorch stuff
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torch.utils import data

# Sklearn stuff
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import roc_auc_score

In [None]:
SEED = 17

# Input data files are available in the "../input/" directory.
PATH_TO_DATA = '../input'
print(os.listdir(PATH_TO_DATA))

# Load data

In [None]:
# Train dataset
df_train_features = pd.read_csv(os.path.join(PATH_TO_DATA, 'train_features.csv'), 
                                    index_col='match_id_hash')
df_train_targets = pd.read_csv(os.path.join(PATH_TO_DATA, 'train_targets.csv'), 
                                   index_col='match_id_hash')

# Test dataset
df_test_features = pd.read_csv(os.path.join(PATH_TO_DATA, 'test_features.csv'), 
                                   index_col='match_id_hash')

In [None]:
df_train_features.head()

In [None]:
# Check if there is missing data
print(df_train_features.isnull().values.any())
print(df_test_features.isnull().values.any())

Let's combine train and test datasets in one dataset. This allows for addding new features for both datasets at the same time.

In [None]:
df_full_features = pd.concat([df_train_features, df_test_features])

# Index to split the training and test data sets
idx_split = df_train_features.shape[0]

# That is, 
# df_train_features == df_full_features[:idx_split]
# df_test_features == df_full_features[idx_split:]

Drop the game-features

In [None]:
df_full_features.drop(['game_time', 'game_mode', 'lobby_type', 'objectives_len', 'chat_len'],
                      inplace=True, axis=1)

Helper function for writing to submission file

In [None]:
def write_to_submission_file(predicted_labels):
    df_submission = pd.DataFrame({'radiant_win_prob': predicted_labels}, 
                                     index=df_test_features.index)

    submission_filename = 'submission_{}.csv'.format(
        datetime.datetime.now(tz=pytz.timezone('Europe/Athens')).strftime('%Y-%m-%d_%H-%M-%S'))
    
    df_submission.to_csv(submission_filename)
    
    print('Submission saved to {}'.format(submission_filename))

# Description of the player-features

There are 245 different features in the dataset. It might be daunting at first, however there are 10 different players (2 teams with 5 players in each), and there are only 24 unique player-features for each player (which gives us 240 player-features in total). The remaining 5 features are general game-features which I don't use here.

The description of each player-feature is given in the following table (source: dota2.gamepedia.com). This might be helpfull for people who have never played Dota 2 like me.


|  Feature  | Description |
| ------------- |:-------------| 
| **hero_id** | ID of player's hero (int64). [Heroes](https://dota2.gamepedia.com/Heroes) are the essential element of Dota 2, as the course of the match is dependent on their intervention. During a match, two opposing teams select five out of 117 heroes that accumulate experience and gold to grow stronger and gain new abilities in order to destroy the opponent's Ancient. Most heroes have a distinct role that defines how they affect the battlefield, though many heroes can perform multiple roles. A hero's appearance can be modified with equipment.|
| **kills** | Number of killed players (int64).|
| **deaths** | Number of deaths of the player (int64).|
| **gold** | Amount of gold (int64). [Gold](https://dota2.gamepedia.com/Gold) is the currency used to buy items or instantly revive your hero. Gold can be earned from killing heroes, creeps, or buildings. |
| **xp** | Experience points (int64). [Experience](https://dota2.gamepedia.com/Experience) is an element heroes can gather by killing enemy units, or being present as enemy units get killed. On its own, experience does nothing, but when accumulated, it increases the hero's level, so that they grow more powerful.   |
| **lh** | Number of last hits (int64). [Last-hitting](https://dota2.gamepedia.com/Creep_control_techniques#Last-hitting) is a technique where you (or a creep under your control) get the 'last hit' on a neutral creep, enemy lane creep, or enemy hero. The hero that dealt the killing blow to the enemy unit will be awarded a bounty.|
| **denies** | Number of denies (int64). [Denying](https://dota2.gamepedia.com/Denying) is the act of preventing enemy heroes from getting the last hit on a friendly unit by last hitting the unit oneself. Enemies earn reduced experience if the denied unit is not controlled by a player, and no experience if it is a player controlled unit. Enemies gain no gold from any denied unit. |
| **assists** | Number of [assists](https://dota2.gamepedia.com/Gold#Assists_.28AoE_gold.29) (int64). Allied heroes within 1300 radius of a killed enemy, including the killer, receive experience and reliable gold if they assisted in the kill. To qualify for an assist, the allied hero merely has to be within the given radius of the dying enemy hero. |
| **health** | Health points (int64). [Health](https://dota2.gamepedia.com/Health) represents the life force of a unit. When a unit's current health reaches 0, it dies. Every hero has a base health pool of 200. This value exists for all heroes and cannot be altered. This means that a hero's maximum health cannot drop below 200. |
| **max_health** | Hero's maximum health pool (int64).|
| **max_mana** | Hero's maximum mana pool (float64). [Mana](https://dota2.gamepedia.com/Mana) represents the magic power of a unit. It is used as a cost for the majority of active and even some passive abilities. Every hero has a base mana pool of 75, while most non-hero units only have a set mana pool if they have abilities which require mana, with a few exceptions. These values cannot be altered. This means that a hero's maximum mana cannot drop below 75. |
| **level** | [Level](https://dota2.gamepedia.com/Experience#Leveling) of player's hero (int64). Each hero begins at level 1, with one free ability point to spend. Heroes may level up by acquiring certain amounts of experience. Upon leveling up, the hero's attributes increase by fixed amounts (unique for each hero), which makes them overall more powerful. Heroes may also gain more ability points by leveling up, allowing them to learn new spells, or to improve an already learned spell, making it more powerful. Heroes can gain a total for 24 levels, resulting in level 25 as the highest possible level a hero can reach. |
| **x** | Player's X coordinate (int64) |
| **y** | Player's Y coordinate (int64) |
| **stuns** | Total stun duration? (float64). [Stun](https://dota2.gamepedia.com/Stun) is a status effect that completely locks down affected units, disabling almost all of its capabilities. |
| **creeps_stacked** | Number of stacked creeps (int64). [Creep Stacking](https://dota2.gamepedia.com/Creep_Stacking) is the process of drawing neutral creeps away from their camps in order to increase the number of units in an area. By pulling neutral creeps beyond their camp boundaries, the game will generate a new set of creeps for the player to interact with in addition to any remaining creeps. This is incredibly time efficient, since it effectively increases the amount of gold available for a team. |
| **camps_stacked** | Number of stacked camps  (int64). |
| **rune_pickups** | Number of picked up [runes](https://dota2.gamepedia.com/Runes)  (int64).  |
| **firstblood_claimed** | boolean feature? (int64) |
| **teamfight_participation** |  Team fight participation rate? (float64) |
| **towers_killed** | Number of killed/destroyed Towers (int64). [Towers](https://dota2.gamepedia.com/Buildings#Towers) are the main line of defense for both teams, attacking any non-neutral enemy that gets within their range. Both factions have all three lanes guarded by three towers each. Additionally, each faction's Ancient have two towers as well, resulting in a total of 11 towers per faction. Towers come in 4 different tiers. |
| **roshans_killed** | Number of killed Roshans  (int64). [Roshan](https://dota2.gamepedia.com/Roshan) is the most powerful neutral creep in Dota 2. It is the first unit which spawns, right as the match is loaded. During the early to mid game, he easily outmatches almost every hero in one-on-one combat. Very few heroes can take him on alone during the mid-game. Even in the late game, lots of heroes struggle fighting him one on one, since Roshan grows stronger as time passes. |
| **obs_placed** | Number of observer-wards placed by a player (int64). [Observer Ward](https://dota2.gamepedia.com/Observer_Ward), an invisible watcher that gives ground vision in a 1600 radius to your team. Lasts 6 minutes. |
| **sen_placed** | Number of sentry-wards placed by a player (int64) [Sentry Ward](https://dota2.gamepedia.com/Sentry_Ward), an invisible watcher that grants True Sight, the ability to see invisible enemy units and wards, to any existing allied vision within a radius. Lasts 6 minutes.|

**Note**: I am not sure about the meaning of some features: `stuns`, `firstblood_claimed` and `teamfight_participation`. Also, the number of towers killed by a team in a few cases is 12, whereas according to wiki the number of towers of each team is 11.
Please correct me if I am wrong in the comments and help to clarrify the meaning of these features.

## Preprocess features
Clearly the `hero_id` is a categorical feature, so let's one-hot encode it. Note that according to wiki there are 117 heroes, however in our dataset there are 116 heroes with ids `1, 2, ..., 114, 119, 120`.

In [None]:
# You will get the same result for all teams and players, here I use r1.
np.sort(np.unique(df_full_features['r1_hero_id'].values.flatten()))

In [None]:
for t in ['r', 'd']:
    for i in range(1, 6):
        df_full_features = pd.get_dummies(df_full_features, columns = [f'{t}{i}_hero_id'])
#         df_full_features = pd.concat([df_full_features,
#                                       pd.get_dummies(df_full_features[f'{t}{i}_hero_id'], prefix=f'{t}{i}_hero_id')], axis=1)

Finally let's scale the player-features that have relatively large values, such as `gold`, `lh`, `xp` etc.

In [None]:
player_features = set(f[3:] for f in df_train_features.columns[5:])

In [None]:
features_to_scale = []
for t in ['r', 'd']:
    for i in range(1, 6):
        for f in player_features - {'hero_id', 'firstblood_claimed', 'teamfight_participation'}:
            features_to_scale.append(f'{t}{i}_{f}')

In [None]:
df_full_features_scaled = df_full_features.copy()
df_full_features_scaled[features_to_scale] = MinMaxScaler().fit_transform(df_full_features_scaled[features_to_scale])  # alternatively use StandardScaler

In [None]:
df_full_features_scaled.head()

In [None]:
df_full_features_scaled.max().sort_values(ascending=False).head(12)

Let's construct X and y arrays.

In [None]:
X_train = df_full_features_scaled[:idx_split]
X_test = df_full_features_scaled[idx_split:]

y_train = df_train_targets['radiant_win'].map({True: 1, False: 0})

In [None]:
X_train.head()

# Multilayer Perceptron

<img src='https://www.vaetas.cz/img/machine-learning/multilayer-perceptron.png' >

Let's build a feedforward neural network – a multilayer perceptron (MLP for short).  Pytorch provides a convinient and easy way to do it with the help of the [nn.Sequential](https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential) class, which is a sequential container of different modules (nn.Module’s) – building blocks of a neural network. Below is a simple MLP with one input layer (4 nodes), two hidden layers (4 nodes each) with ReLU activation, and an output layer with Sigmoid activation (1 node, for a binary classification problem).

In [None]:
mlp = nn.Sequential(nn.Linear(6, 4),
                    nn.ReLU(),
                    nn.Linear(4, 4),
                    nn.ReLU(),
                    nn.Linear(4, 1),
                    nn.Sigmoid()
                   )

Instead of using `nn.Sequential`, let's define our own MLP class which will allow us to build a MLP just by passing any number of hidden layers and nodes.

In [None]:
class MLP(nn.Module):
    ''' Multi-layer perceptron with ReLu and Softmax.
    
    Parameters:
    -----------
        n_input (int): number of nodes in the input layer 
        n_hidden (int list): list of number of nodes n_hidden[i] in the i-th hidden layer 
        n_output (int):  number of nodes in the output layer 
        drop_p (float): drop-out probability [0, 1]
        random_state (int): seed for random number generator (use for reproducibility of result)
    '''
    def __init__(self, n_input, n_hidden, n_output, drop_p, random_state=SEED):
        super().__init__()   
        self.random_state = random_state
        set_random_seed(SEED)
        self.hidden_layers = nn.ModuleList([nn.Linear(n_input, n_hidden[0])])
        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in zip(n_hidden[:-1], n_hidden[1:])])
        self.output_layer = nn.Linear(n_hidden[-1], n_output)       
        self.dropout = nn.Dropout(p=drop_p)  # method to prevent overfitting
                
    def forward(self, X):
        ''' Forward propagation -- computes output from input X.
        '''
        for h in self.hidden_layers:
            X = F.relu(h(X))
            X = self.dropout(X)
        X = self.output_layer(X)
        return torch.sigmoid(X)
    
    def predict_proba(self, X_test):
        return self.forward(X_test).detach().squeeze(1).numpy()
    
    

def set_random_seed(rand_seed=SEED):
    ''' Helper function for setting random seed. Use for reproducibility of results'''
    if type(rand_seed) == int:
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
        random.seed(rand_seed)
        np.random.seed(rand_seed)
        torch.manual_seed(rand_seed)
        torch.cuda.manual_seed(rand_seed)
        

Let's create a function for training our MLP and a function for plotting training and validation losses with respect to the epochs.

In [None]:
def train(model, epochs, criterion, optimizer, scheduler, dataloaders, verbose=False):
    ''' 
    Train the given model...
    
    Parameters:
    -----------
        model: model (MLP) to train
        epochs (int): number of epochs
        criterion: loss function e.g. BCELoss
        optimizer: optimizer e.g SGD or Adam 
        scheduler: learning rate scheduler e.g. StepLR
        dataloaders: train and validation dataloaders
        verbose (boolean): print training details (elapsed time and losses)

    '''
    t0_tot = time.time()
    
    set_random_seed(model.random_state)
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f'Training on {device}...')
    model.to(device)
    
    # Best model weights (deepcopy them because model.state_dict() changes during the training)
    best_model_wts = copy.deepcopy(model.state_dict()) 
    best_loss = np.inf
    losses = {'train': [], 'valid': []}
    
    for epoch in range(epochs): 
        t0 = time.time()
        print(f'============== Epoch {epoch + 1}/{epochs} ==============')
        # Each epoch has a training and validation phase
        for phase in ['train', 'valid']:
            if phase == 'train':
                scheduler.step()
                if verbose: print(f'lr: {scheduler.get_lr()}')
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0 
            for ii, (X_batch, y_batch) in enumerate(dataloaders[phase], start=1):                               
                # Move input and label tensors to the GPU
                X_batch, y_batch = X_batch.to(device), y_batch.to(device)

                # Reset the gradients because they are accumulated
                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(X_batch).squeeze(1)  # forward prop
                    loss = criterion(outputs, y_batch)  # compute loss
                    if phase == 'train':
                        loss.backward()  # backward prop
                        optimizer.step()  # update the parameters
                        
                running_loss += loss.item() * X_batch.shape[0]
                
            ep_loss = running_loss/len(dataloaders[phase].dataset)  # average loss over an epoch
            losses[phase].append(ep_loss)
            if verbose: print(f' ({phase}) Loss: {ep_loss:.5f}')
                        
            # Best model by lowest validation loss
            if phase == 'valid' and ep_loss < best_loss:
                best_loss = ep_loss
                best_model_wts = copy.deepcopy(model.state_dict())          
        if verbose: print(f'\nElapsed time: {round(time.time() - t0, 3)} sec\n')
        
    print(f'\nTraining completed in {round(time.time() - t0_tot, 3)} sec')
    
    # Load the best model weights to the trained model
    model.load_state_dict(best_model_wts)
    model.losses = losses   
    model.to('cpu')
    model.eval()
    return model


def plot_losses(train_losses, val_losses):
    y = [train_losses, val_losses]
    c = ['C7', 'C9']
    labels = ['Train loss', 'Validation loss']
    # Plot train_losses and val_losses wrt epochs
    fig, ax = plt.subplots(1, 1, figsize=(10, 5))
    x = list(range(1, len(train_losses)+1))
    for i in range(2):
        ax.plot(x, y[i], lw=3, label=labels[i], color=c[i])
        ax.set_xlabel('Epoch', fontsize=16)
        ax.set_ylabel('Loss', fontsize=16)
        ax.set_xticks(range(0, x[-1]+1, 2))  
        ax.legend(loc='best')
    plt.tight_layout()
    plt.show()

## Data loaders

PyTorch provides [tools](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) for loading data to a model in parallel using multiprocessing workers. It also allows batching and shuffling the data. So let's create dataloaders for our training and validation datasets.

In [None]:
# Perform a train/validation split
X_train_part, X_valid, y_train_part, y_valid = train_test_split(X_train, y_train, test_size=0.2, random_state=SEED)

# Convert to pytorch tensors
X_train_tensor = torch.from_numpy(X_train_part.values).float()
X_valid_tensor = torch.from_numpy(X_valid.values).float()
y_train_tensor = torch.from_numpy(y_train_part.values).float()
y_valid_tensor = torch.from_numpy(y_valid.values).float()
X_test_tensor = torch.from_numpy(X_test.values).float()

# Create the train and validation dataloaders
train_dataset = data.TensorDataset(X_train_tensor, y_train_tensor)
valid_dataset = data.TensorDataset(X_valid_tensor, y_valid_tensor)

dataloaders = {'train': data.DataLoader(train_dataset, batch_size=1000, shuffle=True, num_workers=2), 
               'valid': data.DataLoader(valid_dataset, batch_size=1000, shuffle=False, num_workers=2)}

## Training and making predictions
Let's try a MLP with 2 hidden layers.

In [None]:
mlp = MLP(n_input=X_train.shape[1], n_hidden=[200, 100], n_output=1, drop_p=0.4)

criterion = nn.BCELoss()  # Binary cross entropy
optimizer = optim.Adam(mlp.parameters(), lr=0.01, weight_decay=0.005)  # alternatevily torch.optim.SGD(mlp.parameters(), lr)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

epochs = 12
train(mlp, epochs, criterion, optimizer, scheduler, dataloaders, verbose=True)

Plot the train and valid losses wrt epochs to see if both are decreasing. Note that when the training loss is decreasing while the valid loss is increasing it's a sign of overfiting, so perhaps try to tune regularization hyperparameters such as dropout probability and the optimizer's weight_decay.

In [None]:
plot_losses(mlp.losses['train'], mlp.losses['valid'])

In [None]:
roc_auc_score(y_valid.values, mlp.predict_proba(X_valid_tensor))

It's a pretty good result obtained after just ~10 seconds of training... Pytorch provides an easy way of [saving and loading](https://pytorch.org/tutorials/beginner/saving_loading_models.html) our trained model (it's state-dictionary with all the learned weights and biases), so that we don't have to train  it from the beginning each time we want to use it.



In [None]:
# Save
torch.save(mlp.state_dict(), 'mlp.pth')

# Load
mlp =  MLP(n_input=X_train.shape[1], n_hidden=[200, 100], n_output=1, drop_p=0.4)
mlp.load_state_dict(torch.load('mlp.pth'))
mlp.eval()

Finall, let's make predictions on the test dataset and write to submission file.[](http://)

In [None]:
mlp_pred = mlp.predict_proba(X_test_tensor)

write_to_submission_file(mlp_pred)

# Things to do
- Feature engineering. Create new features from the given ones (e.g. `radiant_total_gold - dire_total_gold` etc.), exctract features from the provided json files, perhaps remove some features.
- Hyperparameter tunning. Try to find optimal hyperparameters such as number of hidden layers and nodes, learning rate, number of epochs, optimizer and scheduler parameters etc.