<a href="https://colab.research.google.com/github/jp0502/dota2matchpredictor/blob/main/Dota_2_Match_Predictor_using_Pytorch_and_OpenDota_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dota 2 Match Predictor using Pytorch and OpenDota API




To train our neural network model, we need training data. OpenDota is a 3rd party that hosts the API for public Dota 2 matches. We perform ETL by using an api key from OpenDota to extract the match data.

In [12]:
import requests
import pandas as pd
import time

In [3]:
api_key = "xxxxxx"

We want our model to use the types of heroes from a pool of 128 heroes to predict whether a team won or lost. So we create a pandas dataframe.

In [14]:
df = pd.DataFrame(columns=["Match_ID",
                           "Player 0 Hero ID","Player 1 Hero ID","Player 2 Hero ID","Player 3 Hero ID","Player 4 Hero ID",
                           "Player 5 Hero ID","Player 6 Hero ID","Player 7 Hero ID","Player 8 Hero ID","Player 9 Hero ID",
                           "Radiant_Win"])
df

Unnamed: 0,Match_ID,Player 0 Hero ID,Player 1 Hero ID,Player 2 Hero ID,Player 3 Hero ID,Player 4 Hero ID,Player 5 Hero ID,Player 6 Hero ID,Player 7 Hero ID,Player 8 Hero ID,Player 9 Hero ID,Radiant_Win


We want 10,000 matches to use as our training set, so we pick some random existing match ID like "7189298195", and start decrementing the match ID by 1 to get the match data.

But the problem is that not all match ids less than 7189298195 account for an existing match, such that the Dota 2 server decided not to assign any matches to that match id for some reason.

So, we have to declare a variable "count" as 0, and while this count is less than or equal to 10,000, we instruct a while loop to try each match id, and if it exists, fill in the dataframe "df" with the hero_id on each player.

If the match id doesn't exist, it will return a KeyError, then it prints that the match doesnt exist, and that the loop will continue by decrementing the match id by 1.

Sometimes, the matches are played with less than 10 people. This is a situation where a player on either team may have disconnected or abandoned. This returns an index error, and the while loop ignores this error to continue.

The OpenDota API, when called, returns a set of dictionaries that are assigned a match id to each call in a JSON format. What we need to do here is to extract the information from each API call and save it within our dataframe.


In [None]:
match_id = "7189298195"
count = 0;
while count <= 10000:
    try:
      match_id = str(int(match_id) -1);
      url = 'https://api.opendota.com/api/matches/' + match_id + '?api_key=xxxxxxx'
      response = requests.get(url).json()

      #Match ID
      match_id = response["match_id"]

      #Radiant Players
      player0_hero = response["players"][0]["hero_id"]
      player1_hero = response["players"][1]["hero_id"]
      player2_hero = response["players"][2]["hero_id"]
      player3_hero = response["players"][3]["hero_id"]
      player4_hero = response["players"][4]["hero_id"]

      #Dire players
      player5_hero = response["players"][5]["hero_id"]
      player6_hero = response["players"][6]["hero_id"]
      player7_hero = response["players"][7]["hero_id"]
      player8_hero = response["players"][8]["hero_id"]
      player9_hero = response["players"][9]["hero_id"]

          #Radiant win or Dire win?
      Radiant_win = response["radiant_win"]

          #Append everything to the dataframe
      df = df.append({'Match_ID': match_id,
                          'Player 0 Hero ID':player0_hero,
                          'Player 1 Hero ID':player1_hero,
                          'Player 2 Hero ID':player2_hero,
                          'Player 3 Hero ID':player3_hero,
                          'Player 4 Hero ID':player4_hero,
                          'Player 5 Hero ID':player5_hero,
                          'Player 6 Hero ID':player6_hero,
                          'Player 7 Hero ID':player7_hero,
                          'Player 8 Hero ID':player8_hero,
                          'Player 9 Hero ID':player9_hero,
                          'Radiant_Win':Radiant_win
                          },ignore_index=True)
      count = count + 1;
      print("Running... progress: " + str((count/100000)*100) + "% done:" )
    except KeyError:
      print("Match " + str(match_id) + " does not exist; skipping to next")
      continue;
    except IndexError:
      continue;
    if count > 10000:
        print("Done!")
        break;



After a long period of time calling the OpenDota API, we finally get the result of a dataframe filled in with each player's hero ID on each match ID, and whether that game was won by the Radiant or the Dire team (Radiant_Win = True meanas Radiant won, false means Dire won).

Now let's implement the neural network with tensorflow.

In [None]:
import tensorflow as tf
#iloc: Purely integer-location based indexing for selection by position.


x_train = df.iloc[:8000,:11]
x_test = df.iloc[8000:10000,:11]
y_train = df.iloc[:8000,11:12]
y_test = df.iloc[8000:10001,11:12]


x_train_np = np.array(x_train)
y_train_np = np.array(y_train)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(2, activation=tf.nn.softmax))

model.compile(optimizer = 'adam', loss = 'binary_crossentropy',metrics = ['accuracy'])

model.fit(x_train_np,y_train_np, epochs = 3)

In [27]:
import tensorflow as tf

In [17]:
import numpy as np
x_train_np = np.array(x_train)
y_train_np = np.array(y_train)


Here we re-upload the data file to Colab.

In [21]:
from google.colab import files
uploaded = files.upload()

Saving dota_name.csv to dota_name.csv


In [22]:
import io
df2 = pd.read_csv(io.BytesIO(uploaded['dota_name.csv']))
# Dataset is now stored in a Pandas Dataframe

In [28]:
df2

Unnamed: 0.1,Unnamed: 0,Match_ID,Player 0 Hero ID,Player 1 Hero ID,Player 2 Hero ID,Player 3 Hero ID,Player 4 Hero ID,Player 5 Hero ID,Player 6 Hero ID,Player 7 Hero ID,Player 8 Hero ID,Player 9 Hero ID,Radiant_Win
0,0,7189298193,74,138,97,71,121,75,93,14,104,27,True
1,1,7189298192,109,34,4,104,72,93,123,42,22,45,False
2,2,7189298191,5,85,97,28,44,29,56,30,35,13,True
3,3,7189298190,46,9,16,75,93,105,113,98,45,70,True
4,4,7189298189,21,54,14,26,36,19,106,60,97,85,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9996,9996,7189282285,75,80,63,42,114,113,137,90,59,67,False
9997,9997,7189282273,32,74,73,21,2,119,9,8,101,45,True
9998,9998,7189282272,101,73,57,37,99,86,106,102,105,8,False
9999,9999,7189282271,27,35,120,137,6,94,26,68,102,44,True


Let's drop the first column "Unnamed".

In [30]:
df2.drop(columns=df2.columns[0], axis=1, inplace=True)


In [31]:
df2

Unnamed: 0,Match_ID,Player 0 Hero ID,Player 1 Hero ID,Player 2 Hero ID,Player 3 Hero ID,Player 4 Hero ID,Player 5 Hero ID,Player 6 Hero ID,Player 7 Hero ID,Player 8 Hero ID,Player 9 Hero ID,Radiant_Win
0,7189298193,74,138,97,71,121,75,93,14,104,27,True
1,7189298192,109,34,4,104,72,93,123,42,22,45,False
2,7189298191,5,85,97,28,44,29,56,30,35,13,True
3,7189298190,46,9,16,75,93,105,113,98,45,70,True
4,7189298189,21,54,14,26,36,19,106,60,97,85,False
...,...,...,...,...,...,...,...,...,...,...,...,...
9996,7189282285,75,80,63,42,114,113,137,90,59,67,False
9997,7189282273,32,74,73,21,2,119,9,8,101,45,True
9998,7189282272,101,73,57,37,99,86,106,102,105,8,False
9999,7189282271,27,35,120,137,6,94,26,68,102,44,True


Now, we split the data set into the first 8000 and the last 2000 data sets.

We implement two hidden layers with ReLU (Rectified Linear Unit) on each layer, with a softmax function at the 2nd last layer.

In our model we use the Softmax activation function, because we want to classify each match as either a win or loss for Radiant team, and output their likelihood of winning as a percentage.

Softmax function works like:

$$ Softmax(Output Values) = \frac{e^{y_i}}{\sum_{i = 1. i \neq j}^{n}}  $$


In [32]:
import tensorflow as tf
#iloc: Purely integer-location based indexing for selection by position.


x_train = df2.iloc[:8000,:11]
x_test = df2.iloc[8000:10000,:11]
y_train = df2.iloc[:8000,11:12]
y_test = df2.iloc[8000:10001,11:12]


x_train_np = np.array(x_train)
y_train_np = np.array(y_train)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(2, activation=tf.nn.softmax))

model.compile(optimizer = 'adam', loss = 'binary_crossentropy',metrics = ['accuracy'])

model.fit(x_train_np,y_train_np, epochs = 3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f3db8161090>

So our accuracry is only 48%, which is not a great result when you're in a game of Dota 2, and you want to pick the best heroes possible to maximize your likelihood of winning.

This result tells us that the types of heroes that we pick aren't actually that important, but perhaps the individual's ability to work together as a team is more important to a game's outcome.
