# Dota last picker

Dota is a game where 5 players compete against 5 players in a MOBA style area with 112 different heroes to choose from. In this game, players in each team take turns to pick their heroes, and certain combination of heroes may be stronger than others. Our program will help players decide their last pick based on 9 picked heroes, and hopefully increase their edge by a little bit to take home the victory.

# Members
    1. Possawat Sanorkam 6081035
    2. Phairat Lin 6080678

## What were we trying to solve?

We are trying to use naive bayes to analyze over 10 thousand matches and predict possible last picks for our players. We will first use naive bayes to create a predictor model, then use that model to find last picks that would make our player's team result in a win. The predictor may find several possible last picks and that will be on the player to choose them.

### The actual functions

Get all the possible last picks

### Why did we solve it this way?

First, we use naive bayes because we assume that each hero is independent from each other and that all heroes are equally strong. <br/>
Second, we use gaussian naive bayes because our data contains 0, 1, and -1. 

### Where did we get this data?

We got this data from github created by a guy called Andrew DalPino who's the creator of RubixML. <br/>
The github link of this dataset is included. <br/>
Credit: https://github.com/RubixML/Dota2#original-dataset

In [224]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB

In [225]:
df_train = pd.read_csv('data/train.csv')
df_test = pd.read_csv('data/test.csv')
df_hero = pd.read_json('data/heros.json', orient='records')

In [226]:
df_hero = df_hero['heroes']

In [227]:
df_hero

0      {'name': 'antimage', 'id': 1, 'localized_name'...
1      {'name': 'axe', 'id': 2, 'localized_name': 'Axe'}
2      {'name': 'bane', 'id': 3, 'localized_name': 'B...
3      {'name': 'bloodseeker', 'id': 4, 'localized_na...
4      {'name': 'crystal_maiden', 'id': 5, 'localized...
                             ...                        
107    {'name': 'phoenix', 'id': 110, 'localized_name...
108    {'name': 'techies', 'id': 105, 'localized_name...
109    {'name': 'oracle', 'id': 111, 'localized_name'...
110    {'name': 'winter_wyvern', 'id': 112, 'localize...
111    {'name': 'arc_warden', 'id': 113, 'localized_n...
Name: heroes, Length: 112, dtype: object

In [228]:
df_test

Unnamed: 0,cluster_id,game_mode,game_type,hero_1,hero_2,hero_3,hero_4,hero_5,hero_6,hero_7,...,hero_105,hero_106,hero_107,hero_108,hero_109,hero_110,hero_111,hero_112,hero_113,outcome
0,223,8,2,0,-1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_2
1,227,8,2,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1
2,136,2,2,1,0,0,0,-1,0,0,...,0,0,0,0,0,0,0,0,0,team_2
3,227,2,2,-1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1
4,184,2,3,0,0,0,-1,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10289,121,2,2,0,0,0,0,0,0,1,...,-1,0,0,0,0,0,0,0,0,team_1
10290,154,9,2,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1
10291,122,9,2,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,team_1
10292,152,2,3,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1


In [229]:
df_train

Unnamed: 0,cluster_id,game_mode,game_type,hero_1,hero_2,hero_3,hero_4,hero_5,hero_6,hero_7,...,hero_105,hero_106,hero_107,hero_108,hero_109,hero_110,hero_111,hero_112,hero_113,outcome
0,223,2,2,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_2
1,152,2,2,0,0,0,1,0,-1,0,...,0,0,0,0,0,0,0,0,0,team_1
2,131,2,2,0,0,0,1,0,-1,0,...,0,0,0,0,0,0,0,0,0,team_1
3,154,2,2,0,0,0,0,0,0,-1,...,0,0,0,0,0,0,0,0,0,team_1
4,171,2,3,0,0,0,0,0,-1,0,...,0,0,0,0,0,0,0,0,0,team_2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92645,154,2,3,1,0,0,-1,0,0,0,...,0,0,0,0,0,0,0,0,0,team_2
92646,154,2,2,0,0,0,0,-1,0,0,...,0,0,0,0,0,0,0,0,0,team_1
92647,111,2,3,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,team_1
92648,185,2,2,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,team_2


In [230]:
def dataFilter(data):
    
    data = data.drop(["cluster_id", "game_mode", "game_type"], axis=1)
    features = data.drop("outcome", axis=1)
    classes = data.outcome
    normalize = []
    for i in classes:
        if(i == "team_2"):
            normalize.append(1)
        else:
            normalize.append(-1)
    return np.array(features), normalize

In [231]:
# Data preparation

df_train_features, train_classes = dataFilter(df_train)
df_test_features, test_classes = dataFilter(df_test)

In [232]:
np.array(df_train_features)[1]

array([ 0,  0,  0,  1,  0, -1,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,
        0,  0,  0, -1,  0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  0,  0,  0,
       -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0, -1,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0])

In [233]:
train_classes[1]

-1

In [234]:
def buildClassifier(train_data):
    features, classes = dataFilter(train_data)
    # create naive bayes classifier
    gaunb = GaussianNB()
    gaunb = gaunb.fit(features, classes)
    return gaunb

In [235]:
predictor = buildClassifier(df_train)

In [236]:
#testing predictor functionality
predictor.predict(np.array(df_train_features))

array([-1, -1, -1, ..., -1,  1, -1])

In [237]:
#analysis of predictor with its own train data

score_train = predictor.score(df_train_features, train_classes)
print(score_train)

0.5729519697787372


In [238]:
#analysis of predictor with its test data

score_test = predictor.score(df_test_features, test_classes)
print(score_test)

0.5640178744899942


#### Accuracy score on this dataset

With our predictor model created from training the dataset, we achieve an accuracy score of 57.295%. <br/>
This may seem unimpressive at first, but keep in mind that dota is a complex game with 112 different heores and 155 purchasable items. Any slight advantage may result in a win. 
Using our model on the test dataset, we achieve an accuracy score of 

### The actual functions

Assume that a player used our service on Lastpick.xyz which is our website.
We require the user to put in the HERO NAMES in this order
Example : ( Anti-mage, Axe, Bane, Slark, Crystal Maiden, Orge Magi, Queen Of Pain, Beastmaster, Monkey king)

The first 5 heroes are what the enemies picked, and the 4 last heroes are current picked heroes on your team.

getMyLastpick(pick) will return all the possible picks that the user can consider picking according to what have been picked.

heroNameTranslate(hero) will return an id given the name (assume that the name is correct).


In [499]:
df_hero_idname = pd.json_normalize(df_hero)

In [500]:
def getHeroIds(draft):
    answer = []
    for i in draft:
        answer.append(int(df_hero_idname.loc[df_hero_idname['localized_name'] == i]['id']))
    return answer

In [501]:
def heroNameTranslate(draftIndexes):
    answer = []
    for i in draftIndexes:
        answer.append(np.array(df_hero_idname.loc[df_hero_idname['id'] == i]['localized_name'])[0])
    return answer

In [533]:
def findLastPick(picks, clf):
    """
    return last pick given (1,2,3,4,5,6,7,8,9) [1,2,3,4,5] [6,7,8,9,?] <-- your team
    for i in range(1,114) try everything but 
    1. cannot pick currently selected heroes
    """
    dummy = np.zeros(113)
    for i,hero in enumerate(picks):
        if(i<5):
            dummy[hero-1] = 1
        else:
            dummy[hero-1] = -1
    print(dummy)
    possiblePicks = []
    for i in range(0,113):
        trial = dummy.copy()
        if(i+1 not in picks and i+1 != 24):
            trial[i] = -1
            result = clf.predict([trial])
#             print(result, trial)
            if(result == np.array(-1)):
                possiblePicks.append(i+1)
    return possiblePicks

## Tests

In [519]:
test_case1 = ["Axe","Anti-Mage","Bane","Crystal Maiden","Queen of Pain"
              ,"Lone Druid", "Tinker","Abyssal Underlord","Phoenix"]
test_case2 = ["Axe","Juggernaut","Bane","Crystal Maiden","Queen of Pain"
              ,"Spectre", "Jakiro","Abyssal Underlord","Phoenix"]
test_case3 = ["Axe","Juggernaut","Bane","Crystal Maiden","Queen of Pain"
              ,"Spirit Breaker", "Storm Spirit","Earth Spirit","Ember Spirit"]

test_case4 = ["Earthshaker","Pudge","Lifestealer","Enigma","Viper"
              ,"Necrophos", "Tiny","Spectre","Slardar"]

test_case5 = ["Shadow Fiend","Jakiro","Dark Seer","Sven","Pugna"
              ,"Mirana", "Ember Spirit","Keeper of the Light","Arc Warden"]

In [502]:
heroNameTranslate(getHeroIds(test_case1))

['Axe',
 'Anti-Mage',
 'Bane',
 'Crystal Maiden',
 'Queen of Pain',
 'Lone Druid',
 'Tinker',
 'Abyssal Underlord',
 'Phoenix']

In [534]:
ids = getHeroIds(test_case1)
print(ids)
print(heroNameTranslate(ids))
predictions = findLastPick(ids,predictor)
print(heroNameTranslate(predictions))

[2, 1, 3, 5, 39, 80, 34, 108, 110]
['Axe', 'Anti-Mage', 'Bane', 'Crystal Maiden', 'Queen of Pain', 'Lone Druid', 'Tinker', 'Abyssal Underlord', 'Phoenix']
[ 1.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  0.
  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0. -1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.
  0. -1.  0.  0.  0.]
['Leshrac', 'Dark Seer', 'Enchantress', 'Chen', 'Gyrocopter', 'Wisp', 'Earth Spirit', 'Oracle']


In [535]:
ids = getHeroIds(test_case2)
print(ids)
print(heroNameTranslate(ids))
predictions = findLastPick(ids,predictor)
print(heroNameTranslate(predictions))

[2, 8, 3, 5, 39, 67, 64, 108, 110]
['Axe', 'Juggernaut', 'Bane', 'Crystal Maiden', 'Queen of Pain', 'Spectre', 'Jakiro', 'Abyssal Underlord', 'Phoenix']
[ 0.  1.  1.  0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  0. -1.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.
  0. -1.  0.  0.  0.]
['Leshrac', 'Dark Seer', 'Enchantress', 'Chen', 'Gyrocopter', 'Wisp', 'Earth Spirit', 'Oracle']


In [536]:
ids = getHeroIds(test_case3)
print(ids)
print(heroNameTranslate(ids))
predictions = findLastPick(ids,predictor)
print(heroNameTranslate(predictions))

[2, 8, 3, 5, 39, 71, 17, 107, 106]
['Axe', 'Juggernaut', 'Bane', 'Crystal Maiden', 'Queen of Pain', 'Spirit Breaker', 'Storm Spirit', 'Earth Spirit', 'Ember Spirit']
[ 0.  1.  1.  0.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1. -1.  0.
  0.  0.  0.  0.  0.]
['Anti-Mage', 'Bloodseeker', 'Drow Ranger', 'Earthshaker', 'Mirana', 'Morphling', 'Shadow Fiend', 'Phantom Lancer', 'Puck', 'Pudge', 'Razor', 'Sand King', 'Sven', 'Tiny', 'Vengeful Spirit', 'Windranger', 'Zeus', 'Kunkka', 'Lina', 'Lion', 'Shadow Shaman', 'Slardar', 'Tidehunter', 'Witch Doctor', 'Lich', 'Riki', 'Enigma', 'Tinker', 'Sniper', 'Necrophos', 'Warlock', 'Beastma

In [537]:
ids = getHeroIds(test_case4)
print(ids)
print(heroNameTranslate(ids))
predictions = findLastPick(ids,predictor)
print(heroNameTranslate(predictions))

[7, 14, 54, 33, 47, 36, 19, 67, 28]
['Earthshaker', 'Pudge', 'Lifestealer', 'Enigma', 'Viper', 'Necrophos', 'Tiny', 'Spectre', 'Slardar']
[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
 -1.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  0.  0.  0.  1.  0.  0. -1.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.]
['Leshrac', 'Dark Seer', 'Enchantress', 'Chen', 'Gyrocopter', 'Wisp', 'Earth Spirit', 'Oracle']


In [538]:
ids = getHeroIds(test_case5)
print(ids)
print(heroNameTranslate(ids))
predictions = findLastPick(ids,predictor)
print(heroNameTranslate(predictions))

[11, 64, 55, 18, 45, 9, 106, 90, 113]
['Shadow Fiend', 'Jakiro', 'Dark Seer', 'Sven', 'Pugna', 'Mirana', 'Ember Spirit', 'Keeper of the Light', 'Arc Warden']
[ 0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  1.  0.  0.  0.  0.  0.  0.  1.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  1.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  0.  0.
  0.  0.  0.  0. -1.]
['Anti-Mage', 'Axe', 'Bane', 'Bloodseeker', 'Crystal Maiden', 'Drow Ranger', 'Earthshaker', 'Juggernaut', 'Morphling', 'Phantom Lancer', 'Puck', 'Pudge', 'Razor', 'Sand King', 'Storm Spirit', 'Tiny', 'Vengeful Spirit', 'Windranger', 'Zeus', 'Kunkka', 'Lina', 'Lion', 'Shadow Shaman', 'Slardar', 'Tidehunter', 'Witch Doctor', 'Lich', 'Riki', 'Enigma', 'Tinker', 'Sniper', 'Necrophos'

## Explaination

So, using Naives Bayes Classifier gives you just a suggestion, but no a sure win. Since this game has so many possibilities, we think that this amount of data is not enough.

Here are some possible improvement
1. Filtering only very high skilled games
    Since there are fewer high skilled games than normal skilled. In general, we have smurfs, trolls and game ruiners. It is difficult to gauge the real potential of winning the game using last pick only. However, people take things more seriously in high skilled bracket. Unfortunately, those data are not available for free. So, this is the best dataset we could use.
    
2. In-game factors
    Items, luck (critical chance, rune spawns, etc.) and skills affect the winning probabilities. So, the last picking will not indicate much potential of winning that game.
    
    


# Conclusion

We used Naives Bayes Classifier to find some heroes to pick at the last moment and expect higher winning chance. The result turns out that it is not that bad, but better not rely on picks alone. There are many factors that decide game winning moments.