<a href="https://colab.research.google.com/github/mrpintime/Dota2_TI-2016/blob/main/Dota2Game.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dota2 TI 2016 Team Performance
Created by Mrpintime


# Dataset Description

Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. The dataset is reasonably sparse as only 10 of 113 possible heroes are chosen in a given game. All games were played in a space of 2 hours on the 13th of August, 2016

The data was collected using: https://gist.github.com/da-steve101/1a7ae319448db431715bd75391a66e1b


Each row of the dataset is a single game with the following features (in the order in the vector):
1. Team won the game (1 or -1)
2. Cluster ID (related to location)
3. Game mode (eg All Pick)
4. Game type (eg. Ranked)
5 - end: Each element is an indicator for a hero. Value of 1 indicates that a player from team '1' played as that hero and '-1' for the other team. Hero can be selected by only one player each game. This means that each row has five '1' and five '-1' values.

The hero to id mapping can be found here: https://github.com/kronusme/dota2-api/blob/master/data/heroes.json
Dataset Url: https://archive.ics.uci.edu/dataset/367/dota2+games+results

# Problem
We want to know can we predict winner of the games with these features and before end the game ?:)
and find out which hero has more impact on games ?
and find out which team did better but could not win the game ?

# Data Wrangling

Import necessary Libraries

In [1]:
import pandas as pd, seaborn as sns, numpy as np, matplotlib.pyplot as plt

**Note**: you can use high level libraries like Dataprep to do data cleaning and exploratory data analysis but i suggest you if you want to learn and be dominant on your data science knowledge use fundamental libraries like pandas, numpy and ...

Download dataset form UCI site

In [2]:
# ! curl https://archive.ics.uci.edu/static/public/367/dota2+games+results.zip --output dota2.zip

Save files on google drive to reuse them.

In [3]:
# !cp "/content/dota2.zip" "/content/drive/MyDrive/Dota2/"

In [4]:
# !unzip "/content/drive/MyDrive/Dota2/dota2.zip" -d "/content/drive/MyDrive/Dota2/"

So we saved our datasets to our drive google drive, as you can see we have to seperate dataset as test and train, we have to check them for data preprocessing.

## Data Cleaning

We need to import datasets and clean them.

In [39]:
df_trn = pd.read_csv('/content/drive/MyDrive/Dota2/dota2Train.csv')
df_tst = pd.read_csv('/content/drive/MyDrive/Dota2/dota2Test.csv')

In [40]:
df_trn.shape, df_tst.shape

((92649, 117), (10293, 117))

In [41]:
df_trn.isnull().sum().sum(), df_tst.isnull().sum().sum()

(0, 0)

Hoorayyyy, we do not have any null value in train and test set.

In [42]:
df_trn.head()

Unnamed: 0,-1,223,2,2.1,0,0.1,0.2,0.3,0.4,0.5,...,0.93,0.94,0.95,0.96,0.97,0.98,0.99,0.100,0.101,0.102
0,1,152,2,2,0,0,0,1,0,-1,...,0,0,0,0,0,0,0,0,0,0
1,1,131,2,2,0,0,0,1,0,-1,...,0,0,0,0,0,0,0,0,0,0
2,1,154,2,2,0,0,0,0,0,0,...,-1,0,0,0,0,0,0,0,0,0
3,-1,171,2,3,0,0,0,0,0,-1,...,0,0,0,0,0,0,0,0,0,0
4,1,122,2,3,0,1,0,0,0,0,...,1,0,0,0,0,0,0,0,0,-1


> Column '-1' is for status of game or which team is winner at the end of the game.
---
> Column '223' is related to server location that two team play in it. [Link]("https://github.com/kronusme/dota2-api/blob/master/data/regions.json")
---
> Column '2' is for game mode.   [Link]("https://github.com/kronusme/dota2-api/blob/master/data/mods.json")
---
> Column '2.1' is for game type. [Link]("https://github.com/kronusme/dota2-api/blob/master/data/lobbies.json")
---
> All other columns indicate a heros in the game. [Link]("https://github.com/kronusme/dota2-api/blob/master/data/heroes.json")

In [43]:
df_trn.iloc[0, 4:].value_counts()

 0    103
 1      5
-1      5
Name: 0, dtype: int64

We can see as was in description each row contains 5 heros to one team and 5 to another team that indicate with 1 and -1 and all other heros value set to 0.  
in Overall we have 113 heros in game.  
You can see all heros here : [Link]("https://github.com/kronusme/dota2-api/blob/master/data/heroes.json")   

In [45]:
# ['Won', 'Server', 'Game_mode', 'Game_type']+ hero_list

In [46]:
hero_list = [f'Hero_{i+1}' for i in range(113)]
new_columns_name = ['Won', 'Server', 'Game_mode', 'Game_type'] + hero_list

In [47]:
df_trn.columns = new_columns_name

In [48]:
df_trn.head()

Unnamed: 0,Won,Server,Game_mode,Game_type,Hero_1,Hero_2,Hero_3,Hero_4,Hero_5,Hero_6,...,Hero_104,Hero_105,Hero_106,Hero_107,Hero_108,Hero_109,Hero_110,Hero_111,Hero_112,Hero_113
0,1,152,2,2,0,0,0,1,0,-1,...,0,0,0,0,0,0,0,0,0,0
1,1,131,2,2,0,0,0,1,0,-1,...,0,0,0,0,0,0,0,0,0,0
2,1,154,2,2,0,0,0,0,0,0,...,-1,0,0,0,0,0,0,0,0,0
3,-1,171,2,3,0,0,0,0,0,-1,...,0,0,0,0,0,0,0,0,0,0
4,1,122,2,3,0,1,0,0,0,0,...,1,0,0,0,0,0,0,0,0,-1


**So we prepared our Dataset**