---
# Transformador 3: heróis como features de entrada

**Objetivo do experimento:** Tratar cada coluna de entrada como um herói, sendo preenchido por:
- **1**: caso o herói esteja no time radiant;
- **-1**: caso o herói esteja no time dire;
- **0**: caso contrário.

Abaixo a *Ordem de procedimentos:*

1. Carregar o arquivo ``hero_stats`` para contabilizar o número de heróis;
2. Carregar os arquivos de entrada e transformar em um único dataframe;
3. Remover linhas nulas e duplicas;
4. Criar dataframe de features;
5. Adicionar colunas extras de informações dos dados.

**Import Libraries**

In [1]:
data_path = '../data/raw_data'

import pandas as pd
import numpy as np
from datetime import datetime
import os

**Importing Hero stats file**

In [2]:
hero_stats_raw = pd.read_json(data_path+'/hero_stats.json')

**Searching for each .csv file in the 'raw_data' folder**

In [3]:
data_path = '../data/raw_data'
lst_df = []
for root, dirs, files in os.walk(data_path):
    for filename in files:
        xlsx_file, file_extension = os.path.splitext(filename)
        print('.csv file found:\n')
        if(file_extension == '.csv'):
            print(filename)
            file_path = root +'\\' + filename
            lst_df.append(pd.read_csv(file_path))  

.csv file found:

21-04-24 16h14m31s.csv
.csv file found:

.csv file found:

.csv file found:



**Drop NA**

In [4]:
concat_df = pd.concat(lst_df)

print('Dataframe shape:', concat_df.shape)
print('Total nan: \n\n', concat_df.isna().sum())

concat_df.dropna(inplace=True)
print('\nDataframe shape:', concat_df.shape)

Dataframe shape: (46000, 9)
Total nan: 

 Unnamed: 0      0
match_id        0
radiant_win     0
avg_mmr         0
duration        0
lobby_type      0
game_mode       0
radiant_team    0
dire_team       0
dtype: int64

Dataframe shape: (46000, 9)


**Remove duplicated rows**

In [5]:
concat_df.drop_duplicates(subset=['match_id'])
print('\nDataframe shape:', concat_df.shape)


Dataframe shape: (46000, 9)


## Creating feature dataframe

1. Create dataframe;
2. Populate with 1 (radiant) or -1 (dire) for each row;
3. Create columns from the original data into the final dataframe.

In [6]:
# Creating dataframe
number_heroes = hero_stats_raw['id'].nunique()
col_name = list(hero_stats_raw['id'].unique())
col_name.append('radiant_win')

rows = concat_df.shape[0]
cols = number_heroes+1
zero_matrix = np.zeros([rows, cols],dtype=int)

feature_df = pd.DataFrame(zero_matrix,
                          columns=col_name)
feature_df.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,114,119,120,121,123,126,128,129,135,radiant_win
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
for i in np.arange(feature_df.shape[0]):
    row = concat_df.iloc[i,:]
    radiant_team = row.radiant_team.split(",")
    dire_team = row.dire_team.split(",")
    
    for j, radiant_hero in enumerate(radiant_team):
        dire_hero = dire_team[j]
        feature_df.at[i, int(radiant_hero)] = 1
        feature_df.at[i, int(dire_hero)] = -1
        
    #radiant_row = concat_df.radiant_team.str.split(",",expand=True,)

In [8]:
feature_df.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,114,119,120,121,123,126,128,129,135,radiant_win
0,0,1,0,0,0,0,0,0,0,0,...,0,1,0,1,0,0,0,-1,0,0
1,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,-1,0,1,0,0,0
2,0,0,0,0,1,0,0,0,0,0,...,0,0,0,-1,0,-1,0,1,-1,0
3,0,0,0,0,0,-1,0,0,1,0,...,0,0,0,0,0,0,0,0,-1,0
4,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-1,0


In [9]:
columns_to_copy = ['radiant_win','avg_mmr','duration',
                  'lobby_type', 'game_mode']

for i, col in enumerate(columns_to_copy):
    feature_df[col] = concat_df[col]


**Converting string to numerical**

In [10]:
feature_df['radiant_win'] = feature_df['radiant_win'].astype(int)
feature_df = feature_df.apply(pd.to_numeric)

In [11]:
feature_df.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,123,126,128,129,135,radiant_win,avg_mmr,duration,lobby_type,game_mode
0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,-1,0,1,3439,1649,7,22
1,0,0,0,0,0,0,0,0,0,1,...,-1,0,1,0,0,1,3774,1848,0,22
2,0,0,0,0,1,0,0,0,0,0,...,0,-1,0,1,-1,1,3311,1951,7,22
3,0,0,0,0,0,-1,0,0,1,0,...,0,0,0,0,-1,1,3408,995,0,22
4,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,-1,0,4621,1818,7,22


**Saving data frame on 'working data' folder**

In [12]:
working_data_path = '../data/working_data/3_TRA_'
start_file = datetime.now().strftime("%Y-%m-%d")
output_file = working_data_path + start_file + '_working_data.csv'

feature_df.to_csv(output_file, index=False)