### 종족전별 데이터 분류

- 데이터를 종족전별로 분류하는 코드입니다.
- 모든 종족전을 한번에 처리하는 것보다 분할하여 분석시 크기도 줄어들고 유리한 점이 많이 있을 것으로 보입니다.
- ※ 6번째 셀 주석이 틀렸네요. TvsZ가 더 많습니다. 

| <h4> 2020.03.06 13:31 </h4> | <h5> - 종족전별 데이터 분류  [[ 데이콘 링크 ]](https://dacon.io/competitions/official/235583/codeshare/712/) </h5>|
|:-------|:---------:
|view    | 230
|language| Python
|by Eunil| 댓글 0


In [1]:
import numpy as np
import pandas as pd

In [2]:
import warnings                             
warnings.filterwarnings("ignore")

In [3]:
"""
# 기본 DIR 구조를 입력한다. - _assets 폴더 제외!
# /content/drive/My Drive/Colab Notebooks/
# dir_base = '/content/drive/My Drive/Colab Notebooks/competition/''
"""
dir_base = '/home/yk/0325_Starcraft/competition/'

raw      = 'c03_starcraft_prediction/data_raw/'
remake   = 'c03_starcraft_prediction/data_remake/'
submit   = 'c03_starcraft_prediction/data_submit/'

assets = '/home/yk/0325_Starcraft/competition/_assets'

### 2.0 Data Read / 기본정보탐색

In [4]:
train = pd.read_csv(dir_base + raw + 'train.csv')

In [5]:
t_games = train[train.species=='T'].game_id.unique() # 테란이 포함 game_id
p_games = train[train.species=='P'].game_id.unique() # 프로토스가 포함 game_id
z_games = train[train.species=='Z'].game_id.unique() # 저그가 포함 game_id

In [6]:
tp_games = np.intersect1d(t_games, p_games) # 테란과 프로토스가 포함된 game_id, TvsP
tz_games = np.intersect1d(t_games, z_games) # 테란과 저그가 포함된 game_id, TvsP
pz_games = np.intersect1d(p_games, z_games) # 프로토스와 저그가 포함된 game_id, TvsP

In [7]:
tt_games = np.setdiff1d(t_games, p_games) # 테란 게임에서 프로토스 제외
tt_games = np.setdiff1d(tt_games, z_games) # 테란 게임에서 저그도 제외. 테란만 남음. ZvsZ

pp_games = np.setdiff1d(p_games, t_games) # 프로토스 게임에서 테란 제외
pp_games = np.setdiff1d(pp_games, z_games) # 프로토스 게임에서 저그도 제외. 프로토스만 남음. PvsP

zz_games = np.setdiff1d(z_games, t_games) # 저그 게임에서 테란 제외
zz_games = np.setdiff1d(zz_games, p_games) # 저그 게임에서 프로토스도 제외. 저그만 남음. ZvsZ

In [8]:
# 종족전별 게임 수. PvsP가 적고, TvsP가 많음
len(tt_games),len(pp_games),len(zz_games),len(tp_games),len(tz_games),len(pz_games)

(5667, 2952, 4025, 8691, 10308, 7229)

In [9]:
# 게임 수가 맞는지 확인
len(train.game_id.unique()),len(tt_games)+len(pp_games)+len(zz_games)+len(tp_games)+len(tz_games)+len(pz_games)

(38872, 38872)

In [10]:
# TvsT 종족 확인
train[train.game_id.isin(tt_games)].species.unique()

# PvsP 종족 확인
train[train.game_id.isin(pp_games)].species.unique()

# ZvsZ 종족 확인
train[train.game_id.isin(zz_games)].species.unique()

# TvsP 종족 확인
train[train.game_id.isin(tp_games)].species.unique()

# ZvsT 종족 확인
train[train.game_id.isin(tz_games)].species.unique()

# ZvsP 종족 확인
train[train.game_id.isin(pz_games)].species.unique()

array(['Z', 'P'], dtype=object)

### 3.0 계산결과 저장 (Remake)

In [11]:
%%time
train[train.game_id.isin(tt_games)].to_csv( dir_base + remake + 'train_tt.csv') # TvsT 저장
train[train.game_id.isin(pp_games)].to_csv( dir_base + remake + 'train_pp.csv') # PvsP 저장
train[train.game_id.isin(zz_games)].to_csv( dir_base + remake + 'train_zz.csv') # ZvsZ 저장
train[train.game_id.isin(tp_games)].to_csv( dir_base + remake + 'train_tp.csv') # TvsP 저장
train[train.game_id.isin(tz_games)].to_csv( dir_base + remake + 'train_tz.csv') # TvsZ 저장
train[train.game_id.isin(pz_games)].to_csv( dir_base + remake + 'train_pz.csv') # PvsZ 저장

In [16]:
df_train_tt = train[train.game_id.isin(tt_games)]
df_train_pp = train[train.game_id.isin(pp_games)]
df_train_zz = train[train.game_id.isin(zz_games)]
df_train_tp = train[train.game_id.isin(tp_games)]
df_train_tz = train[train.game_id.isin(tz_games)]
df_train_pz = train[train.game_id.isin(pz_games)]

del(train)

In [30]:
t_games.shape                 # 테란종족 포함된 game_id 갯수 = (24666,)
tp_games.shape                # T + P 종족 포함된 game_id 갯수 = (8691,)
tp_games.shape[0] / t_games.shape[0]     # 35%

0.3523473607394794

(5667,)

In [39]:
p0_dict = dict(df_train_tt['event'][df_train_tt['player'] == 0].value_counts())
p1_dict = dict(df_train_tt['event'][df_train_tt['player'] == 1].value_counts())

In [47]:
for i, key in enumerate(p0_dict.keys(),1):
    _rate = p0_dict[key] / p1_dict[key]
    print(f"{i:02}.'{key:20}' = {_rate*100:0.5}%")

01.'Camera              ' = 99.286%
02.'GetControlGroup     ' = 98.187%
03.'Right Click         ' = 99.945%
04.'Selection           ' = 99.682%
05.'Ability             ' = 99.29%
06.'SetControlGroup     ' = 99.419%
07.'AddToControlGroup   ' = 101.93%
08.'ControlGroup        ' = 104.46%
