<a id ='up'></a>
## 시간별 분류로 log 데이터 파악
- **기본 라이브러리 import**   <a href ='#import' >go</a>
- **시간별 게임으로 분류** 
    1. 분류 기준 설정 ( 30초 , 10초 , 1초 ,,,)<a href = '#idx1'  >go</a>
    2. 시간별 데이터 시각화<a href = '#idx2'>go</a>
        - 기존 처리과정을 통하여 적용
    3. 분류 별 기존 데이터 전처리 과정 파이프라인화

<a id = 'import' ></a>
#### 라이브러리 import 

In [2]:
import pandas as pd 
import pickle 
import numpy as np
import re
import seaborn as sns
import matplotlib.pyplot as plt

pd.set_option( 'display.max_rows' , 300)
pd.set_option( 'display.max_columns' , 200)


#load
def load_pickle( PATH ):
    with open(PATH ,'rb') as file:
        res = pickle.load(file)
    return res

<h2><a id = 'idx1'  href = '#up'>시간별 , 게임으로 데이터 분할</a></h2>

   0 ~ 10 분 사이의 데이터를 30초 간격으로 분할 ex) 3.30 초로 환산 , 30으로 나눠서 시간별 등급 부여
- 총 playtime 시간 최대 10분59초 
- 30 초로 분리 --> 21개의 class
    - 등급별 0 , 1 , 2 ,,, 21
    - **정수형 0 , 0.5 , 1.5 , 2.5 , 3,,,,** 선택
    - 실제 시간? 0 , 0.3 , 1.3 , 2 ,,
    
    
    
- 10 초 단위로 분리
    - **정수형** 

#### 30초 단위로 분리

In [None]:
# load data 
raw_data = pd.read_csv('data/sample/sample_game.csv' )

In [None]:
# 정수형 형태로 변환
df = raw_data.copy()
df['sep_time']=(((df['time']%1 * 100 + df['time']//1*60)//30)*30/60) 

#### 나머지 시간별 나누기

In [None]:
# 20 초 단위가 가장 좋은듯
df['sep_time_10']=((((df['time']%1 * 100 + df['time']//1*60)//20)*20)/60).round(1)
# 분단위로 나누기
df['sep_time_60'] = df['time'].apply(np.floor)


<h2><a href = '#up' , id = 'idx2' >시간별 데이터 시각화</a></h2>

- **시각화**
    - raw_data 시각화 시간별 , or 게임별 행동 로그 파악 <a href = '#sub2-1'>go</a>
    - preprocessed_data 시각화

- **기존처리과정 불러오기**<a href = '#load' >go</a>
    - base_column , worker , attack , action , action_code ....
    
    


### 시각화



<a id = 'sub2-1'></a>

#### - raw_data 시각화

    - sep_time , game_id 기준으로 event column 의 value groupby
    - 게임별, 시간별로 데이터 시각화

<a href = '#idx2'>up</a>

__groupby__

In [None]:
# 게임별 플레이 타임 뽑기
play_time_id = df.groupby(['game_id'])['time'].max().sort_values(ascending = False )
sample_game= list(play_time_id.sample(n= 10 , random_state = 123).index)
sample_game

In [None]:
sample = df.set_index( 'game_id' ,drop = True ).loc[ sample_game ]
sample
grouped_df = pd.DataFrame(sample.groupby(['sep_time_60', 'game_id'])['event'].value_counts().unstack()['Camera'])
grouped_df.unstack().plot(figsize = ( 16 , 10 ))
grouped_df.unstack()

In [None]:
grouped_df = pd.DataFrame(df.groupby(['game_id','sep_time_10'])['event'].value_counts().unstack())
grouped_df.loc[17873].plot(figsize = ( 10 , 7 ))
grouped_df.loc[3663].plot(figsize = ( 10 , 7 ))
grouped_df.loc[1576].plot(figsize = ( 10 , 7 ))

In [None]:
grouped_df = pd.DataFrame(df.groupby(['sep_time'])['event'].value_counts().unstack())
plt.figure( figsize = ( 18 , 10 ))
sns.heatmap(grouped_df.corr() , annot = True ,
           linewidths = 0 ,
           fmt = '.1f' ,cmap = "Blues" )

In [None]:
grouped_df = pd.DataFrame(df.groupby(['sep_time'])['event'].value_counts().unstack())[['AddToControlGroup' , 'ControlGroup' , 'GetControlGroup' , 'SetControlGroup']]
plt.figure( figsize = ( 10 , 8 ))
sns.heatmap(grouped_df.corr() , annot = True ,
           linewidths = 1 ,
           fmt = '.1f' ,cmap = "Blues" )

In [None]:
grouped_df = pd.DataFrame(df.groupby(['sep_time'])['event'].value_counts().unstack())
plt.figure( figsize = ( 10, 3 ))
sns.heatmap(grouped_df.corr().loc[['Camera']] , annot = True ,
            linewidths = 1 ,
           fmt = '.1f' ,cmap = "Blues" )

<a id = 'load'></a>

### 기존처리과정 불러오기 
1. **base columns**<a href = '#sub2-11'>go</a>
2. **action column**<a href = '#sub2-2'>go</a>
3. **resource column**<a href = '#sub2-3'>go</a>


<a href = '#idx2'>up</a>

<a id = 'sub2-11'></a>
**1. base columns**

<a href = '#load' >up</a>
<a href = '#sub2-run'>run</a>

In [3]:


## 이벤트 컬럼 갯수 세기
def get_event_counts( raw_data ):
    print('start event counts! .....')
    grouped_data = raw_data.groupby([ 'game_id' ,'sep_time', 'player' ])['event'].value_counts().unstack().unstack()
    print( 'complete groupby')
    
    new_names = []
    for col in grouped_data.columns:
        name = col[0]
        player = col[1]
        new_names.append(f'p{player}_{name}')
    grouped_data.columns = new_names 
    print( 'Done')
    
    return grouped_data.reset_index().set_index('game_id')


def get_species_df( raw_data ):
    print( 'start species! ... ')
    tmp = raw_data.groupby(['game_id', 'sep_time', 'player'])['species'].value_counts().unstack().unstack()
    print('complete groupby')

    new_names = []
    for col in tmp.columns:
        name = col[0]
        player = col[1]
        new_names.append(f'p{player}_{name}')

    tmp.columns = new_names

    species_df = tmp.xs(0.0 , level= 1).fillna(0).astype('bool').astype('int')
    print('Done')
    
    return species_df

def get_label( raw_data ):
    tmp = raw_data[['game_id','winner']]
    df = {'game_id':[] , 'winner' : []}

    for id_ in tmp['game_id'].unique():
        df['winner'].append(tmp.loc[tmp.game_id == id_].winner.values[-1])
        df['game_id'].append(id_)
        
    return pd.DataFrame(df)


def get_worker_attack_df( raw_data ):
    print( 'start worker' )
    a_data = raw_data.loc[raw_data.event == 'Ability']
    worker_data = a_data[a_data['event_contents'].str.contains('1360') | a_data['event_contents'].str.contains('1820') | a_data['event_contents'].str.contains('15E0')| a_data['event_contents'].str.contains('B40')]
    worker_data = worker_data.groupby(['game_id' , 'sep_time' , 'player'])['time'].count().unstack()
    worker_data.columns = ['p0_worker' , 'p1_worker']
    worker_data = worker_data.reset_index().set_index('game_id')
    print( 'worker Done' )
    
    print( 'start attack')
    attack_data = a_data[ a_data['event_contents'].str.contains('Attack')]
    attack_data = attack_data.groupby(['game_id', 'sep_time' , 'player'])['time'].count().unstack()
    attack_data.columns = ['p0_attack' , 'p1_attack']
    attack_data = attack_data.reset_index().set_index('game_id')
    print( 'attack Done' )
    
    return worker_data , attack_data





def get_base_df ( raw_data ,winner = True , WA_col = True ):
    
#     raw_data['sep_time']=(((raw_data['time']%1 * 100 + raw_data['time']//1*60)//30)*30/60)
    event_df = get_event_counts(raw_data)
    species_df = get_species_df(raw_data)
    
    label = None
    if winner:
        label = get_label( raw_data )
        
    worker_attack_df = None 
    if WA_col:
        worker_data , attack_data = get_worker_attack_df( raw_data )
        worker_attack_df = pd.merge( worker_data ,attack_data , on = ['game_id', 'sep_time'], how = 'outer')
        
    return event_df , species_df , worker_attack_df , label
    
    



<a id = 'sub2-2'></a>

**2. action column**

<a href = '#load'>up</a>  <a href = '#sub2-run'>run</a>

In [4]:
# raw_data action_code 만들기
def get_action_code(df):
    code_reg = re.compile('[0-9A-Z]{3}[0-9A-Z]?')
    data = df['event_contents']
    species = df['species']
    try:
        code = code_reg.findall(data)[0]
        return code + species
    except:
        return np.nan

def get_action( data ):
    global act_list
    try:
        return act_list[data]
    except:
        return np.nan
    
    
    
# raw_data 받아서 , action_code , action 컬럼 붙여주기

def make_action_code_columns( data ):
    data = data.loc[ data['event'] == 'Ability']
    data['action_code'] = data.apply(get_action_code  ,axis = 1)
    data['action'] = data['action_code'].apply(get_action)
    return data



<a id = 'sub2-3' ></a>

**3. resource columns**

<a href='#load'>up</a> <a href = '#sub2-run'>run</a>

In [5]:
# action_code 얻기
def get_action_code(df):
    code_reg = re.compile('[0-9A-Z]{3}[0-9A-Z]?')
    data = df['event_contents']
    species = df['species']
    try:
        code = code_reg.findall(data)[0]
        return code + species
    except:
        return np.nan

    
    
# mineral , gas 호출 함수
def get_mi(data):
    global resource_list
    try:
        return resource_list[data][0]
    except:
        return 0
    
def get_gas(data):
    global resource_list
    try:
        return resource_list[data][1]
    except:
        return 0
    
def append_resource( data ):
    data = data.loc[data['event'] == 'Ability']
    data['action_code'] = data.apply(get_action_code, axis= 1)
    data['mineral'] = data['action_code'].apply(get_mi)
    data['gas'] = data['action_code'].apply(get_gas)
    return data


<a id = 'sub2-run'></a>


### Run


In [6]:
raw_data = pd.read_csv('data/raw/train.csv')
act_list = load_pickle( 'data/pickle/act_list/CodeSpecies:Category.p' )
resource_list = load_pickle('data/pickle/cost/CodeSpecies:resource.p')

## spilt time
interval= 90
raw_data['sep_time']= (((raw_data['time']%1 * 100 + raw_data['time']//1*60)//interval)*interval/60)



## base_column  ######### label = True !!!! ##################
event , species , wk_att , label  = get_base_df( raw_data, winner =False )
base = pd.merge(event , species , on = 'game_id' , how = 'outer')
base = pd.merge( base , wk_att ,  on = ['game_id' , 'sep_time']  , how = 'outer')




### act_column
act_grouped_df = make_action_code_columns( raw_data )
act_grouped_df = act_grouped_df.groupby(['game_id' , 'sep_time', 'player'])['action'].value_counts().unstack().unstack()

new_names = []
for name in act_grouped_df.columns:
    new_names.append(f'p{name[1]}_{name[0]}')

act_grouped_df.columns = new_names
act_grouped_df = act_grouped_df.reset_index().set_index('game_id')




### resource_column

 
cost_df = append_resource( raw_data )
cost_grouped_df = cost_df.groupby(['game_id' , 'sep_time', 'player'])['mineral','gas'].sum()
cost_grouped_df = cost_grouped_df.unstack()
cost_grouped_df.columns = ['p0_mineral' , 'p1_mineral' , 'p0_gas' , 'p1_gas']
cost_grouped_df = cost_grouped_df.reset_index().set_index('game_id')



### merge_all
train_data = pd.merge(base , act_grouped_df , on = ['game_id' , 'sep_time'] , how = 'outer')
train_data = pd.merge(train_data , cost_grouped_df , on = ['game_id' , 'sep_time'] , how = 'outer')

start event counts! .....
complete groupby
Done
start species! ... 
complete groupby
Done
start worker
worker Done
start attack
attack Done


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_gui

In [7]:
train_data.to_csv('full_train_0402_inter90.csv')

In [8]:
train_data

Unnamed: 0_level_0,sep_time,p0_Ability,p1_Ability,p0_AddToControlGroup,p1_AddToControlGroup,p0_Camera,p1_Camera,p0_ControlGroup,p1_ControlGroup,p0_GetControlGroup,p1_GetControlGroup,p0_Right Click,p1_Right Click,p0_Selection,p1_Selection,p0_SetControlGroup,p1_SetControlGroup,p0_P,p1_P,p0_T,p1_T,p0_Z,p1_Z,p0_worker,p1_worker,p0_attack,p1_attack,p0_build,p1_build,p0_unit,p1_unit,p0_upgrade,p1_upgrade,p0_mineral,p1_mineral,p0_gas,p1_gas
game_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1
0,0.0,8.0,5.0,,,69.0,38.0,,,3.0,,8.0,,7.0,2.0,1.0,,0,0,1,1,0,0,4.0,2.0,,,3.0,3.0,4.0,2.0,,,525.0,425.0,0.0,0.0
0,1.5,4.0,3.0,,,83.0,67.0,,,2.0,,4.0,3.0,9.0,9.0,1.0,1.0,0,0,1,1,0,0,1.0,,,,2.0,2.0,1.0,1.0,1.0,,700.0,300.0,150.0,100.0
0,3.0,7.0,8.0,2.0,,109.0,132.0,,,8.0,,8.0,11.0,9.0,18.0,1.0,,0,0,1,1,0,0,2.0,1.0,,2.0,3.0,1.0,3.0,2.0,,1.0,425.0,300.0,150.0,175.0
0,4.5,7.0,7.0,,,88.0,135.0,,,6.0,,6.0,11.0,10.0,13.0,,,0,0,1,1,0,0,1.0,1.0,,1.0,4.0,2.0,2.0,2.0,,,700.0,425.0,125.0,125.0
0,6.0,8.0,11.0,,,95.0,53.0,,,5.0,3.0,9.0,3.0,15.0,15.0,,,0,0,1,1,0,0,2.0,3.0,,,2.0,1.0,3.0,4.0,1.0,1.0,450.0,550.0,225.0,300.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
38871,1.5,6.0,4.0,,,32.0,112.0,2.0,,121.0,92.0,86.0,56.0,46.0,9.0,,2.0,0,0,0,1,1,0,2.0,2.0,,,2.0,2.0,4.0,2.0,,,575.0,600.0,0.0,0.0
38871,3.0,13.0,16.0,,,65.0,109.0,4.0,,51.0,65.0,90.0,47.0,40.0,24.0,,1.0,0,0,0,1,1,0,5.0,1.0,5.0,8.0,2.0,2.0,5.0,,1.0,1.0,675.0,450.0,100.0,350.0
38871,4.5,15.0,31.0,,,82.0,107.0,3.0,,45.0,65.0,73.0,51.0,52.0,17.0,,3.0,0,0,0,1,1,0,1.0,5.0,4.0,16.0,3.0,4.0,4.0,5.0,,,700.0,1200.0,100.0,350.0
38871,6.0,22.0,25.0,,,86.0,98.0,4.0,,70.0,58.0,35.0,54.0,65.0,19.0,,2.0,0,0,0,1,1,0,8.0,3.0,5.0,12.0,1.0,1.0,11.0,3.0,1.0,3.0,675.0,800.0,100.0,450.0
