### 목록
* 준비
    * [사용할 라이브러리 불러오기](#lib)
    * [엘라스틱서치 환경 설정하기](#es_setting)
    * [엘라스틱서치에서 데이터 조회하기](#es_load)
    * [조회한 데이터를 테이블 형태로 만들기](#df)
* Parent Pipeline Aggregation
    * [Derivative Aggregation](#der)
    * [Cumulative Sum Aggregation](#cusum)
    * [Moving Avg Aggregation](#mvavg)
    * [Serial Diff Aggregation](#sd)
---

### 준비

<a name='lib'></a>
#### 사용할 라이브러리 불러오기

In [1]:
import pandas as pd
from elasticsearch import Elasticsearch
import datetime

<a name='es_setting'></a>
#### 엘라스틱서치 환경 설정하기

In [2]:
# 적당한 값을 넣지 않으면 오류가 납니다
es = Elasticsearch()
es_index = 'test_index'
es_type = 'test_type'

<a name='es_load'></a>
#### 엘라스틱서치에서 데이터 조회하기

In [3]:
x = es.search(
      index='test_index', 
      doc_type='test_type', 
      body={
        'query' : {
            'match_all' : {}
        },
        'size' : 50
      }
)

<a name='df'></a>
#### 조회한 데이터를 테이블 형태로 만들기

In [4]:
df = pd.DataFrame([x['hits']['hits'][idx]['_source'] for idx, _ in enumerate(x['hits']['hits'])])
df

Unnamed: 0,결제카드,고객ip,고객나이,고객성별,고객주소_시도,구매사이트,물건좌표,배송메모,상품가격,상품개수,상품분류,수령시간,예약여부,접수번호,주문시간,판매자평점
0,국민,87.186.104.168,28,여성,광주광역시,옥션,"36.69404479407181, 128.77849366125415",부재중,8000,7,티셔츠,2017-11-04T18:14:01,일반,7,2017-11-01T05:45:01,1
1,하나,61.25.11.51,32,남성,제주특별자치도,티몬,"37.33710638991395, 128.19801428893294",부재중,13000,1,자켓,2017-11-11T03:32:07,일반,8,2017-11-09T16:58:07,3
2,국민,229.171.146.50,35,여성,부산광역시,위메프,"35.20703842551487, 129.88814822339899",주소 오류,22000,1,자켓,2017-11-07T06:46:26,일반,11,2017-11-06T20:03:26,1
3,국민,100.11.208.255,38,여성,충청남도,GS샵,"36.383338520582114, 129.92114062561183",부재중,21000,7,니트,2017-11-05T22:54:35,일반,31,2017-11-03T18:29:35,3
4,롯데,62.73.159.47,28,여성,부산광역시,쿠팡,"37.82022943242058, 127.73799530036115",부재중,21000,1,블라우스,2017-11-10T16:02:36,일반,34,2017-11-09T22:37:36,4
5,신한,28.170.18.252,30,남성,전라북도,티몬,"37.92452138343442, 128.67772065089744",관리실에 맡김,13000,7,티셔츠,2017-11-08T11:11:55,예약,37,2017-11-05T00:41:55,3
6,신한,197.247.246.203,41,여성,충청남도,옥션,"36.663065088906095, 127.92452258110524",환불 요청,15000,7,원피스,2017-11-09T04:28:44,일반,39,2017-11-06T00:27:44,1
7,국민,191.96.106.19,38,여성,인천광역시,쿠팡,"35.42099073543264, 126.40755694075982",환불 요청,12000,1,가디건,2017-11-11T15:26:16,일반,47,2017-11-09T00:40:16,3
8,우리,212.72.222.135,39,여성,전라북도,11번가,"36.204083843135464, 129.24012691531485",부재중,10000,7,셔츠,2017-11-09T08:59:26,일반,48,2017-11-06T10:28:26,5
9,국민,62.36.170.71,39,여성,대구광역시,GS샵,"35.56868760610102, 128.3793660808611",환불 요청,17000,1,스커트,2017-11-11T15:36:16,일반,1,2017-11-08T02:00:16,3


### Parent Piepline Aggregation

In [6]:
df.index = pd.to_datetime(df['주문시간'])
df_daily_count = df.resample('D').count().iloc[:, 0]
df_daily_count

주문시간
2017-11-01    2
2017-11-02    4
2017-11-03    5
2017-11-04    7
2017-11-05    4
2017-11-06    6
2017-11-07    5
2017-11-08    4
2017-11-09    8
2017-11-10    5
Freq: D, Name: 결제카드, dtype: int64

<a name="der"></a>
#### Derivative Aggregation 

: (Date) Histogram에서 연속된 두 bucket의 aggregated된 값의 차이를 구한다. (현재 값 - 이전 값)

In [7]:
df_daily_count_diff = df_daily_count.diff()
df_derivative = pd.concat([df_daily_count, df_daily_count_diff], axis=1)
df_derivative.columns.values[0], df_derivative.columns.values[1]  = '일별 판매 개수', '전일 대비 판매 개수 변화'
df_derivative

Unnamed: 0_level_0,일별 판매 개수,전일 대비 판매 개수 변화
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-11-01,2,
2017-11-02,4,2.0
2017-11-03,5,1.0
2017-11-04,7,2.0
2017-11-05,4,-3.0
2017-11-06,6,2.0
2017-11-07,5,-1.0
2017-11-08,4,-1.0
2017-11-09,8,4.0
2017-11-10,5,-3.0


<a name="cusum"></a>

#### Cumulative Sum Aggregation
: (Date) Histogram에서 bucket의 aggregate된 값의 누적합 반환한다

In [8]:
df_daily_count_cumsum = df_daily_count.cumsum()
df_cumsum = pd.concat([df_daily_count, df_daily_count_cumsum], axis=1)
df_cumsum.columns.values[0], df_cumsum.columns.values[1]  = '일별 판매 개수', '누적 판매 개수'
df_cumsum

Unnamed: 0_level_0,일별 판매 개수,누적 판매 개수
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-11-01,2,2
2017-11-02,4,6
2017-11-03,5,11
2017-11-04,7,18
2017-11-05,4,22
2017-11-06,6,28
2017-11-07,5,33
2017-11-08,4,37
2017-11-09,8,45
2017-11-10,5,50


<a name="mvavg"></a>
#### Moving Avg Aggregation

: (Date) Histogram에서 bucket의 aggregate된 값들의 이동평균 반환한다

In [9]:
df_daily_count_rolling = df_daily_count.rolling(window=2).mean()
df_rolling = pd.concat([df_daily_count, df_daily_count_rolling], axis=1)
df_rolling.columns.values[0], df_rolling.columns.values[1]  = '일별 판매 개수', '이틀 간 판매 개수의 이동 평균'
df_rolling

Unnamed: 0_level_0,일별 판매 개수,이틀 간 판매 개수의 이동 평균
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-11-01,2,
2017-11-02,4,3.0
2017-11-03,5,4.5
2017-11-04,7,6.0
2017-11-05,4,5.5
2017-11-06,6,5.0
2017-11-07,5,5.5
2017-11-08,4,4.5
2017-11-09,8,6.0
2017-11-10,5,6.5


<a name="sd"></a>
#### Serial Diff Aggregation 

: (Date) Histogram에서 현재 bucket의 aggregate된 값과 이전 {n}번째 bucket의 aggregate된 값의 차이를 반환한다 (현재 - 이전)

In [10]:
df_daily_count_diff_ = df_daily_count.diff(periods=2)
df_derivative_ = pd.concat([df_daily_count, df_daily_count_diff_], axis=1)
df_derivative_.columns.values[0], df_derivative_.columns.values[1]  = '일별 판매 개수', '전전일 대비 판매 개수 변화'
df_derivative_

Unnamed: 0_level_0,일별 판매 개수,전전일 대비 판매 개수 변화
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-11-01,2,
2017-11-02,4,
2017-11-03,5,3.0
2017-11-04,7,3.0
2017-11-05,4,-1.0
2017-11-06,6,-1.0
2017-11-07,5,1.0
2017-11-08,4,-2.0
2017-11-09,8,3.0
2017-11-10,5,1.0
