### 목록
* 준비
    * [사용할 라이브러리 불러오기](#lib)
    * [엘라스틱서치 환경 설정하기](#es_setting)
    * [엘라스틱서치에서 데이터 조회하기](#es_load)
    * [조회한 데이터를 테이블 형태로 만들기](#df)
* Bucket Aggregation
    * [Date Histogram Aggregation](#dh)
    * [Date Range Aggregation](#dr)
    * [Histogram Aggregation](#ha)
    * [Range Aggregation](#ra)
    * [Terms Aggregation](#ta)
    * [Filters Aggregation](#fa)
---

### 준비

<a name='lib'></a>
#### 사용할 라이브러리 불러오기

In [1]:
import pandas as pd
from elasticsearch import Elasticsearch
import datetime

<a name='es_setting'></a>
#### 엘라스틱서치 환경 설정하기

In [2]:
# 적당한 값을 넣지 않으면 오류가 납니다
es = Elasticsearch()
es_index = 'test_index'
es_type = 'test_type'

<a name='es_load'></a>
#### 엘라스틱서치에서 데이터 조회하기

In [3]:
x = es.search(
      index='test_index', 
      doc_type='test_type', 
      body={
        'query' : {
            'match_all' : {}
        },
        'size' : 50
      }
)

<a name='df'></a>
#### 조회한 데이터를 테이블 형태로 만들기

In [4]:
df = pd.DataFrame([x['hits']['hits'][idx]['_source'] for idx, _ in enumerate(x['hits']['hits'])])
df

Unnamed: 0,결제카드,고객ip,고객나이,고객성별,고객주소_시도,구매사이트,물건좌표,배송메모,상품가격,상품개수,상품분류,수령시간,예약여부,접수번호,주문시간,판매자평점
0,국민,87.186.104.168,28,여성,광주광역시,옥션,"36.69404479407181, 128.77849366125415",부재중,8000,7,티셔츠,2017-11-04T18:14:01,일반,7,2017-11-01T05:45:01,1
1,하나,61.25.11.51,32,남성,제주특별자치도,티몬,"37.33710638991395, 128.19801428893294",부재중,13000,1,자켓,2017-11-11T03:32:07,일반,8,2017-11-09T16:58:07,3
2,국민,229.171.146.50,35,여성,부산광역시,위메프,"35.20703842551487, 129.88814822339899",주소 오류,22000,1,자켓,2017-11-07T06:46:26,일반,11,2017-11-06T20:03:26,1
3,국민,100.11.208.255,38,여성,충청남도,GS샵,"36.383338520582114, 129.92114062561183",부재중,21000,7,니트,2017-11-05T22:54:35,일반,31,2017-11-03T18:29:35,3
4,롯데,62.73.159.47,28,여성,부산광역시,쿠팡,"37.82022943242058, 127.73799530036115",부재중,21000,1,블라우스,2017-11-10T16:02:36,일반,34,2017-11-09T22:37:36,4
5,신한,28.170.18.252,30,남성,전라북도,티몬,"37.92452138343442, 128.67772065089744",관리실에 맡김,13000,7,티셔츠,2017-11-08T11:11:55,예약,37,2017-11-05T00:41:55,3
6,신한,197.247.246.203,41,여성,충청남도,옥션,"36.663065088906095, 127.92452258110524",환불 요청,15000,7,원피스,2017-11-09T04:28:44,일반,39,2017-11-06T00:27:44,1
7,국민,191.96.106.19,38,여성,인천광역시,쿠팡,"35.42099073543264, 126.40755694075982",환불 요청,12000,1,가디건,2017-11-11T15:26:16,일반,47,2017-11-09T00:40:16,3
8,우리,212.72.222.135,39,여성,전라북도,11번가,"36.204083843135464, 129.24012691531485",부재중,10000,7,셔츠,2017-11-09T08:59:26,일반,48,2017-11-06T10:28:26,5
9,국민,62.36.170.71,39,여성,대구광역시,GS샵,"35.56868760610102, 128.3793660808611",환불 요청,17000,1,스커트,2017-11-11T15:36:16,일반,1,2017-11-08T02:00:16,3


### Bucket Aggregation

<a name='dh'></a>
#### Date Histogram Aggregation

: Date Field를 일정 간격으로 나눈 구간으로 Bucket(혹은 Group) 생성

In [5]:
new_df = df.copy()
new_df.index = pd.to_datetime(new_df['주문시간'])
new_df.resample('D').count().iloc[:, 0]

주문시간
2017-11-01    2
2017-11-02    4
2017-11-03    5
2017-11-04    7
2017-11-05    4
2017-11-06    6
2017-11-07    5
2017-11-08    4
2017-11-09    8
2017-11-10    5
Freq: D, Name: 결제카드, dtype: int64

<a name="dr"></a>
#### Date Range Aggregation
:  각 Date 구간의 시작과 끝점을 설정하여 Bucket 생성한다

##### 11월1일 ~ 11월3일 구간

In [6]:
new_df['2017-11-01' : '2017-11-03']

Unnamed: 0_level_0,결제카드,고객ip,고객나이,고객성별,고객주소_시도,구매사이트,물건좌표,배송메모,상품가격,상품개수,상품분류,수령시간,예약여부,접수번호,주문시간,판매자평점
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2017-11-01 05:45:01,국민,87.186.104.168,28,여성,광주광역시,옥션,"36.69404479407181, 128.77849366125415",부재중,8000,7,티셔츠,2017-11-04T18:14:01,일반,7,2017-11-01T05:45:01,1
2017-11-03 18:29:35,국민,100.11.208.255,38,여성,충청남도,GS샵,"36.383338520582114, 129.92114062561183",부재중,21000,7,니트,2017-11-05T22:54:35,일반,31,2017-11-03T18:29:35,3
2017-11-02 05:28:18,우리,50.130.83.231,27,남성,경기도,위메프,"37.78723366594554, 127.99751085613701",상품 이상,17000,7,팬츠,2017-11-05T09:26:18,일반,20,2017-11-02T05:28:18,1
2017-11-03 20:15:22,신한,110.126.36.108,20,남성,부산광역시,티몬,"35.73286207547757, 128.237959511983",주소 오류,28000,1,팬츠,2017-11-06T13:32:22,일반,28,2017-11-03T20:15:22,4
2017-11-03 10:37:56,국민,153.183.125.58,29,여성,강원도,11번가,"36.02446595440512, 128.37338819474155",상품 이상,17000,7,블라우스,2017-11-05T21:01:56,일반,2,2017-11-03T10:37:56,3
2017-11-02 04:33:20,롯데,165.17.49.145,42,여성,전라북도,g마켓,"35.418743037985074, 126.58583972979557",주소 오류,10000,7,블라우스,2017-11-04T18:20:20,일반,14,2017-11-02T04:33:20,4
2017-11-02 19:50:09,하나,96.7.37.144,43,남성,광주광역시,g마켓,"37.40789435127714, 126.05827428381714",상품 이상,10000,7,셔츠,2017-11-07T00:43:09,일반,10,2017-11-02T19:50:09,4
2017-11-01 13:52:01,하나,114.136.56.2,17,여성,서울특별시,11번가,"36.48704797751413, 126.6969369579494",상품 이상,27000,1,코트,2017-11-04T01:56:01,일반,22,2017-11-01T13:52:01,5
2017-11-03 20:24:56,신한,47.144.234.168,22,남성,경상남도,GS샵,"35.065572757040094, 129.03874692593698",부재중,22000,7,수트,2017-11-04T17:49:56,일반,4,2017-11-03T20:24:56,3
2017-11-02 13:19:26,우리,27.147.153.211,26,남성,경상북도,옥션,"36.529942877901696, 127.35651155748744",부재중,23000,7,팬츠,2017-11-05T05:03:26,일반,19,2017-11-02T13:19:26,2


In [7]:
new_df['2017-11-01' : '2017-11-03'].count()[0]

11

##### 11월4일 ~ 11월10일 구간

In [8]:
new_df['2017-11-04' : '2017-11-10']

Unnamed: 0_level_0,결제카드,고객ip,고객나이,고객성별,고객주소_시도,구매사이트,물건좌표,배송메모,상품가격,상품개수,상품분류,수령시간,예약여부,접수번호,주문시간,판매자평점
주문시간,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2017-11-09 16:58:07,하나,61.25.11.51,32,남성,제주특별자치도,티몬,"37.33710638991395, 128.19801428893294",부재중,13000,1,자켓,2017-11-11T03:32:07,일반,8,2017-11-09T16:58:07,3
2017-11-06 20:03:26,국민,229.171.146.50,35,여성,부산광역시,위메프,"35.20703842551487, 129.88814822339899",주소 오류,22000,1,자켓,2017-11-07T06:46:26,일반,11,2017-11-06T20:03:26,1
2017-11-09 22:37:36,롯데,62.73.159.47,28,여성,부산광역시,쿠팡,"37.82022943242058, 127.73799530036115",부재중,21000,1,블라우스,2017-11-10T16:02:36,일반,34,2017-11-09T22:37:36,4
2017-11-05 00:41:55,신한,28.170.18.252,30,남성,전라북도,티몬,"37.92452138343442, 128.67772065089744",관리실에 맡김,13000,7,티셔츠,2017-11-08T11:11:55,예약,37,2017-11-05T00:41:55,3
2017-11-06 00:27:44,신한,197.247.246.203,41,여성,충청남도,옥션,"36.663065088906095, 127.92452258110524",환불 요청,15000,7,원피스,2017-11-09T04:28:44,일반,39,2017-11-06T00:27:44,1
2017-11-09 00:40:16,국민,191.96.106.19,38,여성,인천광역시,쿠팡,"35.42099073543264, 126.40755694075982",환불 요청,12000,1,가디건,2017-11-11T15:26:16,일반,47,2017-11-09T00:40:16,3
2017-11-06 10:28:26,우리,212.72.222.135,39,여성,전라북도,11번가,"36.204083843135464, 129.24012691531485",부재중,10000,7,셔츠,2017-11-09T08:59:26,일반,48,2017-11-06T10:28:26,5
2017-11-08 02:00:16,국민,62.36.170.71,39,여성,대구광역시,GS샵,"35.56868760610102, 128.3793660808611",환불 요청,17000,1,스커트,2017-11-11T15:36:16,일반,1,2017-11-08T02:00:16,3
2017-11-07 15:29:40,국민,133.45.118.127,33,남성,대전광역시,GS샵,"37.87201865369285, 126.75507159240954",주소 오류,6000,1,티셔츠,2017-11-09T21:12:40,일반,3,2017-11-07T15:29:40,1
2017-11-09 17:35:41,국민,47.181.162.243,25,남성,경상남도,g마켓,"35.34373921692427, 129.82903178684617",상품 이상,5000,7,점퍼,2017-11-13T02:55:41,일반,12,2017-11-09T17:35:41,3


In [9]:
new_df['2017-11-04' : '2017-11-10'].count()[0]

39

<a name="ha"></a>
#### Histogram Aggregation
: Number Field의 각 구간으로 Bucket 생성한다

In [10]:
bins = range(10, 51, 10)
group_names = ['10대', '20대', '30대', '40대']
df['연령대'] = pd.cut(df['고객나이'], bins, labels=group_names)
df.groupby(by='연령대').count().iloc[:, 0]

연령대
10대     6
20대    17
30대    18
40대     9
Name: 결제카드, dtype: int64

<a name="ra"></a>
#### Range Aggregation
: 옵션으로 설정한 Number Field의 각 구간으로 Bucket 생성한다

* 상품가격이 만원 이하 : [0, 10,000)
* 상품가격이 이만원 이하 : [10,000 , 20000)
* 상품가격이 삼만원 이하 : [20,000 , 30000)

In [11]:
bins = range(0, 30001, 10000)
group_names = ['만원 이하', '이만원 이하', '삼만원 이하']
df['가격대'] = pd.cut(df['상품가격'], bins, labels=group_names)
df.groupby(by='가격대').count().iloc[:, 0]

가격대
만원 이하     13
이만원 이하    20
삼만원 이하    17
Name: 결제카드, dtype: int64

<a name="ta"></a>
#### Terms Aggregation
: 선택한 Term를 기준으로 Bucket 생성한다

* 카드별 결제 건수가 
* 가장 많은 2개의 카드 별
* 최대 결제 금액을 구하시오

In [12]:
# 카드별 결제 건수가...
card = df.groupby(by='결제카드').size()
card

결제카드
국민    13
롯데     6
삼성     1
시티     1
신한     8
우리    14
하나     7
dtype: int64

In [13]:
# ,,,가장 많은 2개의 카드...
card_sort = card.sort_values(ascending=False)[:2]
card_sort

결제카드
우리    14
국민    13
dtype: int64

In [14]:
# ...최대 결제 금액을 구하시오
max(df[df['결제카드'] == '우리']['상품가격'])

27000

In [15]:
# ...최대 결제 금액을 구하시오
max(df[df['결제카드'] == '국민']['상품가격'])

26000

<a name="fa"></a>
#### Filters Aggregation
: Query String으로 작성한 조건 만족하는 Bucket 생성한다

* 구매 사이트가 옥션, 티몬인 사람들의
* 구매 사이트별 최소 구매가격을 구하시오

In [16]:
# 구매 사이트가 옥션, 티몬인 사람들의...
df_filter = df[df['구매사이트'].str.contains('옥션|티몬')]
df_filter

Unnamed: 0,결제카드,고객ip,고객나이,고객성별,고객주소_시도,구매사이트,물건좌표,배송메모,상품가격,상품개수,상품분류,수령시간,예약여부,접수번호,주문시간,판매자평점,연령대,가격대
0,국민,87.186.104.168,28,여성,광주광역시,옥션,"36.69404479407181, 128.77849366125415",부재중,8000,7,티셔츠,2017-11-04T18:14:01,일반,7,2017-11-01T05:45:01,1,20대,만원 이하
1,하나,61.25.11.51,32,남성,제주특별자치도,티몬,"37.33710638991395, 128.19801428893294",부재중,13000,1,자켓,2017-11-11T03:32:07,일반,8,2017-11-09T16:58:07,3,30대,이만원 이하
5,신한,28.170.18.252,30,남성,전라북도,티몬,"37.92452138343442, 128.67772065089744",관리실에 맡김,13000,7,티셔츠,2017-11-08T11:11:55,예약,37,2017-11-05T00:41:55,3,20대,이만원 이하
6,신한,197.247.246.203,41,여성,충청남도,옥션,"36.663065088906095, 127.92452258110524",환불 요청,15000,7,원피스,2017-11-09T04:28:44,일반,39,2017-11-06T00:27:44,1,40대,이만원 이하
14,국민,161.124.192.177,44,여성,서울특별시,옥션,"37.89167509773231, 129.89861013896265",상품 이상,21000,1,니트,2017-11-04T18:14:51,일반,26,2017-11-04T08:22:51,5,40대,삼만원 이하
15,신한,110.126.36.108,20,남성,부산광역시,티몬,"35.73286207547757, 128.237959511983",주소 오류,28000,1,팬츠,2017-11-06T13:32:22,일반,28,2017-11-03T20:15:22,4,10대,삼만원 이하
18,우리,15.236.59.18,35,여성,전라남도,옥션,"37.509031233865585, 127.79764126292338",상품 이상,23000,1,청바지,2017-11-07T20:21:07,일반,43,2017-11-05T04:57:07,4,30대,삼만원 이하
23,하나,148.19.203.84,33,남성,강원도,옥션,"35.1888202093495, 128.58916079461932",상품 이상,19000,1,코트,2017-11-10T22:46:55,일반,5,2017-11-06T04:46:55,2,30대,이만원 이하
32,롯데,230.250.155.4,41,여성,충청북도,옥션,"35.29706664216842, 128.3612301978423",시간 내에 배송 못함,27000,7,코트,2017-11-13T23:19:29,일반,6,2017-11-10T23:15:29,1,40대,삼만원 이하
33,신한,246.158.150.193,19,남성,충청북도,티몬,"35.779125040380016, 127.75099764954581",부재중,23000,1,점퍼,2017-11-05T00:16:58,일반,9,2017-11-04T20:38:58,1,10대,삼만원 이하


In [17]:
# ...구매사이트별 최소 구매고객을 구하시오
min(df_filter[df_filter['구매사이트']=='티몬']['상품가격'])

5000

In [18]:
# ...구매사이트별 최소 구매고객을 구하시오
min(df_filter[df_filter['구매사이트']=='옥션']['상품가격'])

8000