# 集約　Aggregation

## Load Data

In [None]:
library(dplyr)
source('preprocess/load_data/data_loader.R')
load_hotel_reserve()

## Aggregation

Using [group_by()](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/group_by), [summarise()](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/summarise) and [n_distinct()](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/n_distinct).

In [None]:
result <- reserve_tb %>%

  group_by(hotel_id) %>%

  summarise(rsv_cnt = n(),
            cus_cnt = n_distinct(customer_id))

result[1:10,]

## Sum

Using [sum()](https://www.rdocumentation.org/packages/base/versions/3.6.0/topics/sum).

In [None]:
result <- reserve_tb %>%

  group_by(hotel_id, people_num) %>%

  summarise(price_sum = sum(total_price))

result[1:10,]

## Average (mean and more)

In [None]:
result <- reserve_tb %>%

  group_by(hotel_id) %>%

  summarise(price_max = max(total_price),
            price_min = min(total_price),
            price_avg = mean(total_price),
            price_median = median(total_price),
            price_20per = quantile(total_price, 0.2))

result[1:10,]

## Standard deviation

Using [coalesce()](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/coalesce). If data count is 1, var() and sd() is NA. transrate NA to zero. 

In [None]:
result <- reserve_tb %>%

  group_by(hotel_id) %>%

  summarise(price_var = coalesce(var(total_price), 0),
            price_std = coalesce(sd(total_price), 0))

result[1:10,]

## Mode

Clasify total_price / 1000.

In [None]:
names(which.max(table(round(reserve_tb$total_price, -3))))

## Calculate rank (1)

In [None]:
reserve_tb$reserve_datetime <-
  as.POSIXct(reserve_tb$reserve_datetime, format = '%Y-%m-%d %H:%M:%S')

result <- reserve_tb %>%

  group_by(customer_id) %>%

  mutate(log_no = row_number(reserve_datetime))

result[1:10,]

## Calculate rank (2)

In [None]:
result <- reserve_tb %>%

  group_by(hotel_id) %>%

  summarise(rsv_cnt = n()) %>%

  transmute(hotel_id, rsv_cnt, rsv_cnt_rank = min_rank(desc(rsv_cnt)))

result[order(result$rsv_cnt_rank),][1:10,]