# Group by
- Visual Python: Data Analysis > Groupby

1. 기준 컬럼을 중심으로 그룹핑
2. 집계할 컬럼 선택
3. 선택된 컬럼의 집계 데이터(합계, 평균, 분산 등)를 각 그룹별로 산출

---

# Import python package

In [None]:
# Visual Python: Data Analysis > Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Load files

In [None]:
# Visual Python: Data Analysis > File
df_tips = pd.read_csv('./data/tips.csv')
df_tips

In [None]:
# Visual Python: Data Analysis > File
df_w = pd.read_csv('./data/washing_machine.csv')
df_w

# 1. Single aggregate function

## 1.1. Group by `day`, Agg. function: `mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day').mean(numeric_only=True)

## 1.2. Group by `day`, `smoker`, Agg. function: `mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby(['day','smoker']).mean(numeric_only=True)

## 1.3. Group by `day`, Agg. column: `total_bill`, Agg. function: `mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')['total_bill'].mean(numeric_only=True)

## 1.4. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. function: `mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].mean(numeric_only=True)

# 2. Multiple aggregate functions

## 2.1. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. functions: `std()`, `mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].agg(['std', 'mean'])

## 2.2. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. functions: `std()`, `mean()`, Rename: `STD`, `AVG`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].agg([('STD', 'std'), ('AVG', 'mean')])

# 3. Multiple aggregate functions for each column

## 3.1. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. functions: `total_bill - sum()`, `tip - mean()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].agg({'total_bill': 'sum', 'tip': 'mean'})

## 3.2. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. functions: `total_bill - sum()`, `tip - std(), max(), mean(), median()`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].agg({'total_bill': 'sum', 'tip': ['std','max','mean','median']})

## 3.3. Group by `day`, Agg. columns: `total_bill`, `tip`, Agg. functions: `total_bill - sum()`, `tip - std(), mean()`, Rename: `SUM`, `STD`, `AVG`

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['total_bill','tip']].agg({'total_bill': [('SUM', 'sum')], 'tip': [('STD', 'std'),('AVG', 'mean')]})

# 4. reset_index
```Check reset_index```

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby(['day','smoker'], as_index=False).mean(numeric_only=True)

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day', as_index=False)['tip'].mean(numeric_only=True)

# 5. Result: Series to DataFrame
```Check To DataFrame```

In [None]:
# Visual Python: Data Analysis > Groupby
df_tips.groupby('day')[['tip']].mean(numeric_only=True)

# 6. Grouper

### 6.1. As type(object --> datetime)

In [None]:
# Visual Python: Data Analysis > Frame
df_w = df_w.astype({'create_dt_utc': 'datetime64[ns]'})
df_w

### 6.2. Group by `Grouper(create_dt_utc)`,  Agg. function: `size`

In [None]:
# Visual Python: Data Analysis > Groupby
df_count = df_w.groupby(pd.Grouper(key='create_dt_utc', freq='4H')).size()
df_count

### 6.3. Bar chart

In [None]:
# Visual Python: Visualization > Chart Style
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.rc('figure', figsize=(8, 6))

from matplotlib import rcParams
rcParams['font.size'] = 10
rcParams['axes.unicode_minus'] = False

In [None]:
# Visual Python: Library > Pandas Plot
df_count.plot(kind='barh')
plt.show()

---

# [실습] 직접 해보기

## 실습 1. tips 데이터로 복습하기

In [None]:
# Visual Python: Data Analysis > File
tips = pd.read_csv('./data/tips.csv')
tips

#### Q. tips 데이터의 기준 컬럼을 'time'으로 'total_bill'과 'tip'의 sum을 각각 구하세요.

In [None]:
tips.groupby('time')[['total_bill','tip']].sum(numeric_only=True)

#### Q. tips 데이터의 기준 컬럼을 'time'으로 'total_bill'과 'tip'의 sum, mean을 각각 구하고 컬럼 이름을 '합계', '평균'으로 설정하세요.

In [None]:
tips.groupby('time')[['total_bill','tip']].agg([('합계', 'sum'), ('평균', 'mean')])

#### Q. tips 데이터의 기준 컬럼을 'time'으로 'total_bill의 sum'과 'tip'의 sum, mean을 각각 구하고 컬럼 이름을 '합계', '평균'으로 설정하세요.

In [None]:
tips.groupby('time')[['total_bill','tip']].agg({'total_bill': [('합계', 'sum')], 'tip': [('합계', 'sum'),('평균', 'mean')]})

#### Q. df 데이터의 기준 컬럼을 'create_dt_utc'의 하루(1day) 단위로 설정하고 각 행의 수(size)를 출력하세요.

In [None]:
df_w.groupby(pd.Grouper(key='create_dt_utc', freq='1D')).size()

## 실습 2. titanic 데이터로 집계하기

In [None]:
# Visual Python: Data Analysis > File
df_titanic = pd.read_csv('./data/titanic.csv')
df_titanic

#### Q. 기준 컬럼을 `Pclass`로 `Age`와 `Fare`컬럼의 평균값을 구하세요.

In [None]:
# Visual Python: Data Analysis > Groupby
df_titanic.groupby('Pclass')[['Age','Fare']].mean(numeric_only=True)

#### Q. 기준 컬럼을 `Survived`로 `Age`와 `Pclass`, `Fare` 컬럼의 평균과 중앙값을 구하세요.

In [None]:
# Visual Python: Data Analysis > Groupby
df_titanic.groupby('Survived')[['Pclass','Age','Fare']].agg(['mean', 'median'])

#### Q. 기준 컬럼을 `Survived`와 `Pclass`로, `Age` 컬럼의 min, max, mean을 구하고 `Fare` 컬럼의 mean, sum, quantile을 구하세요.

In [None]:
# Visual Python: Data Analysis > Groupby
df_titanic.groupby(['Survived','Pclass']).agg({'Age': ['max','mean','min'], 'Fare': ['sum','mean','quantile']})

#### Q. 기준 컬럼을 `Survived`와 `Pclass`로, `Age` 컬럼의 min, max, mean을 구하고, 각각 최소나이, 최대나이, 평균나이로 컬럼명을 지정해 출력하세요.

In [None]:
# Visual Python: Data Analysis > Groupby
df_titanic.groupby(['Survived','Pclass']).agg({'Age': [('최대나이', 'max'),('평균나이', 'mean'),('최소나이', 'min')]})

---

In [None]:
# End of file