# 🚩 DataFrame Aggregation & Reshape
## 주요 토픽
1. Grouping Columns
2. Multi-Index Dataframe
3. Aggregating Groups
4. Pivot Tables
5. Melting Dataframe
## 목표
1. Group Dataframes by one or more columns and calculate aggregate statistics by group
2. Learn to access multi-index dataframe and reset to return to single index
3. Create pivot table to summarize data
4. Melt wide tables of data into a long tabular form

In [1]:
import pandas as pd
import numpy as np

## 1. 데이터프레임 집계
### > groupby 메서드
- To group data, use the groupby 메서드 and specify a column to group by
    - Grouped columns becomes the index BY DEFAULT

In [2]:
league = pd.read_excel('./data/premier_league_games.xlsx')
league.head()

Unnamed: 0,id,league_name,season,HomeTeam,AwayTeam,HomeGoals,AwayGoals
0,4389,England Premier League,2015/2016,Arsenal,West Ham United,0,2
1,4390,England Premier League,2015/2016,Bournemouth,Aston Villa,0,1
2,4391,England Premier League,2015/2016,Chelsea,Swansea City,2,2
3,4392,England Premier League,2015/2016,Everton,Watford,2,2
4,4393,England Premier League,2015/2016,Leicester City,Sunderland,4,2


In [3]:
# returns Series
league.groupby('HomeTeam')['HomeGoals'].mean()

HomeTeam
Arsenal                 1.631579
Aston Villa             0.736842
Bournemouth             1.210526
Chelsea                 1.684211
Crystal Palace          1.000000
Everton                 1.842105
Leicester City          1.842105
Liverpool               1.736842
Manchester City         2.473684
Manchester United       1.421053
Newcastle United        1.684211
Norwich City            1.368421
Southampton             2.052632
Stoke City              1.157895
Sunderland              1.210526
Swansea City            1.052632
Tottenham Hotspur       1.842105
Watford                 1.052632
West Bromwich Albion    1.052632
West Ham United         1.789474
Name: HomeGoals, dtype: float64

In [4]:
# return Dataframe
league.groupby('HomeTeam')[['HomeGoals']].mean()

Unnamed: 0_level_0,HomeGoals
HomeTeam,Unnamed: 1_level_1
Arsenal,1.631579
Aston Villa,0.736842
Bournemouth,1.210526
Chelsea,1.684211
Crystal Palace,1.0
Everton,1.842105
Leicester City,1.842105
Liverpool,1.736842
Manchester City,2.473684
Manchester United,1.421053


## 2. 그룹화 - Multiple Columns
### > groupby 메서드
- 리스트를 전달
- multi-index object를 리턴
- as_index 매개변수

In [5]:
league = pd.read_excel('./data/premier_league_games_full.xlsx')
league.head()

Unnamed: 0,id,league_name,season,HomeTeam,AwayTeam,HomeGoals,AwayGoals
0,1729,England Premier League,2008/2009,Manchester United,Newcastle United,1,1
1,1730,England Premier League,2008/2009,Arsenal,West Bromwich Albion,1,0
2,1731,England Premier League,2008/2009,Sunderland,Liverpool,0,1
3,1732,England Premier League,2008/2009,West Ham United,Wigan Athletic,2,1
4,1733,England Premier League,2008/2009,Aston Villa,Manchester City,4,2


In [6]:
league.groupby(['HomeTeam', 'season'])[['HomeGoals']].sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,HomeGoals
HomeTeam,season,Unnamed: 2_level_1
Arsenal,2008/2009,31
Arsenal,2009/2010,48
Arsenal,2010/2011,33
Arsenal,2011/2012,39
Arsenal,2012/2013,47
...,...,...
Wigan Athletic,2011/2012,22
Wigan Athletic,2012/2013,26
Wolverhampton Wanderers,2009/2010,13
Wolverhampton Wanderers,2010/2011,30


In [7]:
league.groupby(['season', 'HomeTeam'])[['HomeGoals']].sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,HomeGoals
season,HomeTeam,Unnamed: 2_level_1
2008/2009,Arsenal,31
2008/2009,Aston Villa,27
2008/2009,Blackburn Rovers,22
2008/2009,Bolton Wanderers,21
2008/2009,Chelsea,33
...,...,...
2015/2016,Swansea City,20
2015/2016,Tottenham Hotspur,35
2015/2016,Watford,20
2015/2016,West Bromwich Albion,20


In [8]:
(
    league
    .groupby(['season', 'HomeTeam'], as_index=False)[['HomeGoals']]
    .sum()
    .query("HomeTeam == 'Arsenal'")
)

Unnamed: 0,season,HomeTeam,HomeGoals
0,2008/2009,Arsenal,31
20,2009/2010,Arsenal,48
40,2010/2011,Arsenal,33
60,2011/2012,Arsenal,39
80,2012/2013,Arsenal,47
100,2013/2014,Arsenal,36
120,2014/2015,Arsenal,41
140,2015/2016,Arsenal,31
