<img width=200 src="https://camo.githubusercontent.com/903f3cc51db134b8c9faed2ba2b18ffedff67ff2aafe75259cbde477b27d9b4f/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f7468756d622f652f65642f50616e6461735f6c6f676f2e7376672f3132303070782d50616e6461735f6c6f676f2e7376672e706e673f7261773d74727565"></img>

# Day-15 Pandas 撰寫樞紐分析表

* 範例目標：
  1. 實做欄位索引之間轉換
  2. 重新組織資料
* 範例重點：
  1. 不管是欄位轉索引或是索引轉欄位，皆由最外層的開始轉換
  2. 重新組織資料時應注意參數的理解，可以多做嘗試

## 匯入套件

In [None]:
# 載入 NumPy, Pandas 套件
import numpy as np
import pandas as pd

# 檢查正確載入與版本
print(np)
print(np.__version__)
print(pd)
print(pd.__version__)

<module 'numpy' from 'D:\\anaconda3\\lib\\site-packages\\numpy\\__init__.py'>
1.19.2
<module 'pandas' from 'D:\\anaconda3\\lib\\site-packages\\pandas\\__init__.py'>
1.1.3


## from_product

In [None]:
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                   names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                     names=['subject', 'type'])

# mock some data
data = np.round(np.random.randn(4, 6), 1)
df = pd.DataFrame(data, index=index, columns=columns)
df

Unnamed: 0_level_0,subject,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,-2.0,-0.1,-0.2,-0.8,-0.7,0.6
2013,2,-1.1,-0.1,-1.8,0.6,-1.3,-0.5
2014,1,0.9,-2.5,1.0,-1.7,2.3,-0.1
2014,2,0.2,-2.6,-1.5,0.2,-3.4,0.7


### 欄位轉索引 

* [pandas.DataFrame.stack](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html)
  * 參數
    * level：放整數、字串、串列 (默認為 -1(最後一欄))，決定哪一欄位轉為索引
    * dropna：放布林值 (默認為 True)，決定是否要把全為遺失值的列拿掉

* 將一欄位 (column) 轉成一索引 (index)，使用 .stack() 即可，可以將 type 這個欄位轉成了索引，所以索引變成了 year、visit、type

In [None]:
df.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,subject,Bob,Guido,Sue
year,visit,type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013,1,HR,-0.4,-1.3,-1.6
2013,1,Temp,0.0,-0.7,-1.4
2013,2,HR,-0.1,1.1,1.5
2013,2,Temp,1.9,2.3,-1.0
2014,1,HR,-3.3,0.4,-0.1
2014,1,Temp,0.7,-1.0,-1.0
2014,2,HR,-0.7,0.4,-0.0
2014,2,Temp,1.3,-0.4,1.0


* 再做一次.stack()索引變成了year、visit、type、subject

In [None]:
df.stack().stack()

year  visit  type  subject
2013  1      HR    Bob       -0.4
                   Guido     -1.3
                   Sue       -1.6
             Temp  Bob        0.0
                   Guido     -0.7
                   Sue       -1.4
      2      HR    Bob       -0.1
                   Guido      1.1
                   Sue        1.5
             Temp  Bob        1.9
                   Guido      2.3
                   Sue       -1.0
2014  1      HR    Bob       -3.3
                   Guido      0.4
                   Sue       -0.1
             Temp  Bob        0.7
                   Guido     -1.0
                   Sue       -1.0
      2      HR    Bob       -0.7
                   Guido      0.4
                   Sue       -0.0
             Temp  Bob        1.3
                   Guido     -0.4
                   Sue        1.0
dtype: float64

### 索引轉欄位 

* [pandas.DataFrame.unstack](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html#pandas.DataFrame.unstack)
  * 參數
    * level : 放整數、字串、串列 (默認為 -1)，決定要把哪一索引轉為欄位
    * fill_value : 放整數、字串、字典，決定遺失值要用甚麼代替
* 將一索引(index)轉成一欄位(column) ，使用.unstack()即可，可以將visit這個索引轉成了欄位，所以欄位變成了subject、type 、visit

In [None]:
df.unstack()

subject,Bob,Bob,Bob,Bob,Guido,Guido,Guido,Guido,Sue,Sue,Sue,Sue
type,HR,HR,Temp,Temp,HR,HR,Temp,Temp,HR,HR,Temp,Temp
visit,1,2,1,2,1,2,1,2,1,2,1,2
year,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3
2013,-0.4,-0.1,0.0,1.9,-1.3,1.1,-0.7,2.3,-1.6,1.5,-1.4,-1.0
2014,-3.3,-0.7,0.7,1.3,0.4,0.4,-1.0,-0.4,-0.1,-0.0,-1.0,1.0


## from_tuples

In [None]:
# 設定給以下 dataframe 的索引名稱
multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('weight', 'pounds')])

# 創建我們的範例 dataframe
df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
                                    index=['cat', 'dog'],
                                    columns=multicol1)
df_multi_level_cols1

Unnamed: 0_level_0,weight,weight
Unnamed: 0_level_1,kg,pounds
cat,1,2
dog,2,4


### 欄位轉索引 

In [None]:
# 使用默認的 .stack()  (也就是讓入最下面的欄位)
df_multi_level_cols1.stack()

Unnamed: 0,Unnamed: 1,weight
cat,kg,1
cat,pounds,2
dog,kg,2
dog,pounds,4


In [None]:
df_multi_level_cols1.index, df_multi_level_cols1.stack().index

(Index(['cat', 'dog'], dtype='object'),
 MultiIndex([('cat',     'kg'),
             ('cat', 'pounds'),
             ('dog',     'kg'),
             ('dog', 'pounds')],
            ))

補充 :
* 欄位位子最上面 (最外層) 為 0，往下一列則加一
* df.stack() 中的參數也能放一個 list (意思為要提出當索引的欄位位子)

In [None]:
df_multi_level_cols1.stack().unstack()

Unnamed: 0_level_0,weight,weight
Unnamed: 0_level_1,kg,pounds
cat,1,2
dog,2,4


In [None]:
df_multi_level_cols1.stack(), df_multi_level_cols1.stack().unstack().index

(            weight
 cat kg           1
     pounds       2
 dog kg           2
     pounds       4,
 Index(['cat', 'dog'], dtype='object'))

補充：.unstack() 可以不斷地做下去

In [None]:
df_multi_level_cols1.stack().unstack().unstack().unstack().unstack().unstack().unstack()

Unnamed: 0,Unnamed: 1,weight
cat,kg,1
cat,pounds,2
dog,kg,2
dog,pounds,4


注意
* df.stack()：當欄位沒東西時，會報錯
* df.unstack()：當索引都被移到欄位時，會把所有欄位變為索引

## 欄位名稱轉為欄位值

* [pandas.melt](https://pandas.pydata.org/docs/reference/api/pandas.melt.html)
  * 參數
    * id_vars：不需要被轉換的列名
    * value_vars：需要轉換的列名，如果剩下的列全部都要轉換，就不用寫了
* [Pandas.melt() 用法及代碼示例](https://vimsky.com/zh-tw/examples/usage/python-pandas-melt.html)

In [None]:
df = pd.DataFrame({'Name':{0:'John', 1:'Bob', 2:'Shiela'}, 
                   'Course':{0:'Masters', 1:'Graduate', 2:'Graduate'}, 
                   'Age':{0:27, 1:23, 2:21}}) 
df

Unnamed: 0,Name,Course,Age
0,John,Masters,27
1,Bob,Graduate,23
2,Shiela,Graduate,21


### 全部轉換

In [None]:
df.melt()

Unnamed: 0,variable,value
0,Name,John
1,Name,Bob
2,Name,Shiela
3,Course,Masters
4,Course,Graduate
5,Course,Graduate
6,Age,27
7,Age,23
8,Age,21


### 保留 Name 欄位其餘轉成欄位值

In [None]:
df.melt(id_vars='Name')

Unnamed: 0,Name,variable,value
0,John,Course,Masters
1,Bob,Course,Graduate
2,Shiela,Course,Graduate
3,John,Age,27
4,Bob,Age,23
5,Shiela,Age,21


### 只轉換Name欄位

之後再留下 value_vars='Name'

In [None]:
df.melt(value_vars='Name')

Unnamed: 0,variable,value
0,Name,John
1,Name,Bob
2,Name,Shiela


## 重新組織資料

In [None]:
df = pd.DataFrame({'fff': ['one', 'one', 'one', 'two', 'two',
                           'two'],
                   'bbb': ['P', 'Q', 'R', 'P', 'Q', 'R'],
                   'baa': [2, 3, 4, 5, 6, 7],
                   'zzz': ['h', 'i', 'j', 'k', 'l', 'm']})
df

Unnamed: 0,fff,bbb,baa,zzz
0,one,P,2,h
1,one,Q,3,i
2,one,R,4,j
3,two,P,5,k
4,two,Q,6,l
5,two,R,7,m


### .pivot() 函數根據給定的索引/列值重新組織給定的DataFrame

* [pandas.DataFrame.pivot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html)
  * 參數
    * index：新資料的索引名稱
    * columns：新資料的欄位名稱
    * values：新資料的值名稱
* [DataFrame: pivot() function](https://www.w3resource.com/pandas/dataframe/dataframe-pivot.php)
* [快速瞭解 Pivot Table 與應用](https://medium.com/%E6%95%B8%E6%93%9A%E4%B8%8D%E6%AD%A2-not-only-data/pandas-%E5%BF%AB%E9%80%9F%E7%9E%AD%E8%A7%A3-pivot-table-%E8%88%87%E6%87%89%E7%94%A8-21e4c37b9216)

In [None]:
df.pivot(index='fff', columns='bbb', values='baa')

bbb,P,Q,R
fff,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,2,3,4
two,5,6,7


In [None]:
df.pivot(index='fff', columns='bbb')['baa']

bbb,P,Q,R
fff,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,2,3,4
two,5,6,7


In [None]:
df.pivot(index='fff', columns='bbb', values=['baa', 'zzz'])

Unnamed: 0_level_0,baa,baa,baa,zzz,zzz,zzz
bbb,P,Q,R,P,Q,R
fff,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
one,2,3,4,h,i,j
two,5,6,7,k,l,m
