reshaping(重构)=转换表或者向量(**DataFrame/Series**)的结构，让其适合进行数据分析

* pivot
* pivot-table
* stack
* un-stack

# pivot(Excel常用，也就是透视图的意思)
pivot函数有三个参数
* index
* columns
* values

In [1]:
from collections import OrderedDict
from pandas import DataFrame
import pandas as pd
import numpy as np

In [2]:
# 下面使用的是有序的字典，里面的顺序是固定的
table = OrderedDict((
    ('Item', ['Item0', 'Item0', 'Item1', 'Item1']),
    ('CType', ['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD', [1, 2, 3, 4]),
    ('EU', [1, 2, 3, 4])
))
df = DataFrame(table)
df

Unnamed: 0,Item,CType,USD,EU
0,Item0,Gold,1,1
1,Item0,Bronze,2,2
2,Item1,Gold,3,3
3,Item1,Silver,4,4


In [4]:
p = df.pivot(index='Item', columns='CType', values='USD')
p    # p里面没有EU信息，某种意义上pivot后是对原来信息的简化。

CType,Bronze,Gold,Silver
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Item0,2.0,1.0,
Item1,,3.0,4.0


In [6]:
# 原来的DataFrame:获取Item0，金牌客户的价值
print(df[(df.Item=='Item0') & (df.CType=='Gold')].USD.values)

[1]


In [7]:
# 现在的Pivot DataFrame
print(p[p.index=='Item0'].Gold.values)

[ 1.]


# 多列pivot

In [8]:
p = df.pivot(index='Item', columns='CType')
p     # muliindex = hierarchical column

Unnamed: 0_level_0,USD,USD,USD,EU,EU,EU
CType,Bronze,Gold,Silver,Bronze,Gold,Silver
Item,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Item0,2.0,1.0,,2.0,1.0,
Item1,,3.0,4.0,,3.0,4.0


In [9]:
print(df[(df.Item=='Item0') & (df.CType=='Gold')].USD.values)

[1]


In [10]:
print(p.USD[p.USD.index=='Item0'].Gold.values)

[ 1.]


## pivot常见错误-index/columns索引后对应后有多行相同值
**`这个问题的解决方法是用pivot-table!`**

In [12]:
table = OrderedDict((
    ('Item', ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType', ['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD', [1, 2, 3, 4]),
    ('EU', [1, 2, 3, 4])
))
df = DataFrame(table)
print(df)
p = df.pivot(index='Item', columns='CType', values='USD')

    Item   CType  USD  EU
0  Item0    Gold    1   1
1  Item0  Bronze    2   2
2  Item0    Gold    3   3
3  Item1  Silver    4   4


ValueError: Index contains duplicate entries, cannot reshape

## pivot_table
> **`pivot_table`**解决这个问题的方式是取了平均值

In [14]:
table = OrderedDict((
    ('Item', ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType', ['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD', [1, 2, 3, 4]),
    ('EU', [1, 2, 3, 4])
))
df = DataFrame(table)
p = df.pivot_table(index='Item', columns='CType', values='USD')
p

CType,Bronze,Gold,Silver
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Item0,2.0,2.0,
Item1,,,4.0


In [15]:
df

Unnamed: 0,Item,CType,USD,EU
0,Item0,Gold,1,1
1,Item0,Bronze,2,2
2,Item0,Gold,3,3
3,Item1,Silver,4,4


> 在`pivot_table`的参数中，可以通过使用**`aggfunc`** 来指定重复时要采取的取值的方式！

In [16]:
table = OrderedDict((
    ('Item', ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType', ['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD', [1, 2, 3, 4]),
    ('EU', [1, 2, 3, 4])
))
df = DataFrame(table)
p = df.pivot_table(index='Item', columns='CType', values='USD', aggfunc=np.max)    # aggfunc指明aggretation函数
p

CType,Bronze,Gold,Silver
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Item0,2.0,3.0,
Item1,,,4.0


### stack/unstack
pivot实际是对DataFrame进行stack的一种特例。stack意味把最内层的列索引变成最内层的行索引，而unstack就是把最内层的行索引变成列索引。