## Motivating Pivot Tables 數據透視表 

> For the examples in this section, we'll use the database of passengers on the *Titanic*, available through the Seaborn library:

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


## Pivot Tables by Hand 手動生成數據透視表

> To start learning more about this data, we might begin by grouping according to gender, survival status, or some combination thereof.
If you have read the previous section, you might be tempted to apply a ``GroupBy`` operation–for example, let's look at survival rate by gender:

在深入分析數據之前，我們首先根據性別和存活狀態的相關性進行分組。你可能會自然而然地使用`GroupBy`操作，例如，讓我們來獲得不同性別的存活率：

In [5]:
titanic.groupby('sex')[['survived']].mean()

# overall, three of every four females on board survived, while only one in five males survived!
# 普遍來說，74% 的女性都存活了下來，而只有 18% 的男性存活了下來！

Unnamed: 0_level_0,survived
sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


> we might like to go one step deeper and look at survival by both sex and, say, class.
Using the vocabulary of ``GroupBy``, we might proceed using something like this:
we *group by* class and gender, *select* survival, *apply* a mean aggregate, *combine* the resulting groups, and then *unstack* the hierarchical index to reveal the hidden multidimensionality. In code:

我們可能希望進一步了解根據性別和艙位來統計存活率。如果我們用`GroupBy`的方法來描述這個過程的話，那麼很可能是這樣的：我們使用艙位和性別來*分組*，*選擇*存活狀態，*應用*平均值聚合操作，將結果的分組*組合*起來，然後*展開*成層次化的索引來展示隱藏的高維度。

In [6]:
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


> This gives us a better idea of how both gender and class affected survival, but the code is starting to look a bit garbled.
While each step of this pipeline makes sense in light of the tools we've previously discussed, the long string of code is not particularly easy to read or use.
This two-dimensional ``GroupBy`` is common enough that Pandas includes a convenience routine, ``pivot_table``, which succinctly handles this type of multi-dimensional aggregation.

結果給了我們一個更好的關於性別和艙位是如何影響存活率的視角，但是代碼已經開始顯得有點混亂和難以閱讀了。當我們採用之前的知識來實現這個操作流的每一步的時候，代碼會變得越來越長，將會越來越難以使用和閱讀。這種二維的`GroupBy`對於在Pandas中進行普通分組統計時是足夠的，而透視表`pivot_table`，能簡潔的處理這種多維度的聚合操作。

In [7]:
titanic.pivot_table('survived', index='sex', columns='class')

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


> This is eminently more readable than the ``groupby`` approach, and produces the same result.
As you might expect of an early 20th-century transatlantic cruise, the survival gradient favors both women and higher classes.
First-class women survived with near certainty (hi, Rose!), while only one in ten third-class men survived (sorry, Jack!).

上面的語法明顯比`groupby`版本要易讀多了，兩者的結果是一致的。結果告訴我們如果要搭乘20世紀初的跨大西洋遊輪的話，生存機率更加青睞於女性和高級艙位。頭等艙女性幾乎全部存活（Rose你好），而三等艙的男性只有十分之一的機率存活（Jack抱歉）。

### Multi-level pivot tables 多層透視表

> Just as in the ``GroupBy``, the grouping in pivot tables can be specified with multiple levels, and via a number of options.
For example, we might be interested in looking at age as a third dimension.
We'll bin the age using the ``pd.cut`` function:

就像`GroupBy`那樣，數據透視表的分組也可以指定多層次，還可以指定其他多個參數。例如，我們可能想要將年齡作為第三個維度。我們可以使用`pd.cut`將年齡進行分桶：

In [8]:
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 80]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 80]",0.375,0.071429,0.133663


> We can apply the same strategy when working with the columns as well; let's add info on the fare paid using ``pd.qcut`` to automatically compute quantiles:

我們也可以將相同的方法應用到列上；下面我們在列上加上船票費用分組，使用`pd.qcut`將費用按比例自動分桶：

In [9]:
fare = pd.qcut(titanic['fare'], 2)
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])

Unnamed: 0_level_0,fare,"(-0.001, 14.454]","(-0.001, 14.454]","(-0.001, 14.454]","(14.454, 512.329]","(14.454, 512.329]","(14.454, 512.329]"
Unnamed: 0_level_1,class,First,Second,Third,First,Second,Third
sex,age,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,"(0, 18]",,1.0,0.714286,0.909091,1.0,0.318182
female,"(18, 80]",,0.88,0.444444,0.972973,0.914286,0.391304
male,"(0, 18]",,0.0,0.26087,0.8,0.818182,0.178571
male,"(18, 80]",0.0,0.098039,0.125,0.391304,0.030303,0.192308


> The result is a four-dimensional aggregation with hierarchical indices (see [Hierarchical Indexing](03.05-Hierarchical-Indexing.ipynb)), shown in a grid demonstrating the relationship between the values.

結果是一個四維的統計表，行和列都具有層次化的索引（參見[層次化索引](03.05-Hierarchical-Indexing.ipynb)），以表格的形式展示了對應四個不同維度的聚合數據。