# Hierarchical Indexing 層次化索引

## The Pandas Index Object - Index 索引對象

> We have seen here that both the ``Series`` and ``DataFrame`` objects contain an explicit *index* that lets you reference and modify data.
This ``Index`` object is an interesting structure in itself, and it can be thought of either as an *immutable array* or as an *ordered set* (technically a multi-set, as ``Index`` objects may contain repeated values).
Those views have some interesting consequences in the operations available on ``Index`` objects.
As a simple example, let's construct an ``Index`` from a list of integers:

前面內容介紹的`Series`和`DataFrame`對像都包含著一個顯式定義的*索引index*對象，它的作用就是讓你快速訪問和修改數據。

In [1]:
import numpy as np
import pandas as pd

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [2]:
indA.size                           # returns the size
indA.shape                          # returns the shape
indA.ndim                           # returns the number of dimensions in the data
indA.dtype                          # returns the data type
indA.nbytes                         # returns the number of bytes in the data
indA.empty                          # checks if the series is empty or not
indA.hasnans                        # checks if the series has any nan value 

indA.intersection(indB)             #indA & indB  # 交集
indA.union(indB)                    #indA | indB  # 聯集
indA.symmetric_difference(indB)     #indA ^ indB  # 互斥差集

Int64Index([1, 2, 9, 11], dtype='int64')

### UFuncs: Index Alignment 索引對齊

> For binary operations on two ``Series`` or ``DataFrame`` objects, Pandas will align indices in the process of performing the operation.
This is very convenient when working with incomplete data, as we'll see in some of the examples that follow.

對於兩個`Series`或`DataFrame`進行二元運算操作，Pandas會在運算過程中會自動將兩個數據集的索引進行對齊操作。這對於我們處理不完整的數據集的情況下非常方便，下面我們來看一些例子。

In [3]:
# 假設我們從兩個不同的數據源分別獲得美國前三大面積和前三大人口的州
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': 19651127}, name='population')

In [4]:
area

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64

In [5]:
population

California    38332521
Texas         26448193
New York      19651127
Name: population, dtype: int64

In [6]:
area.index.union(population.index)    # union )(|)

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

> Any item for which one or the other does not have an entry is marked with ``NaN``, or "Not a Number," which is how Pandas marks missing data (see further discussion of missing data in [Handling Missing Data](03.04-Missing-Values.ipynb)).If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators.
For example, calling ``A.add(B)`` is equivalent to calling ``A + B``, but allows optional explicit specification of the fill value for any elements in ``A`` or ``B`` that might be missing:


兩個任意輸入數據集中對應的另一個數據集不存在的元素都會被設置為`NaN`（非數字的縮寫），也就是Pandas標示缺失數據的方法：如果填充成NaN值不是你需要的結果，你可以使用相應的ufunc函數來計算，然後在函數中設置相應的填充值參數。例如，調用`A.add(B)`等同於調用`A + B`，但是可以提供額外的參數來設置用來缺失的替換值：

In [7]:
area + population

Alaska               NaN
California    38756488.0
New York             NaN
Texas         27143855.0
dtype: float64

In [8]:
area.add(population, fill_value=0)

Alaska         1723337.0
California    38756488.0
New York      19651127.0
Texas         27143855.0
dtype: float64

> The following table lists Python operators and their equivalent Pandas object methods:

下面列出了Python的運算操作及其對應的Pandas方法：

| Python運算符  | Pandas方法                             |Python運算符     | Pandas方法                             |
|--------------|---------------------------------------|-----------------|---------------------------------------|
| ``+``        | ``add()``                             | ``//``          | ``floordiv()``                        |
| ``-``        | ``sub()``, ``subtract()``             | ``%``           | ``mod()``                             |
| ``*``        | ``mul()``, ``multiply()``             | ``**``          | ``pow()``                             |
| ``/``        | ``truediv()``, ``div()``, ``divide()``|

### Reindex - How To Reindex Pandas Objects

In [14]:
ob = pd.Series([1, 2, 3, 6], index=['a', 'b', 'c', 'd'])
ob.reindex(index=['a', 'b', 'c', 'd', 'e'], method = 'ffill')   #NaN -> float

a    1
b    2
c    3
d    6
e    6
dtype: int64

In [15]:
ob3 = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'b', 'c'], columns=['Andhra', 'Tamilnadu', 'Kerala'])
ob3.reindex(index=['a', 'b', 'c', 'd'])
ob3.reindex(columns=['Andhra', 'Tamilnadu', 'Kerala', 'Telangana'])

Unnamed: 0,Andhra,Tamilnadu,Kerala,Telangana
a,0,1,2,
b,3,4,5,
c,6,7,8,


## A Multiply Indexed Series - Series 的多重索引

> Let's start by considering how we might represent two-dimensional data within a one-dimensional ``Series``.
For concreteness, we will consider a series of data where each point has a character and numerical key.

我們從在一維`Series`中表示二維數據開始。我們考慮一個序列的數據，每個數據點都有一個字符串和數字關鍵字。

In [55]:
index = [('California', 2000), ('California', 2010),('New York', 2000), ('New York', 2010),('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]

betw = pd.MultiIndex.from_tuples(index)
pop = pd.Series(populations, index=index)
pop = pop.reindex(betw)

pop[:, 2010] # Search

California    37253956
New York      19378102
Texas         25145561
dtype: int64

### MultiIndex as extra dimension 多重索引作為額外維度 unstack()

> we could easily have stored the same data using a simple ``DataFrame`` with index and column labels.In fact, Pandas is built with this equivalence in mind. The ``unstack()`` method will quickly convert a multiply indexed ``Series`` into a conventionally indexed ``DataFrame``:

我們可以很簡單的將數據存儲在一個簡單的`DataFrame`裡面，州名作為行索引，年份作為列索引。實際上，Pandas已經內建了這種等同的機制。 `unstack()`方法可以很快地將多重索引的`Series`轉換成普通索引的`DataFrame`：

In [56]:
pop    #Series

pop_df = pop.unstack()     # Series -> DataFrame
pop_df = pop_df.stack()    # DataFrame -> Series       #stack() method provides the opposite operation:
pop_df

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

## A Multiply Indexed Dataframe - Dataframe 的多重索引 
> we can also use it to represent data of three or more dimensions in a ``Series`` or ``DataFrame``.
Each extra level in a multi-index represents an extra dimension of data; taking advantage of this property gives us much more flexibility in the types of data we can represent. Concretely, we might want to add another column of demographic data for each state at each year (say, population under 18) ; with a ``MultiIndex`` this is as easy as adding another column to the ``DataFrame``:

我們也可以使用`Series`或`DataFrame`來表示三維或多維的數據。每個多重索引中的額外層次都代表著數據中額外的維度；利用這點我們可以靈活地詳細地展示我們的數據，例如我們希望在上面各州各年人口數據的基礎上增加一列（比方說18歲以下人口數）；使用`MultiIndex`能很簡單的為`DataFrame`增加一列：

In [57]:
index = [('California', 2000), ('California', 2010),('New York', 2000), ('New York', 2010),('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]

betw = pd.MultiIndex.from_tuples(index)
pop = pop.reindex(betw)

In [58]:
pop_df = pd.DataFrame({'total': pop, 'under18': [9267089, 9284094,4687374, 4318033,5906301, 6879014]})
pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,33871648,9267089
California,2010,37253956,9284094
New York,2000,18976457,4687374
New York,2010,19378102,4318033
Texas,2000,20851820,5906301
Texas,2010,25145561,6879014


In [59]:
f_u18 = pop_df['under18'] / pop_df['total'] #算18歲以下人口的比例 
f_u18.unstack()

Unnamed: 0,2000,2010
California,0.273594,0.249211
New York,0.24701,0.222831
Texas,0.283251,0.273568


## Methods of MultiIndex Creation

## 多重索引創建的方法

> The most straightforward way to construct a multiply indexed ``Series`` or ``DataFrame`` is to simply pass a list of two or more index arrays to the constructor. For example:

最直接的構建多重索引`Series`或`DataFrame`的方式是向index參數傳遞一個多重列表。例如：

In [35]:
df = pd.DataFrame(np.random.rand(4, 2),index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                                       columns=['data1', 'data2'])
df

Unnamed: 0,Unnamed: 1,data1,data2
a,1,0.164493,0.105049
a,2,0.104183,0.007193
b,1,0.353614,0.992624
b,2,0.597977,0.801245


### Explicit MultiIndex constructors 顯式 MultiIndex 構造器

> For more flexibility in how the index is constructed, you can instead use the class method constructors available in the ``pd.MultiIndex``.
For example, as we did before, you can construct the ``MultiIndex`` from a simple list of arrays giving the index values within each level:

當你需要更靈活地構建多重索引時，你可以使用`pd.MultiIndex`的構造器。例如，你可以使用多重列表來構造一個和前面一樣的`MultiIndex`對象：

In [40]:
pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])      # Method A :  MultiIndex
pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])  # MethodB : list of tuples 
pd.MultiIndex.from_product([['a', 'b'], [1, 2]])                     # MethodC : Cartesian product of single indices

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

> Similarly, you can construct the ``MultiIndex`` directly using its internal encoding by passing ``levels`` (a list of lists containing available index values for each level) and ``labels`` (a list of lists that reference these labels):

你可以用`MultiIndex`構造器來構造多重索引，你需要傳遞`levels`（多重列表包括每個層次的索引值）和`labels`（多重列表包括數據點的標籤值）參數：

In [None]:
pd.MultiIndex(levels=[['a', 'b'], [1, 2]], codes=[[0, 0, 1, 1], [0, 1, 0, 1]])

> Any of these objects can be passed as the ``index`` argument when creating a ``Series`` or ``Dataframe``, or be passed to the ``reindex`` method of an existing ``Series`` or ``DataFrame``.

上面創建的這些對像都能作為`index`參數傳遞給`Series`或`DataFrame`構造器使用，或者作為`reindex`方法的參數提供給`Series`或`DataFrame`對象進行重新索引。

### MultiIndex level names MultiIndex 層次名稱

> Sometimes it is convenient to name the levels of the ``MultiIndex``.
This can be accomplished by passing the ``names`` argument to any of the above ``MultiIndex`` constructors, or by setting the ``names`` attribute of the index after the fact:

為了方便有時需要給`MultiIndex`的不同層次進行命名。這可以通過在上面的`MultiIndex`構造方法中傳遞`names`參數，或者創建了之後通過設置`names`屬性來實現：

In [None]:
pop.index.names = ['state', 'year']
pop

### MultiIndex for columns 列的 MultiIndex

> In a ``DataFrame``, the rows and columns are completely symmetric, and just as the rows can have multiple levels of indices, the columns can have multiple levels as well.
Consider the following, which is a mock-up of some (somewhat realistic) medical data:

在一個`DataFrame`中，行和列是完全對稱的，就像前面看到的行可以有多層次的索引，列也可以有多層次的索引。看下面的例子，用來模擬真實的醫療數據：

In [None]:
# 行和列的多重索引
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                   names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                     names=['subject', 'type'])

# 模擬真實數據
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37

# DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data

> Here we see where the multi-indexing for both rows and columns can come in *very* handy.
This is fundamentally four-dimensional data, where the dimensions are the subject, the measurement type, the year, and the visit number.
With this in place we can, for example, index the top-level column by the person's name and get a full ``DataFrame`` containing just that person's information:

我們看到多重索引對於行和列來說都是非常方便的。上面的數據集實際上是一個四維的數據，四個維度分別是受試者、測試類型、年份和測試編號。創建了這個`DataFrame`之後，我們可以使用受試者的姓名來很方便的獲取到此人的所有測試數據：

In [None]:
health_data['Guido']

> For complicated records containing multiple labeled measurements across multiple times for many subjects (people, countries, cities, etc.) use of hierarchical rows and columns can be extremely convenient!

對於這種包含著多重標籤的多種維度（人、國家、城市等）數據。使用這種層次化的行和列的結構會非常方便。

## Indexing and Slicing a MultiIndex

## 在 MultiIndex 上檢索和切片

> Indexing and slicing on a ``MultiIndex`` is designed to be intuitive, and it helps if you think about the indices as added dimensions.

在`MultiIndex`上進行檢索和切片設計的非常直觀，你可以將其想像為在新增的維度上進行檢索能幫助你理解。

> We'll first look at indexing multiply indexed ``Series``, and then multiply-indexed ``DataFrame``s.

我們先來看一下多重索引`Series`的方法，然後再看多重索引的`DataFrame`。

### Multiply indexed Series 多重索引 Series

> Consider the multiply indexed ``Series`` of state populations we saw earlier: We can access single elements by indexing with multiple terms:



回頭再看前面的那個人口的多重序列`Series`：我們可以使用多重索引值獲取單個元素：

In [None]:
pop

In [None]:
pop['California', 2000]

> The ``MultiIndex`` also supports *partial indexing*, or indexing just one of the levels in the index.
The result is another ``Series``, with the lower-level indices maintained:

`MultiIndex`同樣支持*部分檢索*，即僅在索引中檢索其中的一個層次。得到的結果是另一個`Series`但是具有更少的層次結構：

In [None]:
pop['California']

> Partial slicing is available as well, as long as the ``MultiIndex`` is sorted (see discussion in [Sorted and Unsorted Indices](#Sorted-and-unsorted-indices)):

部分切片同樣也是支持的，只要`MultiIndex`是排序的（參見[有序和無序的索引](#Sorted-and-unsorted-indices)）：

In [None]:
pop.loc['California':'New York']

> With sorted indices, partial indexing can be performed on lower levels by passing an empty slice in the first index:

在有序索引的情況下，部分檢索也可以用到低層次的索引上，只需要在第一個索引位置傳遞一個空的切片即可：

In [None]:
pop[:, 2000]

> Other types of indexing and selection (discussed in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb)) work as well; for example, selection based on Boolean masks:

其他類型的索引和選擇（參見[數據索引和選擇](03.02-Data-Indexing-and-Selection.ipynb)）也是允許的；例如，使用布爾遮蓋進行選擇：

In [None]:
pop[pop > 22000000]

> Selection based on fancy indexing also works:

使用高級索引進行選擇：

In [None]:
pop[['California', 'Texas']]

### Multiply indexed DataFrames 多重索引 DataFrame

> A multiply indexed ``DataFrame`` behaves in a similar manner.
Consider our toy medical ``DataFrame`` from before:

對`DataFrame`進行多重索引也是同樣的。再看前面我們的醫療`DataFrame`數據：

In [None]:
health_data

> Remember that columns are primary in a ``DataFrame``, and the syntax used for multiply indexed ``Series`` applies to the columns.
For example, we can recover Guido's heart rate data with a simple operation:

請注意`DataFrame`中主要的索引是列，你可以將上面的多重索引`Series`的方法應用到`DataFrame`的列上。例如，通過一個簡單的操作就能獲得Guido的心率數據：

In [None]:
health_data['Guido', 'HR']

> Also, as with the single-index case, we can use the ``loc``, ``iloc``, and ``ix`` indexers introduced in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb). For example:

同樣，就像單一索引的情況那樣，我們可以使用在（[數據索引和選擇](03.02-Data-Indexing-and-Selection.ipynb)）中介紹的`loc`、`iloc`和`ix`索引符。例如：

In [None]:
health_data.iloc[:2, :2]

> These indexers provide an array-like view of the underlying two-dimensional data, but each individual index in ``loc`` or ``iloc`` can be passed a tuple of multiple indices. For example:

這些索引符提供了一個底層二維數據的數組視圖，並且`loc`或`iloc`中每個獨立的索引都可以傳遞一個多重索引的元組。例如：

In [None]:
health_data.loc[:, ('Bob', 'HR')]

> You could get around this by building the desired slice explicitly using Python's built-in ``slice()`` function, but a better way in this context is to use an ``IndexSlice`` object, which Pandas provides for precisely this situation.
For example:

解决上述问题的方法可以是显式调用Python內建的`slice()`函数，还有一个更好的方式是使用`IndexSlice`对象，该对象是Pandas专门为这种情况准备的。例如：

In [None]:
idx = pd.IndexSlice
health_data.loc[idx[:, 1], idx[:, 'HR']]

### Stacking and unstacking indices

### 索引的堆疊和拆分

> As we saw briefly before, it is possible to convert a dataset from a stacked multi-index to a simple two-dimensional representation, optionally specifying the level to use:

我們前面已經看到，我們可以將一個堆疊的多重索引的數據集拆分成一個簡單的二維形式，還可以指定使用哪個層次進行拆分：

In [None]:
pop.unstack(level=0)

In [None]:
pop.unstack(level=1)

> The opposite of ``unstack()`` is ``stack()``, which here can be used to recover the original series:

`unstack()`的逆操作是`stack()`，我們可以使用它來重新堆疊數據集：

In [None]:
pop.unstack().stack()

### Index setting and resetting

### 設置及重新設置索引

> Another way to rearrange hierarchical data is to turn the index labels into columns; this can be accomplished with the ``reset_index`` method.
Calling this on the population dictionary will result in a ``DataFrame`` with a *state* and *year* column holding the information that was formerly in the index.
For clarity, we can optionally specify the name of the data for the column representation:

還有一種重新排列層次化數據的方式是將行索引標籤轉為列索引標籤；這可以使用`reset_index`方法來實現。在人口數據集上調用這個方法能讓結果`DataFrame`的列有層次化的州和年份標籤，它們是從原來的行標籤轉換過來的。為了清晰起見，我們可以設置列的標籤：

In [None]:
pop_flat = pop.reset_index(name='population')
pop_flat

> Often when working with data in the real world, the raw input data looks like this and it's useful to build a ``MultiIndex`` from the column values.
This can be done with the ``set_index`` method of the ``DataFrame``, which returns a multiply indexed ``DataFrame``:

通常當我們處理真實世界的數據的時候，我們看到的就會是如上的數據集的形式，因此從列當中構建一個`MultiIndex`會很有用。這可以通過在`DataFrame`上使用`set_index`方法來實現，這樣會返回一個多重索引的`DataFrame`：

In [None]:
pop_flat.set_index(['state', 'year'])

## Data Aggregations on Multi-Indices

## 多重索引的數據聚合

> We've previously seen that Pandas has built-in data aggregation methods, such as ``mean()``, ``sum()``, and ``max()``.
For hierarchically indexed data, these can be passed a ``level`` parameter that controls which subset of the data the aggregate is computed on.

前面我們已經了解到Pandas有內建的數據聚合方法，例如`mean()`、`sum()`和`max()`。對於層次化索引的數據而言，這可以通過傳遞`level`參數來控制數據沿著那個層次的索引來進行計算。

In [None]:
health_data

> Perhaps we'd like to average-out the measurements in the two visits each year. We can do this by naming the index level we'd like to explore, in this case the year:

可能我們希望能將每年測量值進行平均。我們可以用level參數指定我們需要進行聚合的標籤，這裡是年份：

In [None]:
data_mean = health_data.mean(level='year')
data_mean

> By further making use of the ``axis`` keyword, we can take the mean among levels on the columns as well:

通過額外指定`axis`關鍵字，我們可以在列上沿著某個層次`level`進行聚合：

In [None]:
data_mean.mean(axis=1, level='type')

## Aside: Panel Data

## 額外知識：Panel數據

> Pandas has a few other fundamental data structures that we have not yet discussed, namely the ``pd.Panel`` and ``pd.Panel4D`` objects.
These can be thought of, respectively, as three-dimensional and four-dimensional generalizations of the (one-dimensional) ``Series`` and (two-dimensional) ``DataFrame`` structures.
Once you are familiar with indexing and manipulation of data in a ``Series`` and ``DataFrame``, ``Panel`` and ``Panel4D`` are relatively straightforward to use.
In particular, the ``ix``, ``loc``, and ``iloc`` indexers discussed in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb) extend readily to these higher-dimensional structures.

Pandas還有一些其他的基礎數據結構我們沒有介紹到，名稱為`pd.Panel`和`pd.Panel4D`的對象。這兩個對像被認為是對應於一維的`Series`和二維的`DataFrame`相應的三維和四維的通用數據結構。一旦你熟悉了`Series`和`DataFrame`的使用方法，`Panel`和`Panel4D`的使用相對來說也是很直觀的。特別的，我們在[數據索引和選擇](03.02-Data-Indexing-and-Selection.ipynb)中介紹過的`ix`、`loc`和`iloc`索引符在高維結構中也是直接可用的。

> We won't cover these panel structures further in this text, as I've found in the majority of cases that multi-indexing is a more useful and conceptually simpler representation for higher-dimensional data.
Additionally, panel data is fundamentally a dense data representation, while multi-indexing is fundamentally a sparse data representation.
As the number of dimensions increases, the dense representation can become very inefficient for the majority of real-world datasets.
For the occasional specialized application, however, these structures can be useful.
If you'd like to read more about the ``Panel`` and ``Panel4D`` structures, see the references listed in [Further Resources](03.13-Further-Resources.ipynb).

我們不會在本書中繼續介紹Panel結構，因為作者認為在大多數情況下多重索引會更加有用，在表現高維數據時概念也會顯得更加簡單。而且更加重要的是，面板數據從基本上來說是密集數據，而多重索引從基本上來說是稀疏數據。隨著維度數量的增加，使用密集數據方式表示真實世界的數據是非常的低效的。但是對於一些特殊的應用來說，這些結構是很有用的。如果你希望獲取更多有關`Panel`和`Panel4D`結構的內容，請查閱[更多資源](03.13-Further-Resources.ipynb)。

## Importance of Hierarchical Indexing

In [None]:
# pd.Series.index?
# pd.Series.unstack?
# pd.names?
# pd.MultiIndex?

In [None]:
import pandas as pd
import numpy as np
data_hi = pd.Series(np.random.randn(9),
          index=[['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
                 [1, 2, 3, 1, 4, 1, 2, 2, 4]])
data_hi

In [None]:
data_hi.index

In [None]:
data_hi['A']
# data_hi['A':'C']
# data_hi[['A', 'C']]
# data_hi.loc[:, 1]

In [None]:
data_hi.unstack()
# data_hi.unstack(fill_value=0)
# data_hi.unstack().stack()

In [None]:
df_hi = pd.DataFrame(np.arange(12).reshape((4, 3)),
                     index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                     columns=[['one', 'one', 'three'],['Green', 'Red', 'Green']])
df_hi.index.names = ['val1', 'val2']
df_hi.columns.names = ['number', 'color']
df_hi['one']
df_hi

### How Reordering and Sorting of Index Levels Takes Place?

In [None]:
# pd.DataFrame.swaplevel?
# pd.DataFrame.sort_index?

In [None]:
import pandas as pd
import numpy as np

df_hi

In [None]:
df_hi.swaplevel('val1', 'val2', axis=0)
df_hi.swaplevel('number', 'color', axis=1)
df_hi.swaplevel(0, 1).sort_index(level=0) 

In [None]:
df_hi.sort_index(level=0)
df_hi.sort_index(level=1)

In [None]:
import pandas as pd
import numpy as np
df_c = pd.DataFrame({'a': range(7), 'b': range(14, 7, -1),
                     'c': ['one', 'one', 'one', 'two', 'two','two', 'two'],
                     'd': [0, 1, 2, 0, 1, 2, 3]})
df_c

In [None]:
df_si = df_c.set_index(['c', 'd'])
df_si

In [None]:
df_c.set_index(['c', 'd'], drop=False)

In [None]:
df_si.reset_index()

<!--NAVIGATION-->
< [处理空缺数据](03.04-Missing-Values.ipynb) | [目录](Index.ipynb) | [组合数据集：Concat 和 Append](03.06-Concat-And-Append.ipynb) >

<a href="https://colab.research.google.com/github/wangyingsm/Python-Data-Science-Handbook/blob/master/notebooks/03.05-Hierarchical-Indexing.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
