## Pandas的MutliIndex
为什么要学习分层索引MultiIndex?   
+ 分层索引，在一个轴向熵有多个索引层级，可以表达更高维的数据表示
+ 可以更方便的进行数据筛选，如果有序性能更加好
+ 一般不需要自己创键分层索引

### 本次目标
+ series的分层索引
+ series有多层索引怎样筛选数据
+ DataFrame的多层索引MultiIndex
+ Dataframe有多层索引怎样筛选数据

In [1]:
import pandas as pd
%matplotlib inline
stocks=pd.read_excel("./datas/stocks/互联网公司股票.xlsx")
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2019-10-03,BIDU,104.32,102.35,104.73,101.15,2.24,0.02
1,2019-10-02,BIDU,102.62,100.85,103.24,99.5,2.69,0.01
2,2019-10-01,BIDU,102.0,102.8,103.26,101.0,1.78,-0.01
3,2019-10-03,BABA,169.48,166.65,170.18,165.0,10.39,0.02
4,2019-10-02,BABA,165.77,162.82,166.88,161.9,11.6,0.0


In [2]:
stocks.shape

(12, 8)

In [3]:
# 公司的平均收盘价格
stocks.groupby("公司")["开盘"].mean()

公司
BABA    165.826667
BIDU    102.000000
IQ       15.900000
JD       28.110000
Name: 开盘, dtype: float64

### Series的分层索引MultiIndex


In [4]:
s=stocks.groupby(["公司","日期"])["收盘"].mean()
type(s)  

pandas.core.series.Series

为什么是series呢 ？  
因为公司，日期相当于二级索引，查询所得数是"收盘"

In [5]:
s.reset_index()  # 恢复原来的样子

Unnamed: 0,公司,日期,收盘
0,BABA,2019-10-01,165.15
1,BABA,2019-10-02,165.77
2,BABA,2019-10-03,169.48
3,BIDU,2019-10-01,102.0
4,BIDU,2019-10-02,102.62
5,BIDU,2019-10-03,104.32
6,IQ,2019-10-01,15.92
7,IQ,2019-10-02,15.72
8,IQ,2019-10-03,16.06
9,JD,2019-10-01,28.19


### Series有多层MultiIndex怎样筛选数据？


In [6]:
s

公司    日期        
BABA  2019-10-01    165.15
      2019-10-02    165.77
      2019-10-03    169.48
BIDU  2019-10-01    102.00
      2019-10-02    102.62
      2019-10-03    104.32
IQ    2019-10-01     15.92
      2019-10-02     15.72
      2019-10-03     16.06
JD    2019-10-01     28.19
      2019-10-02     28.06
      2019-10-03     28.80
Name: 收盘, dtype: float64

In [7]:
s.loc["BIDU"]

日期
2019-10-01    102.00
2019-10-02    102.62
2019-10-03    104.32
Name: 收盘, dtype: float64

In [8]:
# 多层索引，采用元组形式
s.loc[("BIDU","2019-10-02")]

102.62

In [9]:
# 二级索引，相当于第二列是横着的
s.loc[:,"2019-10-01"]

公司
BABA    165.15
BIDU    102.00
IQ       15.92
JD       28.19
Name: 收盘, dtype: float64

### DataFrame的多层索引MultiIndex

In [10]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2019-10-03,BIDU,104.32,102.35,104.73,101.15,2.24,0.02
1,2019-10-02,BIDU,102.62,100.85,103.24,99.5,2.69,0.01
2,2019-10-01,BIDU,102.0,102.8,103.26,101.0,1.78,-0.01
3,2019-10-03,BABA,169.48,166.65,170.18,165.0,10.39,0.02
4,2019-10-02,BABA,165.77,162.82,166.88,161.9,11.6,0.0


In [11]:
# 把公司和日期作为多级索引
stocks.set_index(["公司","日期"],inplace=True)
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
BABA,2019-10-01,165.15,168.01,168.23,163.64,14.19,-0.01
IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
IQ,2019-10-01,15.92,16.14,16.22,15.5,11.65,-0.01
JD,2019-10-03,28.8,28.11,28.97,27.82,8.77,0.03


In [13]:
stocks.index

MultiIndex([('BIDU', '2019-10-03'),
            ('BIDU', '2019-10-02'),
            ('BIDU', '2019-10-01'),
            ('BABA', '2019-10-03'),
            ('BABA', '2019-10-02'),
            ('BABA', '2019-10-01'),
            (  'IQ', '2019-10-03'),
            (  'IQ', '2019-10-02'),
            (  'IQ', '2019-10-01'),
            (  'JD', '2019-10-03'),
            (  'JD', '2019-10-02'),
            (  'JD', '2019-10-01')],
           names=['公司', '日期'])

In [14]:
stocks.sort_index(inplace=True)

In [15]:
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2019-10-01,165.15,168.01,168.23,163.64,14.19,-0.01
BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
IQ,2019-10-01,15.92,16.14,16.22,15.5,11.65,-0.01
IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
JD,2019-10-01,28.19,28.22,28.57,27.97,10.64,0.0


### DataFrame有多层索引MultiIndex怎样筛选数据
**重要知识点**   
+ 原则（key1,key2)代表筛选多层索引，其中key1是第一级，key2是第二级别
+ 列表\[key1,key2\]代表同一层的多个key,是同级索引

In [16]:
stocks.loc["BIDU"]

Unnamed: 0_level_0,收盘,开盘,高,低,交易量,涨跌幅
日期,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02


In [19]:
# 获取百度，2019-10-01的所有数据
s2=stocks.loc[("BIDU","2019-10-01"),:]
type(s2)

pandas.core.series.Series

In [21]:
s2.index

Index(['收盘', '开盘', '高', '低', '交易量', '涨跌幅'], dtype='object')

In [22]:
#  获取百度，京东某一天的数据
stocks.loc[(["BIDU","JD"],"2019-10-03"),"开盘"]

公司    日期        
BIDU  2019-10-03    102.35
JD    2019-10-03     28.11
Name: 开盘, dtype: float64

In [23]:
stocks.reset_index()

Unnamed: 0,公司,日期,收盘,开盘,高,低,交易量,涨跌幅
0,BABA,2019-10-01,165.15,168.01,168.23,163.64,14.19,-0.01
1,BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
2,BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
3,BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
4,BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
5,BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
6,IQ,2019-10-01,15.92,16.14,16.22,15.5,11.65,-0.01
7,IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
8,IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
9,JD,2019-10-01,28.19,28.22,28.57,27.97,10.64,0.0
