# Pandas的分层索引MultiIndex

为什么要学习分层索引MultiIndex？

* 分层索引：在一个轴向上拥有多个索引层级，可以表达更高维度数据的形式;
* 可以更方便的进行数据筛选，如果有序则性能更好;
* groupby等操作的结果，如果是多KEY，结果是分层索引，需要会使用;
* 一般不需要自己创建分层索引(MultiIndex有构造函数但一般不用)

演示数据：百度、阿里巴巴、爱奇艺、京东四家公司的股票数据<br>
数据来源：英为财经<br>
https://cn.investing.com/

---
演示提纲:<br>
一. Series的分层索引MultiIndex<br>
二. Series的分层索引怎么筛选数据？<br>
三. DataFrame的多层索引MultiIndex<br>
四. DataFrame有多层索引怎样筛选数据？

---

In [1]:
import pandas as pd
%matplotlib inline

In [3]:
stocks = pd.read_excel('../datas/股票.xlsx')

In [4]:
stocks.shape

(12, 8)

In [5]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2023-07-12,JD,37.41,37.34,37.83,36.91,11.42M,0.0386
1,2023-07-11,JD,36.02,35.9,36.39,35.31,6.95M,0.0019
2,2023-07-10,JD,35.95,35.3,36.15,35.06,8.33M,0.0053
3,2023-07-12,BABA,94.0,94.11,95.03,92.55,23.78M,0.0241
4,2023-07-11,BABA,91.79,91.02,92.32,89.01,19.74M,0.0136


In [6]:
stocks["公司"].unique()

array(['JD', 'BABA', 'BIDU', 'IQ'], dtype=object)

In [7]:
stocks.index

RangeIndex(start=0, stop=12, step=1)

In [8]:
stocks.groupby('公司')["收盘"].mean()

公司
BABA     92.116667
BIDU    145.036667
IQ        5.350000
JD       36.460000
Name: 收盘, dtype: float64

## 一. Series的分层索引MultiIndex

In [10]:
ser = stocks.groupby(['公司','日期'])['收盘'].mean()
ser

公司    日期        
BABA  2023-07-10     90.56
      2023-07-11     91.79
      2023-07-12     94.00
BIDU  2023-07-10    142.95
      2023-07-11    143.33
      2023-07-12    148.83
IQ    2023-07-10      5.12
      2023-07-11      5.23
      2023-07-12      5.70
JD    2023-07-10     35.95
      2023-07-11     36.02
      2023-07-12     37.41
Name: 收盘, dtype: float64

---
多维索引中，空白的意思是：使用上面的值

---

In [11]:
ser.index

MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], [2023-07-10 00:00:00, 2023-07-11 00:00:00, 2023-07-12 00:00:00]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]],
           names=['公司', '日期'])

In [12]:
# unstack把二级索引变成列
ser.unstack()

日期,2023-07-10 00:00:00,2023-07-11 00:00:00,2023-07-12 00:00:00
公司,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BABA,90.56,91.79,94.0
BIDU,142.95,143.33,148.83
IQ,5.12,5.23,5.7
JD,35.95,36.02,37.41


In [13]:
ser

公司    日期        
BABA  2023-07-10     90.56
      2023-07-11     91.79
      2023-07-12     94.00
BIDU  2023-07-10    142.95
      2023-07-11    143.33
      2023-07-12    148.83
IQ    2023-07-10      5.12
      2023-07-11      5.23
      2023-07-12      5.70
JD    2023-07-10     35.95
      2023-07-11     36.02
      2023-07-12     37.41
Name: 收盘, dtype: float64

In [14]:
ser.reset_index()

Unnamed: 0,公司,日期,收盘
0,BABA,2023-07-10,90.56
1,BABA,2023-07-11,91.79
2,BABA,2023-07-12,94.0
3,BIDU,2023-07-10,142.95
4,BIDU,2023-07-11,143.33
5,BIDU,2023-07-12,148.83
6,IQ,2023-07-10,5.12
7,IQ,2023-07-11,5.23
8,IQ,2023-07-12,5.7
9,JD,2023-07-10,35.95


## 二. Series的分层索引怎么筛选数据？

In [15]:
ser

公司    日期        
BABA  2023-07-10     90.56
      2023-07-11     91.79
      2023-07-12     94.00
BIDU  2023-07-10    142.95
      2023-07-11    143.33
      2023-07-12    148.83
IQ    2023-07-10      5.12
      2023-07-11      5.23
      2023-07-12      5.70
JD    2023-07-10     35.95
      2023-07-11     36.02
      2023-07-12     37.41
Name: 收盘, dtype: float64

In [16]:
ser.loc['BIDU']

日期
2023-07-10    142.95
2023-07-11    143.33
2023-07-12    148.83
Name: 收盘, dtype: float64

In [18]:
# 多层索引，可以用元组的形式筛选
ser.loc[('BIDU','2023-07-11')]

143.33

In [19]:
ser.loc[:,'2023-07-11']

公司
BABA     91.79
BIDU    143.33
IQ        5.23
JD       36.02
Name: 收盘, dtype: float64

### 三. DataFrame的多层索引MultiIndex

In [20]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2023-07-12,JD,37.41,37.34,37.83,36.91,11.42M,0.0386
1,2023-07-11,JD,36.02,35.9,36.39,35.31,6.95M,0.0019
2,2023-07-10,JD,35.95,35.3,36.15,35.06,8.33M,0.0053
3,2023-07-12,BABA,94.0,94.11,95.03,92.55,23.78M,0.0241
4,2023-07-11,BABA,91.79,91.02,92.32,89.01,19.74M,0.0136


In [22]:
stocks.set_index(['公司','日期'],inplace=True)
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
JD,2023-07-12,37.41,37.34,37.83,36.91,11.42M,0.0386
JD,2023-07-11,36.02,35.9,36.39,35.31,6.95M,0.0019
JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053
BABA,2023-07-12,94.0,94.11,95.03,92.55,23.78M,0.0241
BABA,2023-07-11,91.79,91.02,92.32,89.01,19.74M,0.0136
BABA,2023-07-10,90.56,90.05,92.04,89.6,25.19M,0.0001
BIDU,2023-07-12,148.83,147.44,150.42,145.5,2.41M,0.0384
BIDU,2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
IQ,2023-07-12,5.7,5.43,5.78,5.41,19.24M,0.0899


In [23]:
stocks.index

MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], [2023-07-10 00:00:00, 2023-07-11 00:00:00, 2023-07-12 00:00:00]],
           labels=[[3, 3, 3, 0, 0, 0, 1, 1, 1, 2, 2, 2], [2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0]],
           names=['公司', '日期'])

In [24]:
stocks.sort_index(inplace=True)
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2023-07-10,90.56,90.05,92.04,89.6,25.19M,0.0001
BABA,2023-07-11,91.79,91.02,92.32,89.01,19.74M,0.0136
BABA,2023-07-12,94.0,94.11,95.03,92.55,23.78M,0.0241
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
BIDU,2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
BIDU,2023-07-12,148.83,147.44,150.42,145.5,2.41M,0.0384
IQ,2023-07-10,5.12,5.1,5.19,5.02,8.52M,-0.0078
IQ,2023-07-11,5.23,5.15,5.3,5.12,5.23M,0.0215
IQ,2023-07-12,5.7,5.43,5.78,5.41,19.24M,0.0899
JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053


## 四. DataFrame有多层索引怎样筛选数据？

**【重要知识】**在选择数据时：
* 元组(key1,key2)代表筛选多层索引，其中key1是索引第一级，key2是第二级，比如key1=JD，key2=2023-07-11;
* 列表key1，key2代表同一层的多个KEY，其中key1和key2是并列的同级索引，比如key1=JD，key2=BIDU;
---

In [26]:
stocks.loc['BIDU']

Unnamed: 0_level_0,收盘,开盘,高,低,交易量,涨跌幅
日期,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
2023-07-12,148.83,147.44,150.42,145.5,2.41M,0.0384


In [27]:
stocks.loc[('BIDU','2023-07-10'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002


In [28]:
stocks.loc[('BIDU','2023-07-10'),'开盘']

公司    日期        
BIDU  2023-07-10    140.72
Name: 开盘, dtype: float64

In [29]:
stocks.loc[['BIDU','JD'],:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
BIDU,2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
BIDU,2023-07-12,148.83,147.44,150.42,145.5,2.41M,0.0384
JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053
JD,2023-07-11,36.02,35.9,36.39,35.31,6.95M,0.0019
JD,2023-07-12,37.41,37.34,37.83,36.91,11.42M,0.0386


In [30]:
stocks.loc[(['BIDU','JD'],'2023-07-10'),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053


In [31]:
stocks.loc[(['BIDU','JD'],'2023-07-10'),'收盘']

公司    日期        
BIDU  2023-07-10    142.95
JD    2023-07-10     35.95
Name: 收盘, dtype: float64

In [32]:
stocks.loc[('BIDU',['2023-07-10','2023-07-11']),'收盘']

公司    日期        
BIDU  2023-07-10    142.95
      2023-07-11    143.33
Name: 收盘, dtype: float64

In [34]:
# slice(None)代表筛选这一索引的所有内容
stocks.loc[(slice(None),['2023-07-10','2023-07-11']),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2023-07-10,90.56,90.05,92.04,89.6,25.19M,0.0001
BABA,2023-07-11,91.79,91.02,92.32,89.01,19.74M,0.0136
BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
BIDU,2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
IQ,2023-07-10,5.12,5.1,5.19,5.02,8.52M,-0.0078
IQ,2023-07-11,5.23,5.15,5.3,5.12,5.23M,0.0215
JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053
JD,2023-07-11,36.02,35.9,36.39,35.31,6.95M,0.0019


In [35]:
stocks.reset_index()

Unnamed: 0,公司,日期,收盘,开盘,高,低,交易量,涨跌幅
0,BABA,2023-07-10,90.56,90.05,92.04,89.6,25.19M,0.0001
1,BABA,2023-07-11,91.79,91.02,92.32,89.01,19.74M,0.0136
2,BABA,2023-07-12,94.0,94.11,95.03,92.55,23.78M,0.0241
3,BIDU,2023-07-10,142.95,140.72,143.95,140.12,887.82K,0.002
4,BIDU,2023-07-11,143.33,143.23,144.45,140.01,960.45K,0.0027
5,BIDU,2023-07-12,148.83,147.44,150.42,145.5,2.41M,0.0384
6,IQ,2023-07-10,5.12,5.1,5.19,5.02,8.52M,-0.0078
7,IQ,2023-07-11,5.23,5.15,5.3,5.12,5.23M,0.0215
8,IQ,2023-07-12,5.7,5.43,5.78,5.41,19.24M,0.0899
9,JD,2023-07-10,35.95,35.3,36.15,35.06,8.33M,0.0053
