# 重建索引

### 1. 重建索引改变DataFrame的行列标签，获得指定行列的集合
#### reindex功能可以帮助实现很多操作
- 对现有数据重新排序以匹配一组新的标签
- 在缺少标签数据的标签位置插入缺失值（NA）标记

In [1]:
import pandas as pd
import numpy as np

N = 20

df = pd.DataFrame({
        'A':pd.date_range(start='2016-01-01',periods=N,freq='D'),
        'x':np.linspace(0,stop=N-1,num=N),
        'y':np.random.rand(N),
        'C':np.random.choice(['Low','Medium','High'],N).tolist(),
    })

df.head(5)

Unnamed: 0,A,C,x,y
0,2016-01-01,Medium,0.0,0.086927
1,2016-01-02,High,1.0,0.363968
2,2016-01-03,Medium,2.0,0.83958
3,2016-01-04,Low,3.0,0.12213
4,2016-01-05,Medium,4.0,0.393594


In [2]:
# 更新DataFrame索引
df_reindexed = df.reindex(index=[0,2,5],columns=['A','C','B'])

# 打印结果可以看出,df_reindexed是df实例集合中的子集
# 'B'列不存在，所有元素均为NaN
df_reindexed

Unnamed: 0,A,C,B
0,2016-01-01,Medium,
2,2016-01-03,Medium,
5,2016-01-06,Medium,


### 2. 重建索引以便与其他对象对齐  
有时候可能希望将一个对象轴的索引标记转换为与另一个对象相同的相同，这时候可以调用reindex_like(other_obj)简化操作

In [3]:
df1 = pd.DataFrame(np.random.randn(5,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(3,2),columns=['col1','col3'])

df1

Unnamed: 0,col1,col2,col3
0,0.316469,-0.087153,0.60896
1,2.386163,0.178221,1.025812
2,-1.733944,0.370595,-0.194925
3,0.268836,0.201425,-0.395783
4,-1.328632,-1.378002,-0.347446


In [4]:
df2

Unnamed: 0,col1,col3
0,-0.963156,-0.841239
1,3.06234,0.91475
2,1.670868,0.24785


In [5]:
df3 = df1.reindex_like(df2)

# 从结果可以看出变换后的df1与df2拥有相同的索引与列
df3

Unnamed: 0,col1,col3
0,0.316469,0.60896
1,2.386163,1.025812
2,-1.733944,-0.194925


### 3. 重建索引时填充参数
- pad/ffill -- 向前填充值
- bfill/backfill -- 向后填充值
- nearest -- 最近索引值填充

In [6]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# 自动填充NaN
df3 = df2.reindex_like(df1)
df3

Unnamed: 0,col1,col2,col3
0,-0.628605,0.351178,-1.489034
1,-2.557107,-0.059435,-0.324165
2,,,
3,,,
4,,,
5,,,


In [7]:
# 现在对NaN进行前向填充处理的值
# 这里打印的效果等价于method='nearest'
print 'DataFrame with Forward Fill:'
df3 = df2.reindex_like(df1,method='ffill') 
df3

DataFrame with Forward Fill:


Unnamed: 0,col1,col2,col3
0,-0.628605,0.351178,-1.489034
1,-2.557107,-0.059435,-0.324165
2,-2.557107,-0.059435,-0.324165
3,-2.557107,-0.059435,-0.324165
4,-2.557107,-0.059435,-0.324165
5,-2.557107,-0.059435,-0.324165


In [8]:
# 如果要限制填充范围，可添加参数limit
print 'DataFrame with Forward Fill(only 2 indexes row):'
df3 = df2.reindex_like(df1,method='ffill',limit=2)
df3

DataFrame with Forward Fill(only 2 indexes row):


Unnamed: 0,col1,col2,col3
0,-0.628605,0.351178,-1.489034
1,-2.557107,-0.059435,-0.324165
2,-2.557107,-0.059435,-0.324165
3,-2.557107,-0.059435,-0.324165
4,,,
5,,,


In [9]:
# 使用另一个对象的索引
print 'Using index of other object:'
df3 = df2.reindex(df1.index,method='nearest')
df3

Using index of other object:


Unnamed: 0,col1,col2,col3
0,-0.628605,0.351178,-1.489034
1,-2.557107,-0.059435,-0.324165
2,-2.557107,-0.059435,-0.324165
3,-2.557107,-0.059435,-0.324165
4,-2.557107,-0.059435,-0.324165
5,-2.557107,-0.059435,-0.324165


### 4. set_index()和reset_index()
set_index()函数的作用是将一个或多个现有列设置为DataFrame索引（行标签），默认产生一个新的对象。
使用[1]中的DataFrame举例如下：

In [10]:
df.head()

Unnamed: 0,A,C,x,y
0,2016-01-01,Medium,0.0,0.086927
1,2016-01-02,High,1.0,0.363968
2,2016-01-03,Medium,2.0,0.83958
3,2016-01-04,Low,3.0,0.12213
4,2016-01-05,Medium,4.0,0.393594


In [11]:
# 使用一列创建索引
df4 = df.set_index('A')
df4.head()

Unnamed: 0_level_0,C,x,y
A,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-01-01,Medium,0.0,0.086927
2016-01-02,High,1.0,0.363968
2016-01-03,Medium,2.0,0.83958
2016-01-04,Low,3.0,0.12213
2016-01-05,Medium,4.0,0.393594


In [12]:
# 可以将索引名称设置为None，去掉'A'
df4.index.name = None
df4.head()

Unnamed: 0,C,x,y
2016-01-01,Medium,0.0,0.086927
2016-01-02,High,1.0,0.363968
2016-01-03,Medium,2.0,0.83958
2016-01-04,Low,3.0,0.12213
2016-01-05,Medium,4.0,0.393594


In [13]:
# 使用多列创建索引
df4 = df.set_index(['x','y'])
df4.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,A,C
x,y,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,0.086927,2016-01-01,Medium
1.0,0.363968,2016-01-02,High
2.0,0.83958,2016-01-03,Medium
3.0,0.12213,2016-01-04,Low
4.0,0.393594,2016-01-05,Medium
