# 重建索引

### 1. 重建索引改变DataFrame的行列标签，获得指定行列的集合
#### reindex功能可以帮助实现很多操作
- 对现有数据重新排序以匹配一组新的标签
- 在缺少标签数据的标签位置插入缺失值（NA）标记

In [1]:
import pandas as pd
import numpy as np

N = 20

df = pd.DataFrame({
        'A':pd.date_range(start='2016-01-01',periods=N,freq='D'),
        'x':np.linspace(0,stop=N-1,num=N),
        'y':np.random.rand(N),
        'C':np.random.choice(['Low','Medium','High'],N).tolist(),
        'D':np.random.normal(100,10,size=(N)).tolist()
    })

print df.head(5),'\n'

# 更新DataFrame索引
df_reindexed = df.reindex(index=[0,2,5],columns=['A','C','B'])

# 打印结果可以看出,df_reindexed是df实例集合中的子集
# 'B'列不存在，所有元素均为NaN
print df_reindexed

           A       C           D    x         y
0 2016-01-01     Low  102.470835  0.0  0.045899
1 2016-01-02    High  104.717665  1.0  0.069420
2 2016-01-03     Low  112.804351  2.0  0.804350
3 2016-01-04     Low   97.201832  3.0  0.115151
4 2016-01-05  Medium  122.393591  4.0  0.086764 

           A    C   B
0 2016-01-01  Low NaN
2 2016-01-03  Low NaN
5 2016-01-06  Low NaN


### 2. 重建索引以便与其他对象对齐

In [2]:
# df.reindex_like
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,2),columns=['col1','col3'])

print df1,'\n'
print df2,'\n'

df1 = df1.reindex_like(df2)
# 打印结果，可以看出变换后的df1与df2拥有相同的索引与列名
print df1

       col1      col2      col3
0 -0.557205 -0.198954  0.712611
1  0.085433  0.249148  0.406792
2 -0.199009 -1.615648  1.119526
3  0.002479  0.644308  0.856208
4 -0.004188 -1.365420 -0.523404
5  0.636174 -1.903351  0.681883
6  0.866583 -0.180536  0.444527
7  0.335972 -0.549129  0.572933
8  0.250375  1.470547  1.271760
9 -0.521196  0.493034  0.111265 

       col1      col3
0  0.525968  0.242134
1  0.709595  0.213514
2 -1.829976  0.121692
3  0.001433 -0.953921
4 -0.621043  0.065054
5 -0.210973 -0.691195
6 -0.257499 -1.741614 

       col1      col3
0 -0.557205  0.712611
1  0.085433  0.406792
2 -0.199009  1.119526
3  0.002479  0.856208
4 -0.004188 -0.523404
5  0.636174  0.681883
6  0.866583  0.444527


### 3. 重建索引时填充参数
- pad/ffill -- 向前填充值
- bfill/backfill -- 向后填充值
- nearest -- 最近索引值填充

In [3]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# 自动填充NaN
print df2.reindex_like(df1),'\n'

# 现在对NaN进行填充处理的值
# 这里打印的效果等价于method='nearest'
print 'DataFrame with Forward Fill:'
print df2.reindex_like(df1,method='ffill') ,'\n'

# 如果要限制填充范围，可添加参数limit
print 'DataFrame with Forward Fill(only 2 indexes row):'
print df2.reindex_like(df1,method='ffill',limit=2)

       col1      col2      col3
0 -1.793199  1.009479 -0.550967
1  1.142674 -0.800785  1.976298
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN 

DataFrame with Forward Fill:
       col1      col2      col3
0 -1.793199  1.009479 -0.550967
1  1.142674 -0.800785  1.976298
2  1.142674 -0.800785  1.976298
3  1.142674 -0.800785  1.976298
4  1.142674 -0.800785  1.976298
5  1.142674 -0.800785  1.976298 

DataFrame with Forward Fill(only 2 indexes row):
       col1      col2      col3
0 -1.793199  1.009479 -0.550967
1  1.142674 -0.800785  1.976298
2  1.142674 -0.800785  1.976298
3  1.142674 -0.800785  1.976298
4       NaN       NaN       NaN
5       NaN       NaN       NaN
