#### Reindexing

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

    1. Reorder the existing data to match a new set of labels.
    2. Insert missing value (NA) markers in label locations where no data for the label existed.


In [2]:
import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})

print(df)

#reindex the DataFrame
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])

print(df_reindexed)

            A     x         y       C           D
0  2016-01-01   0.0  0.577828     Low   94.133983
1  2016-01-02   1.0  0.288539     Low   82.772606
2  2016-01-03   2.0  0.683746     Low   97.408563
3  2016-01-04   3.0  0.787850    High   79.782049
4  2016-01-05   4.0  0.834927  Medium   77.590439
5  2016-01-06   5.0  0.632270     Low   90.722540
6  2016-01-07   6.0  0.884251  Medium   91.891968
7  2016-01-08   7.0  0.134866    High   91.011383
8  2016-01-09   8.0  0.516954     Low  101.160012
9  2016-01-10   9.0  0.481512    High   76.754845
10 2016-01-11  10.0  0.611139    High  100.637506
11 2016-01-12  11.0  0.215090  Medium   94.453828
12 2016-01-13  12.0  0.937988  Medium  112.279840
13 2016-01-14  13.0  0.104707    High   93.999147
14 2016-01-15  14.0  0.237414  Medium  112.562033
15 2016-01-16  15.0  0.985195  Medium  101.577778
16 2016-01-17  16.0  0.998506  Medium   97.092308
17 2016-01-18  17.0  0.637261  Medium  102.950779
18 2016-01-19  18.0  0.337014     Low  116.025620


##### Reindex to Align with Other Objects

To take an object and reindex its axes to be labeled the same as another object.

In [10]:
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
df3 = pd.DataFrame(np.random.randn(5,3),columns=['col11','col12','col3'])

print (df1)
print ("-------------------------------")
print (df2)
print ("-------------------------------")
df1 = df1.reindex_like(df2)
print (df1)
print ("-------------------------------")
df1 = df1.reindex_like(df3)
print (df1)

       col1      col2      col3
0 -0.787444 -1.312562  0.788321
1 -0.270340  0.177002 -2.198987
2 -0.725080 -0.377264  0.358572
3  0.082100 -0.805059  0.328545
4 -1.246865 -0.577428 -0.405740
5 -0.481195  0.619955 -2.264124
6 -1.171488  2.890562  0.889989
7 -0.132826  0.791751 -2.360088
8  0.810019 -0.071424 -0.469512
9 -0.601577 -1.253190 -1.507150
-------------------------------
       col1      col2      col3
0 -0.008254 -0.706958  0.745468
1  1.434887  2.102366  1.275282
2  0.211964  0.707461  0.765294
3 -1.134213  0.109203 -0.173895
4  1.657193 -0.090760 -0.322789
5  0.536147 -1.193563  0.086529
6  0.797502  0.588491  0.520362
-------------------------------
       col1      col2      col3
0 -0.787444 -1.312562  0.788321
1 -0.270340  0.177002 -2.198987
2 -0.725080 -0.377264  0.358572
3  0.082100 -0.805059  0.328545
4 -1.246865 -0.577428 -0.405740
5 -0.481195  0.619955 -2.264124
6 -1.171488  2.890562  0.889989
-------------------------------
   col11  col12      col3
0    NaN    Na

##### Filling while ReIndexing
reindex() takes an optional parameter method which is a filling method with values as follows −

1. pad/ffill − Fill values forward
2. bfill/backfill − Fill values backward
3. nearest − Fill from the nearest index values


In [12]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

print(df1)
print("-----------------------------")
print(df2)
print("-----------------------------")

# Padding NAN's
print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values
print("Data Frame with Forward Fill:")
print(df2.reindex_like(df1,method='ffill'))

       col1      col2      col3
0  0.170535  0.639968 -0.826371
1  1.176821 -0.027197 -0.865446
2 -0.980751  1.625008  0.484655
3  0.579718  1.043432  0.441946
4  0.159697  0.037367 -0.544570
5 -0.781826 -0.311292  0.177583
-----------------------------
       col1      col2      col3
0  0.241973  1.060727  0.667641
1  0.263910 -1.631842 -1.144406
-----------------------------
       col1      col2      col3
0  0.241973  1.060727  0.667641
1  0.263910 -1.631842 -1.144406
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0  0.241973  1.060727  0.667641
1  0.263910 -1.631842 -1.144406
2  0.263910 -1.631842 -1.144406
3  0.263910 -1.631842 -1.144406
4  0.263910 -1.631842 -1.144406
5  0.263910 -1.631842 -1.144406


##### Limits on Filling while Reindexing
The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches.

In [18]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print (df2.reindex_like(df1,method='ffill',limit=1))

       col1      col2      col3
0  1.036803 -0.173416 -0.955774
1  2.213998 -1.071261 -1.517364
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill limiting to 1:
       col1      col2      col3
0  1.036803 -0.173416 -0.955774
1  2.213998 -1.071261 -1.517364
2  2.213998 -1.071261 -1.517364
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN


##### Renaming

The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function.

In [23]:
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print (df1)

print ("After renaming the rows and columns:")
print (df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))

       col1      col2      col3
0  1.095798 -1.071453 -0.259364
1  0.877684  0.558646 -0.785188
2  0.055288 -0.608171  0.706043
3 -0.369797  0.914071  0.571440
4 -1.230212 -0.752459 -0.372494
5  1.807578 -2.742196  0.178473
After renaming the rows and columns:
              c1        c2      col3
apple   1.095798 -1.071453 -0.259364
banana  0.877684  0.558646 -0.785188
durian  0.055288 -0.608171  0.706043
3      -0.369797  0.914071  0.571440
4      -1.230212 -0.752459 -0.372494
5       1.807578 -2.742196  0.178473


The rename() method provides an inplace named parameter, which by default is False and copies the underlying data. Pass inplace=True to rename the data in place.

None
