In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

### series reindex

In [2]:
s1 = Series([1,2,3,4], index=['A','B','C','D'])

>这里的index是label标签值

In [3]:
s1

A    1
B    2
C    3
D    4
dtype: int64

如何查看函数的help帮助信息: `shift + Tab`

In [4]:
s1.reindex(index=['A','B','C','D','E'])

A    1.0
B    2.0
C    3.0
D    4.0
E    NaN
dtype: float64

>上面代码会生成一个新的Series，E的这一行会变成NaN

In [5]:
s1.reindex(index=['A','B','C','D','E'], fill_value=10)

A     1
B     2
C     3
D     4
E    10
dtype: int64

>通过fill_value来为reindex之后的series填充缺失值.fill)value会将所有NAN值全部填充

#### 另一种reindex的方法

In [6]:
s2 = Series(['A','B','C'], index=[1,5,10])

In [7]:
s2

1     A
5     B
10    C
dtype: object

In [8]:
s2.reindex(index=range(15))

0     NaN
1       A
2     NaN
3     NaN
4     NaN
5       B
6     NaN
7     NaN
8     NaN
9     NaN
10      C
11    NaN
12    NaN
13    NaN
14    NaN
dtype: object

>可以看到原本不存在的index对应的值是NAN

In [9]:
s2.reindex(index=range(15),method='ffill')

0     NaN
1       A
2       A
3       A
4       A
5       B
6       B
7       B
8       B
9       B
10      C
11      C
12      C
13      C
14      C
dtype: object

在1-5之间的值按原本的这块区域的存在值进行填充。

- 0没有被赋值。从一个值到下一个值之间填充该值


### dataframe的reindex

In [10]:
df1 = DataFrame(np.random.rand(25).reshape([5,5]), index=['A','B','D','E','F'], columns=['c1','c2','c3','c4','c5'])

In [11]:
df1

Unnamed: 0,c1,c2,c3,c4,c5
A,0.310392,0.177483,0.594355,0.408403,0.569392
B,0.569535,0.784958,0.60509,0.886101,0.07329
D,0.198628,0.606571,0.529456,0.370306,0.563255
E,0.658877,0.709507,0.32806,0.576185,0.993285
F,0.237877,0.804067,0.778489,0.604893,0.109284


我们在创建这个dataframe的时候故意把C漏了，然后创建出来一个错误的。

这时我们的任务就是将这个index进行改正。

- 查看dataframe的reindex信息可以看到可传入的参数多了一个

>既有index ，又可以传入columns

In [12]:
df1.reindex(index=['A','B','C','D','E','F'])

Unnamed: 0,c1,c2,c3,c4,c5
A,0.310392,0.177483,0.594355,0.408403,0.569392
B,0.569535,0.784958,0.60509,0.886101,0.07329
C,,,,,
D,0.198628,0.606571,0.529456,0.370306,0.563255
E,0.658877,0.709507,0.32806,0.576185,0.993285
F,0.237877,0.804067,0.778489,0.604893,0.109284


可以看到我们的C这一行的值被填充为NaN

- 下面我们来修改我们的columns

In [13]:
df1.reindex(columns=['c1','c2','c3','c4','c5','c6'])

Unnamed: 0,c1,c2,c3,c4,c5,c6
A,0.310392,0.177483,0.594355,0.408403,0.569392,
B,0.569535,0.784958,0.60509,0.886101,0.07329,
D,0.198628,0.606571,0.529456,0.370306,0.563255,
E,0.658877,0.709507,0.32806,0.576185,0.993285,
F,0.237877,0.804067,0.778489,0.604893,0.109284,


当然我们也可以同时改变index和columns。

In [14]:
df1.reindex(index=['A','B','C','D','E','F'], columns=['c1','c2','c3','c4','c5','c6'])

Unnamed: 0,c1,c2,c3,c4,c5,c6
A,0.310392,0.177483,0.594355,0.408403,0.569392,
B,0.569535,0.784958,0.60509,0.886101,0.07329,
C,,,,,,
D,0.198628,0.606571,0.529456,0.370306,0.563255,
E,0.658877,0.709507,0.32806,0.576185,0.993285,
F,0.237877,0.804067,0.778489,0.604893,0.109284,


上面的示例中我们都是在添加index，而如果我们减少我们的index呢？

>下面代码进行探究

In [15]:
s1

A    1
B    2
C    3
D    4
dtype: int64

In [16]:
s1.reindex(index=['A','B'])

A    1
B    2
dtype: int64

In [17]:
df1.reindex(index=['A','B'])

Unnamed: 0,c1,c2,c3,c4,c5
A,0.310392,0.177483,0.594355,0.408403,0.569392
B,0.569535,0.784958,0.60509,0.886101,0.07329


In [18]:
df1.reindex(columns=['c1','c2'])

Unnamed: 0,c1,c2
A,0.310392,0.177483
B,0.569535,0.784958
D,0.198628,0.606571
E,0.658877,0.709507
F,0.237877,0.804067


### drop删除我们的index

In [19]:
s1.drop('A')

B    2
C    3
D    4
dtype: int64

In [20]:
df1.drop('A', axis=0)

Unnamed: 0,c1,c2,c3,c4,c5
B,0.569535,0.784958,0.60509,0.886101,0.07329
D,0.198628,0.606571,0.529456,0.370306,0.563255
E,0.658877,0.709507,0.32806,0.576185,0.993285
F,0.237877,0.804067,0.778489,0.604893,0.109284


```
Signature: df1.drop(labels, axis=0, level=None, inplace=False, errors='raise')
```

到底是删除index，还是删除columns。是由axis来指定的。

删除行时，因为默认值就是0，不需要指定。

In [21]:
df1.drop('c1', axis=1)

Unnamed: 0,c2,c3,c4,c5
A,0.177483,0.594355,0.408403,0.569392
B,0.784958,0.60509,0.886101,0.07329
D,0.606571,0.529456,0.370306,0.563255
E,0.709507,0.32806,0.576185,0.993285
F,0.804067,0.778489,0.604893,0.109284
