索引
：reindex方法

In [4]:
import pandas as pd
import numpy as np

In [5]:
obj = pd.Series([2.2,4.3,-1.2,5],index=["b","v","s","a"])
obj

b    2.2
v    4.3
s   -1.2
a    5.0
dtype: float64

用该Series的reindex将会根据新索引进行重排。如果某个索引值当前不存在，就引
入缺失值

In [6]:
obj2 = obj.reindex(["a","b","c","v","s"])
obj2

a    5.0
b    2.2
c    NaN
v    4.3
s   -1.2
dtype: float64

index_need = ["s","v","a","b",'w']

obj.index = index_need

obj

重新更新Seriers的index需要匹配长度，但是reindex会自动添加缺失值





对于时间序列这样的有序数据，重新索引时可能需要做一些插值处理
：method选项可以解决

In [7]:
obj3 = pd.Series(["blue","purple","yellow"],index = [0,2,4])
obj3

0      blue
2    purple
4    yellow
dtype: object

In [8]:
obj3.reindex(range(6),method='ffill')
# 用ffill可以实现前向值填充

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

In [9]:
frame = pd.DataFrame(np.arange(9).reshape((3,3)),index = ["a","c","d"],columns=["ohio","texas","cafa"])
frame

Unnamed: 0,ohio,texas,cafa
a,0,1,2
c,3,4,5
d,6,7,8


借助DataFrame，reindex可以修改（行）索引和列。只传递一个序列时，会重新索
引结果的行

In [10]:
frame2 = frame.reindex(["a","b","c","d"])
frame2


Unnamed: 0,ohio,texas,cafa
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


列可以使用columns关键字重新索引：

In [11]:
states =['texas','utah',"ohio",'cafa']
frame.reindex(columns=states)


Unnamed: 0,texas,utah,ohio,cafa
a,1,,0,2
c,4,,3,5
d,7,,6,8


reindex函数的参数

fill_value:在重新索引的过程中，需要引入缺失值时使用的替代值

limit 前向或后向填充的最大填充量

tolerance 填充不匹配项的最大间距



**丢弃指定轴上的项**
丢弃某条轴上的一个或多个项很简单，只要有一个索引数组或列表即可。由于需要
执行一些数据整理和集合逻辑，所以drop方法返回的是一个在指定轴上删除了指定
值的新对象：

In [12]:
obj = pd.Series(np.arange(5.),index=['a','b','c','d','e'])
obj


a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [13]:
new_obj = obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [14]:
obj.drop(["d","c"])

a    0.0
b    1.0
e    4.0
dtype: float64

In [15]:
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

对于DataFrame，可以删除任意轴上的索引值

In [47]:
data = pd.DataFrame(np.arange(16).reshape(4,4),index=["ohio","colorado","utah","newyork"],
                    columns=["one","two","three","four"])
data



Unnamed: 0,one,two,three,four
ohio,0,1,2,3
colorado,4,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


In [48]:
# 用标签序列调用drop会从行标签（axis 0）删除值
data.drop("ohio")


Unnamed: 0,one,two,three,four
colorado,4,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


In [42]:

data.drop("one",axis=1)
data.drop("one",axis='columns')
# 上述两者等效

Unnamed: 0,two,three,four
ohio,1,2,3
colorado,5,6,7
utah,9,10,11
newyork,13,14,15


In [49]:
data.drop("ohio",inplace=True)

In [51]:
data

Unnamed: 0,one,two,three,four
colorado,4,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


索引、选取、过滤

In [56]:
obj = pd.Series(np.arange(4.),index=('a','b','c','d'))
obj

obj[['a','c']]

a    0.0
c    2.0
dtype: float64

In [58]:
obj[[0,1]]

a    0.0
b    1.0
dtype: float64

In [60]:
obj['a':"c"]


a    0.0
b    1.0
c    2.0
dtype: float64

In [62]:
data


Unnamed: 0,one,two,three,four
colorado,4,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


In [64]:
# 通过切片或布尔型数组选取数据
data[:2]

Unnamed: 0,one,two,three,four
colorado,4,5,6,7
utah,8,9,10,11


In [66]:
data[data["three"]>5]

Unnamed: 0,one,two,three,four
colorado,4,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


过布尔型DataFrame（比如下面这个由标量比较运算得出的）进行
索引：

In [68]:
data<5

Unnamed: 0,one,two,three,four
colorado,True,False,False,False
utah,False,False,False,False
newyork,False,False,False,False


In [72]:
data[data<5] = 0
data

Unnamed: 0,one,two,three,four
colorado,0,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


**使用Ioc,iIoc进行选取数据**

对于DataFrame的行的标签索引，我引入了特殊的标签运算符loc和iloc。它们可以
让你用类似NumPy的标记，使用轴标签（loc）或整数索引（iloc），从DataFrame
选择行和列的子集

In [79]:
data

Unnamed: 0,one,two,three,four
colorado,0,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


In [80]:
data.loc['newyork',["one",'two']]

one    12
two    13
Name: newyork, dtype: int32

In [83]:
data.iloc[2,[3,0,1]]

four    15
one     12
two     13
Name: newyork, dtype: int32

In [85]:
data.iloc[2]

one      12
two      13
three    14
four     15
Name: newyork, dtype: int32

In [87]:
data.iloc[[1,2],[1,2]]

Unnamed: 0,two,three
utah,9,10
newyork,13,14


这两个索引函数也适用于一个标签或多个标签的切


In [90]:
data

Unnamed: 0,one,two,three,four
colorado,0,5,6,7
utah,8,9,10,11
newyork,12,13,14,15


In [92]:
data.loc[:'utah','two']

colorado    5
utah        9
Name: two, dtype: int32

In [96]:
data.iloc[:,:3][data["three"]>5]
data.iloc[:,:3][data.three>5]

Unnamed: 0,one,two,three
colorado,0,5,6
utah,8,9,10
newyork,12,13,14
