# 缺失值的处理

Pandas模块支持对有缺失值的数据进行相关处理。导入相关模块：

In [1]:
import numpy as np

In [2]:
import pandas as pd

In [3]:
df = pd.DataFrame(np.random.randn(4, 5), columns=list("abcde"))

DataFrame的行标记可以通过赋值修改：

In [4]:
df.index = pd.date_range("20000101", periods=4)

In [5]:
df.iloc[[2, 3], [3, 4]] = np.nan

In [6]:
df

Unnamed: 0,a,b,c,d,e
2000-01-01,2.428019,2.317165,1.186934,-0.208601,-1.278155
2000-01-02,1.040802,-2.269797,0.580195,-0.24162,-0.620173
2000-01-03,-2.3925,0.298097,0.407541,,
2000-01-04,-0.945231,-0.258435,-0.290456,,


可以使用`.dropna()`方法去掉所有包含缺失值的行，得到一个新的DataFrame：

In [7]:
df.dropna(how="any")

Unnamed: 0,a,b,c,d,e
2000-01-01,2.428019,2.317165,1.186934,-0.208601,-1.278155
2000-01-02,1.040802,-2.269797,0.580195,-0.24162,-0.620173


how参数设为“any”表示只要该行有缺失值，就会被去掉，如果换成“all”，则表示只有该行全部缺失时才去掉。.dropna()方法还可以通过axis参数指定对行还是对列进行操作，默认值为0，即对行；如果要对列进行操作，可以将axis参数设为1：

In [8]:
df.dropna(axis=1, how="any")

Unnamed: 0,a,b,c
2000-01-01,2.428019,2.317165,1.186934
2000-01-02,1.040802,-2.269797,0.580195
2000-01-03,-2.3925,0.298097,0.407541
2000-01-04,-0.945231,-0.258435,-0.290456


也可以用`.fill_na()`方法为缺失值补上默认值：

In [9]:
df.fillna(value=100)

Unnamed: 0,a,b,c,d,e
2000-01-01,2.428019,2.317165,1.186934,-0.208601,-1.278155
2000-01-02,1.040802,-2.269797,0.580195,-0.24162,-0.620173
2000-01-03,-2.3925,0.298097,0.407541,100.0,100.0
2000-01-04,-0.945231,-0.258435,-0.290456,100.0,100.0


这两种方法都返回一个新的DataFrame对象，对原对象不产生影响：

In [10]:
df

Unnamed: 0,a,b,c,d,e
2000-01-01,2.428019,2.317165,1.186934,-0.208601,-1.278155
2000-01-02,1.040802,-2.269797,0.580195,-0.24162,-0.620173
2000-01-03,-2.3925,0.298097,0.407541,,
2000-01-04,-0.945231,-0.258435,-0.290456,,
