# Pandas 缺失值处理
- 缺失值在 pandas 中用 np.nan 、 NaT(时间缺失) 来进行标识.
- 空值用`""`表示
## 创建

In [20]:
import numpy as np
import pandas as pd

df = pd.DataFrame({ "name": ['zs', 'ls', 'wu'],
                    "bc": [np.nan, 'b', 'c'],
                    "time": [pd.NaT, pd.Timestamp("1940-04-25"),pd.NaT]})

df  

Unnamed: 0,name,bc,time
0,zs,,NaT
1,ls,b,1940-04-25
2,wu,c,NaT


## 删除缺失
- `DataFrame.dropna()` 删除缺失值不替换原始值

In [25]:
print(df.dropna())
print("\n\n")
print(df)
print("\n\n")

print("所有值全为缺失值才删除")
print(df.dropna(how="all"))
print("\n\n")

print("至少出现过两个缺失值才删除")
print(df.dropna(thresh=2))
print("\n\n")

print("删除subset中的缺损值行")
print(df.dropna(subset=["bc"]))


  name bc       time
1   ls  b 1940-04-25



  name   bc       time
0   zs  NaN        NaT
1   ls    b 1940-04-25
2   wu    c        NaT



所有值全为缺失值才删除
  name   bc       time
0   zs  NaN        NaT
1   ls    b 1940-04-25
2   wu    c        NaT



至少出现过两个缺失值才删除
  name bc       time
1   ls  b 1940-04-25
2   wu  c        NaT



  name bc       time
1   ls  b 1940-04-25
2   wu  c        NaT


## 缺失值填充
- `DataFrame.fillna()`

In [31]:
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                  [3, 4, np.nan, 1],
                 [np.nan, np.nan, np.nan, 5],
                [np.nan, 3, np.nan, 4]],
                 columns=list('ABCD'))
 
print(df) 
 
print("\n\n")
print("横向用缺失值前面的值替换缺失值") 
print(df.fillna(axis=1,method='ffill')) 
 
print("\n\n")
print("纵向用缺失值上面的值替换缺失值") 
print(df.fillna(axis=0,method='ffill')) 

print("\n\n")
print("指定值替换缺失值") 
print(df.fillna(-1))

     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4



横向用缺失值前面的值替换缺失值
     A    B    C    D
0  NaN  2.0  2.0  0.0
1  3.0  4.0  4.0  1.0
2  NaN  NaN  NaN  5.0
3  NaN  3.0  3.0  4.0



纵向用缺失值上面的值替换缺失值
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  3.0  4.0 NaN  5
3  3.0  3.0 NaN  4



     A    B    C  D
0 -1.0  2.0 -1.0  0
1  3.0  4.0 -1.0  1
2 -1.0 -1.0 -1.0  5
3 -1.0  3.0 -1.0  4


## 缺损布尔值

In [32]:
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                  [3, 4, np.nan, 1],
                 [np.nan, np.nan, np.nan, 5],
                [np.nan, 3, np.nan, 4]],
                 columns=list('ABCD'))
df.isna()

Unnamed: 0,A,B,C,D
0,True,False,True,False
1,False,False,True,False
2,True,True,True,False
3,True,False,True,False


In [34]:
df.isnull()

Unnamed: 0,A,B,C,D
0,True,False,True,False
1,False,False,True,False
2,True,True,True,False
3,True,False,True,False
