# Missing Data

Let's show a few convenient methods to deal with Missing Data in pandas:

## Three options to deal with missing data

1. Keep the missing data as NaN
2. Drop(Remove) the missing data (including the entire row or column with timestamp)
3. Fill the missing data with some other value (guess the best estimated value)

In [7]:
import numpy as np
import pandas as pd

In [8]:
df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})

In [9]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [5]:
# dropna: remove missing values.
df.dropna()

Unnamed: 0,A,B,C
0,1.0,5.0,1


In [6]:
# axis=0: rows, axis=1: columns
df.dropna(axis=1)

Unnamed: 0,C
0,1
1,2
2,3


In [14]:
# thresh: require that many non-NA values. (int, optional)
df.dropna(thresh=2)

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


In [15]:
# fillna: fill missing values.
df.fillna(value='FILL VALUE')

Unnamed: 0,A,B,C
0,1,5,1
1,2,FILL VALUE,2
2,FILL VALUE,FILL VALUE,3


In [14]:
df.mean()

A    1.5
B    5.0
C    2.0
dtype: float64

In [13]:
df['A'].mean()

1.5

In [17]:
# if it's a missing value, fill it with the mean of the column 'A'.
df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64