# Objective : 9. Working on Missing Data
<hr>

1. Detect missing & existing values.
2. Return a new Series with missing values removed.
3. Fill NA/NaN values using the specified method.
4. Interpolate values according to different methods.

<hr>

### 1. Detect existing non-missing values
1. Considered as missing values - None or numpy.NaN
2. Empty string is still considered non null values

In [2]:
import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,2,None], 'B':[2, np.NaN, 3]})

In [3]:
df

Unnamed: 0,A,B
0,1.0,2.0
1,2.0,
2,,3.0


In [4]:
df.isna()

Unnamed: 0,A,B
0,False,False
1,False,True
2,True,False


* isnull is implementation of isna

In [5]:
df.isnull()

Unnamed: 0,A,B
0,False,False
1,False,True
2,True,False


In [11]:
df = pd.DataFrame({'A':[1,'',None], 'B':[2, np.NaN, 3]})

In [12]:
df.isna()

Unnamed: 0,A,B
0,False,False
1,False,True
2,True,False


* Handling Empty Strings

In [14]:
df.replace('',np.NaN)

Unnamed: 0,A,B
0,1.0,2.0
1,,
2,,3.0


* Finding non-null values

In [15]:
df.notna()

Unnamed: 0,A,B
0,True,True
1,True,False
2,False,True


* Filtering data based on series

In [17]:
df[df.A.notna()]

Unnamed: 0,A,B
0,1.0,2.0
1,,


### 2. Return a new Series with missing values removed.

* Dropping rows which have any missing values

In [21]:
df = pd.DataFrame({'A':[1,'',None], 'B':[2, np.NaN, 3], 'C':[3,4,5]})
df

Unnamed: 0,A,B,C
0,1.0,2.0,3
1,,,4
2,,3.0,5


In [22]:
df.dropna()

Unnamed: 0,A,B,C
0,1,2.0,3


* Dropping columns which have null values

In [23]:
df.dropna(axis=1)

Unnamed: 0,C
0,3
1,4
2,5


### 3. Filling missing values

In [24]:
df.fillna(0)

Unnamed: 0,A,B,C
0,1.0,2.0,3
1,,0.0,4
2,0.0,3.0,5


In [27]:
df.fillna({'A':10,'B':11})

Unnamed: 0,A,B,C
0,1.0,2.0,3
1,,11.0,4
2,10.0,3.0,5


* Values can be backward fill, forward fill

In [29]:
df.fillna(method='bfill')

Unnamed: 0,A,B,C
0,1.0,2.0,3
1,,3.0,4
2,,3.0,5


### 4. Intrapolate missing values based on different methods

In [30]:
df = pd.DataFrame({'Name':['Rush','Riba','Kunal','Pruthvi'],
                   'Email':['rush@edyoda.com','riba@edyoda.com','kunal@edyoda.com','pruthvi@edyoda.com'],
                   'Age':[33,31,None,18]})

In [31]:
df

Unnamed: 0,Name,Email,Age
0,Rush,rush@edyoda.com,23
1,Riba,riba@edyoda.com,33
2,Kunal,kunal@edyoda.com,25
3,Pruthvi,pruthvi@edyoda.com,18
