# Working With Missing Values
Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as **NA(Not Available)** values in pandas.

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [48]:
dataframe = pd.read_csv('dataset/circle_employee.csv',index_col='user_id')  # Load data
dataframe.iloc[:,:6].head()

Unnamed: 0_level_0,name,age,blood_group,gender,experience,designation
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Sharif,,B+,male,1.5,Jr Software Engineer
2,Kanan Mahmud,28.0,,Male,7.5,Sr Software Engineer
3,Md. Shakil,27.0,B-,Male,3.5,Software Engineer
4,Imran Sheikh,25.0,B-,Male,1.8,Jr Software Engineer
5,Farsan Rashid,27.0,O+,Male,4.2,Software Engineer


# Checking for missing values using isnull() and notnull()

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series

In [49]:
# using isnull() function  
dataframe.head(10).isnull()

Unnamed: 0_level_0,name,age,blood_group,gender,experience,designation,salary
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,False,True,False,False,False,False,False
2,False,False,True,False,False,False,False
3,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False
6,False,False,True,False,False,False,False
7,False,False,True,False,False,False,False
8,False,False,True,False,False,False,False
9,False,False,False,False,False,False,False
10,False,False,False,False,False,False,False


In [50]:
# using isnull() function  
dataframe.head(10).notnull()

Unnamed: 0_level_0,name,age,blood_group,gender,experience,designation,salary
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,True,False,True,True,True,True,True
2,True,True,False,True,True,True,True
3,True,True,True,True,True,True,True
4,True,True,True,True,True,True,True
5,True,True,True,True,True,True,True
6,True,True,False,True,True,True,True
7,True,True,False,True,True,True,True
8,True,True,False,True,True,True,True
9,True,True,True,True,True,True,True
10,True,True,True,True,True,True,True


In [51]:
age = dataframe['age']
age.head(10).isnull()

user_id
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
Name: age, dtype: bool

# Filling missing values using fillna(), replace() and interpolate() :
In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame. 
- **Interpolate():** function is basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value.

In [52]:
# fill null values on age column using Mean

mean = age.mean()
print(mean)
age.fillna(mean).head(10)

28.302325581395348


user_id
1     28.302326
2     28.000000
3     27.000000
4     25.000000
5     27.000000
6     25.000000
7     27.000000
8     25.000000
9     23.000000
10    30.000000
Name: age, dtype: float64

# Dropping missing values using dropna() :
In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways.

In [53]:
# using dropna() function  
age.dropna().head(10)

user_id
2     28.0
3     27.0
4     25.0
5     27.0
6     25.0
7     27.0
8     25.0
9     23.0
10    30.0
11    25.0
Name: age, dtype: float64