# Handing Missing Data

<span>There are multiple ways to handle missing data. Some people come up with some very creative solutions. This notebook contains some basic methods to handle missing data. Again your strategy to handle missing data will different with contextual knowledge around the problem with your domain expertise.</span> 

<span style="text-decoration:underline">**Basic Strategies**</span>

1. Removing observations
2. Filling in NaN values with certain value
3. Filling in NaN values with the mean
4. Filling in NaN values with the median
4. Dropping columns with missing values
6. Dropping Features with NaN


The best strategy will normally be context specific. Therefore the more contextual knowledge you have the better.

### Import Preliminaries

In [4]:
# Import modules
import pandas as pd
import numpy as np

### Create Data

In [5]:
# Create some studend data
students = pd.DataFrame({'Name' : ['Justin','Kim', 'Stephen','Paul', 'Jean', 'Brian','John'],
             'Midterm_Score' : np.random.randint(70, 100, size=7),
             'Final_Score' : np.random.randint(90, 100, size=7)
             })
# Create null value in for final scores
students.Final_Score.replace(to_replace=list(range(92,97)), value=np.nan,
                            inplace=True)

# View our dataframe
students

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,87,
1,Kim,84,91.0
2,Stephen,72,
3,Paul,94,
4,Jean,72,
5,Brian,87,99.0
6,John,70,


### 1.  Removing Rows with NaN Values

In [6]:
# Drop null values
students.dropna()

Unnamed: 0,Name,Midterm_Score,Final_Score
1,Kim,84,91.0
5,Brian,87,99.0


###  2. Filling NaN Value with Another Value

In [7]:
# Fill Null values with another value 
students.fillna(0)

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,87,0.0
1,Kim,84,91.0
2,Stephen,72,0.0
3,Paul,94,0.0
4,Jean,72,0.0
5,Brian,87,99.0
6,John,70,0.0


### 4. Filling NaN Values with Mean

In [8]:
# Fill in null values with the mean
students.fillna(students.Final_Score.mean())

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,87,95.0
1,Kim,84,91.0
2,Stephen,72,95.0
3,Paul,94,95.0
4,Jean,72,95.0
5,Brian,87,99.0
6,John,70,95.0


### 5. Filling NaN Values with Median

In [9]:
# Fill null values with the median
students.fillna(students.Final_Score.median())

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,87,95.0
1,Kim,84,91.0
2,Stephen,72,95.0
3,Paul,94,95.0
4,Jean,72,95.0
5,Brian,87,99.0
6,John,70,95.0


### 6. Dropping Features with NaN 

In [10]:
# Drop Feature that contain null valuse
students.dropna(axis=1)

Unnamed: 0,Name,Midterm_Score
0,Justin,87
1,Kim,84
2,Stephen,72
3,Paul,94
4,Jean,72
5,Brian,87
6,John,70


Author: Kavi Sekhon