# Handing Missing Dlata

<span>There are multiple ways to handle missing data. Some people come up with some very creative solutions. This notebook contains some basic methods to handle missing data. Again your strategy to handle missing data will different with contextual knowledge around the problem with your domain expertise.</span> 

<span style="text-decoration:underline">**BasicStategies**</span>

1. Removing observations
2. Filling in NaN values with certain value
3. Filling in NaN values with the mean
4. Filling in NaN values with the median
4. Dropping columns with missing values

### Preliminaries

In [4]:
import pandas as pd
import numpy as np

### Generating Data

In [5]:
students = pd.DataFrame({'Name' : ['Justin','Kim', 'Stephen','Paul', 'Jean', 'Brian','John'],
             'Midterm_Score' : np.random.randint(70, 100, size=7),
             'Final_Score' : np.random.randint(90, 100, size=7)
             })
# Create null value in for final scores
students.Final_Score.replace(to_replace=list(range(92,97)), value=np.nan,
                            inplace=True)

students

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,90,98.0
1,Kim,86,
2,Stephen,86,98.0
3,Paul,95,
4,Jean,75,
5,Brian,80,
6,John,81,99.0


### 1.  Removing Rows with NaN Values

In [6]:
students.dropna()

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,90,98.0
2,Stephen,86,98.0
6,John,81,99.0


###  2. Filling NaN Value with Zeros

In [7]:
students.fillna(0)

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,90,98.0
1,Kim,86,0.0
2,Stephen,86,98.0
3,Paul,95,0.0
4,Jean,75,0.0
5,Brian,80,0.0
6,John,81,99.0


### 4. Filling NaN Values with Mean

In [8]:
students.fillna(students.Final_Score.mean())

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,90,98.0
1,Kim,86,98.333333
2,Stephen,86,98.0
3,Paul,95,98.333333
4,Jean,75,98.333333
5,Brian,80,98.333333
6,John,81,99.0


### 5. Filling NaN Values with Median

In [9]:
students.fillna(students.Final_Score.median())

Unnamed: 0,Name,Midterm_Score,Final_Score
0,Justin,90,98.0
1,Kim,86,98.0
2,Stephen,86,98.0
3,Paul,95,98.0
4,Jean,75,98.0
5,Brian,80,98.0
6,John,81,99.0


### 6. Dropping Features with NaN 

In [14]:
students.dropna(axis=1)

Unnamed: 0,Name,Midterm_Score
0,Justin,90
1,Kim,86
2,Stephen,86
3,Paul,95
4,Jean,75
5,Brian,80
6,John,81


Author: Kavi Sekhon