In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab12.ipynb")

# Lab 12: Missing Data

Welcome to Lab 12 of Data Wrangling and Visualization!

## Overview
Missing data is a huge problem in data science. Having a fully complete dataset is quite rare. Many times, complete datasets are not the raw form of data. 

There are three main types of missing data:
- Missing completely at random (MCAR)
- Missing at random (MAR)
- Missing not at random (MNAR)

The methods we use to handle missing data depend on the type of missing data. MCAR data is usually easiest to handle. Since the probability of being missing is equal for all observations, we can safely remove missing observations without accidentally erasing important trends in other variables. MAR data occurs when the probability that one variable is missing depends on another variable in the dataset. We handle some types of MAR data by filling in the missing values, but we have to be careful about avoiding the introduction of bias. MNAR data is typically the most difficult form of missing data to work with. You will learn about techniques for MNAR data in future classes.

Some common techniques for handling missing data include
- Drop missing observations 
- Impute data
    - Mean
    - Median
    - Mode
    - Forward fill (best for time series)
    - Backward fill (best for time series)
    - Interpolate (best for time series)
    
## In today's lab, we will
- work on understanding types of missing data
- handle missing values using a variety of techniques

In [None]:
import pandas as pd
import numpy as np

**Question 1.1:** Import the `temp_weather_data.csv` file. 

In [None]:
# Import data
weather = ...
weather

In [None]:
grader.check("q1_1")

<!-- BEGIN QUESTION -->

**Question 1.2:** Check on the weather dataframe info to look at datatypes and the number of null values in each column. 

In [None]:
...

<!-- END QUESTION -->

**Question 1.3:** If there are any null values in the dataset that do not show up as a NaN, convert them to `np.nan`. Make sure all numeric columns are type float.

In [None]:
weather['rainfall'].replace('---',np.nan)

In [None]:
weather

In [None]:
grader.check("q1_3")

<!-- BEGIN QUESTION -->

**Question 1.4:** Create several visualizations to find any patterns in null values. After you've looked at your visualizations, determine which column has data that is most likely MCAR.

_Type your answer here, replacing this text._

In [None]:
# Create visualizations here

<!-- END QUESTION -->

**Question 1.5:** Drop any rows from the dataframe that has a null value in the `humidity` column.

In [None]:
...
weather

In [None]:
grader.check("q1_5")

**Question 1.6:** Use linear interpolation to fill in the missing values in the `temperature` column. 

*NOTE:* First create a copy of the weather dataframe called `weather_filled`. Then fill in the missing values.

In [None]:
weather_filled = ...
weather_filled

In [None]:
grader.check("q1_6")

**Question 1.7:** Use the backward fill method to fill in the missing values in the `rainfall` column.

In [None]:
weather_filled["rainfall"] = ...
weather_filled

In [None]:
grader.check("q1_7")

<!-- BEGIN QUESTION -->

**Question 1.8:** Do a small EDA on this dataset.

_Type your answer here, replacing this text._

In [None]:
# EDA here

<!-- END QUESTION -->

## You're done! 

Yay! Great job for making it through your final DATA 271 lab! At the beginning of the semester, you were new data wranglers like baby Gus.

<img src="gus_smol.JPG" alt="drawing" width="500"/>

Now you're like fully grown Gus with all your data wrangling and visualization knowledge. Congratulations!

<img src="gus_another_loaf_of_bread.JPG" alt="drawing" width="500"/>

Run the cell below and submit to Canvas. 

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)