# Imputing Missing Data with MICE
Mutliple Iterations with Chained Equations

In [2]:
import warnings
warnings.filterwarnings('ignore')

import sys
sys.path.append('../')

from src.visualization import visualize
from src.processing import impute

%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.colors import LogNorm, ListedColormap, LinearSegmentedColormap

import seaborn as sns

import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 200)
import scipy
import math
import statsmodels.api as sm

from datetime import datetime, timedelta

# Data Import
For each imputation method, we have two datasets we consider:
1. Example data to test the accuracy of the model
2. Remaining participant data that we can apply the model to. 

In [21]:
imp = impute.Impute("rnse61g4","../",consecutive=True)

Percent: 10
Parameter: co2
Period (in minutes): 120


## Missing Data

In [22]:
imp.missing.head()

Unnamed: 0_level_0,pm2p5_mass,tvoc,temperature_c,rh,co2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-06-11 13:52:00,22.713558,123.859636,24.037599,42.064286,1412.201879
2020-06-11 13:54:00,22.796582,126.481339,24.077255,42.061364,1411.594738
2020-06-11 13:56:00,22.721861,129.038577,24.113462,42.058696,1410.994147
2020-06-11 13:58:00,22.634113,131.53941,24.146652,42.05625,1410.628738
2020-06-11 14:00:00,22.695872,133.80765,24.177187,42.054,1409.933588


## Base Data
Same dataset with nothing missing - to compare against.

In [23]:
imp.base.head()

Unnamed: 0_level_0,pm2p5_mass,tvoc,temperature_c,rh,co2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-06-11 13:52:00,22.713558,123.859636,24.037599,42.064286,1412.201879
2020-06-11 13:54:00,22.796582,126.481339,24.077255,42.061364,1411.594738
2020-06-11 13:56:00,22.721861,129.038577,24.113462,42.058696,1410.994147
2020-06-11 13:58:00,22.634113,131.53941,24.146652,42.05625,1410.628738
2020-06-11 14:00:00,22.695872,133.80765,24.177187,42.054,1409.933588


---

# Imputing

## MICE

In [31]:
imp.mice()

In [32]:
imp.mice_imputed

Unnamed: 0_level_0,pm2p5_mass,tvoc,temperature_c,rh,co2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-06-11 13:52:00,22.713558,123.859636,24.037599,42.064286,1412.201879
2020-06-11 13:54:00,22.796582,126.481339,24.077255,42.061364,1411.594738
2020-06-11 13:56:00,22.721861,129.038577,24.113462,42.058696,1410.994147
2020-06-11 13:58:00,22.634113,131.539410,24.146652,42.056250,1410.628738
2020-06-11 14:00:00,22.695872,133.807650,24.177187,42.054000,1409.933588
...,...,...,...,...,...
2020-08-21 13:56:00,22.967623,482.896681,24.910023,39.131579,722.798423
2020-08-21 13:58:00,23.259189,481.608827,24.910023,39.111111,722.588880
2020-08-21 14:00:00,23.578982,480.518230,24.910023,39.117647,722.321311
2020-08-21 14:02:00,23.668638,479.886335,24.910023,39.125000,722.248674


## missForest

In [35]:
imp.miss_forest()

KeyboardInterrupt: 

In [None]:
imp.rf_imputed()

## GANs
Some nice resources:
- [GANs with PyTorch](https://realpython.com/generative-adversarial-networks/)
- [GANs with TensorFlow](https://stackabuse.com/introduction-to-gans-with-python-and-tensorflow/)

In [None]:
imp.gans()

<div class="alert-block alert alert-success">
    
* `co2`: There is a rather significant autocorrelation for about half of the beacons which is also evident by their individual heatmaps.
* `tvoc`: To a lesser extent, we see the same autocorrelations.
* `pm2p5_mass`: There are some beacons that have a small signal about 24 hours out similar to the `co2` and `tvoc` plots.
* `co`: Very few and weak autocorrelations  - mant of the plots drop off after a few lags indicative of more _random_ distributions.
* `temperature_c`: We see some significant autocorrelations again, especially when considering **Beacon 26** which has clear 12 hour cycles whereas most others are 24 hours. 
* `rh`: similar trends to `temperature_c` with less noticeable correlation. 
    
</div>