# IAQ and Survey Response Analysis
Exploring the relationship between the categorical survey responses and the IAQ measurements.

In [1]:
import warnings
warnings.filterwarnings('ignore')

## IAQ Distributions per Restfulness Score Reportings
Here we look at the distributions of IAQ measurements for each of the four ratings of restfulness.

In [2]:
import os
import sys
sys.path.append('../')

from src.features import build_features
from src.visualization import visualize
from src.reports import make_report

import pandas as pd
import numpy as np

from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates

import statsmodels.api as sm

## Data Import
We will need just the morning survey results and the beacon data.

### Morning EMAs

In [3]:
ema = pd.read_csv('../data/processed/bpeace2-morning-survey.csv',index_col=0,parse_dates=True,infer_datetime_format=True)
ema.head()

Unnamed: 0,ID,Content,Stress,Lonely,Sad,Energy,TST,SOL,NAW,Restful
2020-07-31 09:25:41,hfttkth7,1.0,2.0,1.0,3.0,0.0,6.0,10.0,3.0,0.0
2020-08-19 22:49:04,hfttkth7,1.0,1.0,0.0,1.0,2.0,7.0,20.0,1.0,1.0
2020-08-23 10:58:26,hfttkth7,1.0,1.0,1.0,2.0,0.0,6.0,25.0,0.0,0.0
2020-07-17 09:52:16,hfttkth7,1.0,1.0,0.0,1.0,2.0,7.0,20.0,3.0,2.0
2020-08-12 12:32:54,hfttkth7,1.0,3.0,1.0,2.0,2.0,6.0,20.0,0.0,1.0


### Beacon IAQ

In [4]:
beacon = pd.read_csv('../data/processed/bpeace2-beacon.csv',index_col=0,parse_dates=True)
beacon.head()

Unnamed: 0_level_0,TVOC,eCO2,Lux,Visible,Infrared,NO2,T_NO2,RH_NO2,CO,T_CO,...,PM_N_10,PM_C_1,PM_C_2p5,PM_C_4,PM_C_10,z,Beacon,Beiwe,Fitbit,REDCap
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-06-11 00:00:00,,,,,,,,,,,...,,,,,,,30,idbkjh8u,22,4
2020-06-11 00:05:00,,,,,,,,,,,...,,,,,,,30,idbkjh8u,22,4
2020-06-11 00:10:00,,,,,,,,,,,...,,,,,,,30,idbkjh8u,22,4
2020-06-11 00:15:00,,,,,,,,,,,...,,,,,,,30,idbkjh8u,22,4
2020-06-11 00:20:00,,,,,,,,,,,...,,,,,,,30,idbkjh8u,22,4


## Pre-Processing

### Getting IAQ measurements for sleep survey responses
We have to cut the beacon data down to nights that preceded when participants responded to surveys distributed that next morning. This process is slightly different from the one where we cut the beacon data down based on the Fitbit night recordings since there might be a few mornings when participants submitted a survey but did not wear their Fitbit to bed. However, even if there are a few of these instances we cannot summarize the preceding night's IAQ measurements because we don't know when the participant was asleep. Therefore, we **CAN** use the Fitbit-reduced IAQ data as a starting point.

In [5]:
evening_iaq = pd.read_csv('../data/processed/bpeace2-fitbit-beacon-iaq-evening-full.csv',
                          index_col=0,parse_dates=['Timestamp','start_time','end_time'],infer_datetime_format=True)
evening_iaq.head()

Unnamed: 0_level_0,TVOC,eCO2,Lux,Visible,Infrared,NO2,T_NO2,RH_NO2,CO,T_CO,...,PM_C_4,PM_C_10,z,Beacon,Beiwe,Fitbit,REDCap,start_time,end_time,beiwe
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-07-03 06:20:00,192.84,405.36,0.0,0.0,0.0,,27.0,53.4,-0.47992,,...,16.400084,16.604594,0.162959,44,4i7679py,38,37,2020-07-03 06:20:00,2020-07-03 16:10:00,4i7679py
2020-07-03 06:25:00,178.866667,400.0,0.0,0.0,0.0,,27.0,52.833333,-0.3036,,...,20.737945,21.006936,0.088227,44,4i7679py,38,37,2020-07-03 06:20:00,2020-07-03 16:10:00,4i7679py
2020-07-03 06:30:00,188.76,401.4,0.9792,0.48,0.0,,27.0,53.0,-0.22436,,...,16.946462,17.167753,0.114067,44,4i7679py,38,37,2020-07-03 06:20:00,2020-07-03 16:10:00,4i7679py
2020-07-03 06:35:00,193.84,406.04,2.04,1.0,0.0,,27.0,53.0,-0.26328,,...,16.101073,16.309091,0.149835,44,4i7679py,38,37,2020-07-03 06:20:00,2020-07-03 16:10:00,4i7679py
2020-07-03 06:40:00,198.84,409.44,2.04,1.0,0.0,,27.0,53.0,-0.40936,,...,19.142773,19.383442,0.163325,44,4i7679py,38,37,2020-07-03 06:20:00,2020-07-03 16:10:00,4i7679py


Now we compare the sleep survey submission date to the end time of the sleep period for each participant and remove the nights that don't line up

In [6]:
survey_iaq = pd.DataFrame()
for pt in evening_iaq['Beiwe'].unique():
    # getting pt-specific dfs
    evening_iaq_pt = evening_iaq[evening_iaq['Beiwe'] == pt]
    ema_pt = ema[ema['ID'] == pt]
    survey_dates = ema_pt.index.date
    survey_only_iaq = evening_iaq_pt[evening_iaq_pt['end_time'].dt.date.isin(survey_dates)]
    
    survey_iaq = survey_iaq.append(survey_only_iaq)

Now we see how many nights we lost:

In [7]:
print(f'Number of Fitbit-Restricted Points: {len(evening_iaq)}')
print(f'Number of Fitbit- and EMA-Restricted Points: {len(survey_iaq)}')

Number of Fitbit-Restricted Points: 69611
Number of Fitbit- and EMA-Restricted Points: 30572


Might as well save this to the processed directory since it could come into use:

In [8]:
survey_iaq.to_csv('../data/processed/bpeace2-fitbit-beacon-iaq-evening-restricted.csv')

Now we have the measurements for nights when participants submitted surveys and the final piece is to include the restfulness scores for those evenings.