# IEQ and Survey Response Analysis
Exploring the relationship between the categorical survey responses from the EMA and the IEQ measurements from the Beacon

In [1]:
import warnings
warnings.filterwarnings('ignore')

# Extreme IEQ's Affect on Mood Reports
Do more extreme measurements influence survey results?

In [2]:
import os
import sys
sys.path.append('../')

from src.visualization import visualize

import pandas as pd
pd.set_option('display.max_columns', 200)
import numpy as np

from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates

from scipy import stats
from sklearn.linear_model import LinearRegression

<a id='toc'></a>

# Table of Contents

1. [Data Import](#data_import)
2. [Pre-Processing](#pre_processing)
3. [Inspection](#inspection)
4. [Analysis](#analysis)

---

<a id='data_import'></a>

[Back to ToC](#toc)
# Data Import
We have two datasets to import:

## EMAs Completed at Home
Intersection between GPS coordinates, home address, and the completion of the EMAs.

In [3]:
ema = pd.read_csv("../data/processed/beiwe-ema_at_home_v2-ux_s20.csv",index_col="timestamp",parse_dates=["timestamp"],infer_datetime_format=True)
for column in ema.columns:
    if column != "beiwe":
        ema[column] = pd.to_numeric(ema[column])
ema["discontent"] = 3 - ema["content"]
ema.head()

Unnamed: 0_level_0,beiwe,content,stress,lonely,sad,energy,redcap,beacon,time_at_home,discontent
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2020-05-15 09:21:05,mm69prai,1.0,0.0,1.0,1.0,1.0,62,13.0,10412.0,2.0
2020-05-15 09:25:04,vr9j5rry,2.0,0.0,0.0,0.0,3.0,34,25.0,29405.0,1.0
2020-05-15 12:02:43,kyj367pi,2.0,0.0,1.0,0.0,2.0,10,1.0,3774.0,1.0
2020-05-15 12:59:31,lkkjddam,1.0,1.0,2.0,1.0,2.0,12,21.0,5536.0,2.0
2020-05-15 17:28:54,9jtzsuu8,2.0,1.0,0.0,0.0,2.0,36,15.0,31643.0,1.0


## IEQ Data
We will be using all of the IEQ data to identify these extreme events.

In [4]:
ieq = pd.read_csv('../data/processed/beacon-ux_s20.csv',index_col="timestamp",parse_dates=["timestamp"],infer_datetime_format=True)
ieq.drop(["beacon","redcap","pm1_number","pm2p5_number","pm10_number","pm1_mass","pm10_mass","no2","lux","co"],axis=1,inplace=True)
for column in ieq.columns:
    if column != "beiwe":
        ieq[column] = pd.to_numeric(ieq[column])
ieq.head()

Unnamed: 0_level_0,tvoc,co2,pm2p5_mass,temperature_c,rh,beiwe,fitbit
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-06-08 13:00:00,65.783786,,24.767709,16.2447,46.586667,kyj367pi,24.0
2020-06-08 13:02:00,65.973889,,25.379307,16.257887,46.58125,kyj367pi,24.0
2020-06-08 13:04:00,66.832566,,24.874103,16.269523,46.597059,kyj367pi,24.0
2020-06-08 13:06:00,67.746837,,24.503767,16.279865,46.619444,kyj367pi,24.0
2020-06-08 13:08:00,68.488233,,24.824221,16.289119,46.639474,kyj367pi,24.0


[Back to Data Import](#data_import)

---

<a id='pre_processing'></a>

[Back to ToC](#toc)
# Pre-Processing

## Getting Periods with Extreme IEQ Measurements
Looking for periods in the IEQ data that are above or below a certain threshold.

In [13]:
def get_extreme(df_in,ieq_param,n_std=2,above=True):
    """
    Gets the periods of IEQ measurements for the given parameter above or below a certain threshold
    
    Parameters
    ----------
    df_in : DataFrame
        IEQ data for all participants
    ieq_param : string
        specifies which column in df_in to consider
    n_std : int or float
        number of standard deviations to consider for extreme
    above : boolean
        whether to look for measurements above or below the mean - if None, considers both
        
    Returns
    -------
    df_extreme : 
    """
    df = df_in.copy()
    df_extreme = pd.DataFrame()
    for pt in df["beiwe"].unique():
        df_pt = df[df["beiwe"] == pt]
        try:
            df_pt_ieq = df_pt[[ieq_param,"beiwe"]]
        except KeyError:
            print("IEQ parameter not given in dataframe")
            return
        
        avg = np.nanmean(df_pt_ieq[ieq_param])
        std = np.nanstd(df_pt_ieq[ieq_param])
        if above == True:
            df_pt_extreme = df_pt_ieq[df_pt_ieq[ieq_param] > avg + n_std*std]
        elif above == False:
            df_pt_extreme = df_pt_ieq[df_pt_ieq[ieq_param] < avg - n_std*std]
        else:
            df_pt_extreme = df_pt_ieq[(df_pt_ieq[ieq_param] < avg - n_std*std) | (df_pt_ieq[ieq_param] > avg + n_std*std)]
            
        df_agg = df_pt_extreme.reset_index()
        df_agg["start"] = df_agg["timestamp"]
        df_agg.drop("beiwe",axis="columns",inplace=True)
        df_agg = (
            df_agg.groupby(
                df_agg['timestamp'].diff().gt(pd.Timedelta(minutes=2)).cumsum()
            ).agg({
                'timestamp': 'last', ieq_param: 'mean','start': 'first'
            }).set_index('timestamp').rename_axis(index="end_time")
        )
        df_agg["beiwe"] = pt
        df_agg["extreme_minutes"] = (df_agg.index - df_agg["start"]).dt.total_seconds()/60
        df_agg["mean"] = avg
        df_agg["std"] = std
        df_agg.drop("start",axis="columns",inplace=True)
        df_extreme = df_extreme.append(df_agg)
    
    return df_extreme

## Combining Extreme IEQ events with Surveys
Now that we have periods with the extreme IEQ measurements, we can merge them with the surveys to get a sense of how the two relate.

In [20]:
def combine_ieq_with_ema(ieq_in, ema_in, day_threshold=0.25):
    """
    Combines ieq and ema data based on timestamps of extreme events and ema submissions
    """
    ieq = ieq_in.reset_index()
    ema = ema_in.reset_index()
    df = pd.DataFrame()
    for pt in ieq_in["beiwe"].unique():
        ieq_pt = ieq[ieq["beiwe"] == pt]
        ema_pt = ema[ema["beiwe"] == pt]
        for event in ieq_pt["end_time"]:
            for submission in ema_pt["timestamp"]:
                delay = (submission - event).total_seconds() / 60 / 60 / 24
                if submission > event and delay < day_threshold:
                    ieq_pt["delay_days"] = delay
                    df = df.append(ieq_pt[ieq_pt["end_time"] == event].merge(ema_pt[ema_pt["timestamp"] == submission],on=["beiwe"]))
                    break
    
    df.drop_duplicates(subset=["timestamp"],keep="last",inplace=True)
    return df

In [21]:
combine_test = combine_ieq_with_ema(extreme,ema)

## Evaluating
Now that we have the link between the mood and the extreme IEQ events, we can loop through each of the IEQ parameters and see if the _extreme_ events lead to any differences.

Some tests to evaluate the differences in the responses:
* Mann-Whitney U-Test
* Ordinal Regression

In [23]:
def compare_mood_scores(extreme_in,ema_in,moods=["discontent","stress","sad","lonely","energy"],f=np.nanmean):
    """
    Compares the mood scores between the extreme and non-extreme cases
    """
    res = {"mean_normal":[],"mean_ext":[],"p":[]}
    extreme = extreme_in.copy()
    ema = ema_in.copy()
    normal_ema = ema[~ema.index.isin(extreme["timestamp"])]
    normal_ema = normal_ema[normal_ema["beiwe"].isin(extreme["beiwe"].unique())].sort_values("beiwe")
    print(f"Normal: \t{len(normal_ema)}\nExtreme:\t{len(extreme)}")
    for mood in moods:
        ext_mean = round(np.nanmean(extreme[mood]),2)
        norm_mean = round(np.nanmean(normal_ema[mood]),2)
        ext_std = round(np.nanstd(extreme[mood]),2)
        norm_std = round(np.nanstd(normal_ema[mood]),2)
        u, p = stats.mannwhitneyu(normal_ema[mood].values,extreme[mood].values)
        if p < 0.05:
            p = f"{round(p,3)}*"
        elif p < 0.1:
            p = f"{round(p,3)}**"
        else:
            p = f"{round(p,3)}"
        for key, val in zip(res.keys(),[(norm_mean,norm_std),(ext_mean,ext_std),p]):
            if len(val) == 2:
                res[key].append(f"{val[0]} ({val[1]})")
            else:
                res[key].append(val)
        #print(f"\tExtreme ({len(extreme[mood])}):\t{ext_response}\n\tNormal ({len(normal_ema[mood])}):\t{norm_response}")
        #print(f"\tStatistic:\t{round(p,3)}")
    print(pd.DataFrame(data=res,index=moods).to_latex())

In [24]:
compare_mood_scores(combine_test,ema)

Normal: 	495
Extreme:	43
\begin{tabular}{llll}
\toprule
{} &  mean\_normal &     mean\_ext &        p \\
\midrule
discontent &  0.96 (0.89) &  1.16 (0.96) &  0.094** \\
stress     &  0.81 (0.85) &   1.0 (0.89) &  0.082** \\
sad        &  0.45 (0.77) &  0.47 (0.69) &    0.298 \\
lonely     &  0.44 (0.75) &   0.3 (0.59) &     0.16 \\
energy     &  2.19 (1.03) &  2.26 (0.92) &    0.361 \\
\bottomrule
\end{tabular}



[Back to Pre-Processing](#pre_processing)

---

<a id='inspection'></a>

[Back to ToC](#toc)
# Inspection
Various functions and initial looks at the raw and pre-processed data.

[Back to Inspection](#inspection)

---

<a id='analysis'></a>

[Back to ToC](#toc)
# Analysis


<a id='effecs'></a>

## Effect of Extreme IEQ on Mood

In [25]:
for ieq_param in ["co2","pm2p5_mass","tvoc","temperature_c"]:
    print(ieq_param.upper(),"\n")
    extreme = get_extreme(ieq,ieq_param,above=True)
    combined = combine_ieq_with_ema(extreme,ema)
    compare_mood_scores(combined,ema)

CO2 

Normal: 	495
Extreme:	43
\begin{tabular}{llll}
\toprule
{} &  mean\_normal &     mean\_ext &        p \\
\midrule
discontent &  0.96 (0.89) &  1.16 (0.96) &  0.094** \\
stress     &  0.81 (0.85) &   1.0 (0.89) &  0.082** \\
sad        &  0.45 (0.77) &  0.47 (0.69) &    0.298 \\
lonely     &  0.44 (0.75) &   0.3 (0.59) &     0.16 \\
energy     &  2.19 (1.03) &  2.26 (0.92) &    0.361 \\
\bottomrule
\end{tabular}

PM2P5_MASS 

Normal: 	541
Extreme:	52
\begin{tabular}{llll}
\toprule
{} &  mean\_normal &     mean\_ext &       p \\
\midrule
discontent &  0.93 (0.87) &  1.31 (0.89) &  0.002* \\
stress     &  0.76 (0.84) &  1.02 (0.89) &  0.016* \\
sad        &   0.39 (0.7) &  0.65 (0.92) &  0.011* \\
lonely     &  0.42 (0.72) &   0.6 (0.79) &  0.042* \\
energy     &  2.14 (1.05) &  1.92 (1.02) &     0.1 \\
\bottomrule
\end{tabular}

TVOC 

Normal: 	500
Extreme:	42
\begin{tabular}{llll}
\toprule
{} &  mean\_normal &     mean\_ext &       p \\
\midrule
discontent &  0.93 (0.89) &  0.79 (

[Back to Analysis](#analysis)

---

<a id='ieq_and_mood'></a>