<a id='top'></a>

In [2]:
import warnings
warnings.filterwarnings('ignore')

# Data Availability for Predicting Nightly Mood Scores
This notebook explores the available data that we can use to predict the mood scores as reported on the evening EMA. We focus on the overlap between the Fitbit and Beiwe data.

In [3]:
import sys
sys.path.append('../')
%load_ext autoreload
%autoreload 2

import pandas as pd
pd.set_option('display.max_columns', 200)
import numpy as np

from datetime import datetime, timedelta

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import seaborn as sns
import matplotlib.dates as mdates

# Table of Contents
1. [Data Import](#data_import)
    1. [Targets](#targets)
    2. [Features](#features)
2. [Pre-Processing](#preprocessing)
3. [Summary](#summary)

---

<a id='data_import'></a>

# Data Import
Data are simply imported from the Fitbit daily dataset and Fitbit sleep summary dataset - both of each include data summarized on a daily basis.

<a id='targets'></a>

## Target Data

### Evening EMAs

In [4]:
ema_e = pd.read_csv("../data/processed/beiwe-evening_ema-ux_s20.csv",parse_dates=["timestamp"])
ema_e["date"] = ema_e["timestamp"].dt.date
ema_e.head()

Unnamed: 0,timestamp,beiwe,content,stress,lonely,sad,energy,redcap,beacon,date
0,2020-05-13 19:00:23,awa8uces,1.0,1.0,1.0,3.0,2.0,28,26.0,2020-05-13
1,2020-05-13 19:21:32,lkkjddam,0.0,2.0,3.0,1.0,0.0,12,21.0,2020-05-13
2,2020-05-13 19:30:38,rvhdl2la,2.0,1.0,0.0,0.0,1.0,29,,2020-05-13
3,2020-05-13 20:07:04,xxvnhauv,1.0,3.0,1.0,2.0,0.0,21,22.0,2020-05-13
4,2020-05-13 20:25:07,xdbdrk6e,2.0,1.0,2.0,1.0,3.0,23,24.0,2020-05-13


In [5]:
print("Number of Datapoints:", len(ema_e))

Number of Datapoints: 2460


<a id='features'></a>

## Feature Data
The feature data is compromised of:
* morning EMA mood (Beiwe)
* morning EMA sleep (Beiwe)
* current daily acitivity (Fitbit)
* previous night's sleep (Fitbit)

In [6]:
ema_m = pd.read_csv("../data/processed/beiwe-morning_ema-ux_s20.csv",parse_dates=["timestamp"])
ema_m["date"] = ema_m["timestamp"].dt.date
ema_m.head()

Unnamed: 0,timestamp,beiwe,content,stress,lonely,sad,energy,tst,sol,naw,restful,redcap,beacon,date
0,2020-05-13 09:10:27,qh34m4r9,3.0,0.0,0.0,0.0,1.0,8.0,20.0,2.0,3.0,68,19.0,2020-05-13
1,2020-05-13 09:15:49,awa8uces,0.0,2.0,1.0,1.0,1.0,2.0,10.0,3.0,1.0,28,26.0,2020-05-13
2,2020-05-13 09:42:19,xxvnhauv,1.0,1.0,1.0,3.0,0.0,6.0,30.0,3.0,1.0,21,22.0,2020-05-13
3,2020-05-13 09:43:27,rvhdl2la,1.0,1.0,2.0,3.0,0.0,5.3,5.0,2.0,2.0,29,,2020-05-13
4,2020-05-13 10:11:51,q5y11ytm,3.0,1.0,0.0,1.0,2.0,2.0,10.0,0.0,0.0,48,,2020-05-13


In [7]:
print("Number of Datapoints:", len(ema_m))

Number of Datapoints: 2470


### Fitbit Activity
We need two sets of activity data:
1. **Daily Activity**: so we can see if the participant was wearing their Fitbit the entire day
2. **Hourly Activity**: to get the activity level prior to submission time

In [8]:
act_daily = pd.read_csv("../data/processed/fitbit-daily-ux_s20.csv",parse_dates=["timestamp"],infer_datetime_format=True)
act_daily["date"] = act_daily["timestamp"].dt.date
act_daily.head()

Unnamed: 0,timestamp,calories,bmr,steps,distance,sedentary_minutes,lightly_active_minutes,fairly_active_minutes,very_active_minutes,calories_from_activities,bmi,fat,weight,food_calories_logged,water_logged,beiwe,date
0,2020-05-13,2781.0,1876.0,9207,4.396294,1241,70,118,11,1097.0,23.754,0.0,180.0,0.0,0.0,hfttkth7,2020-05-13
1,2020-05-14,3727.0,1876.0,15207,7.261114,614,263,134,23,2234.0,23.754,0.0,180.0,0.0,0.0,hfttkth7,2020-05-14
2,2020-05-15,3909.0,1876.0,14556,8.028501,577,205,57,108,2381.0,23.754,0.0,180.0,0.0,0.0,hfttkth7,2020-05-15
3,2020-05-16,3927.0,1876.0,18453,8.74867,760,176,24,151,2364.0,23.754,0.0,180.0,0.0,0.0,hfttkth7,2020-05-16
4,2020-05-17,4180.0,1876.0,15425,7.973149,605,207,50,131,2652.0,23.754,0.0,180.0,0.0,0.0,hfttkth7,2020-05-17


In [9]:
act_hourly = pd.read_csv("../data/processed/fitbit-intraday-ux_s20.csv",parse_dates=["timestamp"],infer_datetime_format=True)
act_hourly["date"] = act_hourly["timestamp"].dt.date
act_hourly.head()

Unnamed: 0,timestamp,calories,steps,distance,heartrate,beiwe,date
0,2020-05-13 17:39:00,1.69377,,,77.333333,hfttkth7,2020-05-13
1,2020-05-13 17:40:00,1.69377,,,77.6,hfttkth7,2020-05-13
2,2020-05-13 17:41:00,4.42986,33.0,0.015721,79.0,hfttkth7,2020-05-13
3,2020-05-13 17:42:00,1.69377,,,83.416667,hfttkth7,2020-05-13
4,2020-05-13 17:43:00,1.43319,,,63.666667,hfttkth7,2020-05-13


### Fitbit Sleep

In [10]:
sleep_fb = pd.read_csv("../data/processed/fitbit-sleep_summary-ux_s20.csv",
                       parse_dates=["start_date","end_date","end_time","start_time"],infer_datetime_format=True)
sleep_fb["date"] = sleep_fb["end_date"].dt.date
sleep_fb.head()

Unnamed: 0,start_date,end_date,deep_count,deep_minutes,light_count,light_minutes,rem_count,rem_minutes,wake_count,wake_minutes,beiwe,efficiency,end_time,start_time,redcap,beacon,nrem_count,nrem_minutes,rem2nrem,tst_fb,rem_percent,nrem_percent,light_percent,deep_percent,waso,sol_fb,wol_fb,date
0,2020-05-14,2020-05-14,5.0,84.0,20.0,213.0,10.0,82.0,21.0,27.0,hfttkth7,93.349754,2020-05-14 07:13:00,2020-05-14 00:27:00,,,25.0,297.0,0.276094,6.316667,0.216359,0.783641,0.562005,0.221636,18.5,8.5,0.0,2020-05-14
1,2020-05-14,2020-05-15,4.0,95.0,31.0,250.0,6.0,47.0,33.0,101.0,hfttkth7,79.513185,2020-05-15 08:06:30,2020-05-14 23:53:30,,,35.0,345.0,0.136232,6.533333,0.119898,0.880102,0.637755,0.242347,68.5,0.0,32.5,2020-05-15
2,2020-05-15,2020-05-16,2.0,47.0,17.0,190.0,8.0,50.0,20.0,42.0,hfttkth7,87.234043,2020-05-16 04:57:00,2020-05-15 23:28:00,,,19.0,237.0,0.21097,4.783333,0.174216,0.825784,0.662021,0.163763,19.5,11.5,11.0,2020-05-16
3,2020-05-17,2020-05-17,5.0,78.0,21.0,242.0,11.0,83.0,25.0,44.0,hfttkth7,90.1566,2020-05-17 09:28:30,2020-05-17 02:01:30,,,26.0,320.0,0.259375,6.716667,0.205955,0.794045,0.600496,0.193548,23.0,12.5,8.5,2020-05-17
4,2020-05-18,2020-05-18,5.0,96.0,20.0,167.0,14.0,88.0,28.0,65.0,hfttkth7,84.375,2020-05-18 07:20:00,2020-05-18 00:24:00,,,25.0,263.0,0.334601,5.85,0.250712,0.749288,0.475783,0.273504,53.5,11.5,0.0,2020-05-18


---

<a id='preprocessing'></a>

# Pre-Processing

## Combining EMAs
Combining EMAs completed on the same day.

In [11]:
ema = ema_e.merge(ema_m,on=["beiwe","redcap","beacon","date"],suffixes=["_e","_m"])
ema.head()

Unnamed: 0,timestamp_e,beiwe,content_e,stress_e,lonely_e,sad_e,energy_e,redcap,beacon,date,timestamp_m,content_m,stress_m,lonely_m,sad_m,energy_m,tst,sol,naw,restful
0,2020-05-13 19:00:23,awa8uces,1.0,1.0,1.0,3.0,2.0,28,26.0,2020-05-13,2020-05-13 09:15:49,0.0,2.0,1.0,1.0,1.0,2.0,10.0,3.0,1.0
1,2020-05-13 19:21:32,lkkjddam,0.0,2.0,3.0,1.0,0.0,12,21.0,2020-05-13,2020-05-13 12:30:38,1.0,1.0,3.0,3.0,2.0,7.0,45.0,2.0,1.0
2,2020-05-13 19:30:38,rvhdl2la,2.0,1.0,0.0,0.0,1.0,29,,2020-05-13,2020-05-13 09:43:27,1.0,1.0,2.0,3.0,0.0,5.3,5.0,2.0,2.0
3,2020-05-13 20:07:04,xxvnhauv,1.0,3.0,1.0,2.0,0.0,21,22.0,2020-05-13,2020-05-13 09:42:19,1.0,1.0,1.0,3.0,0.0,6.0,30.0,3.0,1.0
4,2020-05-13 20:25:07,xdbdrk6e,2.0,1.0,2.0,1.0,3.0,23,24.0,2020-05-13,2020-05-13 18:16:29,2.0,1.0,2.0,1.0,2.0,8.0,20.0,3.0,2.0


## Integrating Activity Data

In [12]:
def add_fb_activity(df,fb_df,fb_label="steps"):
    """
    Adds the given Fitbit metric to the provided df
    
    """
    fb_df["date"] = fb_df["timestamp"].dt.date
    merged = fb_df.merge(df,on=["beiwe","date"],how="inner",suffixes=["_fb","_ema"])
    data = {"beiwe":[],"date":[],fb_label:[]}
    for pt in merged["beiwe"].unique():
        merged_pt = merged[merged["beiwe"] == pt]
        for d in merged_pt["date"].unique():
            merged_pt_d = merged_pt[merged_pt["date"] == d]
            try:
                merged_pt_d.set_index("timestamp",inplace=True)
                merged_pt_d = merged_pt_d[:merged_pt_d["timestamp_e"][0]]
            except KeyError:
                merged_pt_d.set_index("timestamp_fb",inplace=True)
                merged_pt_d = merged_pt_d[:merged_pt_d["timestamp_ema"][0]]
            for key, val in zip(data.keys(),[pt,d,merged_pt_d[fb_label].sum()]):
                data[key].append(val)
                
    return df.merge(right=pd.DataFrame(data=data),on=["beiwe","date"])

In [13]:
def add_fb_percentage(df,df_fb):
    """
    Adds the percentage of time the participant was wearing their Fitbit based on the activity minutes per day
    
    """
    df_fb["active_minutes"] = df_fb[[col for col in df_fb.columns if col.endswith("minutes")]].sum(axis=1)
    df_fb["active_percent"] = df_fb["active_minutes"] / 1440
    
    return df.merge(right=df_fb[["beiwe","date","active_percent"]],on=["beiwe","date"])

In [14]:
ema_activity = ema.copy()
steps = []
dists = []
for d, t, pt in zip(ema["date"],ema["timestamp_e"],ema["beiwe"]):
    print(pt, d, t)
    act_hourly_by_day = act_hourly[act_hourly["date"] == d]
    act_hourly_by_day_by_pt = act_hourly_by_day[act_hourly_by_day["beiwe"] == pt]
    act_hourly_by_day_by_pt = act_hourly_by_day_by_pt[act_hourly_by_day_by_pt["timestamp"] < t]
    if len(act_hourly_by_day_by_pt) > 0:
        steps.append(act_hourly_by_day_by_pt["steps"].sum())
        dists.append(act_hourly_by_day_by_pt["distance"].sum())
    else:
        steps.append(0)
        dists.append(0)
        
ema_activity["steps"] = steps
ema_activity["distance"] = dists
ema_activity = ema_activity[ema_activity["steps"] > 0]

awa8uces 2020-05-13 2020-05-13 19:00:23
lkkjddam 2020-05-13 2020-05-13 19:21:32
rvhdl2la 2020-05-13 2020-05-13 19:30:38
xxvnhauv 2020-05-13 2020-05-13 20:07:04
xdbdrk6e 2020-05-13 2020-05-13 20:25:07
o6xwrota 2020-05-13 2020-05-13 20:37:09
qh34m4r9 2020-05-13 2020-05-13 21:00:18
hxj6brwj 2020-05-13 2020-05-13 21:48:32
tmexej5v 2020-05-13 2020-05-13 22:03:30
vpy1a985 2020-05-13 2020-05-13 23:05:02
i31pt4b4 2020-05-14 2020-05-14 00:23:01
9jtzsuu8 2020-05-14 2020-05-14 00:34:45
745vq78e 2020-05-14 2020-05-14 00:49:41
idbkjh8u 2020-05-14 2020-05-14 01:19:58
9xmhtq74 2020-05-14 2020-05-14 08:01:38
5fvmg226 2020-05-14 2020-05-14 11:56:25
5fvmg226 2020-05-14 2020-05-14 11:56:25
5fvmg226 2020-05-14 2020-05-14 18:04:20
5fvmg226 2020-05-14 2020-05-14 18:04:20
itmylz3g 2020-05-14 2020-05-14 16:42:35
15tejjtw 2020-05-14 2020-05-14 20:37:44
2xtqkfz1 2020-05-15 2020-05-15 00:43:13
2xtqkfz1 2020-05-15 2020-05-15 23:30:31
qh34m4r9 2020-05-15 2020-05-15 19:00:32
idbkjh8u 2020-05-15 2020-05-15 19:00:55


qh34m4r9 2020-05-27 2020-05-27 19:25:06
tlmlq19s 2020-05-27 2020-05-27 19:50:44
ewvz3zm1 2020-05-27 2020-05-27 20:10:37
lkkjddam 2020-05-27 2020-05-27 20:19:58
hxj6brwj 2020-05-27 2020-05-27 21:18:28
vpy1a985 2020-05-27 2020-05-27 22:55:04
kyj367pi 2020-05-27 2020-05-27 23:27:50
idbkjh8u 2020-05-28 2020-05-28 06:02:30
rvhdl2la 2020-05-28 2020-05-28 07:26:59
745vq78e 2020-05-28 2020-05-28 07:37:39
itmylz3g 2020-05-28 2020-05-28 11:26:30
xxvnhauv 2020-05-28 2020-05-28 12:05:58
15tejjtw 2020-05-28 2020-05-28 16:03:02
5fvmg226 2020-05-28 2020-05-28 16:17:13
olaxadz5 2020-05-28 2020-05-28 16:20:26
vpy1a985 2020-05-29 2020-05-29 02:10:01
4i7679py 2020-05-29 2020-05-29 09:50:51
4i7679py 2020-05-29 2020-05-29 09:50:51
lkkjddam 2020-05-29 2020-05-29 12:11:24
lkkjddam 2020-05-29 2020-05-29 23:32:34
y1tvkx14 2020-05-29 2020-05-29 13:26:29
y1tvkx14 2020-05-29 2020-05-29 23:31:58
51opds1x 2020-05-29 2020-05-29 14:53:38
o6xwrota 2020-05-29 2020-05-29 17:44:50
o6xwrota 2020-05-29 2020-05-29 21:15:10


37sb8wql 2020-06-08 2020-06-08 00:58:23
itmylz3g 2020-06-08 2020-06-08 02:44:05
itmylz3g 2020-06-08 2020-06-08 19:00:42
zdpffrox 2020-06-08 2020-06-08 07:23:51
zdpffrox 2020-06-08 2020-06-08 20:40:50
4i7679py 2020-06-08 2020-06-08 10:07:16
tmexej5v 2020-06-08 2020-06-08 10:47:50
tlmlq19s 2020-06-08 2020-06-08 11:33:10
tlmlq19s 2020-06-08 2020-06-08 20:25:51
o6xwrota 2020-06-08 2020-06-08 12:16:05
axk49ssu 2020-06-08 2020-06-08 13:12:46
7dhu3pn7 2020-06-08 2020-06-08 13:20:01
7dhu3pn7 2020-06-08 2020-06-08 19:00:31
i4w8dx6l 2020-06-08 2020-06-08 13:28:59
i4w8dx6l 2020-06-08 2020-06-08 13:28:59
i4w8dx6l 2020-06-08 2020-06-08 21:56:09
i4w8dx6l 2020-06-08 2020-06-08 21:56:09
pgvvwyvh 2020-06-08 2020-06-08 14:01:53
pgvvwyvh 2020-06-08 2020-06-08 19:02:09
y4m7yv2u 2020-06-08 2020-06-08 15:05:03
idbkjh8u 2020-06-08 2020-06-08 17:07:22
olaxadz5 2020-06-08 2020-06-08 19:00:34
hrqrneay 2020-06-08 2020-06-08 19:00:58
1a9udoc5 2020-06-08 2020-06-08 19:02:45
qh34m4r9 2020-06-08 2020-06-08 19:06:19


zdpffrox 2020-06-17 2020-06-17 09:01:14
bayw6h9b 2020-06-17 2020-06-17 09:53:06
rnse61g4 2020-06-17 2020-06-17 14:31:45
rkem5aou 2020-06-17 2020-06-17 16:31:00
wi5p38l6 2020-06-17 2020-06-17 17:17:05
wi5p38l6 2020-06-17 2020-06-17 21:11:36
axk49ssu 2020-06-17 2020-06-17 18:24:48
axk49ssu 2020-06-17 2020-06-17 20:44:31
51opds1x 2020-06-17 2020-06-17 19:00:18
6rxyg4rp 2020-06-17 2020-06-17 19:00:27
5fvmg226 2020-06-17 2020-06-17 19:02:46
1a9udoc5 2020-06-17 2020-06-17 19:05:54
awa8uces 2020-06-17 2020-06-17 19:15:34
qh34m4r9 2020-06-17 2020-06-17 19:16:37
2xtqkfz1 2020-06-17 2020-06-17 19:24:23
rjcs3hyw 2020-06-17 2020-06-17 19:25:55
hxj6brwj 2020-06-17 2020-06-17 19:44:29
mm69prai 2020-06-17 2020-06-17 19:46:49
tmexej5v 2020-06-17 2020-06-17 19:59:42
eyf8oqwl 2020-06-17 2020-06-17 20:16:54


KeyboardInterrupt: 

## Integrating Objective Sleep Data

In [None]:
def add_fb_sleep(df,df_fb):
    """
    Adds the objective sleep data from Fitbit to the give dataframe
    
    """
    return df.merge(right=df_fb,on=["beiwe","redcap","beacon","date"])

In [None]:
data = add_fb_sleep(ema_activity,sleep_fb)

<div class="alert alert-block alert-success">
    
We have 974 observations when combining all datasets together: previous mood, activity, and sleep before filtering of any kind.  
    
</div>

---

<a id='summary'></a>

# Summary
The following cells summarize the available data for analysis and save them to the `processed` directory for future analysis.

## Daily EMAs
We have actually already saved these data to a file. We can compare just to make sure.

In [None]:
emas = pd.read_csv("../data/processed/beiwe-daily_ema-ux_s20.csv")
print(len(ema) == len(emas))

<div class="alert alert-block alert-success">
    
Just considering the EMAs, we have a total of **2149** observations constituting both mood and morning sleep reports.
    
</div>

In [None]:
ema.to_csv("../data/interim/mood_prediction/beiwe-beiwe-ema_morning-ema_evening.csv")

## Activity and Evening EMA

In [None]:
act_ema_e = add_fb_activity(ema_e,act_hourly)
act_ema_e = add_fb_activity(act_ema_e,act_hourly,"distance")

<div class="alert alert-block alert-success">
    
Now looking at the activity data and evening EMAs, we have **1999** observations.
    
</div>

In [None]:
act_ema_e.to_csv("../data/interim/mood_prediction/fitbit-beiwe-activity-ema_evening.csv")

## Sleep and Evening EMA

In [21]:
sleep_ema_e = add_fb_sleep(ema_e,sleep_fb)

<div class="alert alert-block alert-success">
    
We have **1108** observations.
    
</div>

In [22]:
sleep_ema_e.to_csv("../data/interim/mood_prediction/fitbit-beiwe-sleep-ema_evening.csv")

## Activity and EMAs

In [23]:
print("Number of Observations:",len(ema_activity))
print("Number of Pariticpants:",len(ema_activity["beiwe"].unique()))

Number of Observations: 1201
Number of Pariticpants: 51


<div class="alert alert-block alert-success">
    
We have **1402** observations.
    
</div>

In [24]:
ema_activity.to_csv("../data/interim/mood_prediction/fitbit-beiwe-beiwe-activity-ema_morning-ema_evening.csv")

## Sleep and EMAs

In [25]:
ema_sleep = add_fb_sleep(ema,sleep_fb)

<div class="alert alert-block alert-success">
    
We have **976** observations.
    
</div>

In [26]:
ema_sleep.to_csv("../data/interim/mood_prediction/fitbit-beiwe-beiwe-sleep-ema_morning-ema_evening.csv")

## Activity, Sleep, and Evening EMA

In [27]:
act_sleep_ema_e = add_fb_sleep(act_ema_e,sleep_fb)

<div class="alert alert-block alert-success">
    
We have **1106** observations - only two fewer than when we consider just sleep and the evening EMA.
    
</div>

In [28]:
act_sleep_ema_e.to_csv("../data/interim/mood_prediction/fitbit-fitbit-beiwe-activity-sleep-ema_evening.csv")

## Combined
All the datasets combined

In [29]:
print("Number of Observations:",len(data))

Number of Observations: 876


<div class="alert alert-block alert-success">
    
We have **974** observations - again just two fewer than when we include activity with the EMAs and sleep.
    
</div>

In [30]:
data.to_csv("../data/interim/mood_prediction/fitbit-fitbit-beiwe-beiwe-activity-sleep-ema_morning-ema_evening.csv")

---

[Back to Top](#top)