# Fitbit Exploration
For an explanation on the variables, take a look at the [data dictionary created by Fitabase](https://www.fitabase.com/media/1546/fitabasedatadictionary.pdf).

## Sleep Sensitivity - 1 Variable
In this notebook we take a look at the individual variables that might be affecting sleep

In [1]:
import os
import sys
sys.path.append('../')

import pandas as pd
import numpy as np

from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap

from joypy import joyplot

# Data Import
Sleep data are divided into two primary datasets:

1. Sleep Summaries by Day (daily sleep)
2. Sleep Data by Minute (sleep stages)

In [2]:
daily_sleep = pd.read_csv("../data/processed/fitbit-sleep_daily-ux_s20.csv",parse_dates=["date","start_time","end_time"],infer_datetime_format=True)
# converting duration to something that makes more sense...
daily_sleep['tst'] = daily_sleep['duration_ms'] / 3600000
daily_sleep = daily_sleep[daily_sleep["main_sleep"] == True]
daily_sleep.drop(["minutes_to_sleep","main_sleep"],axis=1,inplace=True)
daily_sleep = daily_sleep[['beiwe', 'start_time', 'end_time', 'date','tst','duration_ms','minutes_after_wakeup', 'minutes_asleep', 'minutes_awake', 'time_in_bed', 'efficiency']]
daily_sleep.head()

Unnamed: 0,beiwe,start_time,end_time,date,tst,duration_ms,minutes_after_wakeup,minutes_asleep,minutes_awake,time_in_bed,efficiency
0,hfttkth7,2020-05-14 00:27:00,2020-05-14 07:13:00,2020-05-14,6.766667,24360000,0,379,27,406,97
1,hfttkth7,2020-05-14 23:53:30,2020-05-15 08:06:30,2020-05-15,8.216667,29580000,8,392,101,493,87
2,hfttkth7,2020-05-15 23:28:00,2020-05-16 04:57:00,2020-05-16,5.483333,19740000,7,287,42,329,95
3,hfttkth7,2020-05-17 02:01:30,2020-05-17 09:28:30,2020-05-17,7.45,26820000,8,403,44,447,96
4,hfttkth7,2020-05-18 00:24:00,2020-05-18 07:20:00,2020-05-18,6.933333,24960000,0,351,65,416,92


In [3]:
sleep_stages = pd.read_csv("../data/processed/fitbit-sleep_stages-ux_s20.csv",parse_dates=["start_date","end_date","time"],infer_datetime_format=True)
sleep_stages.head()

Unnamed: 0,start_date,end_date,time,stage,time_at_stage,beiwe,value
0,2020-05-14,2020-05-14,2020-05-14 00:27:00,wake,510,hfttkth7,0
1,2020-05-14,2020-05-14,2020-05-14 00:35:30,light,420,hfttkth7,1
2,2020-05-14,2020-05-14,2020-05-14 00:42:30,deep,1590,hfttkth7,2
3,2020-05-14,2020-05-14,2020-05-14 01:09:00,light,1290,hfttkth7,1
4,2020-05-14,2020-05-14,2020-05-14 01:30:30,rem,840,hfttkth7,3


The [data dictionary](https://www.fitabase.com/media/1546/fitabasedatadictionary.pdf) for these variables can be quite enlightening as many of these variables are useless.

# Getting Features
Here we combine datasets across the Fitbit, EMAs, and Beacon

In [4]:
beacon = pd.read_csv("../data/processed/beacon-fb_ema_and_gps_filtered-ux_s20.csv",index_col=0,parse_dates=True,infer_datetime_format=True)
beacon.columns

Index(['lat', 'long', 'altitude', 'accuracy', 'tvoc', 'lux', 'no2', 'co',
       'co2', 'pm1_number', 'pm2p5_number', 'pm10_number', 'pm1_mass',
       'pm2p5_mass', 'pm10_mass', 'temperature_c', 'rh', 'beacon', 'beiwe',
       'fitbit', 'redcap', 'start_time', 'end_time'],
      dtype='object')

In [5]:
beacon_mean = pd.DataFrame()
for pt in beacon["beiwe"].unique():
    beacon_by_pt = beacon[beacon["beiwe"] == pt]
    ids = beacon_by_pt[["end_time","beacon","beiwe","fitbit","redcap"]]
    beacon_by_pt.drop(["beiwe","fitbit","redcap","end_time"],axis=1,inplace=True)
    little = beacon_by_pt.groupby("start_time").min()
    big = beacon_by_pt.groupby("start_time").max()
    beacon_mean_by_pt = big - little
    beacon_mean_by_pt["end_time"] = ids["end_time"].unique()
    for col in ids.columns[1:]:
        beacon_mean_by_pt[col] = ids[col][0]
    beacon_mean = beacon_mean.append(beacon_mean_by_pt)
    
beacon_mean

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0_level_0,lat,long,altitude,accuracy,tvoc,lux,no2,co,co2,pm1_number,...,pm1_mass,pm2p5_mass,pm10_mass,temperature_c,rh,beacon,end_time,beiwe,fitbit,redcap
start_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-08-10 04:42:30,0.00018,0.00011,37.31855,40.48316,91.900000,0.1360,,0.731650,304.035761,2.009295,...,0.202818,0.620298,0.627820,1.000,0.950000,21,2020-08-10 12:35:30,lkkjddam,25,12
2020-08-12 02:59:30,0.00010,0.00005,12.98383,0.53336,108.600000,0.2040,,0.512533,146.009840,5.033866,...,0.429567,0.702017,0.657346,0.000,1.500000,21,2020-08-12 10:52:30,lkkjddam,25,12
2020-08-14 03:05:00,0.00034,0.00014,2.83618,18.97467,81.800000,2.0400,,5.108383,233.372161,3.178799,...,0.254569,0.736714,0.783074,1.175,1.250000,21,2020-08-14 11:23:30,lkkjddam,25,12
2020-08-16 04:21:30,0.00007,0.00008,32.37683,11.82401,88.216667,2.0400,,0.312050,96.153213,3.125266,...,0.216821,0.471061,0.445183,0.500,0.583333,21,2020-08-16 11:53:00,lkkjddam,25,12
2020-08-17 03:00:00,0.00027,0.00032,41.06684,47.69561,249.066667,2.0400,,0.819100,331.097623,6.091469,...,0.488183,1.358703,1.371396,1.750,1.500000,21,2020-08-17 11:30:30,lkkjddam,25,12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-08-24 00:25:00,0.00019,0.00027,4.35970,1402.95731,95.916667,4.5424,,0.805100,58.582825,14.461965,...,0.956106,1.626959,1.629446,1.000,5.500000,36,2020-08-24 07:15:30,tlmlq19s,9,47
2020-08-25 23:46:30,0.00897,0.01069,7.93757,9551.28418,140.400000,8.0784,,2.340050,115.775437,8.941261,...,0.702543,1.098767,1.033726,2.000,6.000000,36,2020-08-26 08:03:00,tlmlq19s,9,47
2020-08-30 01:30:00,0.00131,0.00253,11.62827,1409.35997,98.950000,11.5600,,1.313850,166.573677,5.201494,...,0.370893,0.974239,1.013575,1.000,11.000000,36,2020-08-30 08:36:30,tlmlq19s,9,47
2020-08-30 23:42:30,0.00023,0.00028,11.86171,1409.31074,88.700000,4.0800,,2.661700,80.088870,56.705639,...,3.394869,3.543439,2.882672,1.250,7.000000,36,2020-08-31 07:26:00,tlmlq19s,9,47


In [6]:
beacon = pd.read_csv("../data/processed/fitbit_beiwe_beacon-sleep_summary-ux_s20.csv")
beacon.columns

Index(['date', 'start_date', 'end_date', 'deep_count', 'deep_minutes',
       'light_count', 'light_minutes', 'rem_count', 'rem_minutes',
       'wake_count', 'wake_minutes', 'beiwe', 'tst_fb', 'efficiency',
       'end_time', 'minutes_after_wakeup', 'minutes_asleep', 'minutes_awake',
       'minutes_to_sleep', 'start_time', 'time_in_bed', 'redcap_x', 'beacon_x',
       'tst_ema', 'sol_ema', 'naw_ema', 'restful_ema', 'beacon_y', 'fitbit',
       'redcap_y'],
      dtype='object')

In [7]:
beacon = pd.read_csv("../data/processed/beacon-fb_ema_and_gps_filtered-ux_s20.csv")

In [8]:
beacon.columns

Index(['timestamp', 'lat', 'long', 'altitude', 'accuracy', 'tvoc', 'lux',
       'no2', 'co', 'co2', 'pm1_number', 'pm2p5_number', 'pm10_number',
       'pm1_mass', 'pm2p5_mass', 'pm10_mass', 'temperature_c', 'rh', 'beacon',
       'beiwe', 'fitbit', 'redcap', 'start_time', 'end_time'],
      dtype='object')

# Reviewing Datasets

In [9]:
emas = pd.DataFrame()
for pt in morning["beiwe"].unique():
    morning_by_pt = morning[morning["beiwe"] == pt]
    evening_by_pt = evening[evening["beiwe"] == pt]
    ema_by_pt = morning_by_pt.merge(evening_by_pt,left_on=["date","beiwe"],right_on=["date","beiwe"],suffixes=('_morning', '_evening'))
    emas = emas.append(ema_by_pt)

NameError: name 'morning' is not defined

In [None]:
emas

In [None]:
emas = morning.merge(evening,left_on=["date","beiwe"],right_on=["date","beiwe"],suffixes=('_morning', '_evening'))
emas

In [10]:
df1 = pd.read_csv("../data/processed/fitbit-sleep_data_summary-ux_s20.csv",parse_dates=["end_date"],infer_datetime_format=True)
df1.columns

Index(['start_date', 'end_date', 'deep_count', 'deep_minutes', 'light_count',
       'light_minutes', 'rem_count', 'rem_minutes', 'wake_count',
       'wake_minutes', 'beiwe', 'duration_ms', 'efficiency', 'end_time',
       'main_sleep', 'minutes_after_wakeup', 'minutes_asleep', 'minutes_awake',
       'minutes_to_sleep', 'start_time', 'time_in_bed', 'redcap', 'beacon'],
      dtype='object')

In [None]:
df2 = pd.read_csv("../data/processed/fitbit-daily-ux_s20.csv",parse_dates=["timestamp"],infer_datetime_format=True)
df2

In [None]:
df1.merge(df2,left_on=["end_date","beiwe"],right_on=["timestamp","beiwe"])

In [None]:
df3 = pd.read_csv("../data/processed/fitbit_beiwe_beacon-sleep_summary-ux_s20.csv")
df3

In [18]:
df4 = pd.read_csv("../data/processed/beacon-fb_and_gps_filtered_summary-ux_s20.csv")
df4.describe()

  x2 = take(ap, indices_above, axis=axis) * weights_above


Unnamed: 0,lat_mean,long_mean,altitude_mean,accuracy_mean,tvoc_mean,lux_mean,no2_mean,co_mean,co2_mean,pm1_number_mean,...,co_delta_percent,co2_delta_percent,pm1_number_delta_percent,pm2p5_number_delta_percent,pm10_number_delta_percent,pm1_mass_delta_percent,pm2p5_mass_delta_percent,pm10_mass_delta_percent,temperature_c_delta_percent,rh_delta_percent
count,226.0,226.0,226.0,226.0,225.0,224.0,120.0,226.0,226.0,212.0,...,226.0,226.0,212.0,212.0,212.0,212.0,212.0,226.0,226.0,226.0
mean,31.629575,-97.223753,182.485283,194.578042,265.534761,3.332562,1.079495,2.609954,969.410538,8.507479,...,54.221409,22.092101,194996.9,15185.57,9279.687057,780.383287,122.637226,56.10226,5.000845,7.908389
std,1.362328,0.591867,29.62705,244.599722,144.383798,7.050839,0.326953,2.614681,376.737318,9.678663,...,285.667824,17.20967,2001081.0,82217.12,62619.940045,2386.167281,89.571312,46.607221,3.243378,5.079784
min,30.28013,-97.751836,63.792565,4.742241,19.016547,0.0,-0.276564,-0.031249,421.230566,0.116044,...,-1301.118454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,30.357352,-97.644297,175.716933,32.119923,193.786395,0.007692,0.949357,1.010439,732.437325,2.388951,...,4.759512,12.290426,88.37229,89.88501,95.291799,89.822538,84.067439,37.321422,3.464912,4.892089
50%,30.588365,-97.425958,178.101419,75.802632,239.581167,0.514521,1.083818,2.187965,889.380722,6.536288,...,44.894217,18.616984,231.9995,226.3937,230.549436,170.986867,111.87587,51.377256,3.847875,6.984876
75%,33.142101,-96.874433,190.61855,314.072679,312.516468,2.034121,1.19041,3.252632,1163.209247,10.236842,...,85.750103,27.267137,1476.159,1868.23,1536.445868,535.733222,144.584847,71.765016,7.430364,9.919974
max,33.162805,-94.41777,256.76236,1891.25,1331.825231,50.91024,2.032779,11.05362,1992.805586,88.538117,...,3336.955371,183.183303,23976640.0,1087498.0,885146.771987,21616.81763,763.579487,479.628399,16.993464,38.947368


### Combining all features

In [262]:
beacon_summary = pd.read_csv("../data/processed/beacon-fb_and_gps_filtered_summary-ux_s20.csv",parse_dates=["start_time","end_time"])
beacon_summary.head()

Unnamed: 0,start_time,lat_mean,long_mean,altitude_mean,accuracy_mean,tvoc_mean,lux_mean,no2_mean,co_mean,co2_mean,...,co_delta_percent,co2_delta_percent,pm1_number_delta_percent,pm2p5_number_delta_percent,pm10_number_delta_percent,pm1_mass_delta_percent,pm2p5_mass_delta_percent,pm10_mass_delta_percent,temperature_c_delta_percent,rh_delta_percent
0,2020-08-10 04:42:30,30.588263,-97.42594,137.223617,24.191533,131.430556,2.033737,,10.382481,1621.787836,...,7.308497,20.102194,1988.835796,1537.24747,2876.913111,743.505163,95.586287,37.868365,4.347826,2.918587
1,2020-08-11 03:35:00,30.58849,-97.426007,139.133097,15.21447,227.082389,2.031387,,6.945255,1629.768064,...,96.387159,13.312037,100284.234526,31311.555734,7488.907906,3309.270445,109.687766,37.248983,6.875,6.060606
2,2020-08-12 02:59:30,30.58826,-97.425955,142.766695,8.60431,251.788636,2.010636,,10.264054,1600.857712,...,5.133101,9.517651,4856.056983,55213.832386,2256.822629,4736.118611,104.033351,38.959233,0.0,4.615385
3,2020-08-14 03:05:00,30.588369,-97.426019,151.448031,14.086097,226.733677,1.092907,,8.290977,1291.110197,...,75.329702,20.016621,6251.943821,110420.396383,8474.565463,2092.818246,139.357113,51.725844,4.921466,3.90625
4,2020-08-15 06:51:30,30.588487,-97.425897,150.130393,13.397817,212.337528,2.035143,,10.892455,1274.717405,...,6.512018,10.322555,38402.240034,21141.654778,885146.771987,6240.666227,410.901595,103.241479,4.347826,7.686567


In [263]:
ema_morning = pd.read_csv("../data/processed/beiwe-morning_ema-ux_s20.csv",parse_dates=["timestamp"])
ema_morning

Unnamed: 0,timestamp,beiwe,content,stress,lonely,sad,energy,tst,sol,naw,restful,redcap,beacon
0,2020-05-13 09:10:27,qh34m4r9,3,0.0,0.0,0.0,1.0,8.0,20.0,2.0,3.0,68,19.0
1,2020-05-13 09:15:49,awa8uces,0.0,2.0,1.0,1.0,1.0,2.0,10.0,3.0,1.0,28,26.0
2,2020-05-13 09:42:19,xxvnhauv,1,1.0,1.0,3.0,0.0,6.0,30.0,3.0,1.0,21,22.0
3,2020-05-13 09:43:27,rvhdl2la,1,1.0,2.0,3.0,0.0,5.3,5.0,2.0,2.0,29,
4,2020-05-13 10:11:51,q5y11ytm,3,1.0,0.0,1.0,2.0,2.0,10.0,0.0,0.0,48,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2465,2020-09-01 13:10:16,7dhu3pn7,2,2.0,0.0,0.0,2.0,8.0,5.0,0.0,3.0,64,
2466,2020-09-01 14:14:17,745vq78e,3,0.0,0.0,0.0,2.0,7.6,0.0,1.0,2.0,55,5.0
2467,2020-09-01 16:56:03,rkem5aou,2,1.0,2.0,1.0,2.0,6.0,10.0,0.0,2.0,85,
2468,2020-09-01 17:28:26,axk49ssu,2,2.0,0.0,1.0,1.0,7.0,40.0,3.0,1.0,52,


In [264]:
# combining
beacon_summary["date"] = beacon_summary["end_time"].dt.date
ema_morning["date"] = ema_morning["timestamp"].dt.date
ema_morning.rename({"timestamp":"timestamp_ema_morning","content":"content_morning","stress":"stress_morning","lonely":"lonely_morning","sad":"sad_morning","energy":"energy_morning"},axis=1,inplace=True)
combined = beacon_summary.merge(right=ema_morning,left_on=["beiwe","redcap","beacon","date"],right_on=["beiwe","redcap","beacon","date"])

In [265]:
combined

Unnamed: 0,start_time,lat_mean,long_mean,altitude_mean,accuracy_mean,tvoc_mean,lux_mean,no2_mean,co_mean,co2_mean,...,timestamp_ema_morning,content_morning,stress_morning,lonely_morning,sad_morning,energy_morning,tst,sol,naw,restful
0,2020-08-10 04:42:30,30.588263,-97.425940,137.223617,24.191533,131.430556,2.033737,,10.382481,1621.787836,...,2020-08-10 12:56:38,1,2.0,0.0,0.0,2.0,6.0,15.0,5.0,0.0
1,2020-08-12 02:59:30,30.588260,-97.425955,142.766695,8.604310,251.788636,2.010636,,10.264054,1600.857712,...,2020-08-12 11:11:42,2,3.0,0.0,0.0,2.0,8.0,10.0,3.0,2.0
2,2020-08-14 03:05:00,30.588369,-97.426019,151.448031,14.086097,226.733677,1.092907,,8.290977,1291.110197,...,2020-08-14 15:02:44,2,3.0,1.0,2.0,3.0,8.0,10.0,3.0,3.0
3,2020-08-16 04:21:30,30.588345,-97.426070,160.691105,10.360485,271.814000,0.926160,,10.485822,1322.883169,...,2020-08-16 12:25:03,1,2.0,1.0,2.0,1.0,6.0,20.0,4.0,1.0
4,2020-08-17 03:00:00,30.588360,-97.425990,151.134656,19.618770,318.570173,1.077987,,11.053620,1397.884229,...,2020-08-17 11:41:47,1,2.0,1.0,1.0,1.0,6.0,25.0,2.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
127,2020-08-24 00:25:00,30.305750,-97.722996,183.878215,174.374705,231.406713,0.755933,,3.236772,735.855429,...,2020-08-24 14:05:57,3,0.0,0.0,0.0,3.0,7.0,10.0,2.0,3.0
128,2020-08-25 23:46:30,30.305907,-97.723398,184.739720,352.468168,234.280011,0.753636,,2.947662,745.332146,...,2020-08-26 21:31:55,2,1.0,0.0,0.0,1.0,7.0,20.0,3.0,2.0
129,2020-08-30 01:30:00,30.305750,-97.722936,184.636819,202.872929,39.635714,1.408868,,2.433042,622.872286,...,2020-08-30 13:36:48,3,0.0,0.0,0.0,3.0,7.0,15.0,3.0,2.0
130,2020-08-30 23:42:30,30.305737,-97.723005,185.106642,264.704557,19.016547,0.249041,,1.451173,549.733238,...,2020-08-31 09:02:26,2,0.0,0.0,0.0,2.0,7.0,10.0,2.0,2.0


In [266]:
def get_ema_distribution(row):
    """
    
    """
    if row["start_time"].hour < 19:
        d = row["start_time"] - timedelta(days=1)
    else:
        d = row["start_time"]
        
    return datetime(d.year,d.month,d.day,19,0,0)

In [267]:
def insert_ema_timestamp(row, ema):
    """
    
    """
    bid = row["beiwe"]
    ema_by_id = ema[ema["beiwe"] == bid]
    for ts in ema_by_id["timestamp"]:
        if ts.date == row["start_time"].date and ts.hour < row["start_time"].hour:
            return ts
        elif ts > row["ema_evening_distribution"] and ts < row["start_time"]:
            return ts
        
    return np.nan

In [268]:
ema_evening = pd.read_csv("../data/processed/beiwe-evening_ema-ux_s20.csv",parse_dates=["timestamp"])
#ema_evening["date"] = ema_evening["timestamp"].dt.date
ema_evening.rename({"content":"content_evening","stress":"stress_evening","lonely":"lonely_evening","sad":"sad_evening","energy":"energy_evening"},axis=1,inplace=True)
combined["ema_evening_distribution"] = combined.apply(get_ema_distribution, axis="columns")
combined["ema_evening_timestamp"] = combined.apply(lambda x: insert_ema_timestamp(x,ema_evening), axis="columns")
more_combined = combined.merge(right=ema_evening,left_on=["beiwe","redcap","beacon","ema_evening_timestamp"],right_on=["beiwe","redcap","beacon","timestamp"])
more_combined.drop(["timestamp","date","ema_evening_distribution"],axis="columns",inplace=True)

In [269]:
more_combined

Unnamed: 0,start_time,lat_mean,long_mean,altitude_mean,accuracy_mean,tvoc_mean,lux_mean,no2_mean,co_mean,co2_mean,...,tst,sol,naw,restful,ema_evening_timestamp,content_evening,stress_evening,lonely_evening,sad_evening,energy_evening
0,2020-08-10 04:42:30,30.588263,-97.42594,137.223617,24.191533,131.430556,2.033737,,10.382481,1621.787836,...,6.0,15.0,5.0,0.0,2020-08-09 21:11:43,0.0,3,1.0,0,0
1,2020-08-17 03:00:00,30.58836,-97.42599,151.134656,19.61877,318.570173,1.077987,,11.05362,1397.884229,...,6.0,25.0,2.0,1.0,2020-08-16 21:05:35,2.0,1,0.0,1,1
2,2020-06-29 00:15:30,33.161306,-96.874194,178.062775,637.135298,190.825114,0.0,1.083512,-0.007507,747.519295,...,6.0,15.0,1.0,2.0,2020-06-28 19:28:10,3.0,0,0.0,0,3
3,2020-07-05 23:24:30,33.161414,-96.874709,177.223688,562.217737,265.916821,0.0,1.091302,-0.011322,862.979704,...,7.0,20.0,2.0,2.0,2020-07-05 20:28:07,3.0,0,0.0,0,4
4,2020-07-26 22:50:30,33.161554,-96.874795,177.498635,419.860099,216.897193,0.0,1.077144,-0.024787,867.012079,...,7.0,20.0,3.0,2.0,2020-07-26 21:39:50,3.0,0,0.0,0,2
5,2020-08-02 23:25:00,33.161385,-96.874547,179.54179,386.303089,312.516468,0.0,1.085811,-0.028164,1004.128575,...,7.0,20.0,3.0,2.0,2020-08-02 20:36:13,2.0,1,0.0,0,2
6,2020-08-09 23:28:30,33.161302,-96.87473,177.697848,373.488192,319.088829,0.049757,1.095666,-0.001468,762.428713,...,7.0,20.0,2.0,3.0,2020-08-09 19:18:53,3.0,1,0.0,0,3
7,2020-08-16 22:57:00,33.161676,-96.87434,178.188998,372.880521,236.472279,0.0,1.087414,-0.023875,974.974803,...,7.0,30.0,6.0,3.0,2020-08-16 19:17:32,3.0,0,0.0,0,3
8,2020-06-14 22:35:00,30.396225,-97.644259,209.661404,47.07358,239.581167,0.221301,,3.362322,977.369967,...,7.0,15.0,2.0,2.0,2020-06-14 19:00:42,3.0,0,0.0,0,3
9,2020-06-27 23:26:30,30.396236,-97.644256,211.023729,55.878203,420.157917,1.083652,,3.068292,1094.804764,...,10.0,5.0,2.0,3.0,2020-06-27 21:38:49,3.0,0,0.0,0,4


In [270]:
activity = pd.read_csv("../data/processed/fitbit-daily-ux_s20.csv",parse_dates=["timestamp"])
more_combined["date"] = pd.to_datetime(more_combined["end_time"].dt.date - timedelta(days=1))
more_more_combined = more_combined.merge(right=activity,left_on=["date","beiwe"],right_on=["timestamp","beiwe"])

In [271]:
more_more_combined

Unnamed: 0,start_time,lat_mean,long_mean,altitude_mean,accuracy_mean,tvoc_mean,lux_mean,no2_mean,co_mean,co2_mean,...,sedentary_minutes,lightly_active_minutes,fairly_active_minutes,very_active_minutes,calories_from_activities,bmi,fat,weight,food_calories_logged,water_logged
0,2020-08-10 04:42:30,30.588263,-97.42594,137.223617,24.191533,131.430556,2.033737,,10.382481,1621.787836,...,715,320,13,0,1381.0,24.58577,23.756001,166.18,0.0,0.0
1,2020-08-17 03:00:00,30.58836,-97.42599,151.134656,19.61877,318.570173,1.077987,,11.05362,1397.884229,...,687,235,2,65,1487.0,24.956938,23.667,168.69,0.0,0.0
2,2020-06-29 00:15:30,33.161306,-96.874194,178.062775,637.135298,190.825114,0.0,1.083512,-0.007507,747.519295,...,804,2,0,0,6.0,26.785088,0.0,185.0,0.0,0.0
3,2020-07-05 23:24:30,33.161414,-96.874709,177.223688,562.217737,265.916821,0.0,1.091302,-0.011322,862.979704,...,501,97,4,73,1122.0,26.785088,0.0,185.0,0.0,0.0
4,2020-07-26 22:50:30,33.161554,-96.874795,177.498635,419.860099,216.897193,0.0,1.077144,-0.024787,867.012079,...,1353,87,0,0,364.0,26.785088,0.0,185.0,0.0,0.0
5,2020-08-02 23:25:00,33.161385,-96.874547,179.54179,386.303089,312.516468,0.0,1.085811,-0.028164,1004.128575,...,589,165,18,37,1126.0,26.785088,0.0,185.0,0.0,0.0
6,2020-08-09 23:28:30,33.161302,-96.87473,177.697848,373.488192,319.088829,0.049757,1.095666,-0.001468,762.428713,...,635,103,15,99,1433.0,26.785088,0.0,185.0,0.0,0.0
7,2020-08-16 22:57:00,33.161676,-96.87434,178.188998,372.880521,236.472279,0.0,1.087414,-0.023875,974.974803,...,815,101,20,10,636.0,26.785088,0.0,185.0,0.0,0.0
8,2020-06-14 22:35:00,30.396225,-97.644259,209.661404,47.07358,239.581167,0.221301,,3.362322,977.369967,...,664,224,46,14,942.0,21.203255,0.0,108.0,0.0,0.0
9,2020-06-27 23:26:30,30.396236,-97.644256,211.023729,55.878203,420.157917,1.083652,,3.068292,1094.804764,...,331,82,19,1,295.0,21.203255,0.0,108.0,0.0,0.0


In [275]:
fb_all.columns

Index(['start_date', 'end_date', 'deep_count', 'deep_minutes', 'light_count',
       'light_minutes', 'rem_count', 'rem_minutes', 'wake_count',
       'wake_minutes', 'beiwe', 'duration_ms', 'efficiency', 'end_time',
       'main_sleep', 'minutes_after_wakeup', 'minutes_asleep', 'minutes_awake',
       'minutes_to_sleep', 'start_time', 'time_in_bed', 'redcap', 'beacon'],
      dtype='object')

In [272]:
fb_all = pd.read_csv("../data/processed/fitbit-sleep_data_summary-ux_s20.csv",parse_dates=["start_date","end_date","start_time","end_time"])

In [278]:
more_more_more_combined = more_more_combined.merge(right=fb_all,left_on=["start_time","end_time","beiwe","redcap","beacon"],right_on=["start_time","end_time","beiwe","redcap","beacon"])

In [279]:
more_more_more_combined.drop(["date","timestamp","start_date","end_date","bmi","bmr","fat","weight","food_calories_logged","water_logged"],axis="columns",inplace=True)

In [304]:
study_suffix="ux_s20"
morning = pd.read_csv(f"../data/processed/beiwe-morning_ema-{study_suffix}.csv",parse_dates=["timestamp"],infer_datetime_format=True)
morning["date"] = morning["timestamp"].dt.date
evening = pd.read_csv(f"../data/processed/beiwe-evening_ema-{study_suffix}.csv",parse_dates=["timestamp"],infer_datetime_format=True)
evening["date"] = evening["timestamp"].dt.date
emas = morning.merge(evening,left_on=["date","beiwe","redcap","beacon"],right_on=["date","beiwe","redcap","beacon"],suffixes=('_morning', '_evening'))

In [305]:
for c in ["content","stress","lonely","sad","energy"]:
    emas[f"{c}_mean"] = emas[[f"{c}_morning",f"{c}_evening"]].mean(axis=1)

In [307]:
emas.columns

Index(['timestamp_morning', 'beiwe', 'content_morning', 'stress_morning',
       'lonely_morning', 'sad_morning', 'energy_morning', 'tst', 'sol', 'naw',
       'restful', 'redcap', 'beacon', 'date', 'timestamp_evening',
       'content_evening', 'stress_evening', 'lonely_evening', 'sad_evening',
       'energy_evening', 'content', 'stress', 'lonely', 'sad', 'energy'],
      dtype='object')