# Fitbit Exploratory Analysis
This notebook explores Fitbit metrics as outcomes to some explanatory variable. Again, we explore all the different modalities, but always with some Fitbit metric as the target value - and in some cases as both the explanatory and target variables. 

The notebook is organized as follows:
1. [Data Import and Pre-Processing](#import)
2. [Processing](#process)
3. [Visualization and Analysis](#visualize)

<a id='import'></a>
# Data Import
The following cells import the various data to help with visualization and analysis

In [165]:
import pandas as pd
import numpy as np
import os
from datetime import datetime

## Fitbit Data
Files from the ut1000 and ut2000 are read in individually and then merged together into one dataframe and tagged by their study name.

In [214]:
def combine_across_studies(dir_string='fitbit',file_string='dailySteps'):
    '''
    Imports data from ut1000 and ut2000 studies and combines into one dataframe.
    '''
    df = pd.DataFrame()
    for i in range(2):
        temp = pd.read_csv(f'../data/raw/ut{i+1}000/{dir_string}/{file_string}.csv')
        temp['study'] = f'ut{i+1}000'
        
        crossover = pd.read_csv(f'../data/raw/ut{i+1}000/admin/id_crossover.csv')
        if 'Id' in temp.columns:
            temp = pd.merge(left=temp,right=crossover,left_on='Id',right_on='record',how='left')
        elif 'pid' in temp.columns:
            temp = pd.merge(left=temp,right=crossover,left_on='pid',right_on='beiwe',how='left')
        
        df = pd.concat([df,temp])
        
    return df

### Steps
Steps can be used as a proxy for activity - up to a point. While it doesn't tell us how vigorous the activity was, we can at least see if the person had an active versus a "lazy" day.

In [215]:
steps = combine_across_studies('fitbit','dailySteps_merged')
steps['ActivityDay'] = pd.to_datetime(steps['ActivityDay'])
steps.set_index('ActivityDay',inplace=True)
steps.head()

Unnamed: 0_level_0,Id,StepTotal,study,record,beiwe,beacon
ActivityDay,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-10-29,1025,3989,ut1000,1025,2qki3fim,
2018-10-30,1025,7633,ut1000,1025,2qki3fim,
2018-10-31,1025,5497,ut1000,1025,2qki3fim,
2018-11-01,1025,8534,ut1000,1025,2qki3fim,
2018-11-02,1025,6512,ut1000,1025,2qki3fim,


### Sleep Summary
The sleep summary gives insight into how the participant slept each evening. We can look at each individual night if we desire, but for now we consider on the summary

In [209]:
sleep_summary = combine_across_studies('fitbit','sleepStagesDay_merged')
sleep_summary = sleep_summary[sleep_summary['TotalMinutesLight'] > 0]
sleep_summary['TotalMinutesNREM'] = sleep_summary['TotalMinutesLight'] + sleep_summary['TotalMinutesDeep']
sleep_summary['SleepDay'] = pd.to_datetime(sleep_summary['SleepDay'])
sleep_summary.set_index('SleepDay',inplace=True)
sleep_summary.head()

Unnamed: 0_level_0,Id,TotalSleepRecords,TotalMinutesAsleep,TotalTimeInBed,TotalTimeAwake,TotalMinutesLight,TotalMinutesDeep,TotalMinutesREM,study,record,beiwe,beacon,TotalMinutesNREM
SleepDay,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2018-10-30,1025,1,293,346,53,203,45,45,ut1000,1025,2qki3fim,,248
2018-10-31,1025,1,523,575,52,308,113,102,ut1000,1025,2qki3fim,,421
2018-11-01,1025,1,288,326,38,169,47,72,ut1000,1025,2qki3fim,,216
2018-11-02,1025,1,529,592,63,316,99,114,ut1000,1025,2qki3fim,,415
2018-11-03,1025,1,415,480,65,237,96,82,ut1000,1025,2qki3fim,,333


## Beiwe Mood Data
Mood data comes from surveys taken by the participants and has already been summarized nicely in a csv by Peter Wu. Here we combine them into one file.

In [217]:
def process_mood(mood='content'):
    '''
    
    '''
    df = combine_across_studies('beiwe',mood)
    df['date'] = pd.to_datetime(df['date'])
    df.set_index('date',inplace=True)
    df.drop('Unnamed: 0',axis=1,inplace=True)
    df[mood] = df['answer']
    df.drop('answer',axis=1,inplace=True)
    
    return df

### Contentment

In [218]:
content = process_mood('content')
content.head()

Unnamed: 0_level_0,datetime,pid,study,record,beiwe,beacon,content
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2018-10-17 14:15:51,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-19,2018-10-19 19:06:20,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-23,2018-10-23 13:01:16,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-24,2018-10-25 02:35:39,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-27,2018-10-27 23:29:35,11i3mr4n,ut1000,1065.0,11i3mr4n,,2


### Sadness

In [219]:
sad = process_mood('sad')
sad.head()

Unnamed: 0_level_0,datetime,pid,study,record,beiwe,beacon,sad
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2018-10-17 14:15:51,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-19,2018-10-19 19:06:20,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-23,2018-10-23 13:01:16,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-24,2018-10-25 02:35:39,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-27,2018-10-27 23:29:35,11i3mr4n,ut1000,1065.0,11i3mr4n,,0


### Loneliness

In [220]:
lonely = process_mood('lonely')
lonely.head()

Unnamed: 0_level_0,datetime,pid,study,record,beiwe,beacon,lonely
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2018-10-17 14:15:51,11i3mr4n,ut1000,1065.0,11i3mr4n,,1
2018-10-19,2018-10-19 19:06:20,11i3mr4n,ut1000,1065.0,11i3mr4n,,1
2018-10-23,2018-10-23 13:01:16,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-24,2018-10-25 02:35:39,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-27,2018-10-27 23:29:35,11i3mr4n,ut1000,1065.0,11i3mr4n,,0


### Stress

In [222]:
stress = process_mood('stress')
stress.head()

Unnamed: 0_level_0,datetime,pid,study,record,beiwe,beacon,stress
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2018-10-17 14:15:51,11i3mr4n,ut1000,1065.0,11i3mr4n,,1
2018-10-19,2018-10-19 19:06:20,11i3mr4n,ut1000,1065.0,11i3mr4n,,1
2018-10-23,2018-10-23 13:01:16,11i3mr4n,ut1000,1065.0,11i3mr4n,,1
2018-10-24,2018-10-25 02:35:39,11i3mr4n,ut1000,1065.0,11i3mr4n,,0
2018-10-27,2018-10-27 23:29:35,11i3mr4n,ut1000,1065.0,11i3mr4n,,0


### Energy

In [224]:
energy = process_mood('energy')
energy.head()

Unnamed: 0_level_0,datetime,pid,study,record,beiwe,beacon,energy
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2018-10-17 14:15:51,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-19,2018-10-19 19:06:20,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-23,2018-10-23 13:01:16,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-27,2018-10-27 23:29:35,11i3mr4n,ut1000,1065.0,11i3mr4n,,2
2018-10-28,2018-10-29 01:07:32,11i3mr4n,ut1000,1065.0,11i3mr4n,,2


In [228]:
mood_strs = ['content','sad','lonely','stress','energy']
moods = [content,sad,lonely,stress,energy]
mood = pd.DataFrame()
for bid in content['beiwe'].unique():
    frames = []
    for df,mood_str in zip(moods,mood_strs):
        df = df[df['beiwe'] == bid]
        mood_only = df[mood_str]
        frames.append(mood_only)
        
    if len(df) > 0:
        mood_comp = pd.concat(frames,axis=1,join='inner')
        mood_comp['beiwe'] = bid
        mood_comp['study'] = df['study'][0]
        mood = pd.concat([mood,mood_comp])

In [229]:
mood

Unnamed: 0_level_0,content,sad,lonely,stress,energy,beiwe,study
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-17,2,0,1,1,2,11i3mr4n,ut1000
2018-10-19,2,0,1,1,2,11i3mr4n,ut1000
2018-10-23,2,0,0,1,2,11i3mr4n,ut1000
2018-10-27,2,0,0,0,2,11i3mr4n,ut1000
2018-10-28,2,0,0,0,2,11i3mr4n,ut1000
...,...,...,...,...,...,...,...
2019-03-05,1,0,0,0,2,srwzz5e3,ut2000
2019-03-07,1,1,1,1,1,srwzz5e3,ut2000
2019-03-09,2,1,2,1,3,srwzz5e3,ut2000
2019-03-11,1,1,2,2,2,srwzz5e3,ut2000


<a id='process'></a>
# Data Processing
The next cells further process or create new dataframes out of the data just imported

## Mood and Sleep
The following cells create a dataframe that allows us to look more deeply at the relationship between mood and sleep.

In [100]:
mood_and_sleep = pd.DataFrame()
for bid in sleep_summary['beiwe'].unique():
    mood
    for mood, mood_str in zip(moods, mood_strs):
        pass

<a id='visualize'></a>
# Visualization and Analysis

In [45]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns