In [1]:
import warnings
warnings.filterwarnings('ignore')

# Activity Comparison
In this notebook we look at activity levels from participants from the UT1000, UT2000, and UTx000 studies. 

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 200)
import numpy as np

from datetime import datetime, timedelta

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import seaborn as sns
import matplotlib.dates as mdates

from scipy import stats
from scipy.stats import linregress

# Table of Contents
1. [Data Import](#data_import)
    1. [Target Data: Fitbit Sleep](#targets)
    2. [Feature Data: Fitbit Activity](#features)
2. [Pre-Processing](#preprocessing)
    1. [Target Data](#target_data)
        1. Summary
        1. Scaling
    2. [Feature Data](#feature_data)
3. [Analyzing Relationships](#analysis)

In [2]:
import sys
sys.path.append('../')
%load_ext autoreload
%autoreload 2

<a id='data_import'></a>

# Constant Participation
We might check to see if there are any participants that have been in all three studes.

## Importing EIDs
The only ID that would link participants across studies _for certain_ would be their EIDs. The REDCap database has all this information.

In [25]:
eids_1 = pd.read_csv("../data/raw/ut1000/admin/participant_eids.csv")
eids_2 = pd.read_csv("../data/raw/ut2000/admin/participant_eids.csv")
eids_x = pd.read_csv("../data/raw/utx000/admin/participant_eids.csv")

## Comparing Participants Across Studies
Now we can compare the EIDs across studies and see what overlap we have. 

In [26]:
from functools import reduce

dfs = [eids_1,eids_x]
one_and_x = reduce(lambda left,right: pd.merge(left,right,on='uteid'), dfs)
one_and_x

Unnamed: 0,email,uteid,first,last
0,aaron.alterman@utexas.edu,aja3295,Aaron,Alterman
1,eddie.flores183@gmail.com,ef8856,Eduardo,Flores
2,PHOEBEKHONG@YAHOO.COM,pgk286,Phoebe,Khong
3,RANE.PRAK@GMAIL.COM,rp33699,Rane,Prak


<div class="alert alert-block alert-success">
    
We have **four** participants from both the UT1000 and UTx000 study.
    
</div>

In [23]:
dfs = [eids_1,eids_2]
one_and_two = reduce(lambda left,right: pd.merge(left,right,on='uteid'), dfs)
one_and_two

Unnamed: 0,email_x,uteid,email_y
0,angelalaniz101@gmail.com,raa3436,angelalaniz101@gmail.com
1,alexalem3010@yahoo.com,aa33982,alexalem3010@yahoo.com
2,joea934@gmail.com,jaa4948,joea934@gmail.com
3,MIZZU9797@utexas.edu,mc63696,MIZZU9797@utexas.edu
4,jtdahill@utexas.edu,jtd2437,jtdahill@utexas.edu
5,ggarceau@yahoo.com,gmg2555,ggarceau@yahoo.com
6,ANALI0614@GMAIL.COM,cag5634,ANALI0614@GMAIL.COM
7,mariam.imam@gmail.com,mi4944,mariam.imam@gmail.com
8,gabriellajarwin@gmail.com,grj365,gabriellajarwin@gmail.com
9,niharikajetty@yahoo.com,nj4966,niharikajetty@yahoo.com


<div class="alert alert-block alert-success">
    
We have **27** people from both the UT1000 and UT2000 studies
    
</div>

In [24]:
dfs = [eids_2,eids_x]
two_and_x = reduce(lambda left,right: pd.merge(left,right,on='uteid'), dfs)
two_and_x.head()

Unnamed: 0,email,uteid,first,last


<div class="alert alert-block alert-danger">
    
We have **NO** overlap between the UT2000 and UTx000 studies which also means we have no participant who participated in all three studies. 
    
</div>

<a id='aggregate'></a>

# Comparing Aggregate Measures
We start by comparing aggregate measurements of activity between studies.

In [30]:
u1_raw = pd.read_csv("../data/raw/ut1000/fitbit/dailyActivity_merged.csv")
u2_raw = pd.read_csv("../data/raw/ut2000/fitbit/dailyActivity_merged.csv")
u2_raw.head()

Unnamed: 0,Id,ActivityDate,TotalSteps,TotalDistance,TrackerDistance,LoggedActivitiesDistance,VeryActiveDistance,ModeratelyActiveDistance,LightActiveDistance,SedentaryActiveDistance,VeryActiveMinutes,FairlyActiveMinutes,LightlyActiveMinutes,SedentaryMinutes,Calories,Floors,CaloriesBMR,MarginalCalories,RestingHeartRate
0,1025,10/29/2018,3989,2.95,2.95,0.0,0.4,0.42,2.11,0.0,5,6,100,1329,2028,1,1653,246,84.0
1,1025,10/30/2018,7633,5.64,5.64,0.0,0.45,0.71,4.43,0.04,6,12,176,900,2416,7,1653,481,81.0
2,1025,10/31/2018,5497,4.06,4.06,0.0,0.53,0.09,3.43,0.0,7,3,184,671,2436,11,1653,464,79.0
3,1025,11/1/2018,8534,6.31,6.31,0.0,0.9,0.47,4.89,0.05,12,9,215,878,2560,18,1653,561,78.0
4,1025,11/2/2018,6512,4.81,4.81,0.0,0.0,0.0,4.8,0.0,0,0,211,637,2346,15,1653,434,80.0


In [29]:
ux_raw = pd.read_csv("../data/processed/fitbit-daily-ux_s20.csv")
ux_raw.head()

Unnamed: 0,timestamp,calories,bmr,steps,distance,sedentary_minutes,lightly_active_minutes,fairly_active_minutes,very_active_minutes,calories_from_activities,bmi,fat,weight,food_calories_logged,water_logged,beiwe
0,2020-05-13,2781.0,1876.0,9207,4.396294,1241,70,118,11,1097.0,23.754,0.0,180.0,0.0,0.0,hfttkth7
1,2020-05-14,3727.0,1876.0,15207,7.261114,614,263,134,23,2234.0,23.754,0.0,180.0,0.0,0.0,hfttkth7
2,2020-05-15,3909.0,1876.0,14556,8.028501,577,205,57,108,2381.0,23.754,0.0,180.0,0.0,0.0,hfttkth7
3,2020-05-16,3927.0,1876.0,18453,8.74867,760,176,24,151,2364.0,23.754,0.0,180.0,0.0,0.0,hfttkth7
4,2020-05-17,4180.0,1876.0,15425,7.973149,605,207,50,131,2652.0,23.754,0.0,180.0,0.0,0.0,hfttkth7


# Comparing UT1000 to UTx000
We now look at the four participants who were in both the UT1000 and UTx000 studies.