# Importing Data from Fitbit 
Fitbit offers an export option for all of a user's data. 
This data is exported in multiple folders, each containing multiple JSON and .csv files. 
For this project, I am interested in the following data: 
- bpm 
- sleep type (heavy, light, REM) and length of each sleep type
- sleep score 
- step data
- active minutes (light, moderate, and very active minutes) 
- resting heartrate
- outside temperature at bedtime (10pm local) 

Each section has a 2 row sample of the dataframe at the end. 

The data is all time based. The goal is to reduce all of the data to get a summary for each day. 
For example, the bpm data is recorded every 5 seconds, and I will reduce it to a daily summary. 

NOTE: All date / time columns will be converted to datetime datatypes if needed. 

All of the data is then pickled (as loading from the pickle will be much faster than re-reading the JSON). 
Resulting files used in the analysis will be: 
- bpm.pkl 
- sleep_detail.pkl
- sleep_score.pkl
- sedentary_minutes.pkl
- lightly_active.pkl
- moderately_active.pkl
- very_active.pkl


** When you're all done, make sure each data frame has the expect # rows and the expected datatype
Make sure each pickled file exists, and also that its being sourced NOT from the test dir 

In [236]:
import os
import pandas as pd
import numpy as np 
import datetime as dt 

### IMPORT FROM JSON FUNCTION
This function will help import and concatenate the many JSON files that compromise each data type 

In [297]:
def import_data_from_dir(file_prefix, directory):
    """Reads JSON file(s) in a folder and returns a single dataframe. 
    Takes strings of file_prefix and directory as input. 
    """
    dfs = []
    for file in os.listdir(directory):
        if file_prefix in file: 
            dfs.append(pd.read_json(f"{directory}/{file}"))
    return pd.concat(dfs)

To do: 
(1) 
keep tables separate until I get a daily summary for each activity 
Tables:
-bpm
-sleep levels (done)
-sleep score (done)
-steps
-sedentary minutes (done)
-lightly active minutes (done)
-moderately active minutes (done)
-very active minutes (done)
-resting heartrate (done)
-temperature 
-etc.?
merge all the tables together using date as the primary key 
daily sleep will be the response variabel 
everything else will be explanatory variables 



### BPM
Import the bpm data from JSON. 
The JSON data contains a date field and a'value' field. 
The 'value' field contains a dictionary with 'bpm' and 'confidence'. 
The data is imported, the nested 'value' column is unnested. 
The index is also reset, as the index values are not unique.

The data is taken every 5 seconds. The data will be reduced to get daily values for max_bpm and average_bpm. 

In [None]:
bpm_nested = import_data_from_dir('heart_rate-', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/heart_rate')
bpm_nested.to_pickle("bpm_nested.pkl")

In [121]:
# Index needs to be reset as there are repeated values
bpm_nested.reset_index(inplace = True)

In [153]:
bpm_explode = pd.json_normalize(bpm_nested['value'])

In [157]:
# Here, two columns are brought together: the dateTime column from bpm_nested, 
# and the two exploded columns (bpm, confidence) that makeup bpm_explode 
bpm = pd.concat([bpm_nested['dateTime'], bpm_explode], axis = 1)

In [221]:
bpm.head(2)

Unnamed: 0,dateTime,bpm,confidence
0,2022-03-10 08:00:08,54,3
1,2022-03-10 08:00:13,53,2


In [159]:
bpm.to_pickle("bpm.pkl")

### SLEEP DETAIL
Import the detailed sleep data from JSON.  
The raw JSON has many columns, but the 'levels' columnn is perhaps the most interesting.  
The 'levels' column contains a dictionary of data about the amount of time spent in each sleep type.  
Sleep types include deep, wake, light, and REM. 

The 'levels' column will be unnested and added back to the dataframe.  
It is important to note that there may be multiple sleep entries for a given day (for example: if there was a long waking period in the middle of sleeping). 

In [None]:
sleep_nested = import_data_from_dir('sleep-', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Sleep')
sleep_nested.reset_index(inplace=True, drop=True)

In [286]:
levels = pd.json_normalize(df_sleep_nested['levels'])

In [287]:
# Here, desired columns from the sleep_nested dataframe and the levels dataframe are combined  
sleep_detail = pd.concat([sleep_nested.loc[:,['dateOfSleep','minutesAsleep','mainSleep']], levels.loc[:,'summary.deep.count':'summary.rem.thirtyDayAvgMinutes']], axis = 1)
sleep_detail['dateOfSleep'] = pd.to_datetime(sleep_detail['dateOfSleep'])
sleep_detail.to_pickle("sleep_detail.pkl")

In [288]:
sleep_detail.head(2)

Unnamed: 0,dateOfSleep,minutesAsleep,mainSleep,summary.deep.count,summary.deep.minutes,summary.deep.thirtyDayAvgMinutes,summary.wake.count,summary.wake.minutes,summary.wake.thirtyDayAvgMinutes,summary.light.count,summary.light.minutes,summary.light.thirtyDayAvgMinutes,summary.rem.count,summary.rem.minutes,summary.rem.thirtyDayAvgMinutes
0,2019-05-09,408,True,7.0,77.0,90.0,31.0,74.0,58.0,34.0,288.0,227.0,7.0,57.0,91.0
1,2019-05-08,423,True,5.0,119.0,89.0,30.0,69.0,58.0,29.0,215.0,227.0,8.0,128.0,89.0


### SLEEP SCORE
Sleep score is in a .csv with seemingly 1 row per day (need to check this). 

To understand more about the sleep score: https://help.fitbit.com/articles/en_US/Help_article/2439.htm

In [276]:
sleep_score = pd.read_csv('/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Sleep/sleep_score.csv')
# Here, the timestamp will be normalized (time set to midnight), time is meaningless here and this will simplify things later
sleep_score['timestamp'] = pd.to_datetime(sleep_score['timestamp']).dt.normalize()
sleep_score.to_pickle("sleep_score.pkl")

In [278]:
sleep_score.head(2)

Unnamed: 0,sleep_log_entry_id,timestamp,overall_score,composition_score,revitalization_score,duration_score,deep_sleep_in_minutes,resting_heart_rate,restlessness
0,39497287745,2022-12-19 00:00:00+00:00,86,22,22,42,102,53,0.080976
1,39483974879,2022-12-18 00:00:00+00:00,84,21,21,42,105,53,0.080969


### STEPS
Import steps data from JSON.  
Steps data is recorded every few minutes. The number of steps for that period of time is recorded.   
The data will be reduced to the total number of steps for each day.   

In [None]:
step_detail = import_data_from_dir('steps-', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')

In [303]:
step_detail.head(2)

Unnamed: 0,dateTime,value
0,2019-05-09 07:00:00,0
1,2019-05-09 07:01:00,0


In [290]:
step_detail.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1968017 entries, 0 to 15363
Data columns (total 2 columns):
 #   Column    Dtype         
---  ------    -----         
 0   dateTime  datetime64[ns]
 1   value     int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 45.0 MB


In [117]:
step_detail.loc[step_detail['value'] != 0]

Unnamed: 0,dateTime,value
2,2017-09-16 11:24:00,16
15,2017-09-16 11:37:00,41
21,2017-09-16 11:43:00,8
22,2017-09-16 11:44:00,2
28,2017-09-16 11:50:00,6
...,...,...
39133,2017-11-15 03:08:00,31
39140,2017-11-15 03:15:00,4
39145,2017-11-15 03:20:00,7
39159,2017-11-15 03:34:00,14


In [319]:
test = step_detail.groupby([step_detail['dateTime'].dt.date]).sum()

In [322]:
test.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1893 entries, 2017-07-19 to 2022-12-22
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   value   1893 non-null   int64
dtypes: int64(1)
memory usage: 29.6+ KB


In [321]:
test['dateTime']

KeyError: 'dateTime'

### ACTIVITY MINUTES

In [298]:
# Importing activity minutes 

sedentary_minutes = import_data_from_dir('sedentary_minutes', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')
lightly_active = import_data_from_dir('lightly_active', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')
moderately_active = import_data_from_dir('moderately_active', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')
very_active = import_data_from_dir('very_active_minutes', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')

# Sort ascending by date
sedentary_minutes.sort_values(by='dateTime', inplace=True)
lightly_active.sort_values(by='dateTime', inplace=True)
moderately_active.sort_values(by='dateTime', inplace=True)
very_active.sort_values(by='dateTime', inplace=True)

# Pickle the data for future use 
sedentary_minutes.to_pickle("sedentary_minutes.pkl")
lightly_active.to_pickle("lightly_active.pkl")
moderately_active.to_pickle("moderately_active.pkl")
very_active.to_pickle("very_active.pkl")

In [299]:
very_active.head(2)

Unnamed: 0,dateTime,value
0,2017-07-18,0
1,2017-07-19,0


### RESTING HEARTRATE

In [300]:
# Importing resting heartrate data 
resting_heartrate_nested = import_data_from_dir('resting_heart_rate', '/Users/jackiekinsler/projects/sleep_analysis_py/physical_data/Physical_Activity')
# The data in this dataframe is nested, and only the last column ('value') has the needed data 
resting_heartrate = pd.json_normalize(resting_heartrate_nested['value'])
# 'date' comes in as a string, convert it to datetime 
resting_heartrate['date'] = pd.to_datetime(resting_heartrate['date'])
resting_heartrate.sort_values(by='date', inplace=True)
resting_heartrate.to_pickle("very_active.pkl")

In [301]:
resting_heartrate.head(2)

Unnamed: 0,date,value,error
1097,2017-07-20,56.0,100.0
1098,2017-07-21,55.524886,55.113122
