In this exercise, we'll use data from my first month using my fitbit to determine if time spent doing moderate and intense cardio had an impact on my quality of sleep the following night.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from scipy.stats.stats import pearsonr

# import the cardio activity data, which is separated by month
activeSepData = pd.read_csv('/Users/madelineyoung/Desktop/FitBitData/Active Zone Minutes - 2023-09-12.csv')
activeOctData = pd.read_csv('/Users/madelineyoung/Desktop/FitBitData/Active Zone Minutes - 2023-10-01.csv')

# make sure that these both have the same columns
activeSepData.head()
activeOctData.head()

# concatenate the dataframes
activeData = pd.concat([activeSepData, activeOctData], ignore_index=True, sort=False)
activeData.head()
activeData.tail()

In [None]:
# import the sleep quality data
sleepData = pd.read_csv('/Users/madelineyoung/Desktop/FitBitData/sleepScore_Sep12-Oct11.csv')

sleepData.head()

We notice that for the sleep dataset, there is one timestamp per day. We are just interested in extracting the day from this. Lets see if the dates for September are formatted "09" or "9", so we can determine if we'll be able to use string indexing to extract the date.

In [None]:
sleepData.tail()
sleepData.info()

Because single digit months are formatted as 2 digits, it looks like we can use string indexing to extract the dates.

In [None]:
timestamps = sleepData['timestamp']
dates = []

for t in timestamps:
    year_month_day = t[0:10]
    dates.append(year_month_day)

sleepData['timestamp'] = dates

sleepData.head()

For the activity data sets, the timestamps correspond to each minute with cardio activity. Again, we are just interested in extracting the date from this. 

In [None]:
activeData.rename(columns={'date_time': 'timestamp'}, inplace=True)

timestamps2 = activeData['timestamp']
dates = []

for t in timestamps2:
    year_month_day = t[0:10]
    dates.append(year_month_day)

activeData['timestamp'] = dates

activeData.head()

In [None]:
print(activeData['heart_zone_id'].unique())

Now that we have the timestamps formatted the same way in each dataset, let's sum the minutes spent in each zone per day. Above, we obtained the unique zone names to make sure we don't miss any categories.

In [None]:
unique_dates = activeData['timestamp'].unique()

activeDataSums = {'timestamp': [], 'fat_burn': [], 'cardio': [], 'peak': []}

for u in unique_dates:
    # sum the FAT_BURN minutes first 
    fat_burn_bool = (activeData['heart_zone_id'] == 'FAT_BURN') & (activeData['timestamp'] == u)
    fat_burn = 0 + sum(activeData[fat_burn_bool]['total_minutes'])
    cardio_bool = (activeData['heart_zone_id'] == 'CARDIO') & (activeData['timestamp'] == u)
    cardio = 0 + sum(activeData[cardio_bool]['total_minutes'])
    peak_bool = (activeData['heart_zone_id'] == 'PEAK') & (activeData['timestamp'] == u)
    peak = 0 + sum(activeData[peak_bool]['total_minutes'])
    activeDataSums['timestamp'].append(u)
    activeDataSums['fat_burn'].append(fat_burn)
    activeDataSums['cardio'].append(cardio)
    activeDataSums['peak'].append(peak)
    
activeDataSums = pd.DataFrame(data=activeDataSums)
activeDataSums.tail(10)

# check that all data is accounted for
sum1 = sum(activeData['total_minutes'])
sum2 = sum(activeDataSums['fat_burn']) + sum(activeDataSums['cardio']) + sum(activeDataSums['peak'])
print(sum1)
print(sum2)

Below, we account for the fact that we want to compare each morning's sleep report against the exercise data from the previos day. We could either add a day to the exercise dates, or subtract a date from the sleep dates. We do the former:

In [None]:
# adjust the dates so that the exercise data is one day ahead. This way when we call the 
# merge function, the previous day of exercise will correspond to each night of sleep
activeDataSums['timestamp'] = pd.to_datetime(activeDataSums['timestamp'], format='%Y-%m-%d')
activeDataSums['timestamp'] = activeDataSums['timestamp'] + dt.timedelta(days=1)

# convert the dates in sleepData to datetimes as well
sleepData['timestamp'] = pd.to_datetime(sleepData['timestamp'], format='%Y-%m-%d')

# concatenate the datasets
combined = pd.merge(left=sleepData, right=activeDataSums, on='timestamp')
combined['total heart zones'] = combined['fat_burn'] + combined['cardio'] + combined['peak']
combined['upper heart zones'] = combined['cardio'] + combined['peak']
combined.tail()

Now that the datasets are combined, let's do a Pearson R calculation for total minutes in cardio zones and minutes in upper cardio zones with each FitBit sleep evaluation category.

In [None]:
print('total heart rate zones and overall score:', pearsonr(combined['total heart zones'], combined['overall_score']))
print('upper heart rate zones and overall score:', pearsonr(combined["upper heart zones"], combined['overall_score']))
print('total heart rate zones and composition score:', pearsonr(combined['total heart zones'], combined['composition_score']))
print('upper heart rate zones and composition score:', pearsonr(combined["upper heart zones"], combined['composition_score']))
print('total heart rate zones and revitalization score:', pearsonr(combined['total heart zones'], combined['revitalization_score']))
print('upper heart rate zones and revitalization score:', pearsonr(combined["upper heart zones"], combined['revitalization_score']))
print('total heart rate zones and duration score:', pearsonr(combined['total heart zones'], combined['duration_score']))
print('upper heart rate zones and duration score:', pearsonr(combined["upper heart zones"], combined['duration_score']))
print('total heart rate zones and minutes in deep sleep:', pearsonr(combined['total heart zones'], combined['deep_sleep_in_minutes']))
print('upper heart rate zones and minutes in deep sleep:', pearsonr(combined["upper heart zones"], combined['deep_sleep_in_minutes']))
print('total heart rate zones and resting heart rate:', pearsonr(combined['total heart zones'], combined['resting_heart_rate']))
print('upper heart rate zones and resting heart rate:', pearsonr(combined["upper heart zones"], combined['resting_heart_rate']))
print('total heart rate zones and restlessness:', pearsonr(combined['total heart zones'], combined['restlessness']))
print('total heart rate zones and restlessness:', pearsonr(combined["upper heart zones"], combined['restlessness']))

There was a negative correlation between my sleep composition score and time spent in upper heart zone ranges, and a negative correlation between time spent in all elevated heart zones and my revitalization score. There was also a positive correlation between time spent in all elevated heart zones and my restlessness.

Let's plot these datasets to visualize these findings.

In [None]:
plt.figure(figsize=(12, 26))
plt.subplot(4,2,1)
plt.subplot(4,2,2)
plt.subplot(4,2,3)

plt.subplot(7,2,1)
plt.scatter(combined['total heart zones'], combined['revitalization_score'])
plt.xlabel('Minutes in all zones')
plt.ylabel('revitalization_score')

plt.subplot(7,2,2)
plt.scatter(combined['total heart zones'], combined['restlessness'])
plt.xlabel('Minutes in all zones')
plt.ylabel('restlessness')

plt.subplot(7,2,3)
plt.scatter(combined["upper heart zones"], combined['composition_score'])
plt.xlabel('Minutes in upper zones')
plt.ylabel('composition_score')

It's possible that there are some outlying data points skewing the outcome of these tests, especially for the revitalization score v.s. minutes in all elevated zones and composition score v.s. minutes in upper zones. I'll need to test these again when I have more data and identify any inflection points, where trends reverse with a certain amount of total minutes in elevated heart rate zones.