## Load Libraries

In [1]:
import pandas as pd
import numpy as np
from pandas import Timestamp
import os
from datetime import datetime, timedelta

## Download raw data

* Log in with own credentials to **Garmin Connect** webapp. 
* There is an option to download day-by-day data, within a 7-days time interval. In our case, 6 files have been downloaded, 1 containing the week before starting the trip, 4 concerning the days during the trip and another 1 for the week after the trip.
* [This](https://connect.garmin.com/modern/report/60/wellness/last_seven_days) is the link to the download source for **resting heartbeats** and [this](https://connect.garmin.com/modern/report/26/wellness/last_seven_days) for the **sleep** data.

## Load data files for both sleep and hrh


In [25]:
# Read the directories with the data and save file_names in two list
path_to_sleep = 'new_data/Garmin_Sleep/'
path_to_rhr = 'new_data/Garmin_HeartRate/'

csv_files_sleep = [single_csv for single_csv in os.listdir(path_to_sleep) if single_csv.endswith('.csv')]
csv_files_rhr = [single_csv for single_csv in os.listdir(path_to_rhr) if single_csv.endswith('.csv')]

In [26]:
# Check if filenames are parsed correctly
print csv_files_sleep
print csv_files_rhr

['1_SLEEP_DURATION_3006_0607.csv', '2_SLEEP_DURATION_0707_1307.csv', '3_SLEEP_DURATION_1407_2007.csv', '4_SLEEP_DURATION_2107_2707.csv', '5_SLEEP_DURATION_2807_0308.csv', '6_SLEEP_DURATION_0408_1008.csv']
['1_RESTING_HEART_RATE_3006_0607.csv', '2_RESTING_HEART_RATE_0707_1307.csv', '3_RESTING_HEART_RATE_1407_2007.csv', '4_RESTING_HEART_RATE_2107_2707.csv', '5_RESTING_HEART_RATE_2807_0308.csv', '6_RESTING_HEART_RATE_0408_1008.csv']


# Build the sleep dataframe

In [9]:
# Sleep df
df_sleep = pd.DataFrame()
for file_name in csv_files_sleep:
    df_tmp = pd.read_csv(path_to_sleep+file_name)
    df_sleep = pd.concat([df_sleep, df_tmp])

In [10]:
df_sleep

Unnamed: 0,Sleep Time,Hrs,Hrs.1
0,Sat,6.0,6:02 hrs
1,Sun,5.3,5:15 hrs
2,Mon,6.8,6:45 hrs
3,Tue,9.8,9:47 hrs
4,Thu,10.6,10:35 hrs
0,Sat,5.2,5:13 hrs
1,Sun,7.6,7:38 hrs
2,Mon,3.7,3:39 hrs
3,Tue,5.9,5:52 hrs
4,Thu,9.1,9:07 hrs


As far as we see, the sleeping data input of Garmin is missing some days, and it isn't that reliable with a first look at the sleeping hours. We should probably try to download data from the **MiFit** I was also wearing during the trip. The problem is that **Xiaomi** doesn't support a web app where you can download data, so the only option is to manually create the datafiles in a similar format than the one provided by **Garmin** and then parse it the same way.

I modified the same files in order to avoid duplicate useless files so I will source again the same files with different inputs this time.

In [30]:
# Sleep df (Manually Modified Version)
df_sleep = pd.DataFrame()
for file_name in csv_files_sleep:
    df_tmp = pd.read_csv(path_to_sleep+file_name)
    df_sleep = pd.concat([df_sleep, df_tmp])

In [31]:
df_sleep

Unnamed: 0,day,sleep_duration,deep,light,awake
0,Jun 30,8:51 hrs,3:35 hrs,5:16 hrs,0:00 hrs
1,Jul 1,8:43 hrs,4:08 hrs,4:35 hrs,0:00 hrs
2,Jul 2,5:52 hrs,2:40 hrs,3:12 hrs,0:00 hrs
3,Jul 3,5:52 hrs,2:09 hrs,3:43 hrs,0:00 hrs
4,Jul 4,10:25 hrs,4:11 hrs,6:14 hrs,0:00 hrs
5,Jul 5,1:01 hrs,0:00 hrs,1:01 hrs,0:00 hrs
6,Jul 6,10:31 hrs,3:46 hrs,6:45 hrs,0:00 hrs
0,Jul 7,6:47 hrs,1:29 hrs,5:18 hrs,0:00 hrs
1,Jul 8,8:20 hrs,2:21 hrs,5:59 hrs,0:00 hrs
2,Jul 9,7:07 hrs,2:12 hrs,4:55 hrs,0:00 hrs


## Useful functions

In [3]:
# Modify the date to look like the rest
def dayTransformer(s):
    month = s.split(' ')[0]
    day = s.split(' ')[1]
    year = '2017'
    
    if month=='Jan':
        month='01'
    elif month=='Feb':
        month='02'
    elif month=='Mar':
        month='03'
    elif month=='Apr':
        month='04'
    elif month=='May':
        month='05'
    elif month=='Jun':
        month='06'
    elif month=='Jul':
        month='07'
    elif month=='Aug':
        month='08'
    elif month=='Sep':
        month='09'
    elif month=='Oct':
        month='10'
    elif month=='Nov':
        month='11'
    elif month=='Dec':
        month='12'
    
    if len(day)<2:
        day = '0'+day
        
    return year+'-'+month+'-'+day

In [4]:
# Remove 'hrs' from sleep_duration
def removeItem(s):
    hrs = s.split(' ')[0]
    hr = hrs.split(':')[0]
    mm = hrs.split(':')[1]
    if len(hr)<2:
        hr = '0'+hr
    return hr+':'+mm

In [5]:
# Transform hh:mm to minutes
def hoursToMins(s):
    hr = int(s.split(':')[0])
    mm = int(s.split(':')[1])
    
    return str(hr*60 + mm)

### Modify sleep data

In [6]:
# Drop first line
df = df.iloc[1:].reset_index()

# Rename the column
df.rename(columns={'level_1': 'sleep_min','level_0': 'day','Sleep Time': 'sleep_duration' }, inplace=True)

# Transform day to (YYYY-MM-DD) format
df['day'] = df['day'].apply(lambda x: dayTransformer(x))

# Remove hrs string and add a zero digit on hours
df['sleep_duration']=df['sleep_duration'].apply(lambda x: removeItem(x))

# Create a minutes column that calculates the total number of minutes
df['sleep_min']=df['sleep_duration'].apply(lambda x: hoursToMins(x))

In [7]:
df

Unnamed: 0,day,sleep_min,sleep_duration
0,2017-04-22,488,08:08
1,2017-04-23,684,11:24
2,2017-04-24,509,08:29
3,2017-04-25,517,08:37
4,2017-04-26,497,08:17
5,2017-04-27,455,07:35
6,2017-04-28,645,10:45
7,2017-04-29,672,11:12
8,2017-04-30,683,11:23
9,2017-05-01,627,10:27


# Build the RHR(resting HR) dataframe

In [32]:
# Resting HR df
df_rhr = pd.DataFrame()
for file_name in csv_files_rhr:
    df_tmp = pd.read_csv(path_to_rhr+file_name)
    df_rhr = pd.concat([df_rhr, df_tmp])

In [33]:
df_rhr

Unnamed: 0,day,bpm
0,Jun 30,72
1,Jul 1,79
2,Jul 2,72
3,Jul 3,66
4,Jul 4,73
5,Jul 5,71
6,Jul 6,70
0,Jul 7,69
1,Jul 8,65
2,Jul 9,68


### Modify HR data

In [9]:
# Drop first line
df_2 = df_2.iloc[1:].reset_index()

# Rename the columns
df_2.rename(columns={'index': 'day','Resting Heart Rate': 'rest_HR',}, inplace=True)

# Transform day to (YYYY-MM-DD) format
df_2['day'] = df_2['day'].apply(lambda x: dayTransformer(x))

In [10]:
df_2

Unnamed: 0,day,rest_HR
0,2017-04-30,71
1,2017-05-07,68
2,2017-05-14,73


# Discussion

The idea from now on is to:
* Rename each column (except from day) to blabla_angelos
* Merge HR with sleep dataframes
* Then merge Andreas and Angelos dataframe to a single CSV with all data included