# Final Year Project - Intelligent Health Monitoring System
#### by: Koo Chia Meng, A0165275Y

### FEATURES EXTRACTION (PERSON 2)

This Notebook is the continuation of the Data Aquisition and Cleaning portion for heart rate, sleep and steps activity data fetched from the respective APIs. Further processing will be done on the processed DFs to derive more features for visualisation and then analysis.

Features like the HRV or the Heart Rate Variability will be derive here as well with the explainations below.

From previous stage, datas and data frames are all saved under an object in a OOP style and dumped into a pickle file.


In [1]:
#Import relevant Dependencies

import datetime
import time
import pandas as pd # Pandas Version 1.0.1
import numpy as np
import matplotlib.pyplot as plt
import pickle
import os

import tsfresh # Dependacy for Time Series Feature Extraction

In [2]:
%run FYP_SUBJECTS_CLASS.ipynb # line needed to reuse the SUBJECT class created for OOP

### FEATURES EXTRACTION

##### Features Derivable from the 5 sec interval IntraDay Heart Rates:

With the time series data, we can derive the following features:
- Estimated Heart Rate Variabilty (HRV)
    - Root Mean Square of Successive Differences (RMSSD)
    - Standard deviation of the R-R interval (SDRR)
    - Standard deviation of Averaged R-R interval per 5 min segment (SDARR)
    - Derivatives of R-R interval (1st & 2nd derivatives)

- P-P Interval per day
    - Max, Min and Mean
    - Difference in Max and Min
    - Standard deviation

- Heart Rate During Sleep (Joining sleep timestamp and IntradayHR timestamp, for visualising only. There is already Resting Heart Rate Feature for this.)

### Loading Subjects from PICKLE

In [5]:
current_directory = os.getcwd()
folder = "PERSON2" # CHANGE HERE
file = "PERSON2_DATA_31JUL.pickle" # CHANGE HERE
f_path = os.path.join(current_directory,folder,file)
                      
with open(f_path,'rb') as f:
    PERSON2 = pickle.load(file=f)

In [6]:
PERSON2.profile

{'age': 27, 'gender': 'MALE', 'height': 176.0, 'weight': 65.0}

### Loading Processed DF from PICKLE

Establish folder path:

In [7]:
current_directory = os.getcwd()
folder = os.path.join("PERSON2","31JUL") # CHANGE HERE
folder_path = os.path.join(current_directory,folder)


Load from Pickle file:

In [8]:
# LOAD FROM PICKLE:

unpickle = {}
unpickle_list = [
                # Daily Resting Heart Rate DF:
                'df_resting_hr',

                # Daily Heart Rate Zones (HR Activities) DF:
                'df_hr_calories',

                # Intra-day Heart Rate DF:
                'df_intraday_hr',

                # Sleep Durations DF
                'df_sleep_durations',

                # Detailed intra Sleep Stages DF
                'df_intra_sleep_stage',

                # Daily Step activities DF:
                'df_activity_steps']

for i in unpickle_list:
    file = os.path.join(folder_path, str(i) + '.pickle')
    with open(file,'rb') as f:
        unpickle[i] = pickle.load(file=f)

PERSON2.df_resting_hr = unpickle['df_resting_hr']
PERSON2.df_hr_calories = unpickle['df_hr_calories']
PERSON2.df_intraday_hr = unpickle['df_intraday_hr']
PERSON2.df_sleep_durations = unpickle['df_sleep_durations']
PERSON2.df_intra_sleep_stage = unpickle['df_intra_sleep_stage']
PERSON2.df_activity_steps = unpickle['df_activity_steps']


In [9]:
PERSON2.df_intraday_hr

Unnamed: 0_level_0,time,value,date,timestamp,d_timestamp
d_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-04-05 12:40:20,12:40:20,93,2020-04-05,-2208943180000000000,1586090420000000000
2020-04-05 12:40:25,12:40:25,70,2020-04-05,-2208943175000000000,1586090425000000000
2020-04-05 12:40:30,12:40:30,70,2020-04-05,-2208943170000000000,1586090430000000000
2020-04-05 12:40:45,12:40:45,70,2020-04-05,-2208943155000000000,1586090445000000000
2020-04-05 12:40:50,12:40:50,72,2020-04-05,-2208943150000000000,1586090450000000000
...,...,...,...,...,...
2020-07-31 23:58:26,23:58:26,52,2020-07-31,-2208902494000000000,1596239906000000000
2020-07-31 23:58:41,23:58:41,53,2020-07-31,-2208902479000000000,1596239921000000000
2020-07-31 23:58:46,23:58:46,54,2020-07-31,-2208902474000000000,1596239926000000000
2020-07-31 23:58:51,23:58:51,55,2020-07-31,-2208902469000000000,1596239931000000000


### Estimated Heart Rate Variabilty Parameters

#### Checking for days with less than 20 hours and 10 hours, then split the data into "AM" and "PM".

In [10]:
Subject = PERSON2 # Change here

# The following codes will be repeated for the other subjects.

In [11]:
# Checking for the days less than h amount of hours
def check_hours_less_than(x , h):
    if (x.index[-1] - x.index[0]).total_seconds() < datetime.timedelta(hours = h).total_seconds():
        return True
    else: return False

# Less than Full Day records:
df_less_than_20Hr = Subject.df_intraday_hr.groupby('date').apply(check_hours_less_than,20)
count_less_than_20 = len(df_less_than_20Hr[df_less_than_20Hr])

# Less than Half Day records:
df_less_than_10Hr = Subject.df_intraday_hr.groupby('date').apply(check_hours_less_than,10)
count_less_than_10 = len(df_less_than_10Hr[df_less_than_10Hr])

print('Total days recorded: ', len(Subject.df_intraday_hr.groupby('date')))
print('Number of days of records less than 20 Hours: ', count_less_than_20)
print('Number of Full day of records: ', len(Subject.df_intraday_hr.groupby('date'))-count_less_than_20)
print('Number of days of records less than 10 Hours: ', count_less_than_10)

Total days recorded:  101
Number of days of records less than 20 Hours:  32
Number of Full day of records:  69
Number of days of records less than 10 Hours:  11


In [12]:
# Sieving out Full Day Records:
df_HR_Full_Days = pd.concat(
    [Subject.df_intraday_hr.groupby('date').get_group(x) 
     for x in df_less_than_20Hr[~df_less_than_20Hr].index])

len(df_HR_Full_Days.groupby('date'))

69

In [13]:
# Sieving out Half Days Records:
df_HR_Half_Days = pd.concat(
    [Subject.df_intraday_hr.groupby('date').get_group(x) 
     for x in df_less_than_10Hr[~df_less_than_10Hr].index])
len(df_HR_Half_Days.groupby('date'))


90

In [14]:
# Splitting the records into Day and Night if the hours is less than 5 hours:

def split_half_day(x,AM_or_PM):    
    condition = x.time <=  '12:00:00'
    if AM_or_PM == 'AM':
        return x[condition]
    
    elif AM_or_PM == 'PM':
        return x[~condition]

# Records in the AM (time from 00:00:00H to 12:00:00H)
# and using the function "check_hours_less_than" defined before:

df_HR_Half_AM = df_HR_Half_Days.groupby('date').apply(
    lambda x: split_half_day(x,'AM')).reset_index(level=0,drop=True) 
    #reset multi-index to reuse the previous func

df_HR_less_than_5 = df_HR_Half_AM.groupby('date').apply(lambda x: check_hours_less_than(x,5))

df_HR_Half_AM = pd.concat(
    [df_HR_Half_AM.groupby('date').get_group(x) 
     for x in df_HR_less_than_5[~df_HR_less_than_5].index])

# Repeating for PM (12:00:01H to 23:59:59H):

df_HR_Half_PM = df_HR_Half_Days.groupby('date').apply(
    lambda x: split_half_day(x,'PM')).reset_index(level=0,drop=True) 
    #reset multi-index to reuse the previous func

df_HR_less_than_5 = df_HR_Half_PM.groupby('date').apply(lambda x: check_hours_less_than(x,5))

df_HR_Half_PM = pd.concat(
    [df_HR_Half_PM.groupby('date').get_group(x) 
     for x in df_HR_less_than_5[~df_HR_less_than_5].index])

print('Number of days that meets the half day (AM) criteria :',len(df_HR_Half_AM.date.unique()))
print('Dates for AM :\n',df_HR_Half_AM.date.unique())
print('\nNumber of days that meets the half day (PM) criteria :',len(df_HR_Half_PM.date.unique()))
print('Dates for PM :\n',df_HR_Half_PM.date.unique())

Number of days that meets the half day (AM) criteria : 75
Dates for AM :
 ['2020-04-08' '2020-04-09' '2020-04-12' '2020-04-16' '2020-04-19'
 '2020-04-21' '2020-04-23' '2020-04-24' '2020-04-25' '2020-04-26'
 '2020-05-18' '2020-05-19' '2020-05-23' '2020-05-24' '2020-05-25'
 '2020-05-26' '2020-05-27' '2020-05-28' '2020-05-29' '2020-05-30'
 '2020-05-31' '2020-06-01' '2020-06-02' '2020-06-03' '2020-06-04'
 '2020-06-05' '2020-06-06' '2020-06-07' '2020-06-08' '2020-06-09'
 '2020-06-10' '2020-06-11' '2020-06-12' '2020-06-13' '2020-06-14'
 '2020-06-15' '2020-06-16' '2020-06-17' '2020-06-18' '2020-06-19'
 '2020-06-24' '2020-06-25' '2020-06-26' '2020-06-27' '2020-06-28'
 '2020-06-29' '2020-06-30' '2020-07-01' '2020-07-02' '2020-07-03'
 '2020-07-05' '2020-07-06' '2020-07-07' '2020-07-08' '2020-07-09'
 '2020-07-10' '2020-07-11' '2020-07-13' '2020-07-14' '2020-07-15'
 '2020-07-16' '2020-07-17' '2020-07-18' '2020-07-19' '2020-07-21'
 '2020-07-22' '2020-07-23' '2020-07-24' '2020-07-25' '2020-07-26'
 '

In [15]:
# Assigning back to save the Full Day, AM, PM DFs under the subject

# Change the SUBJECT here:
PERSON2.df_HR_Full_Days = df_HR_Full_Days
PERSON2.df_HR_Half_AM = df_HR_Half_AM
PERSON2.df_HR_Half_PM = df_HR_Half_PM


#### Calculating the HRV statistics

In [21]:

def Derived_HRV(df):
    '''
    This function resamples and derives the PP intervals from the intraday hr data frame as input df.
    It returns the process df and the days with errors.
    '''
    resample_error = {}
    list_HRV_resampled_all = []

    df_HRV_grouped = df.groupby('date')
    
    day_list = df.date.unique().tolist()

    for day in day_list:
        try:
            df_process = df_HRV_grouped.get_group(day).resample('5S').ffill().bfill()
            # after forward fill the first record will still be missing, thus the backward fill.

            # convert to pp interval
            df_process['pp_interval'] = df_process.value.rolling(1).apply(lambda x : float(60000)/float(x))

            # getting the difference
            df_process['pp_diff'] = df_process.pp_interval.diff(periods=1).fillna(0).abs()

            # calculating the first derivative (division of pp_diff by 5 seconds as the interval is already fixed)
            df_process['pp_1st_derivative'] = df_process.pp_diff / float(5)

            # calculating the second derivative (diff of 1st derivative / 5 sec)
            df_process['pp_2nd_derivative'] = df_process.pp_1st_derivative.diff(periods=1).fillna(0).abs() / float(5)

            list_HRV_resampled_all.append(df_process)

        except Exception as e:
            resample_error[day] = str(e)

    # Concatinating all the Series and returning the error infomation as a tuple:
    return (pd.concat(list_HRV_resampled_all),resample_error)


In [22]:

Subject = PERSON2 # Change here


In [23]:
# NOTE: THIS STEP WILL TAKE A WHILE TO RUN.

# Calling the function to process the full-day and AM/PM dfs:
# Full Day:
df_derived_HRV_full_day,resample_error = Derived_HRV(Subject.df_HR_Full_Days)
df_derived_HRV_full_day

Unnamed: 0_level_0,time,value,date,timestamp,d_timestamp,pp_interval,pp_diff,pp_1st_derivative,pp_2nd_derivative
d_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-04-08 00:00:05,00:00:05,65.0,2020-04-08,-2.208989e+18,1.586304e+18,923.076923,0.000000,0.000000,0.000000
2020-04-08 00:00:10,00:00:05,65.0,2020-04-08,-2.208989e+18,1.586304e+18,923.076923,0.000000,0.000000,0.000000
2020-04-08 00:00:15,00:00:15,66.0,2020-04-08,-2.208989e+18,1.586304e+18,909.090909,13.986014,2.797203,0.559441
2020-04-08 00:00:20,00:00:20,72.0,2020-04-08,-2.208989e+18,1.586304e+18,833.333333,75.757576,15.151515,2.470862
2020-04-08 00:00:25,00:00:25,73.0,2020-04-08,-2.208989e+18,1.586304e+18,821.917808,11.415525,2.283105,2.573682
...,...,...,...,...,...,...,...,...,...
2020-07-31 23:58:35,23:58:26,52.0,2020-07-31,-2.208902e+18,1.596240e+18,1153.846154,0.000000,0.000000,1.709402
2020-07-31 23:58:40,23:58:26,52.0,2020-07-31,-2.208902e+18,1.596240e+18,1153.846154,0.000000,0.000000,0.000000
2020-07-31 23:58:45,23:58:41,53.0,2020-07-31,-2.208902e+18,1.596240e+18,1132.075472,21.770682,4.354136,0.870827
2020-07-31 23:58:50,23:58:46,54.0,2020-07-31,-2.208902e+18,1.596240e+18,1111.111111,20.964361,4.192872,0.032253


In [24]:
# NOTE: THIS STEP WILL TAKE A WHILE TO RUN.

# Half Day (AM):
df_derived_HRV_AM,resample_error_AM = Derived_HRV(Subject.df_HR_Half_AM)
df_derived_HRV_AM

Unnamed: 0_level_0,time,value,date,timestamp,d_timestamp,pp_interval,pp_diff,pp_1st_derivative,pp_2nd_derivative
d_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-04-08 00:00:05,00:00:05,65.0,2020-04-08,-2.208989e+18,1.586304e+18,923.076923,0.000000,0.000000,0.000000
2020-04-08 00:00:10,00:00:05,65.0,2020-04-08,-2.208989e+18,1.586304e+18,923.076923,0.000000,0.000000,0.000000
2020-04-08 00:00:15,00:00:15,66.0,2020-04-08,-2.208989e+18,1.586304e+18,909.090909,13.986014,2.797203,0.559441
2020-04-08 00:00:20,00:00:20,72.0,2020-04-08,-2.208989e+18,1.586304e+18,833.333333,75.757576,15.151515,2.470862
2020-04-08 00:00:25,00:00:25,73.0,2020-04-08,-2.208989e+18,1.586304e+18,821.917808,11.415525,2.283105,2.573682
...,...,...,...,...,...,...,...,...,...
2020-07-31 11:59:30,11:59:30,61.0,2020-07-31,-2.208946e+18,1.596197e+18,983.606557,16.393443,3.278689,0.000000
2020-07-31 11:59:35,11:59:35,62.0,2020-07-31,-2.208946e+18,1.596197e+18,967.741935,15.864622,3.172924,0.021153
2020-07-31 11:59:40,11:59:35,62.0,2020-07-31,-2.208946e+18,1.596197e+18,967.741935,0.000000,0.000000,0.634585
2020-07-31 11:59:45,11:59:35,62.0,2020-07-31,-2.208946e+18,1.596197e+18,967.741935,0.000000,0.000000,0.000000


In [25]:
# NOTE: THIS STEP WILL TAKE A WHILE TO RUN.

# Half Day (PM):
df_derived_HRV_PM,resample_error_PM = Derived_HRV(Subject.df_HR_Half_PM)
df_derived_HRV_PM

Unnamed: 0_level_0,time,value,date,timestamp,d_timestamp,pp_interval,pp_diff,pp_1st_derivative,pp_2nd_derivative
d_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-04-06 13:14:45,13:14:47,70.0,2020-04-06,-2.208941e+18,1.586179e+18,857.142857,0.000000,0.000000,0.000000
2020-04-06 13:14:50,13:14:47,70.0,2020-04-06,-2.208941e+18,1.586179e+18,857.142857,0.000000,0.000000,0.000000
2020-04-06 13:14:55,13:14:47,70.0,2020-04-06,-2.208941e+18,1.586179e+18,857.142857,0.000000,0.000000,0.000000
2020-04-06 13:15:00,13:14:57,80.0,2020-04-06,-2.208941e+18,1.586179e+18,750.000000,107.142857,21.428571,4.285714
2020-04-06 13:15:05,13:15:02,107.0,2020-04-06,-2.208941e+18,1.586179e+18,560.747664,189.252336,37.850467,3.284379
...,...,...,...,...,...,...,...,...,...
2020-07-31 23:58:35,23:58:26,52.0,2020-07-31,-2.208902e+18,1.596240e+18,1153.846154,0.000000,0.000000,1.709402
2020-07-31 23:58:40,23:58:26,52.0,2020-07-31,-2.208902e+18,1.596240e+18,1153.846154,0.000000,0.000000,0.000000
2020-07-31 23:58:45,23:58:41,53.0,2020-07-31,-2.208902e+18,1.596240e+18,1132.075472,21.770682,4.354136,0.870827
2020-07-31 23:58:50,23:58:46,54.0,2020-07-31,-2.208902e+18,1.596240e+18,1111.111111,20.964361,4.192872,0.032253


In [26]:
print("There were errors in resampling the following days in the Full Day DF :")
for x,y in resample_error.items():
    print(x,':',y)

print("\nThere were errors in resampling the following days in the AM DF :")
for x,y in resample_error_AM.items():
    print(x,':',y)
    
print("\nThere were errors in resampling the following days in the PM DF :")
for x,y in resample_error.items():
    print(x,':',y)

There were errors in resampling the following days in the Full Day DF :
2020-04-12 : cannot reindex a non-unique index with a method or limit

There were errors in resampling the following days in the AM DF :
2020-04-12 : cannot reindex a non-unique index with a method or limit
2020-04-16 : cannot reindex a non-unique index with a method or limit
2020-06-19 : cannot reindex a non-unique index with a method or limit

There were errors in resampling the following days in the PM DF :
2020-04-12 : cannot reindex a non-unique index with a method or limit


In [27]:
def Compiled_HRV_Stats(df_derived_HRV):
    '''
    This function compiles the stats like max,min,sd,mean and derivative of pp_intervals derived 
    from the previous HRV df, "df_derived_HRV"
    '''
    new_df = pd.DataFrame()
    
    # Max, Min, Mean, Range of PP intervals and Standard Deviation (SDRR)
    new_df['Max_PP_interval'] = df_derived_HRV.groupby('date').pp_interval.max()
    new_df['Min_PP_interval'] = df_derived_HRV.groupby('date').pp_interval.min()
    new_df['SD_PP_interval'] = df_derived_HRV.groupby('date').pp_interval.std() # SDRR
    new_df['Mean_PP_interval'] = df_derived_HRV.groupby('date').pp_interval.mean()
    new_df['PP_interval_range'] = new_df.Max_PP_interval - new_df.Min_PP_interval

    # Calculating SDANN
    # Resample to 5 minutes and calculate the Standard Deviation of the average of each group
    df_derived_HRV_5mins = df_derived_HRV.groupby('date').resample('5Min').mean()
    new_df['SD_PP_5min'] = df_derived_HRV_5mins.groupby('date').pp_interval.std()

    # Calculating RMS for PP Successive Interval difference
    df_derived_HRV['Squared_PP_diff'] = np.square(df_derived_HRV.pp_diff) # Square of diff
    new_df['RMS_PP_diff'] = df_derived_HRV.groupby('date').Squared_PP_diff.mean()
    new_df['RMS_PP_diff'] = np.sqrt(new_df.RMS_PP_diff)

    # Calculating the Derivatives of the PP Interval
    # the minimum of the derivaives will be excluded as it will be definately be 0.
    new_df['Max_PP_1st_dydt'] = df_derived_HRV.groupby('date').pp_1st_derivative.max()
    new_df['SD_PP_1st_dydt'] = df_derived_HRV.groupby('date').pp_1st_derivative.std()

    #Second Derivatives
    new_df['Max_PP_2nd_dydt'] = df_derived_HRV.groupby('date').pp_2nd_derivative.max()
    new_df['SD_PP_2nd_dydt'] = df_derived_HRV.groupby('date').pp_2nd_derivative.std()

    return new_df


In [28]:
# Calling the function for Full day, AM and PM DFs:

# Full Day:
PERSON2.df_hrv_stats_full_day = Compiled_HRV_Stats(df_derived_HRV_full_day)
PERSON2.df_hrv_stats_full_day


Unnamed: 0_level_0,Max_PP_interval,Min_PP_interval,SD_PP_interval,Mean_PP_interval,PP_interval_range,SD_PP_5min,RMS_PP_diff,Max_PP_1st_dydt,SD_PP_1st_dydt,Max_PP_2nd_dydt,SD_PP_2nd_dydt
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-04-08,1363.636364,357.142857,177.634318,926.929113,1006.493506,170.298670,18.543749,56.680162,3.226561,11.336032,0.641768
2020-04-19,1428.571429,759.493671,177.886429,976.378766,669.077758,172.862153,18.936038,46.451613,3.508955,9.290323,0.683171
2020-04-23,1333.333333,304.568528,134.947045,1021.203943,1028.764805,127.862328,18.644547,54.421769,3.378597,10.884354,0.651528
2020-04-24,1463.414634,292.682927,231.299563,666.218883,1170.731707,223.255921,18.355795,54.728370,3.132521,10.945674,0.648022
2020-04-25,1428.571429,377.358491,212.062484,984.796456,1051.212938,206.373552,22.400786,53.798257,3.770579,9.364063,0.763926
...,...,...,...,...,...,...,...,...,...,...,...
2020-07-27,1224.489796,405.405405,144.380578,875.621372,819.084391,135.990810,17.399029,47.965116,2.973053,9.285714,0.590291
2020-07-28,1304.347826,540.540541,146.731985,960.120183,763.807286,138.823238,18.900641,70.033670,3.312163,13.256617,0.667132
2020-07-29,1224.489796,468.750000,158.327652,839.728338,755.739796,152.504730,15.970956,53.864169,2.769679,10.772834,0.542198
2020-07-30,1395.348837,560.747664,147.259411,912.955853,834.601174,139.225385,16.730060,58.748404,2.845715,11.035714,0.567696


In [29]:
# AM:
PERSON2.df_hrv_stats_AM = Compiled_HRV_Stats(df_derived_HRV_AM)
PERSON2.df_hrv_stats_AM

Unnamed: 0_level_0,Max_PP_interval,Min_PP_interval,SD_PP_interval,Mean_PP_interval,PP_interval_range,SD_PP_5min,RMS_PP_diff,Max_PP_1st_dydt,SD_PP_1st_dydt,Max_PP_2nd_dydt,SD_PP_2nd_dydt
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-04-08,1363.636364,357.142857,213.602546,973.185308,1006.493506,206.490802,21.880158,56.680162,3.763035,11.336032,0.742534
2020-04-09,1304.347826,526.315789,166.781255,1026.636080,778.032037,159.029452,20.570084,45.495094,3.551960,8.688903,0.700701
2020-04-19,1428.571429,759.493671,91.009870,1156.684710,669.077758,67.012670,27.908136,46.451613,4.644222,9.290323,0.871292
2020-04-21,1304.347826,645.161290,95.411922,942.527916,659.186536,76.969733,18.751873,48.176648,3.165411,9.635330,0.631839
2020-04-23,1333.333333,645.161290,87.656605,1088.090290,688.172043,67.592088,25.998233,54.421769,4.237544,10.884354,0.795851
...,...,...,...,...,...,...,...,...,...,...,...
2020-07-27,1224.489796,530.973451,157.617607,923.443172,693.516345,149.336810,20.207451,47.965116,3.402547,9.285714,0.670726
2020-07-28,1304.347826,652.173913,141.477715,1032.926606,652.173913,132.436265,22.507679,70.033670,3.908144,13.256617,0.778669
2020-07-29,1224.489796,468.750000,185.695940,884.264459,755.739796,180.185359,18.825232,48.970074,3.117066,9.794015,0.594125
2020-07-30,1395.348837,560.747664,146.869321,988.962768,834.601174,138.910756,18.858339,58.748404,3.165575,11.035714,0.631958


In [30]:
# PM:
PERSON2.df_hrv_stats_PM = Compiled_HRV_Stats(df_derived_HRV_PM)
PERSON2.df_hrv_stats_PM

Unnamed: 0_level_0,Max_PP_interval,Min_PP_interval,SD_PP_interval,Mean_PP_interval,PP_interval_range,SD_PP_5min,RMS_PP_diff,Max_PP_1st_dydt,SD_PP_1st_dydt,Max_PP_2nd_dydt,SD_PP_2nd_dydt
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-04-06,1153.846154,500.000000,115.618000,826.576017,653.846154,104.676728,13.600202,55.583127,2.428827,11.116625,0.515469
2020-04-07,1153.846154,521.739130,143.628426,748.063179,632.107023,135.943674,12.224098,38.787879,2.188668,7.757576,0.420454
2020-04-08,1250.000000,540.540541,112.253831,879.338872,709.459459,103.412068,14.319113,35.181237,2.491991,5.630336,0.502077
2020-04-09,1153.846154,588.235294,100.697748,885.844379,565.610860,87.033906,15.479583,29.605263,2.525850,5.734767,0.480275
2020-04-19,1333.333333,821.917808,62.401763,833.259218,511.415525,61.410178,5.010009,34.371643,0.997195,6.874329,0.200551
...,...,...,...,...,...,...,...,...,...,...,...
2020-07-27,1153.846154,405.405405,110.715117,827.693047,748.440748,101.190834,14.034240,38.577456,2.412382,7.326007,0.482252
2020-07-28,1250.000000,540.540541,111.477194,887.210573,709.459459,102.419618,14.412261,49.944506,2.514519,9.988901,0.520359
2020-07-29,1224.489796,582.524272,108.055887,795.158097,641.965524,101.312918,12.477462,53.864169,2.254542,10.772834,0.462824
2020-07-30,1176.470588,582.524272,101.318844,827.903827,593.946316,88.612196,14.956717,36.231884,2.521192,7.246377,0.495963


### SAVING THE PROCESSED DF TO PICKLE

Establish folder path:

In [31]:
current_directory = os.getcwd()
folder = os.path.join("PERSON2","31JUL") # CHANGE HERE
folder_path = os.path.join(current_directory,folder)


Save to Pickle file:

In [32]:
# DUMP THE DERIVED HRV DFs INTO PICKLE:

pickle_list = {
                'df_hrv_stats_full_day':PERSON2.df_hrv_stats_full_day,
                'df_hrv_stats_AM':PERSON2.df_hrv_stats_AM,
                'df_hrv_stats_PM':PERSON2.df_hrv_stats_PM

                }

# Save to Pickle:
for i in pickle_list.keys():
    file = folder_path + '/' + str(i) + '.pickle'
    with open(file,'wb') as f:
        pickle.dump(obj=pickle_list[i],file=f)


In [33]:

file = 'C:/Users/jia92/Documents/Jupyter/FYP/IntelligentHealthMonitoring/PERSON2/31JUL/df_hrv_stats_full_day.pickle'
with open(file,'rb') as f:
    test = pickle.load(file=f)
    
print(test)

            Max_PP_interval  Min_PP_interval  SD_PP_interval  \
date                                                           
2020-04-08      1363.636364       357.142857      177.634318   
2020-04-19      1428.571429       759.493671      177.886429   
2020-04-23      1333.333333       304.568528      134.947045   
2020-04-24      1463.414634       292.682927      231.299563   
2020-04-25      1428.571429       377.358491      212.062484   
...                     ...              ...             ...   
2020-07-27      1224.489796       405.405405      144.380578   
2020-07-28      1304.347826       540.540541      146.731985   
2020-07-29      1224.489796       468.750000      158.327652   
2020-07-30      1395.348837       560.747664      147.259411   
2020-07-31      1395.348837       458.015267      167.621873   

            Mean_PP_interval  PP_interval_range  SD_PP_5min  RMS_PP_diff  \
date                                                                       
2020-04-08     

### Next Step: Visualisation of Derived Features

This will be done another Jupyter Notebook.