# Heart Rate Estimation using PPG (Random Forests classifier)

The data has to be downloaded from [here](https://archive.ics.uci.edu/ml/datasets/PPG-DaLiA), unzipped into the current folder. Different folders for each individual has to be in the same directory as this notebook.

Follow the below code to load the data of each individual and process the PPG signal for all the individuals.

In [1]:
import pickle
import numpy as np
from math import floor
from tqdm import tqdm
lengths=[]
activities = []
ppg = []
lines = []
X = []
for i in tqdm(range(1,16)):
    st ='S'+str(i)+'/'+'S'+str(i)+'.pkl'
    with open(st, 'rb') as f:
        temp = pickle.load(f, encoding='latin1')
    ppg = temp['signal']['wrist']['BVP'].tolist()
    qppg =[]
    lengths.append(int(len(ppg)/128)-3)
    ac = temp['activity']
    tempac =[]
    for l in range(int(floor(ac.shape[0]/8))-3):
        activities.append(max(ac[l*8:(l+4)*8]))
    #print(len(activities),len(temp['label']))
    for p in range(int(len(ppg)/128)-3):
        t = np.zeros(512)
        k = 0
        for j in range(p*128,(4+p)*128):
            t[k] = ppg[j][0]
            k = k+1
        qppg.append(t)
    X.extend(qppg)
    lines.extend(temp['label'])
    #activities.extend(temp['activity'].tolist())
    #print(i)


100%|██████████| 15/15 [01:33<00:00,  6.25s/it]


In [2]:
import numpy as np
X=np.array(X)

In [3]:
Y = lines
Y = np.array(Y)

In [4]:
from math import floor, ceil
y_class = []
sort_dict = {}
for rate in Y:
   fl = floor(rate/5)
   ce = ceil(rate/5)
   st = str(fl*5) + "-" + str(ce*5)
   y_class.append(st)
   sort_dict[st] = fl-8
 
y_class = np.array(y_class)

In [5]:
np.unique(y_class)

array(['100-105', '105-110', '110-115', '115-120', '120-125', '125-130',
       '130-135', '135-140', '140-145', '145-150', '150-155', '155-160',
       '160-165', '165-170', '170-175', '175-180', '180-185', '185-190',
       '40-45', '45-50', '50-55', '55-60', '60-65', '65-70', '70-75',
       '75-80', '80-85', '85-90', '90-95', '95-100'], dtype='<U7')

In [6]:
y_one_hot = []
for rate in y_class:
    temp = np.zeros(30)
    temp[sort_dict[rate]] = 1
    y_one_hot.append(temp)

In [7]:
y_one_hot = np.array(y_one_hot)

X and Y has the input and output data for training the model. The total data accounted for 64697 samples and each input sample is of size 512.

In [8]:
print(X.shape,y_one_hot.shape)

(64697, 512) (64697, 30)


The input and output data is spiltted into train and test set using sklearn's train-test split

In [9]:
import scipy
import numpy as np
from sklearn import model_selection
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_one_hot, test_size=0.10, random_state=42)

We use RandomForestRegressor from sklearn's ensemble learning to train the model. We have used 300 trees to train the model.

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=300, random_state=0)
classifier.fit(X_train,y_train)

Using joblib dump the model, so that it can be loaded after and used for evaluation.

In [11]:
import joblib
joblib.dump(classifier,"random_forests_classifer_model")

Load the model using joblib.

In [13]:
classifier = joblib.load('train-classifier-300')

Get all the accuracy metrics using the code below.

In [17]:
y_pred = classifier.predict(X_test)

In [18]:
counter = 0
for pred, actual in zip(y_pred,y_test):
    if np.argmax(pred)==np.argmax(actual):
        counter+=1
    
print(counter,counter/y_pred.shape[0])

5628 0.8698608964451314


In [19]:
y_pred = classifier.predict(X_train)

In [22]:
counter = 0
for pred, actual in zip(y_pred,y_train):
    if np.argmax(pred)==np.argmax(actual):
        counter+=1
    
print(counter,counter/y_pred.shape[0])

52404 0.8999948477510433
