Skip to content

mullisd1/CV_Heartrate

Repository files navigation

Heart Rate Detection Using Remote Photoplethysmography

David Haas, Spencer Mullinix, Hogan Pope

Fall 2020 ECE 4554/5554 Computer Vision: Course Project

Virginia Tech


Abstract

Currently heart rate is an attribute that can be incredibly difficult to measure without being in close proximity to the patient. However, by using modern Computer Vision techniques, a close approximation to heart rate can be discovered with nothing more than a live video feed. This paper details a remote photoplethysmography implementation utilizing Fourier techniques and Independent Component Analysis to estimate a subject’s BVP signal, and furthermore their heart rate. We report moderate success, with an root mean square error of 6.76. Further work should explore real-time implementations of our algorithm, along with reducing the use of priors within our work.

Introduction

Finding a client's heart rate either for health, polygraph, or other reasons, is an issue that classically requires close proximity. However, with recent developments in remote photoplethysmography, this is no longer necessarily the case. If a stable version of this were deployed, it would allow for better remote healthcare work and improvements in other areas where being remote can help lower costs or increase availability. Our work only necessitates an RGB camera capable of recording video, as part of our goal is to make this capability available to as wide an array of people as possible. Hopefully being able to make them deployable on nearly all modern laptops, as well as potentially smartphones. One of the ways this issue has been approached in the past, specifically in the realm of smart phones, is through the use of fingerprint scanners. However, one of the benefits of being able to do this entirely via camera, is that while fingerprint scanners are becoming increasingly common in smartphones, they are all but non-existent in laptops, and many other devices that already have integrated cameras. Thus using only a camera would increase the domain of devices that could be supported.

Approach

We implemented an image processing pipeline aimed towards extracting a subject's blood volume pulse (BVP) signal, and from that, their pulse rate with a technique called remote photoplethysmography (rPPG). The algorithms are fed a video of a subject, and processed each frame of the video to extract time-indexed RGB vectors. The vectors then go through a pipeline of spectral and statistical analysis algorithms to extract a BVP signal, and from that, their heart rate.

The spectral method we have implemented is inspired by Poe et al. [1] and consists of roughly three portions: ROI detection, preprocessing and extraction, and pulse rate calculation. The first of which is aimed to calculate the location of the subject's face to measure the BVP signal. To maintain a robust sequence of measurements on a relatively static portion of the subject’s face, we used the DLib facial landmark detector to extract the area between the subject’s cheeks [2], illustrated in Figure 6. After their face has been segmented, each RGB channel in the face-image is averaged, resulting in one measurement per channel per frame in the video, producing three signals. These signals are illustrated in Figure 1.

As heart rate signals are non-stationary, we then detrended these signals using a smoothness priors approach (Figure 2) with a cutoff frequency of 0.33 Hz [3]. After the RGB signals have been detrended and z-normalized, we use Independent Component Analysis (ICA) to decompose them into three independent source signals, shown in figure 3. To ensure that we could perform this step robustly and quickly, we opted to use scitkit-learn’s FastICA implementation [4]. ICA separates color variations due to BVP from variations caused by motion, lighting, or other sources. One of the returned components represents the fluctuations in color caused by variations in blood volume; this is assumed to be the component with the largest peak in its power spectrum. An example of an extracted BVP signal is shown in figure 4.

We then filter the signal in the time and frequency domains with a 5 point moving-average filter and a hamming window bandpass filter with cut-off frequencies depending on the user’s inputted state. We allow them to choose between resting, recovery, and active; each of which have different cut-off frequencies that incorporate prior estimates of their heart rate. Once the BVP signal is calculated, we use the interbeat-interval estimation implementation described in van Gent et al [5] to estimate the heart rate of the subject. We decided to utilize these authors’ implementations because the core focus of our project is on remotely estimating the BVP signal, not interbeat-interval estimation.

Experiments

Experimental Methodology

Our testing approach consists of two phases of testing. The first phase consists of preliminary tests that test the initial operating capability of our project. Phase one tests the algorithm on data of people of the same skin tone, gender, with no facial hair, under ideal lighting conditions, with the test subject facing the camera straight on from a fixed distance. This methodology of testing will allow students to determine the algorithms operating capacity before other variables are introduced. Ideally, the second phase of testing would have introduced the variables that were fixed in the first phase. Each variable would have been changed independently of the rest to isolate that particular variable. This stage was unable to be completed due to lack of diverse data. The subjects available to us, for reasons that are explained in the data set section, were all white males between the ages of 18-25. One variable that was able to be added was facial hair.

This methodology for testing the algorithm would have allowed for complete testing of variables while isolating the faults. The algorithm will be considered successful if the algorithm can achieve ±5% in phase one.

Data Sets

HCI tagging database: This database is videos and images of 30 test subjects' biometric reactions to stimuli. This dataset includes images and biometric data for these subjects.

OSF rPPG: This dataset covers provide RGB images and videos that are tagged with the foreground and background of the image as well as the biometrics of the people in the image. COHFACE dataset: This dataset consists of 160 minutes of 40 individuals of varying genders with tagged biometric data. The only downside to this database is that we will need assistance gaining access. After discovering the

Collected Data: Eleven videos were taken of ten different people in ten different lighting conditions. Their heart-rates were taken using established ppg algorithms. These readings were used as ground truth for training.

Sadly, after further processing of the HCI data, including the unpackaging of EEG data, we discovered that while this database has a lot of useful biological information about patients, along with videos, it does not have the patient's heart rate, rendering this dataset useless. The other two major databases proved to be inaccessible given our lack of credentials as undergraduate students. So we had to rely entirely on the data we were able to collect ourselves. While useful, the PPG heart rate measurements we took of our subjects were only accurate to within two BPM, leading to the potential for significant error within our data, as well as the danger of overfitting due to lack of subjects. Additionally, due to covid restrictions in the time window we had to complete this project, we were unable to reach a diverse field of subjects to test our methods on, limiting us to white males between the ages of 18 and 25.

Phase 1

Shown in figure 1 is the mapping of our heart rate measurements to the ground truth values and the trend between them. Ideally, the trendline will follow y = x.

Figure 1 - Experimental Results

Additionally, we compared our methods to other popular methods of RPPG(Remote Photoplethysmography) as seen in Figure 2 [6].

Standard Deviation

Mean Absolute Error

Root Mean Square Error

Poh2011

13.5

-

13.6

CHROM

-

13.49

22.36

LI2014

6.88

-

7.62

SAMC

5.81

4.96

6.23

SynRhythm

10.88

-

11.08

HR-CNN

-

7.25

9.24

DeepPhys

-

4.57

-

rPPPGNet

7.82

5.51

7.82

Our Method

4.26

5.25

6.76

Figure 2 - Table of Our method and other common methods error rates

Compared to many other methods our method has outstanding results, however, it is important to note that due to limiting factors, our data set was substantially smaller than what was used to find these values, consisting of only 10 subjects. All under similar conditions. However, this does seem to still indicate the effectiveness of our methods.

        

The method we have presented has an error rate of approximately 7% which while higher than the goal of 5%, still indicates that this is a solid approach and potentially, that given more data, specifically, more accurate data, the approach shown could drop below 5% error.

Special Test Cases

Test Case 1: Description

We performed an experiment measuring the effect of subject distance from the camera on the accuracy of the derived PPG signal, compared to a reference measurement from a pulse oximeter sensor. We believe that distance may influence accuracy, due to decreasing pixel density of the face-image as the subject moves further from the camera. To test this, we will record 30 second videos of the subject 0.5 meters from the camera, 1 meter from the camera, and 2 meters from the camera. Variables such as camera location, lighting, and video quality will be held constant. We expect that the derived PPG signals of further away subjects will have a higher signal-to-noise ratio, and thus be less accurate.

Test Case 1: Results

Distance from Camera

0.5 M

1.0 M

2.0 M

Mean Absolute Error

4.99

2.1

8.8

There doesn’t appear to be much of a difference within 1 meter but there is a noticeable change outside of 1 meter.

Test Case 2: Description

An experiment will be performed on the image quality that is necessary to get valid data. One of the main flaws of image processing is the general lack of resistance to lower resolution images. One of the tests will downsampling the testing images and compare the accuracy. The other image quality test will be image resolution. This will involve downsampling the testing images to different degrees and taking metrics at each step. The expected performance in this situation is decreased accuracy. The goal is to determine how robust the algorithm is. This will be achieved by incrementally testing by downsampling and comparing the results.

Test Case 2: Results

Image Resolution

540p

720p

1080p

Mean Absolute Error

4.5

2.6

4.8

It appears that there is a sweet spot for accuracy at 720p where it is more effective than 540p and 1080p. This is unexpected. Possible reasons is that the downsampling removes some noise but maintains enough information. More testing would be needed to be determined

Initial Experiments: Facial Recognition

    The project revolves around facial recognition, so it was imperative that this portion be prioritized. Currently, the software takes the input of a xml data from the HCI Tagging Dataset. This data is then parsed for faces using OpenCV. Then a bounding box is drawn around the center. The center of the face was decided to be used based off of previous projects in the same field as ours. The center of the face seemed to provide good data while limiting the outside factors. Once the bounding box is drawn the average RGB value for each color channel within the box is saved. This process is repeated for every frame of the video. Each color channel is saved off as a signal after the video has finished being processed as shown below.

Figure 1 - Raw RGB Signals

Initial Experiments: Signal Filtering

Detrended RGB signal data

Figure 2 - Detrended and normalized RGB signals

Independent component analysis (ICA) was performed on the color channel signals. Independent component analysis is a method of taking input signals then breaking them down into the components that make up those signals. The data above labeled “Detrended RGB signal data” was input into an ICA algorithm and the output is shown below.

Figure 3 - ICA components

Below is an ICA component, which our algorithm has determined to be the most likely BVP signal. The signal chosen has the largest magnitude within our bandpass frequencies, discussed in the approach.

Figure 4 - The ICA component selected as our BVP Signal

After a signal is chosen from ICA, a bandpass filter is applied to eliminate high and low frequency noise. This can be done because a heart beat is typically between 0.7-4.0 Hz [1].

Figure 5 - Bandpass filtered BVP signal

Once the bandpass filtered BVP signal is created, we implemented a library by the name of heartpy, which was designed for PPG type signals. It uses the root mean square of successive differences (RMSSD) along with the standard deviation of successive differences (SDSD) to find the heart rate hidden in a signal [7].

Qualitative results

We detail many of our results in the experiments section. However, there are a few results worth discussing in this section as well.

Figure 6 shows a visualization we developed that runs while our algorithm extracts RGB data. The red box shows the ROI over which the average red, blue, and green signals are extracted from. We calculate this ROI using the dlib library [8] for facial detection, we also use this library for predicting facial landmarks. We use landmarks identified along the jawline, and then shrink the region to give us an area below the eyes that’s all skin. Which will hopefully eliminate as much noise as we can.

Figure 6 - A screenshot of the algorithm extracting color data from the subject

Conclusion

This report has described an overview of our approach for remote photoplethysmography. We utilize a variety of signal processing and statistical techniques to remotely extract a heart rate signal from a patient. The basic pipeline is as follows: extract color signals from a face within a video, filter them and separate them into independent source signals, determine which source signal is the subject’s BVP, and estimate their heart rate from that signal. To accomplish this, we wrote a program in Python that incorporates much of our own work mixed with supporting libraries from reputable authors.

Although our implementation is complete, it is not perfect. Future work should investigate estimating heart rate without the use of priors about the subject’s activity levels, as one’s heart rate may not always fall within the bounds suggested by their activity level. Furthermore, our algorithm processes video at about 3 frames per second, so it would be difficult to implement it online. Real-time heart rate estimates are of much use to physicians and other concerned parties, so this would be a useful expansion of our work.

Citations

1. Poh, M.-Z., McDuff, D.J., Picard, R.W.: Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Trans. Biomed. Eng. 58, 7–11 (2011)

2. Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009

3. M. P. Tarvainen, P. O. Ranta-Aho, and P. A. Karjalainen, “An advanced detrending method with application to HRV analysis,” IEEE Trans. Biomed. Eng., vol. 49, no. 2, pp. 172–175, Feb. 2002.

4. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

5. van Gent, P., Farah, H., van Nes, N. and van Arem, B., 2019. Analysing Noisy Driver Physiology Real-Time Using Off-the-Shelf Sensors: Heart Rate Analysis Software from the Taking the Fast Lane Project. Journal of Open Research Software, 7(1), p.32. DOI: http://doi.org/10.5334/jors.241

6. Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao, Center for Machine Vision and Signal Analysis, University of Oulu, Finland, MOE Key Lab. for Intelligent Networks and Network SecurityFaculty of Electronic and Information Engineering, Xi’an Jiaotong University, PRC School of Information and Technology , Northwest University, PRC, “Remote Heart Rate Measurement from Highly Compressed Facial Videos: anEnd-to-end Deep Learning Solution with Video Enhancement”, https://arxiv.org/pdf/1907.11921.pdf

7. van Gent, Paul & Farah, Haneen & Nes, Nicole & Arem, B.. (2018). Heart Rate Analysis for Human Factors: Development and Validation of an Open Source Toolkit for Noisy Naturalistic Heart Rate Data.

8. Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009


© David Haas, Spencer Mullinix, Hogan Pope