-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ecg_peaks (method='neurokit') sometimes fails on ECG's with missing data #823
Comments
Interesting. @JanCBrammer what do you think about changing allowing for median instead of mean? I reckon we should probably do some benchmarking and some tests to see whether the median option is as good as mean in most cases |
@DominiqueMakowski, exactly my thoughts. |
Do we have benchmarking set up? |
Not really (aside from this old thread). It seems like we've all at some point done bits of work on our side, it would be really cool to assemble around a "task force" project (also with @cbrnr as they did some cool work) for a cross-software benchmarking initiative... |
If you worry about median speed, the bottleneck package will helps a lot |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Hi! In case you want to keep the ContextI identified a similar problem while working with long-term ECGs (several hours of recording) with missing data: QRS detection fails or does not even terminate. I try to reproduce this here with simulated data: import neurokit2 as nk
import numpy as np
# This is roughly how the long-term ECGs look like.
signal = np.concatenate([
nk.ecg_simulate(duration=10*60, sampling_rate=128, random_state=2, random_state_distort=2),
[-4.0] * (360 * 60 * 128),
nk.ecg_simulate(duration=90*60, sampling_rate=128, random_state=2, random_state_distort=2),
[-4.0] * (30 * 60 * 128)
])
nk.signal_plot(signal) ProblemQRS detection fails on raw and cleaned data: _, info = nk.ecg_peaks(signal, sampling_rate=128, correct_artifacts=True)
info
cleaned = nk.ecg_clean(signal, sampling_rate=128)
_, info = nk.ecg_peaks(cleaned, sampling_rate=128, correct_artifacts=True)
info
Proposed SolutionI addressed the problem by doing flatline detection (similar to signal_flatline.py) and then doing QRS detection only on parts without flatline: import numpy as np
def moving_average(signal, window_size):
"""Moving window average on a signal.
Parameters
----------
signal : Union[list, np.array]
The signal as a vector of values.
window_size : int
How many consequtive samples are used for averaging.
Returns
-------
np.array
Returns a signal of averages from the original signal. Note: The returned signal is shorter
than the original signal by window_size - 1.
"""
return np.convolve(signal, np.ones(window_size), 'valid') / window_size
def no_flatline(signal, sampling_rate, threshold=0.01, tolerance=60):
"""Finds areas in a signal that are not flatline.
Parameters
----------
signal : Union[list, np.array]
The signal as a vector of values.
sampling_rate : int
The sampling rate of the signal, i.e. how many samples (values) per second.
threshold : float, optional
Flatline threshold relative to the biggest change in the signal.
This is the percentage of the maximum value of absolute consecutive
differences. Default: 0.01.
tolerance : int, optional
Determines how fine-grained the resulting signal is, i.e. how long (in seconds) can a
flatline part be without being recognised as such. Default: 60.
Returns
-------
np.array
Returns a signal that is True/1 where there are usable data in the signal and False/0 where
there is a sufficiently long flatline interval. Note: Returned signal is shorter than the
original signal by (sampling_rate * tolerance) - 1.
"""
abs_diff = np.abs(np.diff(signal))
threshold = threshold * np.max(abs_diff)
return moving_average(abs_diff >= threshold, sampling_rate * tolerance) >= threshold
def no_flatline_intervals(signal, sampling_rate, threshold=0.01, tolerance=60):
"""Finds areas in a signal that are not flatline.
Parameters
----------
signal : Union[list, np.array]
The signal as a vector of values.
sampling_rate : int
The sampling rate of the signal, i.e. how many samples (values) per second.
threshold : float, optional
Flatline threshold relative to the biggest change in the signal.
This is the percentage of the maximum value of absolute consecutive
differences. Default: 0.01.
tolerance : int, optional
Determines how fine-grained the resulting signal is, i.e. how long (in seconds) can a
flatline part be without being recognised as such. Default: 60.
Returns
-------
tuple
Returns a tuple of np.arrays:
signal_starts: Indices where a usable part of a signal starts.
signal_ends: Indices where a usable part of a signal ends.
"""
# Identify flanks: +1 for beginning plateau; -1 for ending plateau.
flanks = np.diff(no_flatline(signal, sampling_rate, threshold, tolerance).astype(int))
signal_starts = np.flatnonzero(flanks > 0)
signal_ends = np.flatnonzero(flanks < 0)
# Correct offsets from moving average
signal_starts = signal_starts + sampling_rate * tolerance
# Insert start marker at signal start if a start marker is missing
if len(signal_starts) < len(signal_ends):
signal_starts = np.insert(signal_starts, 0, 0)
# Insert end marker at signal end if an end marker is missing
if len(signal_ends) < len(signal_starts):
signal_ends = np.append(signal_ends, [len(signal) - 1])
# Return instances where start < end (start >= end might occur due to offset correction).
return [(start, end) for start, end in zip(signal_starts, signal_ends) if start < end] ResultsFlatline detection results look like this: import matplotlib.pyplot as plt
fig, ax = plt.subplots()
fig.set_size_inches(20,10)
ax.plot(cleaned)
ax.plot(no_flatline(cleaned, sampling_rate=128)) Then, the usable intervals are correctly identified: intervals = no_flatline_intervals(cleaned, sampling_rate=128)
intervals
And detection can be done like this (we might need to merge the arrays and DataFrames later on): import pandas as pd
for start, end in intervals:
signals, info = nk.ecg_peaks(cleaned[start:end], sampling_rate=128)
signals.index = signals.index + start
info['ECG_R_Peaks'] = info['ECG_R_Peaks'] + start
print(signals)
print(info) Next steps?What do you think about this approach? If you like it, I could try to turn this into a PR 🤷 |
Interesting approach. But then the question would be how to determine the optimal thresholds for flat-line detection. How do you define "flat" in the presence of noise or random artifacts (large voltage spikes). How does this approach compare to the simple median? |
Very cool indeed. I also had some issues recently with bad chunks of data, it would be good to have a robust method to either ignore them and/or adapt the parameters in them... However, I'm not sure what the best API for that should look like, because it's not only peak detection that should be adapted to handle bad intervals. @danibene already started to make some functions resistant to missing values (#838). Maybe:
|
My thoughts about this topic:
|
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Describe the bug
On certain ECG's with missing data or longer periods of constant values (e.g. zeros),
ecg_peaks
withmethod="neurokit"
fails to find all but two peaks. Below I attached a screenshot withshow=True
:The cause of this issue lies in the way Neurokit calculates the
min_len
. This variable defines the minimum qrs length (len_qrs
) required to accept a valid R-peak.NeuroKit/neurokit2/ecg/ecg_findpeaks.py
Lines 249 to 252 in 3d5ecfc
Changing
np.mean()
tonp.median
would provide a more robust estimate for the most common qrs length and would preventmin_len
from becoming too high. The following screenshot shows the detected R-peaks withnp.median()
.Because the median calculation is a bit slower, I saw a 16% increase in runtime on a 24h ECG.
System Specifications
OS: Linux ( 64bit)
Python: 3.8.10
NeuroKit2: 0.2.0
NumPy: 1.23.4
Pandas: 1.5.3
SciPy: 1.10.1
sklearn: 1.2.2
matplotlib: 3.6.2
The text was updated successfully, but these errors were encountered: