# 0 (Optional but recommeded) Set up Virtual Environment

# 1 Install Pipeline Dependencies
In order to use the pipeline, you need to install some dependencies the pipeline relies on. Run the following command to install the dependencies defined in requirements.txt

In [None]:
%pip install -r requirements_rpeak2hrv_pipeline.txt

# 2 Instantiate Pipeline

In [20]:
from transformers import pipeline

rpeak2hrv_pipeline = pipeline(model = "hubii-world/rpeaks-to-hrv-pipeline", trust_remote_code=True)

config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


rpeaks_2_hrv_pipeline.py:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

rpeaks2hrv.py:   0%|          | 0.00/6.48k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/hubii-world/rpeaks-to-hrv-pipeline:
- rpeaks2hrv.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/hubii-world/rpeaks-to-hrv-pipeline:
- rpeaks_2_hrv_pipeline.py
- rpeaks2hrv.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/649k [00:00<?, ?B/s]

Device set to use cpu


# 3 Pipeline Parameters & Supported File Formats

## Overiew: Parameters
The pipeline provides a variety of different parameters that can be set to adjust the preprocessing behavior. The following sections explain the individual parameters in detail and provide illustrative examples.


### Mandatory Parameters
In general, the pipeline relies on 2 mandatory parameters the user has to set for every parameter execution:
| Parameter name | Type | Default value | Description |
|----------------|------|---------------|-------------|
|    `inputs`     | _str_ or _Dataframe_  | No default value | The input that should be processed by the pipeline. This can either be a path to a file containing the data to process or the data itself |
| `sampling_rate` | _int_  | 1000          | The sampling rate of the continuous cardiac signal in which peaks occur |


### Optional Parameters
Besides the mandatory parameter, the pipeline offers multiple optional parameters that may be necessary to set in order to compute correct HRV-features:
| Parameter name | Type | Default value | Description |
|----------------|------|---------------|-------------|
| `time_header`  | _str_| 'SystemTime'  | The name of the column in the data that contains the timestamp to which the respective values in the same row are recorded |
| `rri_header`   | _str_| 'interbeat_interval' | The name of the column in the data that contains the RR-Intervals in msec |
| `windowing_method` | _str_| None |  The method that should be applied to divide the raw data into windows. Default setting is None, so no windowing is applied |
| `window_size`  | _str_| '60s' | The size of a window in terms of a time frame. Only relevant if windowing should be applied to the data |


##  3.1 `Inputs`

The `inputs` parameter represents the data the pipeline should process to HRV-Features. The pipeline supports values of type _str_ and _Dataframe_ as input.

When providing the `inputs` as string, it has to represent a file path to a file containing the data to process. Supported file formats are .csv and .txt.

Alternatively, you can also provide the data directly to the pipeline in form of a _DataFrame_.

### Example: Provide input as file path

In [21]:
file_path = "./Example_data/RRIntervalExample.csv"
result = rpeak2hrv_pipeline(inputs=file_path, sampling_rate=2000)
result.head()

Unnamed: 0,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,HRV_RMSSD,HRV_SDSD,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,1006.894005,159.530641,72.830794,137.277955,56.912177,143.4821,45.675812,152.633402,107.280546,102.397785,...,180.664062,1.487042,886.71875,1109.375,50.490196,77.941176,273.4375,2649.414062,33.376923,0.0


## 3.2 `sampling_rate`
The `sampling_rate` (Hz) represents the rate with which the sensor sampled data from the patient. It has to be provided as integer. In the example above, you can see a configuration where the `sampling_rate` is set to 2000.

The default rate is 1000 Hz, meaning that the sensor sampled 1000 values per second.

## 3.3 `time_header` & `rri_header`
`time_header` and `rri_header` are important settings to define the structure of the data the pipeline has to process. In general, the pipeline supports two possible data formats:
- R Peak Flags
- RR-Intervals with timestamps

### 3.3.1 R Peak Flags
The first format option is defined by a _Dataframe_ with one column named `'ECG_R_Peaks'`. The column values are simple binary flags indicating whether a R peak occured or not. 

This is the standard data format used by neurokit2 to represent R peaks. If you use this data format, you do not need to specify `time_header` and `rri_header`.

__Important__: Make sure that the column has the correct name and that you specify the correct `sampling_rate`, as this is indispensable information to compute the correct HRV-Features.

#### Example: R Peak Flags
Execute the following cell to inspect an example for a _DataFrame_ containing R Peak Flags

In [22]:
import pandas as pd
df = pd.read_csv("./Example_data/RPeaksDataExample.csv")
df.head()

Unnamed: 0,ECG_R_Peaks
0,0
1,0
2,0
3,0
4,0


You can process this data without setting `time_header`and `rri_header`

In [23]:
result = rpeak2hrv_pipeline(inputs=df, sampling_rate=100)
result.head()

Unnamed: 0,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,HRV_RMSSD,HRV_SDSD,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,696.395349,62.135891,10.060728,60.275036,,,,,69.697983,69.779109,...,60.0,0.891502,660.0,740.0,14.651163,49.302326,470.0,1420.0,7.962963,234.375


### 3.3.2 RR-Intervals with timestamps
The second format option is defined by a _DataFrame_ with two columns containing the RR-Intervals in milliseconds and the corresponding timestamps at which the RR-intervals have been recorded by the sensor. Here, `time_header` speficies the column name containing the timestamps and `rri_header` speficies the column containing the RR-intervals.
The default column names are `'SystemTime'` and `'interbeat_intervals'`.
#### Example: RR-Intervals with timestamps
Execute the following cell to inspect an example for a _DataFrame_ containing RR intervals and their timestamps

In [24]:
import pandas as pd
df = pd.read_csv("./Example_data/RRIntervalExample.csv")
df.head()

Unnamed: 0,SystemTime,interbeat_interval
0,2025-03-17 16:20:54.760848,13349609375.0
1,2025-03-17 16:20:54.762717,10341796875.0
2,2025-03-17 16:20:55.1747236,
3,2025-03-17 16:20:56.2737142,96875.0
4,2025-03-17 16:20:57.3727371,9814453125.0


As in this example the column names match the default values of `time_header` and `rr_header`, you also do not need to specify them individually to process the data

In [25]:
result = rpeak2hrv_pipeline(inputs=df, sampling_rate=1000)
result.head()

Unnamed: 0,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,HRV_RMSSD,HRV_SDSD,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,1006.894005,159.530641,72.830794,137.277955,56.912177,143.4821,45.675812,152.633402,107.280546,102.397785,...,180.664062,1.487042,886.71875,1109.375,50.490196,77.941176,273.4375,2649.414062,33.376923,0.0


## 3.4 `windowing_method`
The `windowing_method` defines the method to be used to divide the raw data into windows. The supported settings are:
| Parameter value | Description |
|-----------------|-------------|
|'rolling'        | Creates a window rolling over the data. For more information see [pandas.DataFrame.rolling()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html) |
|'first_interval' | Keeps the data values that are recorded within the __first__ timeframe defined by _window_size_ and omits the rest |
|'last_interval'  | Keeps the data values that are recorded within the __last__ timeframe defined by _window_size_ and omits the rest |


### Example: 'first_interval'-windowing
The following code snipped shows an exemplary usage of first_interval windowing. In this example, only the values recorded within the first 5 minutes of the data collection are used to compute HRV-Features

In [26]:
file_path = "./Example_data/RRIntervalExample.csv"
result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="5m", sampling_rate=1000)
result.head()

  hrv_values = pd.concat([hrv_values, hrv_time], ignore_index=True)


Unnamed: 0,window_start,window_end,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,2025-03-17 16:20:54.760848,2025-03-17 16:25:54.300729033,972.108115,125.671439,79.452647,99.909422,80.799289,106.578732,,,...,128.417969,1.260327,876.5625,1058.007812,50.0,81.25,714.84375,1424.804688,17.222222,234.375


## 3.5 `window_size`
The `window_size` defines the size of the windows the data should be divided into. In general, the definition follows this pattern: '{any positive integer}{t}', where t is an element of {'d', 'h', 'm', 's'}.

For example: the setting '20m' represents a window size of 20 minutes.

The default setting is '60s' corresponding to a window size of a minute.

Setting this parameter is only necessary, if you want to apply windowing.

### Example: Window size

In the following cell, a rolling window of 5 minutes is applied to the data. For each window, the pipeline then calculates the HRV-Features and creates a new row in the result _DataFrame_. The pipeline returns a _DataFrame_ in which each row represents a specific window.
For each window, the corresponding starting and ending timestamps are included in the result.

In [27]:
file_path = "./Example_data/RRIntervalExample.csv"
result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="rolling", window_size="5m", sampling_rate=1000)
result.head()

  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  out["RMSSD"] = np.sqrt(np.nanmean(diff_rri**2))
  hrv_values = pd.concat([hrv_values, hrv_time], ignore_index=True)
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


Unnamed: 0,window_start,window_end,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,2025-03-17 16:20:54.760848,2025-03-17 16:20:54.760848000,1334.960938,,,,,,,,...,0.0,,1334.960938,1334.960938,0.0,0.0,1334.960938,1334.960938,1.0,0.0
1,2025-03-17 16:20:54.760848,2025-03-17 16:20:54.762717000,1184.570312,212.684462,,,,,,,...,150.390625,0.707107,1094.335938,1274.804688,50.0,50.0,1034.179688,1334.960938,2.0,0.0
2,2025-03-17 16:20:54.760848,2025-03-17 16:20:56.273714200,1112.630208,195.303548,,,,,,,...,183.105469,0.897294,994.921875,1214.648438,66.666667,66.666667,968.75,1334.960938,3.0,0.0
3,2025-03-17 16:20:54.760848,2025-03-17 16:20:57.372737100,1079.833984,172.42782,,,,,,,...,131.103516,0.969412,976.367188,1154.492188,50.0,50.0,968.75,1334.960938,4.0,0.0
4,2025-03-17 16:20:54.760848,2025-03-17 16:20:58.471694000,1106.835938,161.071544,,,,,,,...,233.398438,0.833476,978.90625,1238.867188,60.0,60.0,968.75,1334.960938,5.0,0.0


# 3.6 Supported file formats

As already mentioned in Section 3.1, the pipeline can process 2 types of data formats when providing a file path: .csv and .txt.
When using a .csv file, the pipeline supports two column seprarators: ',' and ';'. 

The pipeline recognizes the column separator in the .csv file automatically.

When using a .txt file, the pipeline only supports the column separator '\t'. Make sure your data file matches this requirement before providing it to the pipeline.

### Example: Provide .csv file to pipeline
The following example provides a .csv file to the pipeline and lets it calculate the HRV-Features on the first 10 minutes of the data

In [15]:
file_path = "./Example_data/RRIntervalExample.csv"
result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="10m", sampling_rate=1000)
result.head()

  hrv_values = pd.concat([hrv_values, hrv_time], ignore_index=True)


Unnamed: 0,window_start,window_end,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,2025-03-17 16:20:54.760848,2025-03-17 16:30:54.600700666,999.90429,148.548482,66.769493,130.324972,62.538421,133.508714,,,...,166.503906,1.835071,884.960938,1101.367188,48.387097,74.193548,708.984375,1510.742188,22.296296,242.1875


### Example: Provide .txt file to pipeline
The same can be done using a .txt file

In [16]:
file_path = "./Example_data/RRIntervalExample.txt"
result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="10m", sampling_rate=1000)
result.head()

  hrv_values = pd.concat([hrv_values, hrv_time], ignore_index=True)


Unnamed: 0,window_start,window_end,HRV_MeanNN,HRV_SDNN,HRV_SDANN1,HRV_SDNNI1,HRV_SDANN2,HRV_SDNNI2,HRV_SDANN5,HRV_SDNNI5,...,HRV_IQRNN,HRV_SDRMSSD,HRV_Prc20NN,HRV_Prc80NN,HRV_pNN50,HRV_pNN20,HRV_MinNN,HRV_MaxNN,HRV_HTI,HRV_TINN
0,2025-03-17 16:20:54.760848,2025-03-17 16:30:54.600700666,999.90429,148.548482,66.769493,130.324972,62.538421,133.508714,,,...,166.503906,1.835071,884.960938,1101.367188,48.387097,74.193548,708.984375,1510.742188,22.296296,242.1875
