__BIOBSS - PPG Pipeline__

_This notebook includes guidelines to help using pipeline module for PPG signal processing and feature extraction._

In [None]:
#Import BIOBSS and other required packages

#import biobss
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#import biobss from local to run without installing
import sys
sys.path.append("../")
import biobss

# Table of Contents
1. [PPG Sample Data](#sampledata)<br>
2. [Peak Detection](#ppg_peak)<br>
3. [Create Bio Process Objects](#create_bioprocess)<br>
4. [Create Feature Objects](#create_feature)<br>
5. [Create Bio Pipeline Object](#create_pipeline)<br>
6. [Create Bio Channel Objects](#create_biochannel)<br>
7. [Set Pipeline Inputs](#set_input)<br>
8. [Add Bio Process Objects to Pipeline](#add_bioprocess)<br>
9. [Add Feature Objects to Pipeline](#add_feature)
10. [Run Pipeline](#run_pipeline)<br>
11. [Extract Features](#extract_features)<br>

### __PPG Sample Data__
<a id="sampledata"></a>

PPG sample data is provided as a csv file in BIOBSS\sample data. The data file contains 100 PPG segments of 10-seconds length. The sampling rate is 64 Hz for all segments.

In [None]:
#Load the sample data
data, info = biobss.utils.load_sample_data(data_type='PPG_short')
sig = np.asarray(data['PPG'])
fs = info['sampling_rate']
L = info['signal_length']

#### __Peak Detection__
<a id="ppg_peak"></a>

BIOBSS provides a peak detection function with different alternatives for the peak detection method. These methods are appropriate for PPG signal, however the parameters should be selected properly if the second peak (diastolic peak) is observable in the signal. The ___peak_detection___ function returns a dictionary including peak locations, peak amplitudes, trough locations and trough amplitudes.

For further analysis, a peak control step is required to prevent errors resulting from incorrect peak detection results (missing or dupliciate peaks). The ___peak_control___ function checks the relative locations of peaks and troughs, ensuring only a single peak is located between consecutive troughs.

In [None]:
#Detect peaks using 'peakdet' method (delta=0.01). Delta parameter should be adjusted related to the amplitude of the signal.
info=biobss.preprocess.peak_detection(sig,fs,'peakdet',delta=0.01)

locs_peaks=info['Peak_locs']
peaks=info['Peaks']
locs_onsets=info['Trough_locs']
onsets=info['Troughs']

#Correct the peak detection results by considering the order of peaks and troughs
info=biobss.ppgtools.peak_control(sig=sig, peaks_locs=locs_peaks, troughs_locs=locs_onsets)

locs_peaks=info['Peak_locs']
peaks=sig[locs_peaks]
locs_onsets=info['Trough_locs']
onsets=sig[locs_onsets]

### <font color='Green'>__Create Bio Process Objects__ </font>
<a id="create_bioprocess"></a>

Any process can be added to the pipeline if the input is a signal and ouput is a signal or collection of signals. The process can be added by passing the method to a Bio_Process constructor. The Bio_Process constructor takes the following arguments:

- process_method: The method to be added to the pipeline
- inplace: If True, result of the process will modify the input signal. If False, the result of the process will be returned as a new signal and added as a new channel.
- prefix: The prefix to be added to the output channel name. If inplace is True, prefix will be ignored.
  - If inplace is False, the output channel name will be prefix + input channel name.
  - <code> (Default: None) (if inplace = False  and prefix = None, prefix will be set to 'processed') </code>
- return_index : If a return_index is set, selected index of the result will be set as result. If return_index is None, the whole result will be set as result. (Default: None)
- argmap: this dictionary maps different named arguments. For example, Bio_Channel already have a sampling_rate attribute. If the process method requires sampling rate with a different name, it can be mapped to the Bio_Channel attribute.<code> EXAMPLE : argmap = {'sampling_rate': 'fs'} or argmap = {'sampling_rate': 'sample_rate'} </code>
- **kwargs: Keyword arguments to be passed to the process method






Pipeline processes the given input sequentially. The input is passed from one process to the next.

In [None]:
# Create steps for the pipeline

# Filter steps process,return filtered signal for all input signals
clean=biobss.pipeline.Bio_Process(
    process_method=biobss.ppgtools.filter_ppg, argmap={"sampling_rate":"sampling_rate"})

# Signal normalization step, return normalized signal for all input signals
normalize = biobss.pipeline.Bio_Process(
    process_method=biobss.preprocess.normalize_signal)



### <font color='Green'>__Create Feature Objects__ </font>
<a id="create_feature"></a>

In [None]:
# Create feature extraction steps

# Extract time domain features. input_signals dictionary contains feature prefixes as keys and input signals as values
# For example rms of EDA_Tonic is extracted as Tonic_rms in this case
segment_features = biobss.pipeline.Feature(name="time_features", function=biobss.ppgtools.from_segment, input_signals={'PPG_Raw':'PPG_Raw'}, feature_types=['Freq','Time','Stat'], sampling_rate=fs, argmap={'sampling_rate':'sampling_rate'})


### __Create Bio Pipeline Object__
<a id="create_pipeline"></a>

In [None]:
# BIOBSS Pipeline is created
# windowed_process=True means that the pipeline will process the signal in windows
# window_size=60 means that the pipeline will process 60 seconds of data at a time
# step_size=20 means signal will be shifted by 20 seconds for each window
pipeline = biobss.pipeline.Bio_Pipeline(windowed_process=False)

### __Create Bio Channel Objects__
<a id="create_biochannel"></a>

In [None]:
# Input is created as Bio_Channel object
# Bio_Channel object requires following parameters: name, signal, sampling_rate
# Optionally timestamp can be provided
# If timestamp is provided, resolution of the timestamp should be provided. Possible resolutions are:'min', 's', 'ms', 'us'
# If timestamp is not provided, timestamp will be created using sampling_rate. Default resolution is 's'
bio_channel=biobss.pipeline.Bio_Channel(signal=sig,name="PPG_Raw",sampling_rate=fs)

# a Bio Channel object is created with sample data, with the name EDA_Raw and sampling rate 700
# timestamp is not provided, so timestamp will be created using sampling_rate

# Simply bio_channel the channel object will print the channel properties
bio_channel

In [None]:
# bio_channel.channel accesses the data in the channel object
bio_channel.channel

### __Set Pipeline Inputs__
<a id="set_input"></a>

In [None]:
# Pipeline input can be set from an array, a dataframe, pandas series, list, Bio_Channel, Bio_Data

# In this case, the pipeline input is set with an array
pipeline.set_input(sig,sampling_rate=fs,name='PPG_Raw')

# Alternatively, the pipeline input can be set with a Bio_Channel object, in this case the name and sampling rate are not required as they are already provided in the Bio_Channel object
pipeline.set_input(bio_channel)

pipeline.input

#### __Add Bio Process Objects to Pipeline__
<a id="add_bioprocess"></a>

In [None]:
# Pipeline steps are added to the pipeline sequentially, the order of the steps is important as the output of one step is the input of the next step
# These steps will be processed in the order they are added to the pipeline
pipeline.preprocess_queue.add_process(clean)
pipeline.preprocess_queue.add_process(normalize)


# Currently all the steps are added to the preprocess_queue
# After the preprocess_queue is processed, data will be segmented into windows (if windowed_process=True)
# For running process in windows, the process_queue is used

## This structure will change in the future

#### __Add Feature Objects to Pipeline__
<a id="add_feature"></a>

In [None]:
# Features are added to the pipeline
pipeline.add_feature_step(segment_features)


In [None]:
# Represetation of the pipeline
pipeline

#### __Run Pipeline__
<a id="run_pipeline"></a>

In [None]:
# Pipeline is run, this will process the input data in the pipeline
pipeline.run_pipeline()

In [None]:
# Represetation of the pipeline data after running
pipeline.data

# Data is cleaned
# Data is normalized
# Data is decomposed into tonic and phasic components, these componenets are added to the pipeline data as EDA_Tonic and EDA_Phasic channels
# Lastly, data is resampled to 350Hz (All channels are resampled to 350Hz) (target_sample_rate=350)
# Data is segmented into windows of 60 seconds with a step size of 20 seconds

#### __Extract Features__
<a id="extract_features"></a>

In [None]:
# Statistical and signal features are extracted from the EDA_Tonic, EDA_Phasic and EDA_Raw channels
# Supplied prefix is added to the feature name
pipeline.extract_features()



In [None]:
# Represetation of the pipeline features after extraction
# Each row is a feature vector for a window
# index is the timestamp of the window (this can be selected as start, end or center of the window)
pipeline.features