__BIOBSS - GSR Pipeline using different Python packages__

_This notebook includes guidelines to help using pipeline module for GSR signal processing and feature extraction._

In [1]:
# imports
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import neurokit2 as nk
#import biobss from local to run without installing
sys.path.append("../")
import biobss
import neurokit2
import flirt


# Table of Contents
1. [GSR Sample Data](#sampledata)<br>
2. [Create Bio Process Objects](#create_bioprocess)<br>
4. [Create Feature Objects](#create_feature)<br>
5. [Create Bio Pipeline Object](#create_pipeline)<br>
6. [Create Bio Channel Objects](#create_biochannel)<br>
7. [Set Pipeline Inputs](#set_input)<br>
8. [Add Bio Process Objects to Pipeline](#add_bioprocess)<br>
9. [Add Feature Objects to Pipeline](#add_feature)
10. [Run Pipeline](#run_pipeline)<br>
11. [Extract Features](#extract_features)<br>

### __GSR Sample Data__
<a id="sampledata"></a>

In [2]:
#Load the sample data
data, info = biobss.utils.load_sample_data(data_type='EDA')
sample_data = np.asarray(data['EDA'])
sampling_rate = info['sampling_rate']
L = info['signal_length']

# Create a timestamp for plotting, this is not necessary for the pipeline
timestamp = np.arange(0, len(sample_data)/sampling_rate, 1/sampling_rate)

signals={'Raw':sample_data}

### <font color='Green'>__Create Bio Process Objects__ </font>
<a id="create_bioprocess"></a>

Any process can be added to the pipeline if the input is a signal and ouput is a signal or collection of signals. The process can be added by passing the method to a Bio_Process constructor. The Bio_Process constructor takes the following arguments:

- process_method: The method to be added to the pipeline
- inplace: If True, result of the process will modify the input signal. If False, the result of the process will be returned as a new signal and added as a new channel.
- prefix: The prefix to be added to the output channel name. If inplace is True, prefix will be ignored.
  - If inplace is False, the output channel name will be prefix + input channel name.
  - <code> (Default: None) (if inplace = False  and prefix = None, prefix will be set to 'processed') </code>
- return_index : If a return_index is set, selected index of the result will be set as result. If return_index is None, the whole result will be set as result. (Default: None)
- argmap: this dictionary maps different named arguments. For example, Bio_Channel already have a sampling_rate attribute. If the process method requires sampling rate with a different name, it can be mapped to the Bio_Channel attribute.<code> EXAMPLE : argmap = {'sampling_rate': 'fs'} or argmap = {'sampling_rate': 'sample_rate'} </code>
- **kwargs: Keyword arguments to be passed to the process method






Pipeline processes the given input sequentially. The input is passed from one process to the next.

BIOBSS provides functionality to generate a pipeline using desired functions from different packages. In this example, bio process objects are created from Neurokit2 and BIOBSS packages as defined below.

In [4]:
# Create steps for the pipeline
# EDA decopmosition step, Returns a dictionary with the following keys: EDA_Tonic, EDA_Phasic
decompose = biobss.pipeline.Bio_Process(process_name ='decompose' ,process_method=biobss.edatools.eda_decompose,method="highpass")


In [5]:
# Create steps for the pipeline

# Filter steps process,return filtered signal for all input signals
clean=biobss.pipeline.Bio_Process(process_name= 'clean',
    process_method=neurokit2.eda_clean, sampling_rate = sampling_rate)

In [7]:
# Signal normalization step, return normalized signal for all input signals
normalize = biobss.pipeline.Bio_Process(process_name= 'normalize',
    process_method=biobss.preprocess.normalize_signal)

In [8]:
# Signal resampling step, return resampled signal for all input signals, signals resampled to 350Hz
resample = biobss.pipeline.Bio_Process(process_name= 'resample',
    process_method=biobss.preprocess.resample_signal_object,target_sample_rate=350)

### <font color='Green'>__Create Feature Objects__ </font>
<a id="create_feature"></a>

Similarly, a feature object is created from Flirt package by adding ___get_stats___ function. Note that, the required parameters should be provided during feature object creation. 

In [9]:
# Create feature extraction steps

# Extract signal features. input_signals dictionary contains feature prefixes as keys and input signals as values
# For example rms of EDA_Tonic is extracted as Tonic_rms in this case
signal_features = biobss.pipeline.Feature(name="signal_features", function=flirt.stats.common.get_stats,add_prefix_after=True,input_type=pd.Series)

In [10]:
flirt.stats.common.get_stats(sample_data)

{'mean': 6.491609817674261,
 'std': 0.8103523210948972,
 'min': 5.057525634765625,
 'max': 8.211135864257812,
 'ptp': 3.1536102294921875,
 'sum': 26901231.08444214,
 'energy': 177353539.959844,
 'skewness': 0.004636156463354328,
 'kurtosis': -1.4695282218969485,
 'peaks': 1,
 'rms': 6.541992732281867,
 'lineintegral': 18854.150390625,
 'n_above_mean': 1949163,
 'n_below_mean': 2194837,
 'n_sign_changes': 0,
 'iqr': 1.4865875244140625,
 'iqr_5_95': 2.2777557373046875,
 'pct_5': 5.323028564453125,
 'pct_95': 7.6007843017578125,
 'entropy': 15.22935103074018,
 'perm_entropy': 0.9998711817324639,
 'svd_entropy': 0.006630583507182337}

Another feature object is created from BIOBSS package by adding ___eda_stat_features___ function.  

In [11]:
# Extract statistical features.
# For example mean of EDA_Tonic is extracted as Tonic_mean in this case
stat_features = biobss.pipeline.Feature(name="stat_features", function=biobss.edatools.eda_stat_features)

### __Create Bio Pipeline Object__
<a id="create_pipeline"></a>

In [12]:
# BIOBSS Pipeline is created
# windowed_process=True means that the pipeline will process the signal in windows
# window_size=60 means that the pipeline will process 60 seconds of data at a time
# step_size=20 means signal will be shifted by 20 seconds for each window
pipeline = biobss.pipeline.Bio_Pipeline(windowed_process=True,window_size=60,step_size=60)

### __Create Bio Channel Objects__
<a id="create_biochannel"></a>

In [13]:
# Input is created as Bio_Channel object
# Bio_Channel object requires following parameters: name, signal, sampling_rate
# Optionally timestamp can be provided
# If timestamp is provided, resolution of the timestamp should be provided. Possible resolutions are:'min', 's', 'ms', 'us'
# If timestamp is not provided, timestamp will be created using sampling_rate. Default resolution is 's'
bio_channel=biobss.pipeline.Bio_Channel(signal=sample_data,name="EDA_Raw",sampling_rate=700)

# a Bio Channel object is created with sample data, with the name EDA_Raw and sampling rate 700
# timestamp is not provided, so timestamp will be created using sampling_rate

# Simply bio_channel the channel object will print the channel properties
bio_channel

EDA_Raw (700Hz) ((5920.0,)s) (1 windows) ((4144000,)) ((4144000,)) (float64)[5.40046692 5.40885925 5.40161133 ... 7.26280212 7.26966858 7.2681427 ]

In [14]:
# bio_channel.channel accesses the data in the channel object
bio_channel.channel

array([5.40046692, 5.40885925, 5.40161133, ..., 7.26280212, 7.26966858,
       7.2681427 ])

### __Set Pipeline Inputs__
<a id="set_input"></a>

In [15]:
# Pipeline input can be set from an array, a dataframe, pandas series, list, Bio_Channel, Bio_Data

# In this case, the pipeline input is set with an array
pipeline.set_input(sample_data,sampling_rate=700,name='EDA_Raw')

# Alternatively, the pipeline input can be set with a Bio_Channel object, in this case the name and sampling rate are not required as they are already provided in the Bio_Channel object
pipeline.set_input(bio_channel)

pipeline.input

Signal object with 1 channel(s)
EDA_Raw (700Hz) ((5920.0,)s) (1 windows) ((4144000,))

#### __Add Bio Process Objects to Pipeline__
<a id="add_bioprocess"></a>

In [16]:
# Pipeline steps are added to the pipeline sequentially, the order of the steps is important as the output of one step is the input of the next step
# These steps will be processed in the order they are added to the pipeline
pipeline.preprocess_queue.add_process(clean,input_signals=['EDA_Raw'],output_signals=['EDA_Clean'])
pipeline.preprocess_queue.add_process(normalize,input_signals=['EDA_Clean'],output_signals=['EDA_Normalized'])
pipeline.preprocess_queue.add_process(decompose,input_signals=['EDA_Normalized'],output_signals=['EDA_Tonic','EDA_Phasic'])
pipeline.preprocess_queue.add_process(resample,input_signals=['EDA_Normalized'],output_signals=['EDA_Normalized'])
pipeline.preprocess_queue.add_process(resample,input_signals=['EDA_Tonic'],output_signals=['EDA_Tonic'])
pipeline.preprocess_queue.add_process(resample,input_signals=['EDA_Phasic'],output_signals=['EDA_Phasic'])

# Currently all the steps are added to the preprocess_queue
# After the preprocess_queue is processed, data will be segmented into windows (if windowed_process=True)
# For running process in windows, the process_queue is used

## This structure will change in the future

#### __Add Feature Objects to Pipeline__
<a id="add_feature"></a>

In [17]:
pipeline.add_feature_step(signal_features,feature_prefix = 'Normalized_f',input_signals=['EDA_Normalized'])
pipeline.add_feature_step(stat_features,feature_prefix = 'Normalized',input_signals=['EDA_Normalized'])
pipeline.add_feature_step(signal_features,feature_prefix = 'Tonic_f',input_signals=['EDA_Tonic'])
pipeline.add_feature_step(stat_features,feature_prefix = 'Tonic',input_signals=['EDA_Tonic'])
pipeline.add_feature_step(signal_features,feature_prefix = 'Phasic_f',input_signals=['EDA_Phasic'])
pipeline.add_feature_step(stat_features,feature_prefix = 'Phasic',input_signals=['EDA_Phasic'])


In [18]:
# Represetation of the pipeline
pipeline

Bio_Pipeline:
	Preprocessors: Process list:
	1: clean(EDA_Raw) -> EDA_Clean
	2: normalize(EDA_Clean) -> EDA_Normalized
	3: decompose(EDA_Normalized) -> EDA_Tonic,EDA_Phasic
	4: resample(EDA_Normalized) -> EDA_Normalized
	5: resample(EDA_Tonic) -> EDA_Tonic
	6: resample(EDA_Phasic) -> EDA_Phasic

	Processors: Process list:

	Postprocessors: Process list:

	Window Size(Seconds): 60
	Step Size: 60

#### __Run Pipeline__
<a id="run_pipeline"></a>

In [19]:
# Pipeline is run, this will process the input data in the pipeline
pipeline.run_pipeline()

  warn("If output of the process is Bio_Data, output signals argument must be a dictionary or it will bi ignored")


In [20]:
# Represetation of the pipeline data after running
pipeline.data

# Data is cleaned
# Data is normalized
# Data is decomposed into tonic and phasic components, these componenets are added to the pipeline data as EDA_Tonic and EDA_Phasic channels
# Lastly, data is resampled to 350Hz (All channels are resampled to 350Hz) (target_sample_rate=350)
# Data is segmented into windows of 60 seconds with a step size of 20 seconds

Signal object with 5 channel(s)
EDA_Raw (700Hz) (60.0s) (98 windows) ((98, 42000))
EDA_Clean (700Hz) (60.0s) (98 windows) ((98, 42000))
EDA_Normalized (350Hz) (60.0s) (98 windows) ((98, 21000))
EDA_Tonic (350Hz) (60.0s) (98 windows) ((98, 21000))
EDA_Phasic (350Hz) (60.0s) (98 windows) ((98, 21000))

#### __Extract Features__
<a id="extract_features"></a>

In [21]:
# Statistical and signal features are extracted from the EDA_Tonic, EDA_Phasic and EDA_Raw channels
# Supplied prefix is added to the feature name
pipeline.extract_features()

In [22]:
# Represetation of the pipeline features after extraction
# Each row is a feature vector for a window
# index is the timestamp of the window (this can be selected as start, end or center of the window)
pipeline.features

Unnamed: 0,Normalized_mean,Normalized_std,Normalized_min,Normalized_max,Normalized_ptp,Normalized_sum,Normalized_energy,Normalized_skewness,Normalized_kurtosis,Normalized_peaks,...,Phasic_perm_entropy,Phasic_svd_entropy,Phasic_signal_mean,Phasic_signal_std,Phasic_signal_max,Phasic_signal_min,Phasic_signal_range,Phasic_signal_kurtosis,Phasic_signal_skew,Phasic_signal_momentum
0.0,-1.306393,0.062686,-1.503240,-0.770217,0.733024,-27434.243861,35922.413714,2.298977,4.608311,0,...,0.991289,0.043767,0.001101,0.026432,0.107763,-0.089481,0.197244,5.162175,0.612499,0.000699
60000.0,-1.127855,0.064259,-1.292727,-0.968544,0.324184,-23684.947286,26799.891590,-0.636777,0.436994,0,...,0.974122,0.060710,-0.001521,0.022604,0.099171,-0.040446,0.139617,3.444132,1.518407,0.000511
120000.0,-1.403079,0.048873,-1.458883,-1.288423,0.170460,-29464.654566,41391.391665,0.829802,-0.541948,0,...,0.995749,0.110980,0.000281,0.004254,0.015477,-0.011377,0.026854,0.403755,-0.036546,0.000018
180000.0,-1.448536,0.010259,-1.470541,-1.421197,0.049344,-30419.245682,44065.567772,0.266085,-0.256773,0,...,0.990984,0.099563,0.000212,0.004630,0.016339,-0.011875,0.028214,0.381608,0.544795,0.000021
240000.0,-1.462740,0.010572,-1.482210,-1.435018,0.047192,-30717.537805,44934.115233,0.580621,-0.220862,0,...,0.984896,0.126958,-0.000241,0.003221,0.009511,-0.008324,0.017835,-0.160100,0.095751,0.000010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5580000.0,0.653533,0.011884,0.628771,0.689395,0.060624,13724.189489,8972.174383,0.473253,0.081926,0,...,0.985763,0.092923,-0.000084,0.007905,0.026500,-0.021184,0.047684,0.492056,0.145790,0.000062
5640000.0,0.759860,0.128804,0.625495,1.010939,0.385444,15957.054468,12473.521712,0.659615,-1.226944,0,...,0.980562,0.048939,-0.000182,0.036746,0.117735,-0.137439,0.255175,2.675637,-0.191880,0.001350
5700000.0,0.890010,0.018847,0.858397,0.940686,0.082289,18690.201830,16641.918990,0.611062,-0.389302,0,...,0.981126,0.103008,-0.000183,0.006742,0.019091,-0.016013,0.035104,-0.443534,0.370807,0.000045
5760000.0,0.896646,0.010765,0.872314,0.920338,0.048024,18829.574093,16885.903004,-0.038059,-0.681742,0,...,0.979104,0.117435,0.000213,0.005863,0.016819,-0.013719,0.030538,-0.679443,0.090121,0.000034
