[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/obss/BIOBSS/blob/tutorials/examples/acc_processing.ipynb)

__BIOBSS - ACC Signal Processing__

_This notebook includes guidelines to help using BIOBSS for calculation of activity metrics and feature extraction from ACC signals._

In [None]:
%%bash
git clone https://github.com/obss/biobss.git
cd BIOBSS
pip install .

In [None]:
#Import BIOBSS and the other required packages

import biobss
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Table of Contents
1. [ACC Sample Data](#sampledata)<br>
2. [ACC Signal Preprocessing](#acc_pre)<br>
    2.1. [Filtering](#acc_filter)<br>
    2.2. [Peak Detection](#acc_peaks)<br>
    2.3. [Plotting](#acc_plot)<br>
3. [Activity Metrics from ACC Signals](#acc_actind)<br>
    3.1. [Dataset Generation](#acc_dataset)<br>
    3.2. [Calculation of Activity Metrics](#act_ind)<br>
4. [ACC Feature Extraction](#acc_features)<br>

### __ACC Sample Data__
<a id="sampledata"></a>

ACC sample data is provided as a csv file in BIOBSS\sample data. The data file contains 3-axis ACC signals of 5-minutes length, sampled at 32 Hz. 

In [None]:
#Load the sample data
data, info = biobss.utils.load_sample_data(data_type='ACC')
accx = np.asarray(data['ACCx'])
accy = np.asarray(data['ACCy'])
accz = np.asarray(data['ACCz'])
fs = info['sampling_rate']
L = info['signal_length']

### __ACC Signal Preprocessing__
<a id="acc_pre"></a>

#### __Filtering__
<a id="acc_filter"></a>

BIOBSS provides the __filter_signal__ function which filters signals with Butterworth filter designed using __Scipy__. The filter parameters (filter type, filter order, cutoff frequencies) should be defined as shown below.

In [None]:
#Filtering ACC signals by defining the filter parameters

f_accx= biobss.preprocess.filter_signal(sig=accx, sampling_rate=fs, filter_type='lowpass', N=2, f_upper=10)
f_accy= biobss.preprocess.filter_signal(sig=accy, sampling_rate=fs, filter_type='lowpass', N=2, f_upper=10)
f_accz= biobss.preprocess.filter_signal(sig=accz, sampling_rate=fs, filter_type='lowpass', N=2, f_upper=10)

As an alternative, pre-defined filters can be used for each signal type. For this purpose, _signal_type_ should be 'ACC' and _method_ should be selected. 

In [None]:
#Filter ACC signal by using predefined filters
filtered_accx=biobss.preprocess.filter_signal(sig=accx, sampling_rate=fs, signal_type='ACC', method='lowpass')
filtered_accy=biobss.preprocess.filter_signal(sig=accy, sampling_rate=fs, signal_type='ACC', method='lowpass')
filtered_accz=biobss.preprocess.filter_signal(sig=accz, sampling_rate=fs, signal_type='ACC', method='lowpass')

#### __Peak Detection__
<a id="acc_peaks"></a>

The ___peak_detection___ function can be used to detect peaks in the ACC signal(s). Method can be 'peakdet', 'heartpy' or 'scipy'. The function returns a dictionary including arrays of peak and trough locations.

In [None]:
#Detect peaks using 'peakdet' method (delta=0.01). 
#Delta parameter should be adjusted related to the amplitude of the signal.

info=biobss.preprocess.peak_detection(sig=filtered_accx, sampling_rate=fs, method='peakdet', delta=0.01)

locs_peaks=info['Peak_locs']
peaks=filtered_accx[locs_peaks]
locs_onsets=info['Trough_locs']
onsets=filtered_accx[locs_onsets]

#### __Plotting__
<a id="acc_plot"></a>

BIOBSS provides plotting functions specific to each signal type. In order to plot ACC signals, ___plot_acc___ function can be used. The _signals_ and _peaks_ should be dictionaries and the keys should be selected properly as shown below. The plots can be generated either using __Matplotlib__ or __Plotly__. 

In [None]:
#Generate inputs as dictionaries
signals = {'x-axis':{'Raw': accx , 'Filtered': filtered_accx}, 'y-axis':{'Raw': accy , 'Filtered': filtered_accy}, 'z-axis':{'Raw': accz , 'Filtered': filtered_accz}}
peaks = {'x-axis':{'Raw':{'Peaks': locs_peaks}, 'Filtered':{'Peaks': locs_peaks}}}

In [None]:
#Plot ACC signals using Matplotlib (default)
biobss.imutools.plot_acc(signals=signals, peaks=peaks, sampling_rate=fs, method='matplotlib', show_peaks=True)

In [None]:
#Plot ACC signals using Plotly
biobss.imutools.plot_acc(signals=signals, peaks=peaks, sampling_rate=fs, method='plotly', show_peaks=True)

### __Activity Metrics from ACC Signals__
<a id="acc_actind"></a>

BIOBSS provides a set of functions to calculate activity metrics defined in the literature. These activity metrics are:

- Proportional Integration Method (PIM)
- Zero Crossing Method (ZCM)
- Time Above Threshold (TAT)
- Mean Amplitude Deviation (MAD)
- Euclidian Norm Minus One (ENMO)
- High-pass Filtered Euclidian (HFEN)
- Activity Index (AI)

Reference: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261718


#### __Dataset Generation__
<a id="acc_dataset"></a>

The preprocessing steps which should be applied on the raw acceleration signal are different for each of the activity metrics listed above. In other words, each activity index can be calculated only from specific datasets. BIOBSS package has the ___generate_dataset___ function which applies appropriate preprocessing steps (dataset generation) to prevent errors and make the process easier for the users.

The generated datasets are:
- UFXYZ: unfiltered acc signals 
- UFM: magnitude of unfiltered acc signals 
- UFM_modified: modified magnitude of unfiltered signals (absolute(UFM-length(UFM)))
- UFNM: normalized magnitude of unfiltered acc signals 
- FXYZ: filtered acc signals
- FXYZ_modified: modified filtered acc signals (absolute(FXYZ))   
- FMpre: magnitude of filtered acc signals
- SpecialXYZ: filtered acc signals (special filter parameters)  
- SpecialM: magnitude of filtered acc signals (special filter parameters)
- FMpost: filtered magnitude of acc signals
- FMpost_modified: modified of filtered magnitude of acc signals (absolute(FMpost))


Dataset generation step is a part of activity index calculation pipeline and the ___calc_activity_index___ function can handle it by calling the ___generate_dataset___ function. However, this function can also be used independently by defining input arguments properly as shown below.

In [None]:
#Generate the datasets
ufxyz           = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=False,filtering_order=None,magnitude=False,normalize=False,modify=False),
ufm             = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=False,filtering_order=None,magnitude=True,normalize=False,modify=False),
ufm_modified    = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=False,filtering_order=None,magnitude=True,normalize=False,modify=True),
ufnm            = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=False,filtering_order=None,magnitude=True,normalize=True,modify=False),
fxyz            = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='pre',magnitude=False,normalize=False,modify=False),
fxyz_modified   = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='pre',magnitude=False,normalize=False,modify=True),
fmpre           = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='pre',magnitude=True,normalize=False,modify=False),
specialxyz      = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='pre',magnitude=False,normalize=False,modify=False,filter_type='highpass',N=4,f_lower=0.2),
specialm        = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='pre',magnitude=True,normalize=False,modify=False,filter_type='highpass',N=4,f_lower=0.2),
fmpost          = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='post',magnitude=True,normalize=False,modify=False),
fmpost_modified = biobss.imutools.generate_dataset(accx,accy,accz,fs,filtering=True,filtering_order='post',magnitude=True,normalize=False,modify=True)
                    

#### __Calculation of Activity Metrics__
<a id="act_ind"></a>

The ___calc_activity_index___ function is used to calculate an activity index defined by _metric_ for the selected _input_types_. Note that if the _input_types_ is not passed to the function, the activity metric is calculated for all of the valid input types.  

In [None]:
#Calculate activity metrics
pim = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='PIM')
zcm = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='ZCM')
tat = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='TAT')
mad = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='MAD')
enmo = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='ENMO')
hfen = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='HFEN')
ai = biobss.imutools.calc_activity_index(accx, accy, accz, signal_length=60, sampling_rate=fs, metric='AI', baseline_variance=[0.5,0.5,0.5])

### __ACC Feature Extraction__
<a id="acc_features"></a>

The features which are used for analysis of ACC signals can be categorized as statistical features, frequency domain features and correlation features. 

Statistical features:
- mean: mean of the signal amplitude
- std: standard deviation of the signal amplitude
- mad: mean absolute deviation of the signal amplitude
- min: minimum value of the signal amplitude
- max: maximum value of the signal amplitude
- range: difference of maximum and minimum values of the signal amplitude
- median: median value of the signal amplitude
- medad: median absolute deviation of the signal amplitude
- iqr: interquartile range of the signal amplitude
- ncount: number of negative values 
- pcount: number of positibe values 
- abmean: number of values above mean
- npeaks: number of peaks
- skew: skewness of the signal
- kurtosis: kurtosis of the signal
- energy: signal energy (the mean of sum of squares of the values in a window)
- momentum: signal momentum

Frequency domain features:
- fft_mean: mean of fft peaks
- fft_std: standard deviation of fft peaks
- fft_mad: mean absolute deviation of fft peaks
- fft_min: minimum value of fft peaks
- fft_max: maximum value of fft peaks
- fft_range: difference of maximum and minimum values of fft peaks
- fft_median: median value of fft peaks
- fft_medad: median absolute deviation of fft peaks
- fft_iqr: interquartile range of fft peaks
- fft_abmean: number of fft peaks above mean
- fft_npeaks: number of fft peaks
- fft_skew: skewness of fft peaks
- fft_kurtosis: kurtosis of fft peaks
- fft_energy: energy of fft peaks
- fft_entropy: entropy of fft peaks
- f1sc: signal power in the range of 0.1 to 0.2 Hz
- f2sc: signal power in the range of 0.2 to 0.3 Hz
- f3sc: signal power in the range of 0.3 to 0.4 Hz
- max_freq: frequency of maximum fft peak

Correlation features:
- accx_accy_corr: correlation coefficient for x and y axes
- accx_accz_corr: correlation coefficient for x and z axes
- accy_accz_corr: correlation coefficient for y and z axes

Reference: https://towardsdatascience.com/feature-engineering-on-time-series-data-transforming-signal-data-of-a-smartphone-accelerometer-for-72cbe34b8a60

These features can be calculated seperately for each domain using the functions ___get_freq_features___, ___get_stat_features___ and ___get_corr_features___. The signal arrays and signal names should be lists for multi axis signals. Then, the features are calculated for each signal in the list. Note that list of _signals_ and _signal_names_ should have the same order.

In [None]:
#Generate list of signals and signal_names to be used in feature calculation.
signals = [accx, accy, accz]
signal_names = ['accx', 'accy', 'accz']

In [None]:
#Calculate frequency domain, statistical and correlation features for x-axis acceleration signal
features_freq = biobss.imutools.acc_features.acc_freq_features(signals=signals, signal_names=signal_names, sampling_rate=fs)
features_stat = biobss.imutools.acc_features.acc_stat_features(signals=signals, signal_names=signal_names, sampling_rate=fs)
features_corr = biobss.imutools.acc_features.acc_corr_features(signals=signals, signal_names=signal_names, sampling_rate=fs)

The ___get_acc_features___ function is used to calculate features for multiple domains at a time. 

In [None]:
features_acc = biobss.imutools.get_acc_features(signals=signals, signal_names=signal_names, sampling_rate=fs)

In order to calculate the features for the signal vector magnitude ($\sqrt{accx^2 + accy^2 + accz^2}$), _magnitude_ should be True.

In [None]:
features_acc = biobss.imutools.get_acc_features(signals, signal_names, sampling_rate=fs, magnitude=True)
features_freq = biobss.imutools.acc_features.acc_freq_features(signals=signals, signal_names=signal_names, sampling_rate=fs, magnitude=True)
features_stat = biobss.imutools.acc_features.acc_stat_features(signals=signals, signal_names=signal_names, sampling_rate=fs, magnitude=True)
features_corr = biobss.imutools.acc_features.acc_corr_features(signals=signals, signal_names=signal_names, sampling_rate=fs)