# How to Use Feature Extraction Toolbox

In [None]:
import pandas as pd

from package_name.feature_extraction import settings, extraction

In [None]:
# Load dataset
# filename = "/home/scai9/feature_dataset/USCHAD_data.csv"
filename = "/home/edumaba/SCAI_lab/proj-adl-classification/Public Dataset/USCHAD_data.csv"
dataset = pd.read_csv(filename)

# Get subset of dataset to use for testing
dataset = dataset.loc[dataset["subject"]==1]

### Parameter Settings for Feature Extraction

There are three categories of features: Statistical, Spectral, and Time Frequency. Each have a class holding parameters required for feature calculation. A given configuration can be saved in either a json or yaml format.

In [None]:
# Initialization of the feature extraction parameters
statistical_params = settings.StatisticalFeatureParams(25)
spectral_params = settings.SpectralFeatureParams(25)
time_freq_params = settings.TimeFrequencyFeatureParams(25)

# Save and load the parameters
statistical_params.to_json("statistical_params.json")
statistical_params_2 = settings.StatisticalFeatureParams.from_json("statistical_params.json")

assert statistical_params.get_settings_as_dict() == statistical_params_2.get_settings_as_dict()

### Data Format for Feature Calculators

The individual statistial feature calculator functions support univariate series inputs. See example below.

In [None]:
from package_name.feature_extraction.statistical_feature_calculators import calculate_area_under_squared_curve

area = calculate_area_under_squared_curve(dataset["accx"].values)
print(f"Area Under Squared Curve: {area}")

### Using Feature Extraction Functions

In `package_name.feature_extraction.extraction` there are functions to extract features from each subcategory: Statistical, Spectral, and Time-Frequency. There is also a fucntion available to extraction features from all three categories. The `signal_name` parameter specifies a name for the dataset to be prepended to all sub-signal names in the resultant dataframe. `njobs` also specifies the number of cores to use. A basic example of each is shown below.


In [None]:
# Calculate statistical features
features = extraction.calculate_statistical_features(dataset, statistical_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=1)

print(features.head())

In [None]:
# Calculate spectral features
features = extraction.calculate_spectral_features(dataset, spectral_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=1)

In [None]:
pd.set_option('display.max_columns', None)
features

In [None]:
# Calculate time frequency features
features = extraction.calculate_time_frequency_features(dataset[0:3000], time_freq_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=1)

In [None]:
pd.set_option('display.max_columns', None)
features

In [None]:
# Calculate all features
features = extraction.calculate_all_features(dataset, statistical_params, spectral_params, time_freq_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=6)

In [None]:
pd.set_option('display.max_columns', None)
features

Arrays, DataFrames and Series are all acceptable input data formats. If the input is a DataFrame, the columns parameters specifies the columns to analyze (as seen in the previous examples). If not, they are the ordered names of the components of signal. 

In [None]:
# 2D array input
print(dataset[["accx", "accy", "accz"]].values.shape)
features = extraction.calculate_statistical_features(dataset[["accx", "accy", "accz"]].values, statistical_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=1)

print(features.head())

# Series input
features = extraction.calculate_statistical_features(dataset["accx"].values, statistical_params, columns=["accx"], signal_name="test", njobs=1)

print(features.head())

### Extracting a Subset of Features

Optionally, a subset of the available features can be selected for extraction through a list of feature names in the parameter classes.

In [None]:
statistical_params = settings.StatisticalFeatureParams(25, calculators=["mean", "mode", "std"])
spectral_params = settings.SpectralFeatureParams(25, calculators=["spectral_variance"])
time_freq_params = settings.TimeFrequencyFeatureParams(25, calculators=["tkeo_features"],tkeo_sf_params=statistical_params)

features = extraction.calculate_all_features(dataset, statistical_params, spectral_params, time_freq_params, columns=["accx", "accy", "accz"], signal_name="test", njobs=1)
print(features.head())