# 1. Tsfresh: Univariate Feature Engineering with Robot Failure Dataset (LP1)

This basic example shows how to use [tsfresh](https://tsfresh.readthedocs.io/) to extract useful features from univaraite timeseries and use them to improve classification performance.

We use the gunpoint dataset as an example.

In [6]:
import pandas as pd
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_extraction import ComprehensiveFCParameters

#load dataset
lp1 = pd.read_csv(r'C:\Users\seren\Documents\RR Lab Documents\lp1_fulldata.csv')

y = lp1['class']
lp1 = lp1.drop(['class'], axis=1)
lp1 = lp1.iloc[:, :3] #univariate

In [7]:
lp1.head()

Unnamed: 0,id,time,var1
0,1,1,-1
1,1,2,0
2,1,3,-1
3,1,4,-1
4,1,5,-1


In [8]:
extraction_settings = ComprehensiveFCParameters()

X = extract_features(lp1, column_id='id', column_sort='time',
                     default_fc_parameters=extraction_settings,
                     # we impute = remove all NaN features automatically
                     impute_function=impute)

Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 18/18 [00:07<00:00,  2.35it/s]


In [9]:
print('Extracted Feature set shape: ', X.shape)
X['class'] = y
print('Extracted Feature set shape with class labels: ', X.shape)

Extracted Feature set shape:  (88, 779)
Extracted Feature set shape with class labels:  (88, 780)


# 2. Tsfresh: Multivariate Feature Engineering with Robot Failure Dataset (LP1)

This basic example shows how to use [tsfresh](https://tsfresh.readthedocs.io/) to extract useful features from multiple timeseries and use them to improve classification performance.

We use the robot failure data set as an example.

In [10]:
import pandas as pd
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_extraction import ComprehensiveFCParameters

## Extract Features

In [11]:
#load dataset
lp1 = pd.read_csv(r'C:\Users\seren\Documents\RR Lab Documents\lp1_fulldata.csv')

y = lp1['class']
lp1 = lp1.drop(['class'], axis=1)

In [12]:
lp1.head()

Unnamed: 0,id,time,var1,var2,var3,var4,var5,var6
0,1,1,-1,0,57,-5,-3,0
1,1,2,0,-3,63,-1,0,0
2,1,3,-1,1,51,-4,-1,-1
3,1,4,-1,-2,68,-2,-2,0
4,1,5,-1,-1,65,-6,1,0


In [13]:
extraction_settings = ComprehensiveFCParameters()

X = extract_features(lp1, column_id='id', column_sort='time',
                     default_fc_parameters=extraction_settings,
                     # we impute = remove all NaN features automatically
                     impute_function=impute)

Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 20/20 [00:23<00:00,  1.15s/it]


It should be noted that given that there are 6 variables, this results in the 779*6 columns to be created. 

In [14]:
print('Extracted Feature set shape: ', X.shape)
print('Number of features extracted per sensor: ', X.shape[1]/6)
X = X.reset_index(drop = True)
X['class'] = y[::15].reset_index(drop = True)
print('Extracted Feature set shape with class labels: ', X.shape)

Extracted Feature set shape:  (88, 4674)
Number of features extracted per sensor:  779.0
Extracted Feature set shape with class labels:  (88, 4675)


In [15]:
#check for nan
is_NaN = X.isnull()
row_has_NaN = is_NaN.any(axis=1)
rows_with_NaN = X[row_has_NaN]

rows_with_NaN

Unnamed: 0,var4__variance_larger_than_standard_deviation,var4__has_duplicate_max,var4__has_duplicate_min,var4__has_duplicate,var4__sum_values,var4__abs_energy,var4__mean_abs_change,var4__mean_change,var4__mean_second_derivative_central,var4__median,...,var3__fourier_entropy__bins_3,var3__fourier_entropy__bins_5,var3__fourier_entropy__bins_10,var3__fourier_entropy__bins_100,var3__permutation_entropy__dimension_3__tau_1,var3__permutation_entropy__dimension_4__tau_1,var3__permutation_entropy__dimension_5__tau_1,var3__permutation_entropy__dimension_6__tau_1,var3__permutation_entropy__dimension_7__tau_1,class


In [16]:
X.to_csv(r'C:\Users\seren\Documents\oct\tsfresh_multi.csv', index = False)