# Experiments on KuHar Resampled to 20Hz

This notebook will perform basic experiments on the balanced KuHar dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced KuHar dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced KuHar dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-12 01:26:01.158485: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 01:26:01.158506: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for KuHar resampled to 20Hz view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/KuHar/resampled_view_20Hz")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,accel-x-0,accel-x-1,accel-x-2,accel-x-3,accel-x-4,accel-x-5,accel-x-6,accel-x-7,accel-x-8,...,gyro-z-59,accel-start-time,gyro-start-time,accel-end-time,gyro-end-time,activity code,length,serial,index,user
0,0,0.001911,-0.014536,0.005845,0.003675,-0.014972,0.025607,0.000478,-0.031141,-0.014827,...,0.004456,23.235,23.223,26.26,26.249,0,300,1,2100,1051
1,1,0.004114,-0.003186,0.000759,0.01245,-0.032074,0.00727,-0.00047,0.00698,0.0214,...,0.002979,56.292,56.292,59.245,59.245,0,300,1,5700,1037
2,2,-0.011282,-0.002432,-0.003199,0.008152,-0.021763,0.000309,-0.004968,-0.009551,0.001497,...,0.003343,27.268,27.267,30.29,30.291,0,300,1,2700,1075
3,3,-0.009241,-0.004666,0.021606,-0.0072,0.003091,0.00163,0.005057,-0.008149,0.013167,...,-0.002053,39.421,39.42,42.441,42.44,0,300,6,3900,1008
4,4,-0.013083,-0.005612,0.001645,0.006823,-0.004159,0.000415,0.008178,0.002637,-0.000827,...,0.002603,23.703,23.703,26.656,26.656,0,300,1,2400,1038


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# Kuhar features to select
features = [
    "accel-x",
    "accel-y",
    "accel-z",
    "gyro-x",
    "gyro-y",
    "gyro-z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([ 1.91093286e-03, -1.45361925e-02,  5.84452385e-03,  3.67495627e-03,
       -1.49718059e-02,  2.56068907e-02,  4.77538088e-04, -3.11405362e-02,
       -1.48270261e-02,  7.69834863e-03,  1.06101665e-02, -5.96475630e-02,
       -3.35511310e-03, -1.65885925e-03,  3.94389738e-02, -4.28711994e-02,
       -4.65577088e-03, -1.44686791e-02, -7.36948774e-03, -3.87024460e-03,
        6.24744252e-02, -1.79626835e-02,  3.22744928e-03, -3.75961022e-03,
        1.46163449e-02, -1.07502353e-02, -9.27218103e-03,  5.06417325e-03,
        1.40691624e-02,  1.60138354e-02, -5.34838152e-02, -3.29858611e-03,
        2.31031426e-02,  2.27906805e-02,  2.54595798e-03,  1.75255266e-02,
       -5.10498318e-03, -2.07463519e-02,  1.32902011e-02,  1.37572046e-02,
        7.17675958e-03, -2.01445217e-02,  5.47817384e-03, -7.66570074e-04,
        1.94831071e-02, -1.11694213e-03,  2.27235363e-02, -1.49616813e-02,
       -9.71672954e-03, -7.12839038e-03,  9.02811373e-03, -1.57676951e-03,
       -5.51378813e-03, 

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [ 1.91093286e-03 -1.45361925e-02  5.84452385e-03  3.67495627e-03
 -1.49718059e-02  2.56068907e-02  4.77538088e-04 -3.11405362e-02
 -1.48270261e-02  7.69834863e-03  1.06101665e-02 -5.96475630e-02
 -3.35511310e-03 -1.65885925e-03  3.94389738e-02 -4.28711994e-02
 -4.65577088e-03 -1.44686791e-02 -7.36948774e-03 -3.87024460e-03
  6.24744252e-02 -1.79626835e-02  3.22744928e-03 -3.75961022e-03
  1.46163449e-02 -1.07502353e-02 -9.27218103e-03  5.06417325e-03
  1.40691624e-02  1.60138354e-02 -5.34838152e-02 -3.29858611e-03
  2.31031426e-02  2.27906805e-02  2.54595798e-03  1.75255266e-02
 -5.10498318e-03 -2.07463519e-02  1.32902011e-02  1.37572046e-02
  7.17675958e-03 -2.01445217e-02  5.47817384e-03 -7.66570074e-04
  1.94831071e-02 -1.11694213e-03  2.27235363e-02 -1.49616813e-02
 -9.71672954e-03 -7.12839038e-03  9.02811373e-03 -1.57676951e-03
 -5.51378813e-03 -3.61854449e-03 -9.16178207e-03  1.69776410e-02
 -3.65341848e-03 -2.29361283e-02 -2.51271512e-03  3.50588067e-02
  1.3409791

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in Kuhar

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[ 1.91093286e-03, -1.45361925e-02,  5.84452385e-03, ...,
         3.45654902e-03,  2.32869360e-03,  4.45589801e-03],
       [ 4.11395657e-03, -3.18646610e-03,  7.58931558e-04, ...,
        -9.94428406e-04, -1.82853273e-03,  2.97903419e-03],
       [-1.12820040e-02, -2.43180090e-03, -3.19908050e-03, ...,
         3.56838998e-03,  4.38234273e-03,  3.34301636e-03],
       ...,
       [-3.59406279e-01, -1.19101056e+00, -8.73361291e-01, ...,
         3.80289216e-01,  2.18364177e-01,  2.06985897e-01],
       [-2.03856397e+00,  1.39151562e+00,  1.99333519e+00, ...,
         4.45907215e-01,  5.24997532e-01,  6.63564268e-01],
       [-3.72664939e+00, -1.19035790e+01, -4.04218490e+00, ...,
         4.09155302e-01,  3.60494266e-01, -1.44081320e-04]])

In [12]:
train_dataset_fft[:][0]

array([[2.08923330e-02, 1.12081089e-01, 6.03699767e-02, ...,
        6.86907330e-03, 1.25349286e-02, 1.69158661e-02],
       [1.53802877e-02, 8.24343989e-02, 4.18766153e-02, ...,
        3.72912157e-03, 3.98584265e-03, 1.71193131e-02],
       [5.21272671e-02, 4.82816195e-02, 8.93573044e-02, ...,
        9.75422945e-03, 2.66463902e-02, 7.84359780e-03],
       ...,
       [2.46594280e+00, 2.97792077e+01, 2.58438841e+01, ...,
        1.43625028e+00, 7.98405975e-01, 2.66617405e-01],
       [3.12703194e+00, 2.12859482e+01, 9.31637610e+00, ...,
        4.71804217e-01, 9.06413206e-01, 6.75740676e-01],
       [1.48890233e+01, 6.63385820e+00, 8.58902995e+00, ...,
        3.19953749e-01, 1.27100790e-01, 3.83110579e-01]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation]),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945968.2656271
    result:
    -   accuracy: 0.6772486772486772
        f1 score (macro): 0.6678879234801423
        f1 score (micro): 0.6772486772486772
        f1 score (weighted): 0.6866094310172122
    run id: 1
    start: 1662945963.8253286
    time taken: 4.440298557281494
-   end: 1662945972.564448
    result:
    -   accuracy: 0.708994708994709
        f1 score (macro): 0.7020199931651915
        f1 score (micro): 0.708994708994709
        f1 score (weighted): 0.7159694248242263
    run id: 2
    start: 1662945968.2656293
    time taken: 4.298818826675415
-   end: 1662945976.8605897
    result:
    -   accuracy: 0.7037037037037037
        f1 score (macro): 0.6929442374103094
        f1 score (micro): 0.7037037037037037
        f1 score (weighted): 0.714463169997098
    run id: 3
    start: 1662945972.56445
    time taken: 4.296139717102051



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945979.565638
    result:
    -   accuracy: 0.8174603174603174
        f1 score (macro): 0.8138427799981984
        f1 score (micro): 0.8174603174603176
        f1 score (weighted): 0.8210778549224366
    run id: 1
    start: 1662945977.0002391
    time taken: 2.565398931503296
-   end: 1662945982.1550117
    result:
    -   accuracy: 0.8518518518518519
        f1 score (macro): 0.8497917699922641
        f1 score (micro): 0.8518518518518519
        f1 score (weighted): 0.8539119337114395
    run id: 2
    start: 1662945979.56564
    time taken: 2.589371681213379
-   end: 1662945984.7115672
    result:
    -   accuracy: 0.8201058201058201
        f1 score (macro): 0.8172024871252938
        f1 score (micro): 0.8201058201058201
        f1 score (weighted): 0.8230091530863464
    run id: 3
    start: 1662945982.1550133
    time taken: 2.556553840637207



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945986.1398034
    result:
    -   accuracy: 0.47883597883597884
        f1 score (macro): 0.4577020753647976
        f1 score (micro): 0.4788359788359789
        f1 score (weighted): 0.49996988230716005
    run id: 1
    start: 1662945984.7173824
    time taken: 1.4224209785461426
-   end: 1662945987.5187905
    result:
    -   accuracy: 0.47883597883597884
        f1 score (macro): 0.4577020753647976
        f1 score (micro): 0.4788359788359789
        f1 score (weighted): 0.49996988230716005
    run id: 2
    start: 1662945986.1398056
    time taken: 1.3789849281311035
-   end: 1662945988.8941143
    result:
    -   accuracy: 0.47883597883597884
        f1 score (macro): 0.4577020753647976
        f1 score (micro): 0.4788359788359789
        f1 score (weighted): 0.49996988230716005
    run id: 3
    start: 1662945987.5187924
    time taken: 1.375321865081787



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945989.320505
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7760122723591896
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8112893149423979
    run id: 1
    start: 1662945988.8994958
    time taken: 0.4210090637207031
-   end: 1662945989.7350264
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7760122723591896
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8112893149423979
    run id: 2
    start: 1662945989.3205068
    time taken: 0.4145195484161377
-   end: 1662945990.1553805
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7760122723591896
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8112893149423979
    run id: 3
    start: 1662945989.7350285
    time taken: 0.4203519821166992



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945990.3258417
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.44767892953100336
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.44650096464889066
    run id: 1
    start: 1662945990.161067
    time taken: 0.16477465629577637
-   end: 1662945990.3509066
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.44767892953100336
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.44650096464889066
    run id: 2
    start: 1662945990.3258452
    time taken: 0.025061368942260742
-   end: 1662945990.3748505
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.44767892953100336
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.44650096464889066
    run id: 3
    start: 1662945990.350909
    time taken: 0.023941516876220703



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662945990.402439
    result:
    -   accuracy: 0.8333333333333334
        f1 score (macro): 0.8318913869639928
        f1 score (micro): 0.8333333333333334
        f1 score (weighted): 0.8347752797026738
    run id: 1
    start: 1662945990.383144
    time taken: 0.019295215606689453
-   end: 1662945990.4158952
    result:
    -   accuracy: 0.8333333333333334
        f1 score (macro): 0.8318913869639928
        f1 score (micro): 0.8333333333333334
        f1 score (weighted): 0.8347752797026738
    run id: 2
    start: 1662945990.402441
    time taken: 0.013454198837280273
-   end: 1662945990.4288878
    result:
    -   accuracy: 0.8333333333333334
        f1 score (macro): 0.8318913869639928
        f1 score (micro): 0.8333333333333334
        f1 score (weighted): 0.8347752797026738
    run id: 3
    start: 1662945990.4158971
    time taken: 0.012990713119506836

