# Experiments on KuHar Resampled to 50Hz

This notebook will perform basic experiments on the balanced KuHar dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced KuHar dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced KuHar dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-12 20:50:59.585764: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 20:50:59.585786: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for KuHar resampled to 50Hz view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/KuHar/resampled_view_50Hz")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,accel-x-0,accel-x-1,accel-x-2,accel-x-3,accel-x-4,accel-x-5,accel-x-6,accel-x-7,accel-x-8,...,gyro-z-149,accel-start-time,gyro-start-time,accel-end-time,gyro-end-time,activity code,length,serial,index,user
0,0,-0.002824,-0.012195,-0.023101,-0.014948,0.014606,0.007602,0.000329,-0.011798,0.020713,...,0.00305,23.235,23.223,26.26,26.249,0,300,1,2100,1051
1,1,-0.001939,0.005255,0.01016,-0.005391,-0.015235,-0.007979,0.028104,0.018456,-0.000599,...,0.002945,56.292,56.292,59.245,59.245,0,300,1,5700,1037
2,2,-0.021353,-0.051341,0.018199,0.042849,-0.038259,-0.024216,0.019591,0.017296,0.005607,...,0.002912,27.268,27.267,30.29,30.291,0,300,1,2700,1075
3,3,-0.003025,-0.011433,-0.020292,0.012551,0.016344,0.015364,0.015261,-0.001294,-0.008576,...,0.002076,39.421,39.42,42.441,42.44,0,300,6,3900,1008
4,4,0.010373,0.024726,-0.031148,-0.010458,-0.016808,0.035965,-0.005237,0.001135,-0.001181,...,0.006516,23.703,23.703,26.656,26.656,0,300,1,2400,1038


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# Kuhar features to select
features = [
    "accel-x",
    "accel-y",
    "accel-z",
    "gyro-x",
    "gyro-y",
    "gyro-z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([-2.82401087e-03, -1.21949303e-02, -2.31008138e-02, -1.49481153e-02,
        1.46058665e-02,  7.60185134e-03,  3.29308743e-04, -1.17979330e-02,
        2.07128886e-02, -4.33668742e-03, -2.11764558e-02, -1.03425461e-02,
        1.17034974e-03,  5.47544131e-02,  2.45609306e-02,  4.64435586e-03,
       -4.26413009e-02, -4.71307500e-02,  8.80317933e-03, -1.47481121e-02,
       -4.08809867e-02, -1.00304229e-02,  6.60626173e-03,  1.00813444e-02,
        4.22378752e-02,  6.10609459e-03, -3.34549912e-02, -6.59750932e-02,
       -4.79929482e-02, -7.50992609e-03, -4.01805192e-03, -2.60514624e-02,
       -1.55780352e-02,  3.90942169e-02,  5.30392222e-02, -4.88866464e-03,
        1.13713488e-03, -1.94442502e-03, -2.00040876e-02, -6.22008498e-02,
       -3.05527161e-02,  2.39413161e-02,  2.91927789e-02, -4.40679074e-02,
       -4.89667227e-02,  2.39612481e-02,  9.14101238e-03, -2.83117714e-02,
       -5.88288613e-03,  5.68090291e-02,  6.02793637e-02,  5.13853161e-02,
       -5.60998865e-03, 

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [-2.82401087e-03 -1.21949303e-02 -2.31008138e-02 -1.49481153e-02
  1.46058665e-02  7.60185134e-03  3.29308743e-04 -1.17979330e-02
  2.07128886e-02 -4.33668742e-03 -2.11764558e-02 -1.03425461e-02
  1.17034974e-03  5.47544131e-02  2.45609306e-02  4.64435586e-03
 -4.26413009e-02 -4.71307500e-02  8.80317933e-03 -1.47481121e-02
 -4.08809867e-02 -1.00304229e-02  6.60626173e-03  1.00813444e-02
  4.22378752e-02  6.10609459e-03 -3.34549912e-02 -6.59750932e-02
 -4.79929482e-02 -7.50992609e-03 -4.01805192e-03 -2.60514624e-02
 -1.55780352e-02  3.90942169e-02  5.30392222e-02 -4.88866464e-03
  1.13713488e-03 -1.94442502e-03 -2.00040876e-02 -6.22008498e-02
 -3.05527161e-02  2.39413161e-02  2.91927789e-02 -4.40679074e-02
 -4.89667227e-02  2.39612481e-02  9.14101238e-03 -2.83117714e-02
 -5.88288613e-03  5.68090291e-02  6.02793637e-02  5.13853161e-02
 -5.60998865e-03 -2.56385963e-02 -3.84203554e-02  1.57150393e-02
  3.16060820e-02 -8.13746441e-03 -3.35782692e-02  2.21092590e-02
  2.2256436

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in Kuhar

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[-2.82401087e-03, -1.21949303e-02, -2.31008138e-02, ...,
         4.14369740e-03,  3.15500087e-03,  3.04970315e-03],
       [-1.93932487e-03,  5.25495953e-03,  1.01601449e-02, ...,
         3.43516972e-03,  3.02233629e-03,  2.94516169e-03],
       [-2.13525119e-02, -5.13413715e-02,  1.81990906e-02, ...,
         2.51475200e-03,  3.29431362e-03,  2.91202148e-03],
       ...,
       [-3.70108604e-02, -5.97871689e-02, -1.12466276e+00, ...,
         1.81797785e-01,  1.26340727e-01,  1.96693217e-01],
       [-1.81267617e+00, -1.30214149e+00,  3.49151724e-01, ...,
         6.41425859e-01,  5.44806134e-01,  6.06248542e-01],
       [-4.92305269e+00, -8.59666781e+00, -8.98209319e+00, ...,
        -9.90499837e-02, -1.56576412e-01,  4.87221382e-01]])

In [12]:
train_dataset_fft[:][0]

array([[5.22308324e-02, 2.80202721e-01, 1.50924942e-01, ...,
        4.81010693e-03, 3.37252282e-03, 1.47039145e-03],
       [3.84507191e-02, 2.06085997e-01, 1.04691538e-01, ...,
        5.69264944e-03, 3.24664438e-03, 5.17484831e-03],
       [1.30318168e-01, 1.20704049e-01, 2.23393261e-01, ...,
        2.99293465e-03, 1.37581610e-02, 4.52250695e-03],
       ...,
       [6.16485700e+00, 7.44480193e+01, 6.46097103e+01, ...,
        2.19120689e-01, 4.98279344e-01, 2.21144625e-02],
       [7.81757985e+00, 5.32148704e+01, 2.32909403e+01, ...,
        5.51705451e-01, 3.98979192e-01, 6.44197516e-01],
       [3.72225584e+01, 1.65846455e+01, 2.14725749e+01, ...,
        7.53616114e-01, 1.32405440e+00, 8.49919225e-01]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation], ignore_index=True),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015869.8204503
    result:
    -   accuracy: 0.6931216931216931
        f1 score (macro): 0.6831705102233666
        f1 score (micro): 0.6931216931216931
        f1 score (weighted): 0.7030728760200194
    run id: 1
    start: 1663015862.6682813
    time taken: 7.1521689891815186
-   end: 1663015876.9129007
    result:
    -   accuracy: 0.6984126984126984
        f1 score (macro): 0.6888534590143475
        f1 score (micro): 0.6984126984126984
        f1 score (weighted): 0.7079719378110493
    run id: 2
    start: 1663015869.8204525
    time taken: 7.0924482345581055
-   end: 1663015884.008612
    result:
    -   accuracy: 0.7195767195767195
        f1 score (macro): 0.7116944371721674
        f1 score (micro): 0.7195767195767196
        f1 score (weighted): 0.7274590019812717
    run id: 3
    start: 1663015876.9129028
    time taken: 7.0957090854644775



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015888.332773
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7880984847854101
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.7992031025161773
    run id: 1
    start: 1663015884.208875
    time taken: 4.123898029327393
-   end: 1663015892.4836802
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7877085470973502
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.7995930402042372
    run id: 2
    start: 1663015888.3327744
    time taken: 4.1509058475494385
-   end: 1663015896.6064703
    result:
    -   accuracy: 0.7962962962962963
        f1 score (macro): 0.7911838817945395
        f1 score (micro): 0.7962962962962963
        f1 score (weighted): 0.801408710798053
    run id: 3
    start: 1663015892.4836817
    time taken: 4.122788667678833



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015899.1041205
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.42227025143691815
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.471909642742976
    run id: 1
    start: 1663015896.6118753
    time taken: 2.4922451972961426
-   end: 1663015901.5986407
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.42227025143691815
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.471909642742976
    run id: 2
    start: 1663015899.104123
    time taken: 2.4945175647735596
-   end: 1663015904.089568
    result:
    -   accuracy: 0.4470899470899471
        f1 score (macro): 0.42227025143691815
        f1 score (micro): 0.4470899470899471
        f1 score (weighted): 0.471909642742976
    run id: 3
    start: 1663015901.5986433
    time taken: 2.490924596786499



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015904.832401
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7730636518669046
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.8089469301436776
    run id: 1
    start: 1663015904.0944638
    time taken: 0.7379372119903564
-   end: 1663015905.5662603
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7730636518669046
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.8089469301436776
    run id: 2
    start: 1663015904.8324027
    time taken: 0.7338576316833496
-   end: 1663015906.3043766
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7730636518669046
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.8089469301436776
    run id: 3
    start: 1663015905.566262
    time taken: 0.738114595413208



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015906.4811482
    result:
    -   accuracy: 0.3941798941798942
        f1 score (macro): 0.39473861320525794
        f1 score (micro): 0.3941798941798942
        f1 score (weighted): 0.39362117515453043
    run id: 1
    start: 1663015906.3100052
    time taken: 0.1711430549621582
-   end: 1663015906.5214093
    result:
    -   accuracy: 0.3941798941798942
        f1 score (macro): 0.39473861320525794
        f1 score (micro): 0.3941798941798942
        f1 score (weighted): 0.39362117515453043
    run id: 2
    start: 1663015906.4811528
    time taken: 0.040256500244140625
-   end: 1663015906.560912
    result:
    -   accuracy: 0.3941798941798942
        f1 score (macro): 0.39473861320525794
        f1 score (micro): 0.3941798941798942
        f1 score (weighted): 0.39362117515453043
    run id: 3
    start: 1663015906.521412
    time taken: 0.03949999809265137



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663015906.5856428
    result:
    -   accuracy: 0.8201058201058201
        f1 score (macro): 0.8196555490468773
        f1 score (micro): 0.8201058201058201
        f1 score (weighted): 0.8205560911647629
    run id: 1
    start: 1663015906.5672026
    time taken: 0.01844024658203125
-   end: 1663015906.6026359
    result:
    -   accuracy: 0.8201058201058201
        f1 score (macro): 0.8196555490468773
        f1 score (micro): 0.8201058201058201
        f1 score (weighted): 0.8205560911647629
    run id: 2
    start: 1663015906.5856447
    time taken: 0.016991138458251953
-   end: 1663015906.6196291
    result:
    -   accuracy: 0.8201058201058201
        f1 score (macro): 0.8196555490468773
        f1 score (micro): 0.8201058201058201
        f1 score (weighted): 0.8205560911647629
    run id: 3
    start: 1663015906.602638
    time taken: 0.016991138458251953

