# Experiments KuHar

This notebook will perform basic experiments on the balanced KuHar dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced KuHar dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced KuHar dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-05 13:26:19.398633: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-05 13:26:19.398654: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for KuHar balanced view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/KuHar/balanced_view")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,accel-x-0,accel-x-1,accel-x-2,accel-x-3,accel-x-4,accel-x-5,accel-x-6,accel-x-7,accel-x-8,...,gyro-z-299,accel-start-time,gyro-start-time,accel-end-time,gyro-end-time,activity code,length,serial,index,user
0,0,-0.007251,-0.016431,-0.0019,-0.020529,-0.027133,-0.019558,-0.014525,-0.002541,0.016369,...,0.002956,23.235,23.223,26.26,26.249,0,300,1,2100,1051
1,1,-0.008128,-0.006837,0.008597,0.014337,0.006973,0.00325,-0.005086,-0.014379,-0.007034,...,0.001709,56.292,56.292,59.245,59.245,0,300,1,5700,1037
2,2,-0.033081,-0.037222,-0.043654,-0.038211,0.014246,0.063478,0.043582,-0.013673,-0.029928,...,0.00255,27.268,27.267,30.29,30.291,0,300,1,2700,1075
3,3,-0.00974,-0.016656,0.002454,-0.023503,-0.023115,-0.006241,0.017415,0.014765,0.019231,...,0.002969,39.421,39.42,42.441,42.44,0,300,6,3900,1008
4,4,0.029113,0.042745,0.017337,-0.015903,-0.027398,-0.010438,-0.026766,-0.013397,-0.008499,...,0.006943,23.703,23.703,26.656,26.656,0,300,1,2400,1038


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# Kuhar features to select
features = [
    "accel-x",
    "accel-y",
    "accel-z",
    "gyro-x",
    "gyro-y",
    "gyro-z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([-0.00725079, -0.01643086, -0.00189972, ...,  0.00295611,
        0.00295611,  0.00295611]), 0)


In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [-0.00725079 -0.01643086 -0.00189972 ...  0.00295611  0.00295611
  0.00295611]
Shape of sample 0: (1800,)
The label of sample 0: 0


## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in Kuhar

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[-7.2507860e-03, -1.6430855e-02, -1.8997192e-03, ...,
         2.9561063e-03,  2.9561063e-03,  2.9561063e-03],
       [-8.1281660e-03, -6.8368910e-03,  8.5973740e-03, ...,
         3.8299560e-03,  2.7618408e-03,  1.7089844e-03],
       [-3.3081055e-02, -3.7221910e-02, -4.3654440e-02, ...,
         3.6152380e-03,  2.5499745e-03,  2.5499745e-03],
       ...,
       [ 3.5006000e-01,  2.6262000e-01, -1.7055000e-01, ...,
         1.5103000e-01,  1.5636000e-01,  1.6275000e-01],
       [-1.6479000e+00, -1.6806000e+00, -1.3551000e+00, ...,
         5.5797000e-01,  5.4838000e-01,  5.3240000e-01],
       [-6.2135534e+00, -7.1491690e+00, -8.0634130e+00, ...,
         1.2653333e-01,  4.5037340e-01,  7.2521144e-01]])

In [12]:
train_dataset_fft[:][0]

array([[1.04461665e-01, 5.60405443e-01, 3.01849883e-01, ...,
        1.65037529e-02, 1.63466728e-02, 1.24300339e-02],
       [7.69014383e-02, 4.12171994e-01, 2.09383077e-01, ...,
        2.63120409e-03, 9.48805261e-03, 5.18175730e-03],
       [2.60636336e-01, 2.41408098e-01, 4.46786522e-01, ...,
        3.69022318e-03, 1.16167546e-02, 1.45708099e-03],
       ...,
       [1.23297140e+01, 1.48896039e+02, 1.29219421e+02, ...,
        2.89959395e-01, 2.99217590e-01, 3.00465532e-01],
       [1.56351597e+01, 1.06429741e+02, 4.65818805e+01, ...,
        4.51856425e-01, 3.94662803e-01, 4.98044870e-01],
       [7.44451167e+01, 3.31692910e+01, 4.29451498e+01, ...,
        1.67725714e+00, 1.48620599e+00, 1.40784994e+00]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [22]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation], ignore_index=True),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

x = combined_train_dset[0]
print(x)
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")
print(train_dataset)
print(validation_dataset)
print(combined_train_dset)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

(array([-0.00725079, -0.01643086, -0.00189972, ...,  0.00295611,
        0.00295611,  0.00295611]), 0)
The sample 0: [-0.00725079 -0.01643086 -0.00189972 ...  0.00295611  0.00295611
  0.00295611]
Shape of sample 0: (1800,)
The label of sample 0: 0
PandasMultiModalDataset: samples=3330, features=1800, no. window=6
PandasMultiModalDataset: samples=108, features=1800, no. window=6
PandasMultiModalDataset: samples=3438, features=1800, no. window=6
runs:
-   end: 1662384844.76248
    result:
    -   accuracy: 0.3888888888888889
        f1 score (macro): 0.3883578776298696
        f1 score (micro): 0.3888888888888889
        f1 score (weighted): 0.3894199001479082
    run id: 1
    start: 1662384844.695297
    time taken: 0.06718301773071289
-   end: 1662384844.8289225
    result:
    -   accuracy: 0.3888888888888889
        f1 score (macro): 0.3883578776298696
        f1 score (micro): 0.3888888888888889
        f1 score (weighted): 0.3894199001479082
    run id: 2
    start: 1662384844.762

In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662384418.4568903
    result:
    -   accuracy: 0.783068783068783
        f1 score (macro): 0.7808929070599591
        f1 score (micro): 0.7830687830687831
        f1 score (weighted): 0.785244659077607
    run id: 1
    start: 1662384412.5481153
    time taken: 5.908775091171265
-   end: 1662384424.4287639
    result:
    -   accuracy: 0.7883597883597884
        f1 score (macro): 0.7848541859483434
        f1 score (micro): 0.7883597883597884
        f1 score (weighted): 0.7918653907712332
    run id: 2
    start: 1662384418.4568925
    time taken: 5.971871376037598
-   end: 1662384430.420034
    result:
    -   accuracy: 0.7777777777777778
        f1 score (macro): 0.7730427641500177
        f1 score (micro): 0.7777777777777778
        f1 score (weighted): 0.7825127914055378
    run id: 3
    start: 1662384424.428767
    time taken: 5.991266965866089



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662384436.1339922
    result:
    -   accuracy: 0.4497354497354497
        f1 score (macro): 0.4260833681093106
        f1 score (micro): 0.4497354497354497
        f1 score (weighted): 0.4733875313615887
    run id: 1
    start: 1662384430.4254065
    time taken: 5.708585739135742
-   end: 1662384441.7608743
    result:
    -   accuracy: 0.4497354497354497
        f1 score (macro): 0.4260833681093106
        f1 score (micro): 0.4497354497354497
        f1 score (weighted): 0.4733875313615887
    run id: 2
    start: 1662384436.133994
    time taken: 5.626880168914795
-   end: 1662384447.3747075
    result:
    -   accuracy: 0.4497354497354497
        f1 score (macro): 0.4260833681093106
        f1 score (micro): 0.4497354497354497
        f1 score (weighted): 0.4733875313615887
    run id: 3
    start: 1662384441.7608767
    time taken: 5.613830804824829



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662384448.734352
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7733134329979361
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.808697149012646
    run id: 1
    start: 1662384447.3800087
    time taken: 1.3543434143066406
-   end: 1662384450.080104
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7733134329979361
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.808697149012646
    run id: 2
    start: 1662384448.7343543
    time taken: 1.345749855041504
-   end: 1662384451.447714
    result:
    -   accuracy: 0.791005291005291
        f1 score (macro): 0.7733134329979361
        f1 score (micro): 0.791005291005291
        f1 score (weighted): 0.808697149012646
    run id: 3
    start: 1662384450.080106
    time taken: 1.3676080703735352



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662384451.6437302
    result:
    -   accuracy: 0.3888888888888889
        f1 score (macro): 0.3883578776298696
        f1 score (micro): 0.3888888888888889
        f1 score (weighted): 0.3894199001479082
    run id: 1
    start: 1662384451.4532526
    time taken: 0.19047760963439941
-   end: 1662384451.7111523
    result:
    -   accuracy: 0.3888888888888889
        f1 score (macro): 0.3883578776298696
        f1 score (micro): 0.3888888888888889
        f1 score (weighted): 0.3894199001479082
    run id: 2
    start: 1662384451.6437347
    time taken: 0.06741762161254883
-   end: 1662384451.778556
    result:
    -   accuracy: 0.3888888888888889
        f1 score (macro): 0.3883578776298696
        f1 score (micro): 0.3888888888888889
        f1 score (weighted): 0.3894199001479082
    run id: 3
    start: 1662384451.7111547
    time taken: 0.06740140914916992



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662384451.8095093
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8247611415029943
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8260325092906565
    run id: 1
    start: 1662384451.7847493
    time taken: 0.024760007858276367
-   end: 1662384451.8335404
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8247611415029943
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8260325092906565
    run id: 2
    start: 1662384451.8095114
    time taken: 0.024029016494750977
-   end: 1662384451.8584514
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8247611415029943
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8260325092906565
    run id: 3
    start: 1662384451.8335423
    time taken: 0.024909019470214844

