# Experiments on KuHar Resampled to 30Hz

This notebook will perform basic experiments on the balanced KuHar dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced KuHar dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced KuHar dataset in both time and frequency domains.

## Common imports and definitions

In [3]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-02 18:36:34.454018: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-02 18:36:34.454101: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [4]:
# Path for KuHar resampled to 30Hz view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/KuHar/resampled_view_30Hz")

Once paths is defined, we can load the CSV as pandas dataframes

In [5]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [6]:
train.head()

Unnamed: 0.1,Unnamed: 0,accel-x-0,accel-x-1,accel-x-2,accel-x-3,accel-x-4,accel-x-5,accel-x-6,accel-x-7,accel-x-8,...,gyro-z-89,accel-start-time,gyro-start-time,accel-end-time,gyro-end-time,activity code,length,serial,index,user
0,0,0.00362,-0.023688,-0.002657,0.010371,-0.005858,0.010423,-0.019593,0.000303,0.045433,...,0.003377,23.235,23.223,26.26,26.249,0,300,1,2100,1051
1,1,-0.005823,0.012494,-0.012503,-0.002116,0.025957,-0.012833,-0.025845,-0.011941,0.012807,...,0.003056,56.292,56.292,59.245,59.245,0,300,1,5700,1037
2,2,-0.039278,0.003864,0.008927,-0.024887,0.022435,0.003431,-0.038931,0.003359,0.009394,...,0.003442,27.268,27.267,30.29,30.291,0,300,1,2700,1075
3,3,-0.001728,-0.018312,0.013927,0.015426,0.007332,-0.012372,0.006893,-0.002433,0.012821,...,-0.001294,39.421,39.42,42.441,42.44,0,300,6,3900,1008
4,4,-0.022981,0.014871,-0.03631,0.033512,-0.016733,0.01993,-0.016637,0.007568,-0.002753,...,0.00456,23.703,23.703,26.656,26.656,0,300,1,2400,1038


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [7]:
# Kuhar features to select
features = [
    "accel-x",
    "accel-y",
    "accel-z",
    "gyro-x",
    "gyro-y",
    "gyro-z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [8]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([ 3.62008887e-03, -2.36884079e-02, -2.65723909e-03,  1.03708716e-02,
       -5.85826359e-03,  1.04230261e-02, -1.95931668e-02,  3.02941467e-04,
        4.54334895e-02, -4.61346122e-03, -4.32958544e-02, -2.11811327e-03,
       -3.24331746e-02, -9.64189479e-04,  2.38423378e-02,  1.46571666e-02,
       -6.33010786e-02, -3.09536555e-02, -6.09642095e-03, -2.23327124e-02,
        4.73811398e-02,  8.10745709e-03, -1.46291624e-03, -3.52547565e-02,
       -3.65486640e-02,  3.89222511e-02, -5.64525903e-02,  1.71065679e-02,
       -1.88765233e-02,  1.56273664e-02,  6.68170176e-02,  1.86759574e-02,
       -4.15165877e-02,  1.55675209e-02,  4.57480490e-03, -1.56470432e-02,
        2.86984361e-02, -3.88362938e-03, -1.85153493e-02, -9.91992429e-03,
        1.05702633e-02, -5.34487311e-03,  2.19452625e-02,  2.31337120e-02,
       -7.65234924e-03, -5.43440699e-02, -2.68535392e-02,  1.13130045e-02,
        2.85217945e-02,  1.84220024e-02,  1.95164920e-02,  4.88774164e-03,
        6.26222956e-03, 

In [9]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [ 3.62008887e-03 -2.36884079e-02 -2.65723909e-03  1.03708716e-02
 -5.85826359e-03  1.04230261e-02 -1.95931668e-02  3.02941467e-04
  4.54334895e-02 -4.61346122e-03 -4.32958544e-02 -2.11811327e-03
 -3.24331746e-02 -9.64189479e-04  2.38423378e-02  1.46571666e-02
 -6.33010786e-02 -3.09536555e-02 -6.09642095e-03 -2.23327124e-02
  4.73811398e-02  8.10745709e-03 -1.46291624e-03 -3.52547565e-02
 -3.65486640e-02  3.89222511e-02 -5.64525903e-02  1.71065679e-02
 -1.88765233e-02  1.56273664e-02  6.68170176e-02  1.86759574e-02
 -4.15165877e-02  1.55675209e-02  4.57480490e-03 -1.56470432e-02
  2.86984361e-02 -3.88362938e-03 -1.85153493e-02 -9.91992429e-03
  1.05702633e-02 -5.34487311e-03  2.19452625e-02  2.31337120e-02
 -7.65234924e-03 -5.43440699e-02 -2.68535392e-02  1.13130045e-02
  2.85217945e-02  1.84220024e-02  1.95164920e-02  4.88774164e-03
  6.26222956e-03  1.74586067e-02  4.62809500e-03 -4.00069874e-02
  1.28236155e-03  1.63634494e-02 -7.79265175e-04  3.31124115e-02
 -6.2073271

## Fourier Transform

In [10]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [11]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in Kuhar

In [12]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [13]:
train_dataset[:][0]

array([[ 3.62008887e-03, -2.36884079e-02, -2.65723909e-03, ...,
         2.38507494e-03,  4.01598763e-03,  3.37713756e-03],
       [-5.82278332e-03,  1.24938221e-02, -1.25032413e-02, ...,
        -3.11192249e-03,  2.45944768e-03,  3.05583499e-03],
       [-3.92784081e-02,  3.86433489e-03,  8.92735084e-03, ...,
         4.96789877e-03,  3.30600047e-03,  3.44192435e-03],
       ...,
       [-3.16901498e-01, -5.51916541e-01, -1.59211721e+00, ...,
         2.68874703e-01,  1.47700475e-01,  1.98234010e-01],
       [-1.88898149e+00, -2.94286322e-01,  2.80438536e+00, ...,
         5.70111149e-01,  5.77753056e-01,  6.43595137e-01],
       [-4.83722709e+00, -9.36896540e+00, -1.10340631e+01, ...,
         4.34685719e-01, -2.58935273e-02,  4.36265954e-02]])

In [16]:
train_dataset_fft[:][0]

array([[3.13384994e-02, 1.68121633e-01, 9.05549650e-02, ...,
        5.92594004e-03, 1.10474894e-02, 7.03162882e-03],
       [2.30704315e-02, 1.23651598e-01, 6.28149230e-02, ...,
        8.26119084e-03, 3.20484735e-03, 5.24385686e-03],
       [7.81909007e-02, 7.24224293e-02, 1.34035957e-01, ...,
        1.17237633e-02, 8.03545237e-03, 2.37891069e-02],
       ...,
       [3.69891420e+00, 4.46688116e+01, 3.87658262e+01, ...,
        5.11105207e-02, 4.00290331e-01, 3.62113724e-01],
       [4.69054791e+00, 3.19289222e+01, 1.39745642e+01, ...,
        9.98049805e-01, 1.14453200e-01, 7.21716051e-02],
       [2.23335350e+01, 9.95078730e+00, 1.28835449e+01, ...,
        1.85874699e+00, 4.77287638e-01, 6.15996013e-01]])

## Train and evaluate Random Forest classifier

In [17]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [18]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation]),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143860.0500042
    result:
    -   accuracy: 0.6957671957671958
        f1 score (macro): 0.6827279495242926
        f1 score (micro): 0.6957671957671958
        f1 score (weighted): 0.7088064420100988
    run id: 1
    start: 1662143854.5040376
    time taken: 5.545966625213623
-   end: 1662143865.4628685
    result:
    -   accuracy: 0.7063492063492064
        f1 score (macro): 0.69706378505107
        f1 score (micro): 0.7063492063492064
        f1 score (weighted): 0.7156346276473424
    run id: 2
    start: 1662143860.050006
    time taken: 5.412862539291382
-   end: 1662143870.9336047
    result:
    -   accuracy: 0.6825396825396826
        f1 score (macro): 0.6730676135431849
        f1 score (micro): 0.6825396825396826
        f1 score (weighted): 0.6920117515361803
    run id: 3
    start: 1662143865.4628704
    time taken: 5.470734357833862



In [19]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143874.2717953
    result:
    -   accuracy: 0.8174603174603174
        f1 score (macro): 0.815147840828324
        f1 score (micro): 0.8174603174603176
        f1 score (weighted): 0.819772794092311
    run id: 1
    start: 1662143871.1246927
    time taken: 3.1471025943756104
-   end: 1662143877.4467125
    result:
    -   accuracy: 0.8201058201058201
        f1 score (macro): 0.815379357566093
        f1 score (micro): 0.8201058201058201
        f1 score (weighted): 0.8248322826455473
    run id: 2
    start: 1662143874.2717967
    time taken: 3.1749157905578613
-   end: 1662143880.600804
    result:
    -   accuracy: 0.828042328042328
        f1 score (macro): 0.8258187826871999
        f1 score (micro): 0.8280423280423279
        f1 score (weighted): 0.8302658733974558
    run id: 3
    start: 1662143877.4467144
    time taken: 3.1540896892547607



## Train and evaluate Support Vector Machine classifier

In [20]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143882.5887048
    result:
    -   accuracy: 0.4523809523809524
        f1 score (macro): 0.42561036660072527
        f1 score (micro): 0.4523809523809524
        f1 score (weighted): 0.47915153816117956
    run id: 1
    start: 1662143880.6069896
    time taken: 1.981715202331543
-   end: 1662143884.5418885
    result:
    -   accuracy: 0.4523809523809524
        f1 score (macro): 0.42561036660072527
        f1 score (micro): 0.4523809523809524
        f1 score (weighted): 0.47915153816117956
    run id: 2
    start: 1662143882.5887065
    time taken: 1.9531819820404053
-   end: 1662143886.4903188
    result:
    -   accuracy: 0.4523809523809524
        f1 score (macro): 0.42561036660072527
        f1 score (micro): 0.4523809523809524
        f1 score (weighted): 0.47915153816117956
    run id: 3
    start: 1662143884.5418909
    time taken: 1.9484279155731201



In [21]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143887.08961
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7753814134375384
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8119201738640489
    run id: 1
    start: 1662143886.4954395
    time taken: 0.5941705703735352
-   end: 1662143887.6801007
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7753814134375384
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8119201738640489
    run id: 2
    start: 1662143887.0896118
    time taken: 0.5904889106750488
-   end: 1662143888.2695642
    result:
    -   accuracy: 0.7936507936507936
        f1 score (macro): 0.7753814134375384
        f1 score (micro): 0.7936507936507936
        f1 score (weighted): 0.8119201738640489
    run id: 3
    start: 1662143887.6801023
    time taken: 0.5894618034362793



## Train and evaluate K Neighbors Classifier classifier

In [22]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143888.6856935
    result:
    -   accuracy: 0.3994708994708995
        f1 score (macro): 0.4002607778347177
        f1 score (micro): 0.3994708994708994
        f1 score (weighted): 0.39868102110708126
    run id: 1
    start: 1662143888.2751548
    time taken: 0.4105386734008789
-   end: 1662143888.7158413
    result:
    -   accuracy: 0.3994708994708995
        f1 score (macro): 0.4002607778347177
        f1 score (micro): 0.3994708994708994
        f1 score (weighted): 0.39868102110708126
    run id: 2
    start: 1662143888.6856966
    time taken: 0.030144691467285156
-   end: 1662143888.7453344
    result:
    -   accuracy: 0.3994708994708995
        f1 score (macro): 0.4002607778347177
        f1 score (micro): 0.3994708994708994
        f1 score (weighted): 0.39868102110708126
    run id: 3
    start: 1662143888.7158434
    time taken: 0.029490947723388672



In [23]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662143888.7702076
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8239473002863725
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8268463505072783
    run id: 1
    start: 1662143888.7514968
    time taken: 0.018710851669311523
-   end: 1662143888.784826
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8239473002863725
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8268463505072783
    run id: 2
    start: 1662143888.7702096
    time taken: 0.01461648941040039
-   end: 1662143888.798944
    result:
    -   accuracy: 0.8253968253968254
        f1 score (macro): 0.8239473002863725
        f1 score (micro): 0.8253968253968254
        f1 score (weighted): 0.8268463505072783
    run id: 3
    start: 1662143888.784828
    time taken: 0.014116048812866211

