# Experiments MotionSense

This notebook will perform basic experiments on the balanced MotionSense dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced MotionSense dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced MotionSense dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-12 20:24:38.302435: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 20:24:38.302455: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for KuHar balanced view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/MotionSense/balanced_view")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,attitude.roll-0,attitude.roll-1,attitude.roll-2,attitude.roll-3,attitude.roll-4,attitude.roll-5,attitude.roll-6,attitude.roll-7,attitude.roll-8,...,userAcceleration.z-145,userAcceleration.z-146,userAcceleration.z-147,userAcceleration.z-148,userAcceleration.z-149,activity code,length,trial_code,index,user
0,0,1.962846,1.921332,1.877961,1.828619,1.773968,1.719602,1.66524,1.616507,1.579558,...,-0.07677,-0.410893,-0.349788,0.020158,0.236074,0,150,1,150,11
1,1,-0.458128,-0.503994,-0.52522,-0.556961,-0.619681,-0.728183,-0.84422,-0.937235,-1.018289,...,0.080776,0.209356,0.045844,-0.171495,-0.279159,0,150,1,900,12
2,2,0.854208,0.887741,0.94539,1.018196,1.072981,1.099024,1.117173,1.14571,1.176665,...,0.143923,0.035212,-0.023136,-0.038015,0.040352,0,150,1,1050,21
3,3,1.030491,1.065353,1.093455,1.097724,1.071357,1.038327,1.004383,0.972705,0.970985,...,-0.183913,-0.296363,-0.286543,-0.514901,-0.449945,0,150,2,150,17
4,4,-2.674791,-2.548334,-2.361486,-2.109103,-1.826062,-1.564544,-1.341344,-1.179871,-1.083587,...,0.060069,-0.048874,-0.267644,-0.392915,-0.291261,0,150,11,450,21


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# MotionSense features to select
features = [
    "userAcceleration.x",
    "userAcceleration.y",
    "userAcceleration.z",
    "rotationRate.x",
    "rotationRate.y",
    "rotationRate.z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([-2.547700e-01, -1.947490e-01, -1.761130e-01, -1.938550e-01,
       -2.660080e-01, -2.217610e-01, -2.494890e-01, -2.760730e-01,
       -2.373710e-01, -2.373490e-01, -2.313310e-01, -1.722840e-01,
       -1.461070e-01, -2.276050e-01, -3.829260e-01, -3.187740e-01,
       -1.111440e-01,  3.439900e-02,  1.214240e-01, -1.570880e-01,
       -4.630930e-01, -3.902580e-01, -1.376360e-01,  1.734600e-02,
        9.102500e-02,  2.303000e-01,  6.665400e-02, -8.479700e-02,
       -8.206900e-02, -1.534800e-02,  2.010200e-02,  2.014600e-02,
       -8.539600e-02, -1.337190e-01, -9.849300e-02, -3.688900e-01,
       -4.145870e-01, -4.104480e-01,  3.891700e-02,  2.787010e-01,
       -1.036980e-01,  6.972800e-02,  1.535100e-01,  1.390830e-01,
        1.127850e-01, -1.264410e-01, -2.113060e-01, -2.880330e-01,
       -3.026940e-01, -2.229810e-01, -1.907640e-01, -2.740590e-01,
       -3.405890e-01, -3.028110e-01, -2.228760e-01, -2.063950e-01,
       -1.629620e-01, -1.781440e-01, -2.476790e-01, -3.032110

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [-2.547700e-01 -1.947490e-01 -1.761130e-01 -1.938550e-01 -2.660080e-01
 -2.217610e-01 -2.494890e-01 -2.760730e-01 -2.373710e-01 -2.373490e-01
 -2.313310e-01 -1.722840e-01 -1.461070e-01 -2.276050e-01 -3.829260e-01
 -3.187740e-01 -1.111440e-01  3.439900e-02  1.214240e-01 -1.570880e-01
 -4.630930e-01 -3.902580e-01 -1.376360e-01  1.734600e-02  9.102500e-02
  2.303000e-01  6.665400e-02 -8.479700e-02 -8.206900e-02 -1.534800e-02
  2.010200e-02  2.014600e-02 -8.539600e-02 -1.337190e-01 -9.849300e-02
 -3.688900e-01 -4.145870e-01 -4.104480e-01  3.891700e-02  2.787010e-01
 -1.036980e-01  6.972800e-02  1.535100e-01  1.390830e-01  1.127850e-01
 -1.264410e-01 -2.113060e-01 -2.880330e-01 -3.026940e-01 -2.229810e-01
 -1.907640e-01 -2.740590e-01 -3.405890e-01 -3.028110e-01 -2.228760e-01
 -2.063950e-01 -1.629620e-01 -1.781440e-01 -2.476790e-01 -3.032110e-01
 -4.050310e-01 -6.022220e-01 -5.712880e-01 -3.343420e-01  4.065160e-01
  5.664790e-01  3.743940e-01  3.247700e-02  7.595600e-02  9.021

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in MotionSense

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[-0.25477 , -0.194749, -0.176113, ...,  1.039189,  0.477649,
         0.014129],
       [ 0.06514 ,  0.024016,  0.417193, ...,  0.933631,  0.902924,
         0.53028 ],
       [-0.136331, -0.006953,  0.023562, ..., -0.116957,  0.074175,
         0.160067],
       ...,
       [-0.324891,  0.195969,  0.245713, ...,  0.240055,  0.17508 ,
        -1.293666],
       [ 0.515124,  1.270492,  2.024342, ...,  2.610587,  0.046535,
        -1.336136],
       [-1.010466,  0.04417 ,  0.265844, ..., -4.286964, -3.586218,
        -1.559694]])

In [12]:
train_dataset_fft[:][0]

array([[13.087418  ,  3.28751578,  1.55737448, ...,  1.52864569,
         0.4901568 ,  0.2201563 ],
       [ 1.048889  ,  3.69454719,  1.88881149, ...,  3.09439567,
         3.50719884,  1.94845805],
       [ 5.321311  ,  2.5960347 ,  1.00393161, ...,  0.10128641,
         0.38533303,  0.35604925],
       ...,
       [ 3.55285   ,  1.62902057,  3.14263605, ...,  2.37068496,
         3.0819737 ,  2.4680785 ],
       [12.190217  ,  8.4662952 ,  8.59820939, ...,  3.62581486,
         1.30981604,  4.15834285],
       [20.498708  ,  3.84238355,  7.756693  , ...,  1.96280194,
         6.38683171,  2.45728023]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation], ignore_index=True),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014290.6413767
    result:
    -   accuracy: 0.8137254901960784
        f1 score (macro): 0.8129686006257989
        f1 score (micro): 0.8137254901960784
        f1 score (weighted): 0.8144823797663582
    run id: 1
    start: 1663014284.0247872
    time taken: 6.616589546203613
-   end: 1663014297.2178366
    result:
    -   accuracy: 0.8009803921568628
        f1 score (macro): 0.7990411019470333
        f1 score (micro): 0.8009803921568628
        f1 score (weighted): 0.8029196823666923
    run id: 2
    start: 1663014290.6413789
    time taken: 6.576457738876343
-   end: 1663014303.8647568
    result:
    -   accuracy: 0.807843137254902
        f1 score (macro): 0.8082327009760811
        f1 score (micro): 0.807843137254902
        f1 score (weighted): 0.8074535735337227
    run id: 3
    start: 1663014297.2178385
    time taken: 6.646918296813965



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014308.6999614
    result:
    -   accuracy: 0.8127450980392157
        f1 score (macro): 0.8162926817627025
        f1 score (micro): 0.8127450980392157
        f1 score (weighted): 0.809197514315729
    run id: 1
    start: 1663014304.4078693
    time taken: 4.2920920848846436
-   end: 1663014313.0252461
    result:
    -   accuracy: 0.7970588235294118
        f1 score (macro): 0.8012467519814369
        f1 score (micro): 0.7970588235294119
        f1 score (weighted): 0.7928708950773865
    run id: 2
    start: 1663014308.699963
    time taken: 4.325283050537109
-   end: 1663014317.281268
    result:
    -   accuracy: 0.7980392156862746
        f1 score (macro): 0.8029778144271248
        f1 score (micro): 0.7980392156862746
        f1 score (weighted): 0.7931006169454242
    run id: 3
    start: 1663014313.0252483
    time taken: 4.256019592285156



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014322.409154
    result:
    -   accuracy: 0.6009803921568627
        f1 score (macro): 0.5769043035929738
        f1 score (micro): 0.6009803921568627
        f1 score (weighted): 0.6250564807207517
    run id: 1
    start: 1663014317.2869122
    time taken: 5.122241735458374
-   end: 1663014327.80782
    result:
    -   accuracy: 0.6009803921568627
        f1 score (macro): 0.5769043035929738
        f1 score (micro): 0.6009803921568627
        f1 score (weighted): 0.6250564807207517
    run id: 2
    start: 1663014322.4091563
    time taken: 5.398663759231567
-   end: 1663014332.95849
    result:
    -   accuracy: 0.6009803921568627
        f1 score (macro): 0.5769043035929738
        f1 score (micro): 0.6009803921568627
        f1 score (weighted): 0.6250564807207517
    run id: 3
    start: 1663014327.807822
    time taken: 5.150667905807495



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014333.8880372
    result:
    -   accuracy: 0.8382352941176471
        f1 score (macro): 0.841576408889408
        f1 score (micro): 0.8382352941176471
        f1 score (weighted): 0.8348941793458862
    run id: 1
    start: 1663014332.9636707
    time taken: 0.9243664741516113
-   end: 1663014334.8069131
    result:
    -   accuracy: 0.8382352941176471
        f1 score (macro): 0.841576408889408
        f1 score (micro): 0.8382352941176471
        f1 score (weighted): 0.8348941793458862
    run id: 2
    start: 1663014333.8880389
    time taken: 0.9188742637634277
-   end: 1663014335.7310727
    result:
    -   accuracy: 0.8382352941176471
        f1 score (macro): 0.841576408889408
        f1 score (micro): 0.8382352941176471
        f1 score (weighted): 0.8348941793458862
    run id: 3
    start: 1663014334.8069148
    time taken: 0.9241578578948975



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014336.660901
    result:
    -   accuracy: 0.5480392156862746
        f1 score (macro): 0.5371743230532967
        f1 score (micro): 0.5480392156862746
        f1 score (weighted): 0.5589041083192523
    run id: 1
    start: 1663014335.7364957
    time taken: 0.9244053363800049
-   end: 1663014336.7394373
    result:
    -   accuracy: 0.5480392156862746
        f1 score (macro): 0.5371743230532967
        f1 score (micro): 0.5480392156862746
        f1 score (weighted): 0.5589041083192523
    run id: 2
    start: 1663014336.6609044
    time taken: 0.07853293418884277
-   end: 1663014336.8154297
    result:
    -   accuracy: 0.5480392156862746
        f1 score (macro): 0.5371743230532967
        f1 score (micro): 0.5480392156862746
        f1 score (weighted): 0.5589041083192523
    run id: 3
    start: 1663014336.739439
    time taken: 0.07599067687988281



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1663014336.8645668
    result:
    -   accuracy: 0.7225490196078431
        f1 score (macro): 0.724365497314137
        f1 score (micro): 0.722549019607843
        f1 score (weighted): 0.7207325419015493
    run id: 1
    start: 1663014336.821772
    time taken: 0.04279470443725586
-   end: 1663014336.905426
    result:
    -   accuracy: 0.7225490196078431
        f1 score (macro): 0.724365497314137
        f1 score (micro): 0.722549019607843
        f1 score (weighted): 0.7207325419015493
    run id: 2
    start: 1663014336.8645685
    time taken: 0.040857553482055664
-   end: 1663014336.9461477
    result:
    -   accuracy: 0.7225490196078431
        f1 score (macro): 0.724365497314137
        f1 score (micro): 0.722549019607843
        f1 score (weighted): 0.7207325419015493
    run id: 3
    start: 1663014336.9054275
    time taken: 0.040720224380493164

