# Experiments MotionSense resampled to 20Hz

This notebook will perform basic experiments on the balanced MotionSense dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced MotionSense dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced MotionSense dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-04 16:55:16.839329: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-04 16:55:16.839350: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for MotionSense balanced view resampled to 30Hz with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/MotionSense/resampled_view_30Hz")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# MotionSense dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,userAcceleration.x-0,userAcceleration.x-1,userAcceleration.x-2,userAcceleration.x-3,userAcceleration.x-4,userAcceleration.x-5,userAcceleration.x-6,userAcceleration.x-7,userAcceleration.x-8,...,rotationRate.z-85,rotationRate.z-86,rotationRate.z-87,rotationRate.z-88,rotationRate.z-89,activity code,length,trial_code,index,user
0,0,-0.186833,-0.179195,-0.226435,-0.234763,-0.267824,-0.234534,-0.235421,-0.133759,-0.297125,...,0.153844,0.456858,0.898804,1.139253,0.275556,0,150,1,150,11
1,1,-0.054442,0.260099,0.022933,0.019339,0.148599,-0.036896,-0.125777,-0.110877,0.01626,...,0.848919,0.559802,0.253026,0.858864,0.799075,0,150,1,900,12
2,2,-0.007696,-0.009515,0.051284,-0.082342,0.046316,0.062557,-0.032338,-0.108787,-0.09048,...,0.789453,0.49527,-0.042529,-0.16111,0.129157,0,150,1,1050,21
3,3,-0.435023,-0.557701,-0.284523,0.142448,0.545683,0.363495,0.006622,-0.042397,-0.412486,...,-1.073667,-0.531939,-0.302297,0.086762,0.60079,0,150,2,150,17
4,4,0.098066,0.398057,0.321284,-0.024039,0.372623,0.302234,0.199685,0.285311,0.319281,...,0.884322,1.476072,1.630557,0.927239,0.672827,0,150,11,450,21


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# MotionSense features to select
features = [
    "userAcceleration.x",
    "userAcceleration.y",
    "userAcceleration.z",
    "rotationRate.x",
    "rotationRate.y",
    "rotationRate.z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([-1.86833389e-01, -1.79195359e-01, -2.26435131e-01, -2.34763436e-01,
       -2.67823557e-01, -2.34534448e-01, -2.35421098e-01, -1.33759301e-01,
       -2.97124572e-01, -3.19298325e-01,  8.95068153e-03,  4.13667107e-02,
       -4.45930376e-01, -2.28045572e-01,  5.05584862e-02,  1.88233135e-01,
       -3.05589990e-02, -7.44910767e-02,  3.59969568e-02, -5.25122747e-02,
       -1.20435540e-01, -2.99962061e-01, -4.50567186e-01,  1.46823014e-01,
        1.64267921e-02,  1.00866929e-01,  1.56137079e-01, -9.38140351e-02,
       -2.79031325e-01, -2.71325809e-01, -1.99438544e-01, -3.22355437e-01,
       -2.81747706e-01, -1.88338075e-01, -1.75930166e-01, -2.58367241e-01,
       -4.13807163e-01, -6.38969102e-01, -6.66223601e-02,  6.10918606e-01,
        1.23116570e-01,  7.50942481e-02,  1.32656873e-01,  1.36065249e-01,
       -7.60303034e-02, -1.99809749e-01,  8.69800753e-03,  3.54754458e-01,
       -2.36973734e-01, -3.67774663e-01, -2.45674023e-01,  1.50712010e-01,
        1.44587441e-01, 

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [-1.86833389e-01 -1.79195359e-01 -2.26435131e-01 -2.34763436e-01
 -2.67823557e-01 -2.34534448e-01 -2.35421098e-01 -1.33759301e-01
 -2.97124572e-01 -3.19298325e-01  8.95068153e-03  4.13667107e-02
 -4.45930376e-01 -2.28045572e-01  5.05584862e-02  1.88233135e-01
 -3.05589990e-02 -7.44910767e-02  3.59969568e-02 -5.25122747e-02
 -1.20435540e-01 -2.99962061e-01 -4.50567186e-01  1.46823014e-01
  1.64267921e-02  1.00866929e-01  1.56137079e-01 -9.38140351e-02
 -2.79031325e-01 -2.71325809e-01 -1.99438544e-01 -3.22355437e-01
 -2.81747706e-01 -1.88338075e-01 -1.75930166e-01 -2.58367241e-01
 -4.13807163e-01 -6.38969102e-01 -6.66223601e-02  6.10918606e-01
  1.23116570e-01  7.50942481e-02  1.32656873e-01  1.36065249e-01
 -7.60303034e-02 -1.99809749e-01  8.69800753e-03  3.54754458e-01
 -2.36973734e-01 -3.67774663e-01 -2.45674023e-01  1.50712010e-01
  1.44587441e-01 -1.08431514e-01 -1.82670965e-02 -8.24199145e-02
 -3.54580921e-02 -2.73086366e-01 -3.19687428e-01 -1.72529165e-01
  2.8114138

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in MotionSense

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[-0.18683339, -0.17919536, -0.22643513, ...,  0.89880418,
         1.13925318,  0.27555584],
       [-0.05444212,  0.26009924,  0.0229331 , ...,  0.25302617,
         0.85886433,  0.79907488],
       [-0.00769597, -0.00951478,  0.05128415, ..., -0.04252899,
        -0.16110965,  0.12915699],
       ...,
       [ 0.12156633,  0.08784327,  0.22650777, ..., -0.33646327,
         0.188585  , -0.40013492],
       [ 0.30457243,  1.98117781,  1.05075516, ...,  4.15423467,
         3.18971673, -0.32404159],
       [-0.82321128,  0.18495934, -0.01894481, ..., -3.27345907,
        -4.03518556, -2.9964158 ]])

In [12]:
train_dataset_fft[:][0]

array([[ 7.8524508 ,  1.97250947,  0.93442469, ...,  1.89050909,
         0.83709156,  1.20154337],
       [ 0.6293334 ,  2.21672832,  1.13328689, ...,  1.16967075,
         3.53071988,  1.06557213],
       [ 3.1927866 ,  1.55762082,  0.60235897, ...,  0.55560622,
         0.71680657,  0.39816055],
       ...,
       [ 2.13171   ,  0.97741234,  1.88558163, ...,  3.39217532,
         0.16116226,  2.08539881],
       [ 7.3141302 ,  5.07977712,  5.15892564, ...,  3.59260974,
         5.23529476,  4.98941257],
       [12.2992248 ,  2.30543013,  4.6540158 , ...,  4.96081329,
         6.97650088,  4.14331   ]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation]),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310524.6330519
    result:
    -   accuracy: 0.8147058823529412
        f1 score (macro): 0.8156302339334225
        f1 score (micro): 0.8147058823529412
        f1 score (weighted): 0.8137815307724598
    run id: 1
    start: 1662310519.609394
    time taken: 5.02365779876709
-   end: 1662310529.6589525
    result:
    -   accuracy: 0.7901960784313725
        f1 score (macro): 0.7904209664487559
        f1 score (micro): 0.7901960784313725
        f1 score (weighted): 0.7899711904139892
    run id: 2
    start: 1662310524.633054
    time taken: 5.025898456573486
-   end: 1662310534.7272208
    result:
    -   accuracy: 0.8009803921568628
        f1 score (macro): 0.8019091572586419
        f1 score (micro): 0.8009803921568628
        f1 score (weighted): 0.8000516270550835
    run id: 3
    start: 1662310529.6589544
    time taken: 5.06826639175415



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310538.1584
    result:
    -   accuracy: 0.8294117647058824
        f1 score (macro): 0.833632982239653
        f1 score (micro): 0.8294117647058825
        f1 score (weighted): 0.8251905471721116
    run id: 1
    start: 1662310534.921592
    time taken: 3.2368080615997314
-   end: 1662310541.34553
    result:
    -   accuracy: 0.8264705882352941
        f1 score (macro): 0.8301586681355534
        f1 score (micro): 0.826470588235294
        f1 score (weighted): 0.8227825083350346
    run id: 2
    start: 1662310538.1584015
    time taken: 3.1871285438537598
-   end: 1662310544.5052915
    result:
    -   accuracy: 0.8382352941176471
        f1 score (macro): 0.8427912324107038
        f1 score (micro): 0.8382352941176471
        f1 score (weighted): 0.8336793558245903
    run id: 3
    start: 1662310541.3455317
    time taken: 3.159759759902954



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310547.456629
    result:
    -   accuracy: 0.6058823529411764
        f1 score (macro): 0.5853038254714807
        f1 score (micro): 0.6058823529411764
        f1 score (weighted): 0.6264608804108721
    run id: 1
    start: 1662310544.5108726
    time taken: 2.945756435394287
-   end: 1662310550.239782
    result:
    -   accuracy: 0.6058823529411764
        f1 score (macro): 0.5853038254714807
        f1 score (micro): 0.6058823529411764
        f1 score (weighted): 0.6264608804108721
    run id: 2
    start: 1662310547.4566312
    time taken: 2.7831509113311768
-   end: 1662310553.237906
    result:
    -   accuracy: 0.6058823529411764
        f1 score (macro): 0.5853038254714807
        f1 score (micro): 0.6058823529411764
        f1 score (weighted): 0.6264608804108721
    run id: 3
    start: 1662310550.2397845
    time taken: 2.998121500015259



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310553.8998082
    result:
    -   accuracy: 0.8362745098039216
        f1 score (macro): 0.8398093236898054
        f1 score (micro): 0.8362745098039216
        f1 score (weighted): 0.8327396959180376
    run id: 1
    start: 1662310553.2429883
    time taken: 0.6568198204040527
-   end: 1662310554.5566177
    result:
    -   accuracy: 0.8362745098039216
        f1 score (macro): 0.8398093236898054
        f1 score (micro): 0.8362745098039216
        f1 score (weighted): 0.8327396959180376
    run id: 2
    start: 1662310553.8998103
    time taken: 0.6568074226379395
-   end: 1662310555.2221253
    result:
    -   accuracy: 0.8362745098039216
        f1 score (macro): 0.8398093236898054
        f1 score (micro): 0.8362745098039216
        f1 score (weighted): 0.8327396959180376
    run id: 3
    start: 1662310554.5566194
    time taken: 0.6655058860778809



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310555.3926013
    result:
    -   accuracy: 0.5588235294117647
        f1 score (macro): 0.5519621171043617
        f1 score (micro): 0.5588235294117647
        f1 score (weighted): 0.5656849417191676
    run id: 1
    start: 1662310555.2281358
    time taken: 0.16446542739868164
-   end: 1662310555.447947
    result:
    -   accuracy: 0.5588235294117647
        f1 score (macro): 0.5519621171043617
        f1 score (micro): 0.5588235294117647
        f1 score (weighted): 0.5656849417191676
    run id: 2
    start: 1662310555.3926063
    time taken: 0.05534076690673828
-   end: 1662310555.501964
    result:
    -   accuracy: 0.5588235294117647
        f1 score (macro): 0.5519621171043617
        f1 score (micro): 0.5588235294117647
        f1 score (weighted): 0.5656849417191676
    run id: 3
    start: 1662310555.4479487
    time taken: 0.054015398025512695



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662310555.546325
    result:
    -   accuracy: 0.7235294117647059
        f1 score (macro): 0.7257034370955838
        f1 score (micro): 0.7235294117647059
        f1 score (weighted): 0.7213553864338279
    run id: 1
    start: 1662310555.5103478
    time taken: 0.03597712516784668
-   end: 1662310555.5785568
    result:
    -   accuracy: 0.7235294117647059
        f1 score (macro): 0.7257034370955838
        f1 score (micro): 0.7235294117647059
        f1 score (weighted): 0.7213553864338279
    run id: 2
    start: 1662310555.5463269
    time taken: 0.03222990036010742
-   end: 1662310555.6105347
    result:
    -   accuracy: 0.7235294117647059
        f1 score (macro): 0.7257034370955838
        f1 score (micro): 0.7235294117647059
        f1 score (weighted): 0.7213553864338279
    run id: 3
    start: 1662310555.5785582
    time taken: 0.03197646141052246

