# Experiments MotionSense resampled to 20Hz

This notebook will perform basic experiments on the balanced MotionSense dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced MotionSense dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced MotionSense dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-12 01:30:53.298122: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 01:30:53.298143: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for MotionSense balanced view resampled to 20Hz with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/MotionSense/resampled_view_20Hz")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# MotionSense dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,userAcceleration.x-0,userAcceleration.x-1,userAcceleration.x-2,userAcceleration.x-3,userAcceleration.x-4,userAcceleration.x-5,userAcceleration.x-6,userAcceleration.x-7,userAcceleration.x-8,...,rotationRate.z-55,rotationRate.z-56,rotationRate.z-57,rotationRate.z-58,rotationRate.z-59,activity code,length,trial_code,index,user
0,0,-0.101581,-0.221355,-0.234016,-0.264552,-0.200991,-0.208962,-0.308408,0.089943,-0.382516,...,-0.099006,-0.17113,0.292608,0.932535,0.782147,0,150,1,150,11
1,1,-0.082527,0.201136,-0.017408,0.120404,-0.179599,-0.01396,-0.014233,0.253264,0.684288,...,-0.838042,0.241819,0.71842,0.336507,0.936563,0,150,1,900,12
2,2,0.108323,-0.045941,0.01741,0.010881,0.019035,-0.157225,0.016889,-0.076276,0.149599,...,0.497304,0.732298,0.667477,-0.025386,-0.037093,0,150,1,1050,21
3,3,-0.370755,-0.49585,0.205895,0.455012,0.113117,-0.327512,0.11176,0.001935,-0.844532,...,0.919908,0.037076,-0.993926,-0.182277,0.317828,0,150,2,150,17
4,4,-0.005683,0.45237,0.028475,0.402016,0.168378,0.353346,0.182684,0.042545,0.00153,...,-0.060734,0.496173,1.265226,1.502311,0.857408,0,150,11,450,21


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# MotionSense features to select
features = [
    "userAcceleration.x",
    "userAcceleration.y",
    "userAcceleration.z",
    "rotationRate.x",
    "rotationRate.y",
    "rotationRate.z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([-1.01580660e-01, -2.21355124e-01, -2.34015566e-01, -2.64551900e-01,
       -2.00990656e-01, -2.08961907e-01, -3.08408466e-01,  8.99426214e-02,
       -3.82515554e-01, -7.75533852e-02,  1.46318283e-01, -4.93770905e-02,
       -2.09640201e-02, -3.17307189e-02, -3.74259464e-01, -1.52182310e-01,
        1.05233312e-01,  1.11319758e-01, -5.38330158e-02, -3.14136807e-01,
       -2.08472528e-01, -3.23492282e-01, -1.91815150e-01, -1.88533126e-01,
       -4.71212234e-01, -3.96779870e-01,  5.07970584e-01,  5.27352561e-02,
        1.59006497e-01,  3.15998485e-02, -1.98139909e-01,  2.08691866e-01,
       -1.08432318e-01, -4.06142257e-01,  1.66143634e-01,  7.70059198e-06,
       -5.77953517e-02, -5.52501402e-02, -2.20866812e-01, -3.14509600e-01,
        2.84967949e-01,  1.34890424e-01,  6.60941177e-02,  6.91335728e-02,
       -3.98362725e-01,  6.46615090e-02,  1.24072879e-01, -2.26239499e-01,
       -3.26437858e-01, -1.39575067e-01, -1.09824066e-02, -7.93132478e-02,
       -4.57364775e-01, 

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [-1.01580660e-01 -2.21355124e-01 -2.34015566e-01 -2.64551900e-01
 -2.00990656e-01 -2.08961907e-01 -3.08408466e-01  8.99426214e-02
 -3.82515554e-01 -7.75533852e-02  1.46318283e-01 -4.93770905e-02
 -2.09640201e-02 -3.17307189e-02 -3.74259464e-01 -1.52182310e-01
  1.05233312e-01  1.11319758e-01 -5.38330158e-02 -3.14136807e-01
 -2.08472528e-01 -3.23492282e-01 -1.91815150e-01 -1.88533126e-01
 -4.71212234e-01 -3.96779870e-01  5.07970584e-01  5.27352561e-02
  1.59006497e-01  3.15998485e-02 -1.98139909e-01  2.08691866e-01
 -1.08432318e-01 -4.06142257e-01  1.66143634e-01  7.70059198e-06
 -5.77953517e-02 -5.52501402e-02 -2.20866812e-01 -3.14509600e-01
  2.84967949e-01  1.34890424e-01  6.60941177e-02  6.91335728e-02
 -3.98362725e-01  6.46615090e-02  1.24072879e-01 -2.26239499e-01
 -3.26437858e-01 -1.39575067e-01 -1.09824066e-02 -7.93132478e-02
 -4.57364775e-01 -1.14570859e-01  2.62746376e-01 -1.53196322e-01
  3.57891496e-01 -2.31041538e-01 -2.26239680e-01  3.22787316e-01
 -1.4633490

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in MotionSense

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[-0.10158066, -0.22135512, -0.23401557, ...,  0.29260805,
         0.932535  ,  0.78214699],
       [-0.08252731,  0.20113614, -0.017408  , ...,  0.71842028,
         0.33650671,  0.93656304],
       [ 0.10832292, -0.04594076,  0.01741017, ...,  0.66747733,
        -0.0253858 , -0.03709332],
       ...,
       [ 0.19266143,  0.10970878,  0.28033436, ..., -0.39853684,
        -0.17416418, -0.04600792],
       [ 0.69634911,  1.57399343, -0.05390784, ...,  2.05972956,
         3.92681916,  1.76309344],
       [-0.69489245,  0.10504003,  0.50498891, ..., -2.74187909,
        -3.1827423 , -3.9243559 ]])

In [12]:
train_dataset_fft[:][0]

array([[ 5.2349672 ,  1.31500631,  0.62294979, ...,  1.51949579,
         1.40946664,  2.67359214],
       [ 0.4195556 ,  1.47781888,  0.7555246 , ...,  3.16244675,
         1.15978979,  1.87787114],
       [ 2.1285244 ,  1.03841388,  0.40157264, ...,  0.31071895,
         2.31689217,  1.23138236],
       ...,
       [ 1.42114   ,  0.65160823,  1.25705442, ...,  1.84423902,
         3.08233644,  6.43174028],
       [ 4.8760868 ,  3.38651808,  3.43928376, ..., 10.83724915,
        19.81342536,  2.8192645 ],
       [ 8.1994832 ,  1.53695342,  3.1026772 , ...,  6.69953404,
        20.53527595,  6.08226567]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation]),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946259.7251973
    result:
    -   accuracy: 0.7941176470588235
        f1 score (macro): 0.7945803591043178
        f1 score (micro): 0.7941176470588235
        f1 score (weighted): 0.7936549350133294
    run id: 1
    start: 1662946255.7958398
    time taken: 3.9293575286865234
-   end: 1662946263.6127312
    result:
    -   accuracy: 0.8058823529411765
        f1 score (macro): 0.8052478115067453
        f1 score (micro): 0.8058823529411765
        f1 score (weighted): 0.8065168943756078
    run id: 2
    start: 1662946259.7251995
    time taken: 3.8875317573547363
-   end: 1662946267.4796536
    result:
    -   accuracy: 0.807843137254902
        f1 score (macro): 0.8083873125009667
        f1 score (micro): 0.807843137254902
        f1 score (weighted): 0.8072989620088372
    run id: 3
    start: 1662946263.6127326
    time taken: 3.8669209480285645



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946270.1721408
    result:
    -   accuracy: 0.8401960784313726
        f1 score (macro): 0.8450572294039255
        f1 score (micro): 0.8401960784313726
        f1 score (weighted): 0.8353349274588198
    run id: 1
    start: 1662946267.6317942
    time taken: 2.540346622467041
-   end: 1662946272.7333715
    result:
    -   accuracy: 0.846078431372549
        f1 score (macro): 0.8506697695433227
        f1 score (micro): 0.846078431372549
        f1 score (weighted): 0.8414870932017754
    run id: 2
    start: 1662946270.1721425
    time taken: 2.5612289905548096
-   end: 1662946275.2740645
    result:
    -   accuracy: 0.8441176470588235
        f1 score (macro): 0.8487874396226154
        f1 score (micro): 0.8441176470588234
        f1 score (weighted): 0.8394478544950316
    run id: 3
    start: 1662946272.7333732
    time taken: 2.540691375732422



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946277.0871634
    result:
    -   accuracy: 0.6166666666666667
        f1 score (macro): 0.5991285350102015
        f1 score (micro): 0.6166666666666667
        f1 score (weighted): 0.6342047983231319
    run id: 1
    start: 1662946275.2794306
    time taken: 1.8077328205108643
-   end: 1662946278.9083006
    result:
    -   accuracy: 0.6166666666666667
        f1 score (macro): 0.5991285350102015
        f1 score (micro): 0.6166666666666667
        f1 score (weighted): 0.6342047983231319
    run id: 2
    start: 1662946277.0871654
    time taken: 1.8211352825164795
-   end: 1662946280.6965876
    result:
    -   accuracy: 0.6166666666666667
        f1 score (macro): 0.5991285350102015
        f1 score (micro): 0.6166666666666667
        f1 score (weighted): 0.6342047983231319
    run id: 3
    start: 1662946278.9083028
    time taken: 1.7882847785949707



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946281.1805005
    result:
    -   accuracy: 0.8303921568627451
        f1 score (macro): 0.8346168154346197
        f1 score (micro): 0.8303921568627451
        f1 score (weighted): 0.8261674982908706
    run id: 1
    start: 1662946280.7017505
    time taken: 0.47874999046325684
-   end: 1662946281.6543446
    result:
    -   accuracy: 0.8303921568627451
        f1 score (macro): 0.8346168154346197
        f1 score (micro): 0.8303921568627451
        f1 score (weighted): 0.8261674982908706
    run id: 2
    start: 1662946281.1805022
    time taken: 0.4738423824310303
-   end: 1662946282.1283693
    result:
    -   accuracy: 0.8303921568627451
        f1 score (macro): 0.8346168154346197
        f1 score (micro): 0.8303921568627451
        f1 score (weighted): 0.8261674982908706
    run id: 3
    start: 1662946281.6543462
    time taken: 0.47402310371398926



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946282.318519
    result:
    -   accuracy: 0.5813725490196079
        f1 score (macro): 0.57652308813541
        f1 score (micro): 0.5813725490196079
        f1 score (weighted): 0.5862220099038056
    run id: 1
    start: 1662946282.1341512
    time taken: 0.18436789512634277
-   end: 1662946282.3659525
    result:
    -   accuracy: 0.5813725490196079
        f1 score (macro): 0.57652308813541
        f1 score (micro): 0.5813725490196079
        f1 score (weighted): 0.5862220099038056
    run id: 2
    start: 1662946282.318521
    time taken: 0.04743146896362305
-   end: 1662946282.4086316
    result:
    -   accuracy: 0.5813725490196079
        f1 score (macro): 0.57652308813541
        f1 score (micro): 0.5813725490196079
        f1 score (weighted): 0.5862220099038056
    run id: 3
    start: 1662946282.3659544
    time taken: 0.04267716407775879



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662946282.4466975
    result:
    -   accuracy: 0.7294117647058823
        f1 score (macro): 0.7322267426617582
        f1 score (micro): 0.7294117647058823
        f1 score (weighted): 0.7265967867500063
    run id: 1
    start: 1662946282.4166644
    time taken: 0.030033111572265625
-   end: 1662946282.4746978
    result:
    -   accuracy: 0.7294117647058823
        f1 score (macro): 0.7322267426617582
        f1 score (micro): 0.7294117647058823
        f1 score (weighted): 0.7265967867500063
    run id: 2
    start: 1662946282.4466996
    time taken: 0.02799820899963379
-   end: 1662946282.502856
    result:
    -   accuracy: 0.7294117647058823
        f1 score (macro): 0.7322267426617582
        f1 score (micro): 0.7294117647058823
        f1 score (weighted): 0.7265967867500063
    run id: 3
    start: 1662946282.4746995
    time taken: 0.028156518936157227

