# Experiments MotionSense resampled to 30Hz

This notebook will perform basic experiments on the balanced MotionSense dataset with the following steps:
1. Quick load train, test and validation CSV subsets from the balanced MotionSense dataset using `PandasDatasetsIO` helper
2. Subclassing the `Dataset` interface using `PandasMultiModalDataset`
3. Apply the fourier transform on the dataset
4. Train and evaluate SVM, KNN and Random Forest classification models in both time and frequency domains

The experiments will evaluate the performance of SVM, KNN and RF models on the balanced MotionSense dataset in both time and frequency domains.

## Common imports and definitions

In [1]:
from pathlib import Path  # For defining dataset Paths
import sys                # For include librep package

# This must be done if librep is not installed via pip,
# as this directory (examples) is appart from librep package root
sys.path.append("..")

# Third party imports
import pandas as pd
import numpy as np

# Librep imports
from librep.utils.dataset import PandasDatasetsIO          # For quick load train, test and validation CSVs
from librep.datasets.multimodal import PandasMultiModalDataset # Wrap CSVs to librep's `Dataset` interface

2022-09-01 18:57:54.549778: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-01 18:57:54.549798: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Loading data
Change the path to use in other datasets

In [2]:
# Path for KuHar balanced view with the same activities (and labels numbers)
# It is assumed that the directory will contain (train.csv, test.csv and validation.csv)
dataset_path = Path("../data/views/MotionSense/balanced_view")

Once paths is defined, we can load the CSV as pandas dataframes

In [3]:
# Kuhar dataframes
train, validation, test = PandasDatasetsIO(dataset_path).load()

Letś take a look in the train dataframes

In [4]:
train.head()

Unnamed: 0.1,Unnamed: 0,attitude.roll-0,attitude.roll-1,attitude.roll-2,attitude.roll-3,attitude.roll-4,attitude.roll-5,attitude.roll-6,attitude.roll-7,attitude.roll-8,...,userAcceleration.z-145,userAcceleration.z-146,userAcceleration.z-147,userAcceleration.z-148,userAcceleration.z-149,activity code,length,trial_code,index,user
0,0,1.3118,1.309805,1.294033,1.259262,1.214031,1.174594,1.150417,1.126066,1.071678,...,0.198949,-0.241833,-0.228292,-0.409867,-0.227758,0,150,11,300,16
1,1,0.979769,0.853751,0.724747,0.620533,0.563019,0.546236,0.540058,0.531511,0.509747,...,0.061945,0.108357,0.042498,-0.119922,-0.535207,0,150,1,750,7
2,2,2.457231,2.508876,2.562549,2.610262,2.64626,2.662423,2.66341,2.662757,2.656153,...,0.389712,-0.012963,-0.117823,-0.242463,-0.520011,0,150,1,750,11
3,3,-0.816211,-0.847936,-0.773849,-0.642674,-0.511272,-0.443049,-0.422701,-0.404203,-0.357625,...,1.096083,0.919155,0.980044,0.167161,0.291327,0,150,1,450,12
4,4,0.093224,0.153045,0.230516,0.32971,0.430513,0.511403,0.596036,0.68903,0.762821,...,0.559331,0.268818,0.286077,0.244404,0.149644,0,150,1,150,22


## Creating a Librep dataset from pandas dataframes

Change the features to use in other datasets

In [5]:
# MotionSense features to select
features = [
    "userAcceleration.x",
    "userAcceleration.y",
    "userAcceleration.z",
    "rotationRate.x",
    "rotationRate.y",
    "rotationRate.z"
]

# Creating the datasets

# Train
train_dataset = PandasMultiModalDataset(
    train,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Validation
validation_dataset = PandasMultiModalDataset(
    validation,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

# Test
test_dataset = PandasMultiModalDataset(
    test,
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)

## Inspect sample

In [6]:
# Lets print the first sample of kh_train dataset.
# Is a tuple, with an vector of 1800 elements as first element and the label as second
x = train_dataset[0]
print(x)

(array([ 1.850310e-01,  1.323820e-01,  8.863600e-02,  8.935600e-02,
        1.302640e-01,  1.320740e-01,  1.471270e-01,  1.316310e-01,
       -1.874900e-02, -7.266600e-02,  2.075100e-02, -1.878720e-01,
       -3.334510e-01, -4.484450e-01, -1.459800e-01,  2.271810e-01,
        2.504250e-01,  2.776350e-01,  1.516580e-01, -2.107200e-02,
        5.909000e-02, -2.750900e-02,  4.955300e-02,  1.133940e-01,
        1.298460e-01,  3.934200e-02, -2.773300e-02, -4.913900e-02,
       -1.986110e-01, -3.969930e-01, -5.211840e-01, -3.820190e-01,
       -2.566590e-01, -6.048700e-02,  1.571870e-01,  2.215400e-01,
        2.895000e-01,  2.309840e-01, -1.455400e-01, -6.067200e-02,
        3.323650e-01,  5.399100e-02, -7.359100e-02, -2.317620e-01,
       -1.721370e-01, -7.184400e-02, -1.080600e-01,  5.770800e-02,
        2.087240e-01,  2.547080e-01,  2.270680e-01,  1.223900e-02,
       -8.920400e-02, -1.924300e-01, -2.729570e-01, -1.280950e-01,
       -1.683400e-02, -8.303300e-02,  1.845200e-02, -1.829600

In [7]:
# Inspecting sample
print(f"The sample 0: {x[0]}")
print(f"Shape of sample 0: {x[0].shape}")
print(f"The label of sample 0: {x[1]}")

The sample 0: [ 1.850310e-01  1.323820e-01  8.863600e-02  8.935600e-02  1.302640e-01
  1.320740e-01  1.471270e-01  1.316310e-01 -1.874900e-02 -7.266600e-02
  2.075100e-02 -1.878720e-01 -3.334510e-01 -4.484450e-01 -1.459800e-01
  2.271810e-01  2.504250e-01  2.776350e-01  1.516580e-01 -2.107200e-02
  5.909000e-02 -2.750900e-02  4.955300e-02  1.133940e-01  1.298460e-01
  3.934200e-02 -2.773300e-02 -4.913900e-02 -1.986110e-01 -3.969930e-01
 -5.211840e-01 -3.820190e-01 -2.566590e-01 -6.048700e-02  1.571870e-01
  2.215400e-01  2.895000e-01  2.309840e-01 -1.455400e-01 -6.067200e-02
  3.323650e-01  5.399100e-02 -7.359100e-02 -2.317620e-01 -1.721370e-01
 -7.184400e-02 -1.080600e-01  5.770800e-02  2.087240e-01  2.547080e-01
  2.270680e-01  1.223900e-02 -8.920400e-02 -1.924300e-01 -2.729570e-01
 -1.280950e-01 -1.683400e-02 -8.303300e-02  1.845200e-02 -1.829600e-02
  3.238000e-02  8.430000e-04 -1.690000e-03  1.066490e-01  1.603170e-01
  1.546510e-01  1.547520e-01  1.644770e-01  1.941880e-01  1.728

## Fourier Transform

In [8]:
from librep.datasets.multimodal import TransformMultiModalDataset
from librep.transforms.fft import FFT

In [9]:
fft_transform = FFT(centered = True)
transformer = TransformMultiModalDataset(transforms=[fft_transform], new_window_name_prefix="fft.")

### Use FFT in MotionSense

In [10]:
train_dataset_fft = transformer(train_dataset)
validation_dataset_fft = transformer(validation_dataset)
test_dataset_fft = transformer(test_dataset)

In [11]:
train_dataset[:][0]

array([[ 0.185031,  0.132382,  0.088636, ...,  0.522936,  0.93236 ,
         0.720122],
       [-0.367936, -0.464473, -0.331398, ...,  1.419287,  0.899386,
         0.875637],
       [ 0.268881,  0.285772,  0.109782, ..., -0.164199,  0.357488,
         0.615993],
       ...,
       [ 0.434857,  0.423192,  0.699041, ...,  1.644392,  2.491744,
         5.096907],
       [ 1.300639,  1.112505,  1.404574, ...,  0.288699,  0.259291,
         0.476109],
       [ 0.864961,  0.063222, -0.40488 , ...,  1.268209, -0.937557,
         0.557948]])

In [12]:
train_dataset_fft[:][0]

array([[ 0.4051    ,  1.02004708,  1.07645137, ...,  0.81610676,
         1.46963384,  0.38973439],
       [12.27775   ,  4.13506303,  2.19762137, ...,  2.24368189,
         3.28249292,  0.97427267],
       [ 9.830147  ,  4.4287738 ,  5.70037324, ...,  0.62088193,
         0.92075927,  0.48567778],
       ...,
       [ 1.303547  ,  4.4969612 ,  3.74262557, ...,  3.49895555,
         0.98426705,  7.3453344 ],
       [ 2.758783  ,  2.02576464,  0.89622788, ...,  3.34989765,
         2.32798646,  1.80279953],
       [ 9.720902  ,  1.55661245,  3.92530048, ...,  5.51747107,
         1.05656395,  2.57233132]])

## Train and evaluate Random Forest classifier

In [13]:
from librep.utils.workflow import SimpleTrainEvalWorkflow, MultiRunWorkflow
from librep.estimators import RandomForestClassifier
from librep.metrics.report import ClassificationReport
import yaml

reporter = ClassificationReport(use_accuracy=True, use_f1_score=True, use_classification_report=False, use_confusion_matrix=False, plot_confusion_matrix=False)
experiment = SimpleTrainEvalWorkflow(estimator=RandomForestClassifier, estimator_creation_kwags ={'n_estimators':100} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

In [14]:
combined_train_dset = PandasMultiModalDataset(
    pd.concat([train, validation]),
    feature_prefixes=features,
    label_columns="activity code",
    as_array=True
)


result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058691.958451
    result:
    -   accuracy: 0.8274161735700197
        f1 score (macro): 0.8265047544211411
        f1 score (micro): 0.8274161735700197
        f1 score (weighted): 0.8283275927188983
    run id: 1
    start: 1662058685.637917
    time taken: 6.320533990859985
-   end: 1662058698.2649539
    result:
    -   accuracy: 0.814595660749507
        f1 score (macro): 0.8132132157540464
        f1 score (micro): 0.814595660749507
        f1 score (weighted): 0.8159781057449675
    run id: 2
    start: 1662058691.9584527
    time taken: 6.306501150131226
-   end: 1662058704.5732756
    result:
    -   accuracy: 0.8185404339250493
        f1 score (macro): 0.816785371802641
        f1 score (micro): 0.8185404339250493
        f1 score (weighted): 0.8202954960474576
    run id: 3
    start: 1662058698.2649558
    time taken: 6.308319807052612



In [15]:
combined_train_dset_fft = transformer(combined_train_dset)

result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058709.0840907
    result:
    -   accuracy: 0.8224852071005917
        f1 score (macro): 0.8249361596193524
        f1 score (micro): 0.8224852071005917
        f1 score (weighted): 0.8200342545818312
    run id: 1
    start: 1662058704.8035216
    time taken: 4.280569076538086
-   end: 1662058713.3284512
    result:
    -   accuracy: 0.8284023668639053
        f1 score (macro): 0.8309968376624209
        f1 score (micro): 0.8284023668639053
        f1 score (weighted): 0.8258078960653898
    run id: 2
    start: 1662058709.0840921
    time taken: 4.244359016418457
-   end: 1662058717.5715759
    result:
    -   accuracy: 0.8175542406311637
        f1 score (macro): 0.8207150680020541
        f1 score (micro): 0.8175542406311638
        f1 score (weighted): 0.8143934132602735
    run id: 3
    start: 1662058713.3284528
    time taken: 4.2431230545043945



## Train and evaluate Support Vector Machine classifier

In [16]:
#from librep.estimators import SVC
from sklearn.svm import SVC

experiment = SimpleTrainEvalWorkflow(estimator=SVC, estimator_creation_kwags ={'C':3.0, 'kernel':"rbf"} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058722.2264612
    result:
    -   accuracy: 0.6390532544378699
        f1 score (macro): 0.6133353886470502
        f1 score (micro): 0.6390532544378699
        f1 score (weighted): 0.6647711202286894
    run id: 1
    start: 1662058717.5770545
    time taken: 4.649406671524048
-   end: 1662058727.040984
    result:
    -   accuracy: 0.6390532544378699
        f1 score (macro): 0.6133353886470502
        f1 score (micro): 0.6390532544378699
        f1 score (weighted): 0.6647711202286894
    run id: 2
    start: 1662058722.226463
    time taken: 4.814520835876465
-   end: 1662058731.7785027
    result:
    -   accuracy: 0.6390532544378699
        f1 score (macro): 0.6133353886470502
        f1 score (micro): 0.6390532544378699
        f1 score (weighted): 0.6647711202286894
    run id: 3
    start: 1662058727.0409856
    time taken: 4.7375171184539795



In [17]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058732.6979492
    result:
    -   accuracy: 0.8648915187376726
        f1 score (macro): 0.864679230820583
        f1 score (micro): 0.8648915187376726
        f1 score (weighted): 0.8651038066547622
    run id: 1
    start: 1662058731.7835605
    time taken: 0.9143886566162109
-   end: 1662058733.6290216
    result:
    -   accuracy: 0.8648915187376726
        f1 score (macro): 0.864679230820583
        f1 score (micro): 0.8648915187376726
        f1 score (weighted): 0.8651038066547622
    run id: 2
    start: 1662058732.6979506
    time taken: 0.9310710430145264
-   end: 1662058734.5476532
    result:
    -   accuracy: 0.8648915187376726
        f1 score (macro): 0.864679230820583
        f1 score (micro): 0.8648915187376726
        f1 score (weighted): 0.8651038066547622
    run id: 3
    start: 1662058733.6290236
    time taken: 0.9186296463012695



## Train and evaluate K Neighbors Classifier classifier

In [18]:
#from librep.estimators import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

experiment = SimpleTrainEvalWorkflow(estimator=KNeighborsClassifier, estimator_creation_kwags ={'n_neighbors' :1} , do_not_instantiate=False, do_fit=True, evaluator=reporter)
multi_run_experiment = MultiRunWorkflow(workflow=experiment, num_runs=3, debug=False)

result = multi_run_experiment(combined_train_dset, test_dataset)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058734.8791718
    result:
    -   accuracy: 0.5029585798816568
        f1 score (macro): 0.49825668094091674
        f1 score (micro): 0.5029585798816568
        f1 score (weighted): 0.5076604788223971
    run id: 1
    start: 1662058734.5530987
    time taken: 0.32607316970825195
-   end: 1662058734.9548812
    result:
    -   accuracy: 0.5029585798816568
        f1 score (macro): 0.49825668094091674
        f1 score (micro): 0.5029585798816568
        f1 score (weighted): 0.5076604788223971
    run id: 2
    start: 1662058734.8791757
    time taken: 0.07570552825927734
-   end: 1662058735.0284004
    result:
    -   accuracy: 0.5029585798816568
        f1 score (macro): 0.49825668094091674
        f1 score (micro): 0.5029585798816568
        f1 score (weighted): 0.5076604788223971
    run id: 3
    start: 1662058734.9548829
    time taken: 0.0735175609588623



In [19]:
result = multi_run_experiment(combined_train_dset_fft, test_dataset_fft)
print(yaml.dump(result, sort_keys=True, indent=4))

runs:
-   end: 1662058735.080295
    result:
    -   accuracy: 0.7889546351084813
        f1 score (macro): 0.791422475633086
        f1 score (micro): 0.7889546351084813
        f1 score (weighted): 0.7864867945838767
    run id: 1
    start: 1662058735.0344865
    time taken: 0.04580855369567871
-   end: 1662058735.11995
    result:
    -   accuracy: 0.7889546351084813
        f1 score (macro): 0.791422475633086
        f1 score (micro): 0.7889546351084813
        f1 score (weighted): 0.7864867945838767
    run id: 2
    start: 1662058735.080297
    time taken: 0.03965306282043457
-   end: 1662058735.158503
    result:
    -   accuracy: 0.7889546351084813
        f1 score (macro): 0.791422475633086
        f1 score (micro): 0.7889546351084813
        f1 score (weighted): 0.7864867945838767
    run id: 3
    start: 1662058735.119952
    time taken: 0.03855109214782715

