# Testing Dataset Loaders

Example of loading the following datasets (classes):

- KuHarResampledView20HZ
- MotionSenseResampledView20HZ
- CHARMUnbalancedView
- WISDMInterpolatedUnbalancedView
- UCIHARUnbalancedView

To load the datasets, you must:

- Wrap the dataset path arround one of the above classes (`root_dir` argument). You may want to download the dataset setting `download` argument to `True`
- Use `load` function. It will create `PandasMultiModalDataset` objects

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("..")

In [3]:
from librep.datasets.har.loaders import (
    KuHar_BalancedView20HzMotionSenseEquivalent,
    MotionSense_BalancedView20HZ,
    ExtraSensorySense_UnbalancedView20HZ,
    CHARM_BalancedView20Hz,
    WISDM_UnbalancedView20Hz,
    UCIHAR_UnbalancedView20Hz
)

2022-10-07 18:59:45.630046: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-07 18:59:45.630067: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## KuHar

In [14]:
loader = KuHar_BalancedView20HzMotionSenseEquivalent(
    root_dir="../../data/views/KuHar/balanced_20Hz_motionsense_equivalent-v1",
    download=False)

loader.print_readme()

# Balanced KuHar View Resampled to 20Hz

This is a view from [KuHar v5](https://data.mendeley.com/datasets/45f952y38r/5) that was spllited into 3s windows and was resampled to 20Hz using the [FFT method](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample.html#scipy.signal.resample). 

The data was first splitted in three sets: train, validation and test. Each one with the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.

## Activity codes
- 0: stair down (485 train, 34 validation, 41 test) 
- 1: stair up (485 train, 34 validation, 41 test) 
- 2: sit (485 train, 34 validation, 41 test) 
- 3: stand (485 train, 34 validation, 41 test) 
- 4: walk (485 train, 34 validation, 41 test) 
- 5: run (485 train, 34 validation, 41 test) 
 

## Standartized activity codes
- 0: sit (485 train, 34 validation, 41 test) 
- 1: stand (485 train, 34 validation, 41 test) 
- 2: walk (485 train, 34 validation, 41 test) 
- 3: stair up (485 train, 34 validation, 41 test) 
- 4: stair down (485 train, 34 validation, 41 test) 
- 5: run (485 train, 34 validation, 41 test) 
      




In [11]:
train_val, test = loader.load(concat_train_validation=True, label=loader.standard_label)
train_val

PandasMultiModalDataset: samples=3114, features=360, no. window=6, label_columns='standard activity code'

In [12]:
loader.standard_activity_codes

{0: 'sit', 1: 'stand', 2: 'walk', 3: 'stair up', 4: 'stair down', 5: 'run'}

## MotionSense

In [15]:
# MotionSense Loader
loader = MotionSense_BalancedView20HZ(
    root_dir="../../data/views/MotionSense/balanced_20Hz-v1", 
    download=False
)

# Print the readme (optional)
loader.print_readme()

# Balanced MotionSense View Resampled to 20Hz

This is a view from [KuHar v5](https://data.mendeley.com/datasets/45f952y38r/5) that was spllited into 3s windows and was resampled to 20Hz using the [FFT method](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample.html#scipy.signal.resample). 

The data was first splitted in three sets: train, validation and test. Each one with the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.

## Activity codes
- 0: downstairs (569 train, 101 validation, 170 test) 
- 1: upstairs (569 train, 101 validation, 170 test) 
- 2: sitting (569 train, 101 validation, 170 test) 
- 3: standing (569 train, 101 validation, 170 test) 
- 4: walking (569 train, 101 validation, 170 test) 
- 5: jogging (569 train, 101 validation, 170 test) 
 

## Standartized activity codes
- 0: sit (569 train, 101 validation, 170 test) 
- 1: stand (569 train, 101 validation, 170 test) 
- 2: walk (569 train, 101 validation, 170 test) 
- 3: stair up (569 train, 101 validation, 170 test) 
- 4: stair down (569 train, 101 validation, 170 test) 
- 5: run (569 train, 101 validation, 170 test) 
      




In [17]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True, label=loader.standard_label)
train_val, test

(PandasMultiModalDataset: samples=4020, features=360, no. window=6, label_columns='standard activity code',
 PandasMultiModalDataset: samples=1020, features=360, no. window=6, label_columns='standard activity code')

## CHARM

In [18]:
# CHARM Loader
loader = CHARM_BalancedView20Hz(
    "../../data/views/CHARM/balanced_20Hz_train_test-v1", 
    download=False
)

# Print the readme (optional)
loader.print_readme()

# Balanced CHARM View

This is a view from [CHARM dataset](https://zenodo.org/record/4642560) that was spllited into 3s windows. The sample rate was 20Hz.

The data was first splitted in two sets: train and test. Each one with the following proportions:
- Train: 70% of samples
- Test: 30% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.

## Activity codes
- 0: sitting on a chair (105 train, 0 validation, 33 test) 
- 1: sitting on a couch (105 train, 0 validation, 33 test) 
- 2: standing (105 train, 0 validation, 33 test) 
- 6: walking (105 train, 0 validation, 33 test) 
- 7: running (105 train, 0 validation, 33 test) 
- 8: walking upstairs (105 train, 0 validation, 33 test) 
- 9: walking downstairs (105 train, 0 validation, 33 test) 
 

## Standartized activity codes
- 0: sit (210 train, 0 validation, 66 test) 
- 1: stand (105 train, 0 validation, 33 test) 
- 2: walk (105 train, 0 validation, 33 test) 
- 3: stair up (105 train, 0 validation, 33 test) 
- 4: stair down (105 train, 0 validation, 33 test) 
- 5: run (105 train, 0 validation, 33 test) 
      




In [19]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True, label=loader.standard_label)
train_val, test

(PandasMultiModalDataset: samples=735, features=360, no. window=6, label_columns='standard activity code',
 PandasMultiModalDataset: samples=231, features=360, no. window=6, label_columns='standard activity code')

## WISDM

In [20]:
# WISDM Loader
loader = WISDM_UnbalancedView20Hz(
    "../../data/views/WISDM/unbalanced_20Hz_train_test-v1", 
    download=False
)

# Print the readme (optional)
loader.print_readme()

# Unbalanced WISDM View Resampled to 20Hz

This view contain only the train and test files for [WISDM dataset](https://archive.ics.uci.edu/ml/datasets/WISDM+Smartphone+and+Smartwatch+Activity+and+Biometrics+Dataset) (70% samples train and 30% test).
The dataset was sampled at 20Hz and interpolated using the cubic spline method due to non stable sampling.

## Activity codes
- 0: walking (2188 train, 0 validation, 886 test) 
- 1: jogging (2070 train, 0 validation, 887 test) 
- 2: stairs (2187 train, 0 validation, 827 test) 
- 3: sitting (2189 train, 0 validation, 886 test) 
- 4: standing (2189 train, 0 validation, 887 test) 
 

## Standartized activity codes
- 0: sit (2189 train, 0 validation, 886 test) 
- 1: stand (2189 train, 0 validation, 887 test) 
- 2: walk (2188 train, 0 validation, 886 test) 
- 5: run (2070 train, 0 validation, 887 test) 
- 6: stair up and down (2187 train, 0 validation, 827 test) 
      




In [21]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True, label=loader.standard_label)
train_val, test

(PandasMultiModalDataset: samples=10823, features=360, no. window=6, label_columns='standard activity code',
 PandasMultiModalDataset: samples=4373, features=360, no. window=6, label_columns='standard activity code')

## UCI-HAR

In [24]:
# UCI-HAR Loader
loader = UCIHAR_UnbalancedView20Hz(
    "../../data/views/CHARM/balanced_20Hz_train_test-v1", 
    download=False
)

# Print the readme (optional)
loader.print_readme()

# Balanced CHARM View

This is a view from [CHARM dataset](https://zenodo.org/record/4642560) that was spllited into 3s windows. The sample rate was 20Hz.

The data was first splitted in two sets: train and test. Each one with the following proportions:
- Train: 70% of samples
- Test: 30% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.

## Activity codes
- 0: sitting on a chair (105 train, 0 validation, 33 test) 
- 1: sitting on a couch (105 train, 0 validation, 33 test) 
- 2: standing (105 train, 0 validation, 33 test) 
- 6: walking (105 train, 0 validation, 33 test) 
- 7: running (105 train, 0 validation, 33 test) 
- 8: walking upstairs (105 train, 0 validation, 33 test) 
- 9: walking downstairs (105 train, 0 validation, 33 test) 
 

## Standartized activity codes
- 0: sit (210 train, 0 validation, 66 test) 
- 1: stand (105 train, 0 validation, 33 test) 
- 2: walk (105 train, 0 validation, 33 test) 
- 3: stair up (105 train, 0 validation, 33 test) 
- 4: stair down (105 train, 0 validation, 33 test) 
- 5: run (105 train, 0 validation, 33 test) 
      




In [25]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=735, features=360, no. window=6, label_columns='activity code',
 PandasMultiModalDataset: samples=231, features=360, no. window=6, label_columns='activity code')