# Testing Dataset Loaders

Example of loading the following datasets (classes):

- KuHarResampledView20HZ
- MotionSenseResampledView20HZ
- CHARMUnbalancedView
- WISDMInterpolatedUnbalancedView
- UCIHARUnbalancedView

To load the datasets, you must:

- Wrap the dataset path arround one of the above classes (`root_dir` argument). You may want to download the dataset setting `download` argument to `True`
- Use `load` function. It will create `PandasMultiModalDataset` objects

In [1]:
import sys
sys.path.append("..")

In [2]:
from librep.datasets.har.loaders import (
    KuHarResampledView20HZ,
    MotionSenseResampledView20HZ,
    CHARMUnbalancedView,
    WISDMInterpolatedUnbalancedView,
    UCIHARUnbalancedView,
)

2022-09-28 18:32:28.717493: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-28 18:32:28.787333: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## KuHar

In [3]:
# KuHar Loader
loader = KuHarResampledView20HZ(
    "../data/views/KuHar/resampled_view_20Hz", download=False
)

# Print the readme (optional)
loader.print_readme()

# Balanced KuHar View Resampled to 20Hz

This view contains train, validation and test subsets in the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

## Activities:
- 0: Sit (185 train, 6 validation, 21 test)
- 1: Stand (185 train, 6 validation, 21 test)
- 2: Walk (185 train, 6 validation, 21 test)
- 3: Stair-up (185 train, 6 validation, 21 test)
- 4: Stair-down (185 train, 6 validation, 21 test)
- 5: Run (185 train, 6 validation, 21 test)
- 6: Talk-sit (185 train, 6 validation, 21 test)
- 7: Talk-stand (185 train, 6 validation, 21 test)
- 8: Stand-sit (185 train, 6 validation, 21 test)
- 9: Lay (185 train, 6 validation, 21 test)
- 10: Lay-stand (185 train, 6 validation, 21 test)
- 11: Pick (185 train, 6 validation, 21 test)
- 12: Jump (185 train, 6 validation, 21 test)
- 13: Push-up (185 train, 6 validation, 21 test)
- 14: Sit-up (185 train, 6 validation, 21 test)
- 15: Walk-backwards (185 train, 6 validation, 21 test)
- 16: Walk-circle (185 train, 6 validation, 21 test)
- 17: Table-tennis (185 train, 6 validation, 21 test)

## Users
- 62 users train dataset: 1003 (29 samples), 1004 (58 samples), 1005 (25 samples), 1008 (71 samples), 1011 (24 samples), 1013 (54 samples), 1014 (120 samples), 1015 (56 samples), 1016 (39 samples), 1017 (24 samples), 1018 (35 samples), 1020 (32 samples), 1021 (39 samples), 1022 (102 samples), 1023 (63 samples), 1024 (117 samples), 1025 (39 samples), 1026 (89 samples), 1027 (64 samples), 1029 (39 samples), 1031 (42 samples), 1032 (21 samples), 1033 (18 samples), 1034 (138 samples), 1035 (7 samples), 1037 (67 samples), 1038 (48 samples), 1039 (103 samples), 1040 (92 samples), 1041 (96 samples), 1042 (85 samples), 1043 (87 samples), 1046 (82 samples), 1047 (37 samples), 1048 (38 samples), 1049 (36 samples), 1051 (28 samples), 1053 (29 samples), 1054 (8 samples), 1055 (36 samples), 1058 (29 samples), 1060 (31 samples), 1061 (33 samples), 1063 (27 samples), 1064 (19 samples), 1067 (16 samples), 1068 (32 samples), 1069 (25 samples), 1070 (33 samples), 1073 (15 samples), 1074 (14 samples), 1075 (17 samples), 1076 (31 samples), 1078 (20 samples), 1079 (26 samples), 1081 (51 samples), 1083 (30 samples), 1084 (29 samples), 1085 (29 samples), 1087 (32 samples), 1090 (42 samples), 1101 (532 samples).
- 9 users validation dataset: 1002 (58 samples), 1006 (5 samples), 1019 (6 samples), 1062 (6 samples), 1065 (3 samples), 1071 (13 samples), 1072 (1 samples), 1082 (10 samples), 1086 (6 samples).
- 18 users test dataset: 1001 (12 samples), 1007 (19 samples), 1009 (8 samples), 1010 (6 samples), 1028 (10 samples), 1030 (29 samples), 1036 (45 samples), 1044 (66 samples), 1045 (58 samples), 1050 (10 samples), 1052 (14 samples), 1056 (22 samples), 1057 (10 samples), 1066 (15 samples), 1077 (23 samples), 1080 (10 samples), 1088 (10 samples), 1089 (11 samples).

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.



In [4]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=3438, features=360, no. window=6,
 PandasMultiModalDataset: samples=378, features=360, no. window=6)

## MotionSense

In [5]:
# MotionSense Loader
loader = MotionSenseResampledView20HZ(
    "../data/views/MotionSense/resampled_view_20Hz", download=False
)

# Print the readme (optional)
loader.print_readme()

# Resampled to 20Hz MotionSense View

This view contains train, validation and test subsets in the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

## Activities:
- sit: 0 (569 train, 101 validation, 170 test)
- std: 1 (569 train, 101 validation, 170 test)
- wlk: 2 (569 train, 101 validation, 170 test)
- ups: 3 (569 train, 101 validation, 170 test)
- dws: 4 (569 train, 101 validation, 170 test)
- jog: 5 (569 train, 101 validation, 170 test)

## Users
- 16 users train dataset: 1 (218 samples), 2 (219 samples), 5 (185 samples), 6 (218 samples), 8 (233 samples), 9 (202 samples), 10 (218 samples), 11 (211 samples), 12 (197 samples), 13 (183 samples), 15 (208 samples), 16 (246 samples), 17 (209 samples), 21 (254 samples), 22 (200 samples), 23 (213 samples).
- 3 users validation dataset: 4 (190 samples), 7 (211 samples), 20 (205 samples).
- 5 users test dataset: 3 (222 samples), 14 (183 samples), 18 (223 samples), 19 (233 samples), 24 (159 samples).

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.



In [6]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=4020, features=360, no. window=6,
 PandasMultiModalDataset: samples=1020, features=360, no. window=6)

## CHARM

In [7]:
# CHARM Loader
loader = CHARMUnbalancedView(
    "../data/views/CHARM/unbalanced_view_train_test-v1", download=False
)

# Print the readme (optional)
loader.print_readme()

# Unbalanced CHARM View (a.k.a. V0)

This view contain only the train and test files for [CHARM dataset](https://zenodo.org/record/4642560) (70% samples train and 30% test).
The dataset was sampled at 20Hz.

## Activities:

- 0: Sitting in a Chair  (258 train, 105 test)
- 1: Sitting in a Couch. (260 train, 105 test)
- 2: Standing (105 train, 36 test)
- 3: Lying up (237 train, 85 test)
- 4: Lying side (232 train, 87 test)
- 5: Device on surface (155 train, 65 test)
- 6: Walking (226 train, 64 test)
- 7: Running (237 train, 84 test)
- 8: Walking Upstairs (113 train, 33 test)
- 9: Walking Downstairs (229 train, 83 test)

## Users

There are 30 users in total. Each sample is from a single user.

- Samples from user 0-19 are in train file
- Samples from user 21-30 are in test file


In [8]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=2052, features=360, no. window=6,
 PandasMultiModalDataset: samples=747, features=360, no. window=6)

## WISDM

In [9]:
# WISDM Loader
loader = WISDMInterpolatedUnbalancedView(
    "../data/views/WISDM/interpolated_unbalanced_view_train_test-v1", download=False
)

# Print the readme (optional)
loader.print_readme()

# Interpolated unbalanced WISDM View (a.k.a. V2)

This view contain only the train and test files for [WISDM dataset](https://archive.ics.uci.edu/ml/datasets/WISDM+Smartphone+and+Smartwatch+Activity+and+Biometrics+Dataset) (70% samples train and 30% test).
The dataset was sampled at 20Hz and interpolated using the cubic spline method due to non stable sampling.

## Activities:

0: Walking (2188 train, 886 test)
1: Jogging (2070 train, 887 test)
2: Stairs (2187 train, 827 test)
3: Sitting (2189 train, 886 test)
4: Standing (2189 train, 887 test)


## Users

There are 51 users in total. Each sample is from a single user.

- Samples from user 1600-1635 are in train file
- Samples from user 1631-1650 are in test file


In [10]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=10823, features=360, no. window=6,
 PandasMultiModalDataset: samples=4373, features=360, no. window=6)

## UCI-HAR

In [11]:
# UCI-HAR Loader
loader = UCIHARUnbalancedView(
    "../data/views/UCI-HAR/unbalanced_view_train_test-v1", download=False
)

# Print the readme (optional)
loader.print_readme()

# Unbalanced UCI-HAR View (a.k.a. V0)

This view contain only the train and test files for [UCI-HAR dataset](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones#) (70% samples train and 30% test).
The dataset was sampled at 50Hz.

## Activities:

1: Walking (506 train, 204 test)
2: Walking Upstairs (439 train, 189 test)
3: Walking Downstairs (395 train, 173 test)
4: Sitting (544 train, 204 test)
5: Standing (575 train, 227 test)
6: Laying (590 train, 227 test)


In [12]:
# Load the dataset
# If concat_train_validation is true, return a tuple (train+validation, test)
train_val, test = loader.load(concat_train_validation=True)
train_val, test

(PandasMultiModalDataset: samples=3049, features=900, no. window=6,
 PandasMultiModalDataset: samples=1224, features=900, no. window=6)