[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/open-pack/openpack-torch/blob/main/examples/notebooks/U-Net_Change-Input-Data.ipynb)

# U-Net | Change Input Data

This is a tutorial of [OpenPack Challenge 2022](https://open-pack.github.io/challenge2022/).

In this notebook, we will explain how to change the input sensor data modality for U-Net.
[U-Net_Train-Model-and-Make-Submission-File.ipynb](./U-Net_Train-Model-and-Make-Submission-File.ipynb) uses only the acceleration data from the atr02 (left wrist).
If you can add other sensors, you may improve the scores.

## [0] Inital Setup

### 0-1: Download Code and Install `openpack-torch`
NOTE: You can also install `openpack-torch` from PyPI with `pip install openpack-torch`.

In [1]:
! cd /content && git clone https://github.com/open-pack/openpack-torch.git

Cloning into 'openpack-torch'...
remote: Enumerating objects: 1249, done.[K
remote: Counting objects: 100% (100/100), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 1249 (delta 68), reused 59 (delta 59), pack-reused 1149[K
Receiving objects: 100% (1249/1249), 55.56 MiB | 23.18 MiB/s, done.
Resolving deltas: 100% (511/511), done.
Updating files: 100% (163/163), done.


In [2]:
! pip install openpack-torch

Collecting openpack-torch
  Downloading openpack_torch-1.0.1-py3-none-any.whl (29 kB)
Collecting hydra-core<2.0.0,>=1.3.1 (from openpack-torch)
  Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.5/154.5 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting omegaconf<3.0.0,>=2.3.0 (from openpack-torch)
  Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openpack-toolkit==1.0.1 (from openpack-torch)
  Downloading openpack_toolkit-1.0.1-py3-none-any.whl (25 kB)
Collecting pytorch-lightning<3.0,>=2.1 (from openpack-torch)
  Downloading pytorch_lightning-2.1.4-py3-none-any.whl (778 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m778.1/778.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Collecting antlr4-python3-runtime==4.9.* (from hydra-core<2.0.0,>=1.3.1->

### 0-2: Mount Your Google Drive

Follow the instruction of [Tutorial - Download OpenPack Dataset to Google Drive.ipynb](https://colab.research.google.com/drive/1YOnegl9L6UnlfermwJpevWLQ43anwwGd?usp=sharing) to download OpenPack Dataset (v1.0.0) to your Google Drive.

After you finish downloading the datasets, mount your Google Drive to this notebook and create a shortcut to `/content/data`.

In [3]:
from google.colab import drive
drive.mount('/content/drive')
! ln -s "/content/drive/MyDrive/Colab Notebooks/openpack/data/" "/content/data"

Mounted at /content/drive


### 0-3: Import Modules

In [4]:
import os
import shutil
import logging
from pathlib import Path
from typing import Dict, List, Optional, Tuple

import hydra
import numpy as np
import openpack_toolkit as optk
import openpack_torch as optorch
import pandas as pd
from omegaconf import DictConfig, OmegaConf

from openpack_toolkit import OPENPACK_OPERATIONS, ActSet

In [5]:
optorch.configs.register_configs()

In [6]:
! cp -r /content/openpack-torch/examples/configs /content/

## [1] Customize Input Data Stream (IMU Data)

Input modalities are controled by a **DatasetConfig** and a **DataStreamConfig**.
With these files, only acceleration data from atr02 is loaded.
In this section, let's change the config files to load acceleration data from 4 IMU (i.e., atr01--atr04).

**DatasetConfig**:  [configs/dataset/atr-left-wrist.yaml](https://github.com/open-pack/openpack-torch/blob/main/examples/unet/configs/dataset/atr-left-wrist.yaml)

This file defines annotation data (`annotation` property), input sensor data configuration (`stream` property), data split (`split` property), and activity set (`classes` property).
You need to change `stream` property to change input stream. The value of `stream` property is the filename in the `stream/` folder such as `configs/dataset/stream/atr-acc-left-wrist.yaml`.

```yaml
defaults:
  - annotation: activity-1s
  - stream: atr-acc-left-wrist
  - split: openpack-challenge-2022
  - classes: OPENPACK_OPERATIONS
name: "atr-acc-left-wrist"
```

**DataStreamConfig**: [configs/dataset/stream/atr-acc-left-wrist.yaml](https://github.com/open-pack/openpack-torch/blob/main/examples/unet/configs/dataset/stream/atr-acc-left-wrist.yaml)

This file defines the sensor nodes and sensor type (i.e., acc, gyro, quat) and loaded by the parent config file (i.e., `configs/dataset/atr-left-wrist.yaml`).

```yaml
defaults:
  - atr-qags
  - _self_
name: atr-acc-left-wrist
super_stream: atr-qags
devices:
  - atr02
acc: true
gyro: false
quat: false
```

Let's start to add input modalities from here.

### 1-1: Create New `DataStreamConfig`

Create a yaml file to `./configs/dataset/split/atr-acc-all.yaml` and copy & paste the following contents.


```yaml
defaults:
  - atr-qags
  - _self_
name: atr-acc-all # Set data stream name
super_stream: atr-qags
devices: # Add sensor nodes to here.
  - atr01
  - atr02
  - atr03
  - atr04
acc: true
gyro: false # If you want to use gyro data as well as acc, please set true.
quat: false
```

### 1-2: Create New `DatasetConfig`

Create a yaml file to `./configs/dataset/atr-acc.yaml` and copy & paste the following contents.

```yaml
defaults:
  - annotation: activity-1s
  - stream: atr-acc-all # set filename that you created in the previous step.
  - split: openpack-challenge-2022
  - classes: OPENPACK_OPERATIONS
name: "atr-acc" # set dataset config name. This value will be included in the log directory path.
```

### 1-3: Update Root Config  (`unet.yaml`)

Update the dataset field in the root config file ([./config/unet.yaml](https://github.com/open-pack/openpack-torch/blob/main/examples/unet/configs/unet.yaml)).

```yaml
defaults:
  - dataset: atr-acc # <= EDIT HERE!! Set the filename of DatasetConfig that you created in the previous step.
  - override hydra/job_logging: custom
  - _self_
...
```

When you did the above steps, current directry is like this.

```bash
configs/
├── dataset
│   ├── atr-acc.yaml
│   ├── atr-left-wrist.yaml
│   └── stream
│       ├── atr-acc-left-wrist.yaml
│       └── atr-acc-all.yaml
├── hydra
│   └── job_logging
│       └── custom.yaml
└── unet.yaml
```

### 1-4: Load Config Files

In [None]:
with hydra.initialize_config_dir(version_base=None, config_dir="/content/configs"):
    cfg = hydra.compose(
        # config_name="unet.yaml",
        config_name="unet-tutorial2.yaml",
    )
cfg.dataset.annotation.activity_sets = dict() # Remove this attribute just for the simpler visualization.
cfg.dataset.split = optk.configs.datasets.splits.DEBUG_SPLIT

In [None]:
print(OmegaConf.to_yaml(cfg.dataset.stream))

schema: ImuConfig
name: atr-acc-all
description: null
super_stream: atr-qags
path:
  dir: ${path.openpack.rootdir}/${user.name}/atr/${device}
  fname: ${session}.csv
file_format: null
frame_rate: 30
devices:
- atr01
- atr02
- atr03
- atr04
acc: true
gyro: false
quat: false



### 1-5: Load Dataset!

In [None]:
class OpenPackImuDataModule(optorch.data.OpenPackBaseDataModule):
    dataset_class = optorch.data.datasets.OpenPackImu

    def get_kwargs_for_datasets(self, stage: Optional[str] = None) -> Dict:
        kwargs = {
            "window": self.cfg.train.window,
            "debug": self.cfg.debug,
        }
        return kwargs

In [None]:
datamodule = OpenPackImuDataModule(cfg)
datamodule.setup("test")
dataloaders = datamodule.test_dataloader()

batch = dataloaders[0].dataset.__getitem__(0)
print(batch.keys())

No preprocessing is applied.


dict_keys(['x', 't', 'ts'])


In [None]:
batch['x'].shape

torch.Size([12, 1800, 1])

The shape of input tensor is (`NUM_OF_ONPUT_CHANNELS`, `TIMESTEPS`, 1).
So you can see that `NUM_OF_ONPUT_CHANNELS = 12` which indicates that 3 channel (x-,y-,z-axis) from 4 sensor nodes are includes.

### Tips
If you create your own config file, I recommend you to store them in your google drive.

## [2] Load Preprocessed Data

You can load other sensor streams or preprocessed data by updating Dataset Class.
If you make preprocessed dataset, please split them into sessions (e.g., "U0101-S0100").

Here is an example to load HT data.

In [None]:
class OpenPackImuHt(optorch.data.datasets.OpenPackImu):
    def load_dataset(
        self,
        cfg: DictConfig,
        user_session_list: Tuple[Tuple[int, int], ...],
        window: int = None,
        submission: bool = False,
    ) -> None:
        """Called in ``__init__()`` and load required data.
        Args:
            user_session (Tuple[Tuple[str, str], ...]): _description_
            window (int, optional): _description_. Defaults to None.
            submission (bool, optional): _description_. Defaults to False.
        """
        data, index = [], []
        for seq_idx, (user, session) in enumerate(user_session_list):
            with open_dict(cfg):
                cfg.user = {"name": user}
                cfg.session = session


            """ >>>>> EDIT HERR >>>>>
            Add function to load correspondig session!
            """
            # -- IMU --
            ts_sess, x_sess = load_imu_wrapper(cfg)
            # -- HT & Label Printer --
            anchor_sess = load_system_ht_wrapper(cfg, ts_sess) # function to load System/HT data. (Please implement by yourself)
            """ <<<<<<<<<<<<<<<<<<<<<
            """

            # -- annotation --
            label = load_annot_wrapper(cfg, ts_sess, submission, self.classes)

            data.append({
                "user": user,
                "session": session,
                "data": x_sess,
                "data/anchor": anchor_sess.astype(x_sess.dtype), # << ADD loaded sequence!!
                "label": label,
                "unixtime": ts_sess,
            })

            seq_len = ts_sess.shape[0]
            index += [dict(seq=seq_idx, seg=seg_idx, pos=pos)
                      for seg_idx, pos in enumerate(range(0, seq_len, window))]
        self.data = data
        self.index = tuple(index)

    def __str__(self) -> str:
        s = (
            "OpenPackImuHt("
            f"index={len(self.index)}, "
            f"num_sequence={len(self.data)}, "
            f"submission={self.submission} "
            f"random_crop={self.random_crop}"
            ")"
        )
        return s

    def __getitem__(self, index: int) -> Dict:
        seq_idx, seg_idx = self.index[index]["seq"], self.index[index]["seg"]
        seq_dict = self.data[seq_idx]
        seq_len = seq_dict["data"].shape[1]

        head, tail, pad_tail = get_segment_head_and_tail(
            seg_idx, self.window, seq_len, self.random_crop)

        # extract a segment
        x = seq_dict["data"][:, head:tail, np.newaxis]
        x_anchor = seq_dict["data/anchor"][:, head:tail, np.newaxis] # << ADD
        t = seq_dict["label"][head:tail]
        ts = seq_dict["unixtime"][head:tail]

        if pad_tail > 0:
            x = np.pad(x, [(0, 0), (0, pad_tail), (0, 0)],
                       mode="constant", constant_values=0)
            x_anchor = np.pad(x_anchor, [(0, 0), (0, pad_tail), (0, 0)],
                              mode="constant", constant_values=0)  # << ADD
            t = np.pad(t, [(0, pad_tail)], mode="constant",
                       constant_values=self.classes.get_ignore_class_index())
            ts = np.pad(ts, [(0, pad_tail)],
                        mode="constant", constant_values=ts[-1])

        x = torch.from_numpy(x)
        x_anchor = torch.from_numpy(x_anchor) # << ADD
        t = torch.from_numpy(t)
        ts = torch.from_numpy(ts)
        return {"x": x, "x/anchor": x_anchor, "t": t, "ts": ts}

In [None]:
class OpenPackImuHtDataModule(optorch.data.OpenPackBaseDataModule):
    # NOTE: Change Dataset Class
    dataset_class = OpenPackImuHt

    def get_kwargs_for_datasets(self, stage: Optional[str] = None) -> Dict:
        kwargs = {
            "window": self.cfg.train.window,
            "debug": self.cfg.debug,
        }
        return kwargs

    def setup(self, stage: Optional[str] = None):
        super().setup(stage=stage)
        if self.op_train is not None:
            self.op_train.random_crop = True
            logger.debug(
                f"enable random_crop in training dataset: {self.op_train}")

## [3] Apply Preprocessing (Online Preprocessing)

Dataset class has `preprocessing()` method and you can implement only preprocessing logi here.

Here is an example to normalize acceleration data ([-3G, +3G]) into [0, 1].

In [None]:
class OpenPackImuNormalize(optorch.data.datasets.OpenPackImu):
    """Dataset class for IMU + HT.
    """
    def preprocessing(self) -> None:
        """
        * Normalize [-3G, +3G] into [0, 1].
        """
        # NOTE: Normalize ACC data. ([-3G, +3G] -> [0, 1])
        # NOTE: Described in Appendix Sec.3.2.
        for seq_dict in self.data:
            x = seq_dict.get("data")
            x = np.clip(x, -3, +3)
            x = (x + 3.) / 6.
            seq_dict["data"] = x
