## Idea
Based on the saliency maps, bounding boxes can be automatically generated. The
bounding box generation and selection process is explained in [the original paper]
(https://arxiv.org/abs/2204.11535).

## File structure

First, download the scene-specific parameters from [Google Drive](https://drive.google.com/drive/folders/1jWjJDcpYJ6cNECV1USC0dg0xRwMTSRWb?usp=sharing).

In your PVDN dataset (day cycle) directory, the downloaded parameter config files have to be stored like this:
```
/path/to/PVDN/day/<split>/labels/kpbms_params
├── S*****.json
├── S*****.json
├── ....
├── S*****.json
```

For example, for the **val** split, this would look like this:
```
/path/to/PVDN/day/val/labels/kpbms_params
├── S00071.json
├── S00092.json
├── S00100.json
├── S00101.json
├── S00121.json
├── S00123.json
├── S00126.json
├── S00132.json
├── S00135.json
├── S00164.json
├── S00168.json
├── S00192.json
├── S00195.json
├── S00260.json
├── S00284.json
├── S00294.json
├── S00309.json
├── S00355.json
├── S00370.json
└── S00372.json
```

A scene config file looks like this (example: S00071):
```
{
    "kernel_size": 300,
    "lower_direct": 0.7000000000000001,
    "lower_indirect": 0.9,
    "n": 6,
    "selem_size": 13,
    "sigma": 0,
    "upper_direct": 1.0,
    "upper_indirect": 1.0
}
```

## Generate the dataset
The following code will generate the dataset. For each image, a `.json` file
containing the bounding box annotations as well as the labels will be stored. Note
that as soon as a bounding box contains at least one direct keypoint annotation, it
is labeled as direct.

Regarding the file structure of the annotations, in your __labels/__ directory of
your dataset a new directory __kpbms_boxes/__ will be generated. Within this
directory, all image specific annotation files are stored. An annotation file looks
something like this:

```
{
    "bounding_boxes": [
        [567, 537, 573, 541],
        [592, 537, 599, 542],
        [565, 554, 573, 560],
        [589, 554, 599, 561],
        [695, 544, 727, 555]],
    "labels": [1, 1, 2, 2, 2]
}
```

### Run the code

**Note:** Make sure to adjust the dataset paths in the second cell accordingly.

In [None]:
!pip install pip --upgrade
!pip install -e ..  # install our package

import os
import numpy as np

from kpsaliency.datasets import SaliencyBoxDataset
from kpsaliency.generators import KPBMSBoxGenerator, KPBMSGenerator


In [2]:
## setup all paths

dataset_dir = "/path/to/PVDN/day"   # adapt this to your
                                    # specific path
splits = ("train", "val", "test")

assert os.path.exists(dataset_dir)
for split in splits:
    assert os.path.exists(os.path.join(dataset_dir, split))


In [None]:
## run the dataset generation process

for split in splits:
    # setup and check paths
    split_dir = os.path.join(dataset_dir, split)
    params_dir = os.path.join(split_dir, "labels/kpbms_params")
    if not os.path.isdir(params_dir):
        raise NotADirectoryError(f"{params_dir} not found. Please check that you set"
                                 f" up the parameter directories properly.")

    # setup dataset
    dataset = SaliencyBoxDataset(split_dir)

    # setup scene-specific generators
    generators = {
        s.split(".")[0]: KPBMSBoxGenerator(KPBMSGenerator.from_json(os.path.join(params_dir, s)))
        for s in os.listdir(params_dir)
    }

    # run generation
    print(split)
    with np.errstate(invalid='ignore'):
        dataset.generate_dataset(box_generator=generators, n_workers=4, verbose=True)

You can also generate a dataset just from a single parameter description file. In this case, the same parameter setting is used for each scene:

In [None]:
    # adjust those variables to your specific case
    split = "train"
    param_path = "/path/to/params.json"

    # setup dataset
    split_dir = os.path.join(dataset_dir, split)
    dataset = SaliencyBoxDataset(split_dir)

    # setup generator
    generator = KPBMSBoxGenerator(KPBMSGenerator.from_json(param_path))

    # run generation
    print(split)
    with np.errstate(invalid='ignore'):
        dataset.generate_dataset(box_generator=generator, n_workers=8, verbose=True)

## Evaluating the quality of the generated dataset

As soon as you generated the dataset you can assess its quality based on the metrics described in the paper. A script version for evaluating your dataset quality is also provided in [scripts/evaluate_kpbms_groundtruth.py](../scripts/evaluate_kpbms_groundtruth.py).

In [None]:
import argparse
import json
import os

from kpsaliency.datasets import SaliencyBoxDataset
from kpsaliency.metrics.bboxes import DatasetEvaluator


for split in ("train", "val", "test"):
    print("Split:", split)
    split_dir = os.path.join(dataset_dir, split)
    box_dir = os.path.join(
        split_dir, "labels/kpbms_boxes"
    )
    dataset = SaliencyBoxDataset(path=split_dir, load_images=False,
                                bbox_path=box_dir)
    evaluator = DatasetEvaluator()
    prec, rec, fsc, kp_quality, kp_quality_std, box_quality, box_quality_std, combined = evaluator.evaluate_dataset(dataset, verbose=True)
