## Setup


### Environment
Make sure your miniconda environment is `eai-repair` when you run this script.
If you are in a different environment, reopen this notebook after entering `eai-repair` environment.

In [None]:
!conda info -e

### Prepare dataset and model

Several folders are required to run this script.
The `docs/bdd_image_extract/README.adoc` in this repository is described as Document-1 and `docs/user_manual/README.adoc` as Document-2.
- Folder containing label files in scalabel format
   - Using the terminology from this site ( https://doc.scalabel.ai/format.html ) , the label file must be a json file that satisfies the following conditions.
      - The content of the json is an array of frames.
      - Each frame element contains a name and attributes.
   - This document covers the BDD2020 and is the result of the unpacking of `bdd100k_det_20_labels_trainval.zip` in `2. Berkley DeepDrive (BDD) Dataset` in Document-1.
   - Used for option `scalabel_format_label_path`.
- Execution result of `repair create_image_subset`
   - For the target dataset (in this document, the BDD2020), this is the result of executing `3.3. Create image subset` in Document-1.
   - Used for option `create_image_subset_output_path`.
- Execution result of `repair prepare`
   - For the target dataset (in this document, the BDD2020), this is the result of executing `2.6. Preparation` in Document-2.
   - Used for option `h5_dataset_path`.
- Model used for inference
   - In this document, `datasets/BDD100K-Classification/model/VGG16` in repository `eAI-Repair-exp` is used.
   - Used for option `model_dir`.
   
The following file structure shows the required files and their structure for this notebook.
Subsequent scripts are implemented assuming that at least these files exist.
These scripts work even if folders contain other files and folders.

```bash
$ !cd ../../ && tree ./outputs --filelimit 5

./outputs
├── VGG16
│   ├── assets/
│   ├── keras_metadata.pb
│   ├── saved_model.pb
│   └── variables/
├── bdd100k
│   └── labels
│       └── det_20
│           ├── det_train.json
│           └── det_val.json
├── create_image_subset_result
│   ├── train [288964 entries exceeds filelimit, not opening dir]
│   └── val [104918 entries exceeds filelimit, not opening dir]
└── prepare_result
    ├── repair.h5
    ├── test.h5
    └── train.h5
```

In [None]:
# if　the result of below command contains above files, then future scripts should work fine.
!cd ../../ && tree ./outputs --filelimit 5

## Execute tool
Not using the GPU will cause some error, etc, but the CPU will run instead.
The following commands are prefixed with `cd ../../ &&` so that they are run from the top directory of the `eAI-Repair` repository.

In [None]:
!cd ../../ && pwd

### `calc_target=scene_prob`
Calculate the probability of occurrence of a specific scene from Datasets.
This time the output is formatted by specifying `--format_json=True` for readability.

In [4]:
# Execute
!cd ../../ && \
mkdir -p ./outputs/scene_prob/ && \
repair utils --dataset=BDD-Objects \
    --call risk_calculation_tool \
    --calc_target=scene_prob \
    --h5_dataset_path=./outputs/prepare_result/ \
    --create_image_subset_output_path=./outputs/create_image_subset_result/ \
    --output_dir=./outputs/scene_prob/ \
    --format_json=True \
    --scalabel_format_label_path=./outputs/bdd100k/labels/det_20/ \
    --label=car \
    --attributes=weather=rainy,timeofday=dawn/dusk

2022-12-15 04:58:20.887554: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
reading folder of create_image_subset
finished reading
[Query]
{
    "label": "2",
    "weather": "rainy",
    "timeofday": "dawn/dusk"
}
[Summary of scene prob]
{
    "total_image_count": 393880,
    "image_count_matched_to_query": 1184,
    "existence_rate": 0.003005991672590637
}


In [5]:
# Read result
!cd ../../ && cat ./outputs/scene_prob/results.json

{
    "response": {
        "query": {
            "label": "2",
            "weather": "rainy",
            "timeofday": "dawn/dusk"
        },
        "scene_prob": {
            "summary": {
                "total_image_count": 393880,
                "image_count_matched_to_query": 1184,
                "existence_rate": 0.003005991672590637
            }
        }
    }
}

In [6]:
# Read result: A folder containing images that match the query
!cd ../../ && ls -1 ./outputs/scene_prob/matched_data | wc -l

1184


In [7]:
!cd ../../ && ls -1 ./outputs/scene_prob/matched_data | head

1000810_8cc99f48-ca215966.jpg
1000811_8cc99f48-ca215966.jpg
1002699_8d18f11d-2d2299e7.jpg
1008112_8dd654ee-75143ef0.jpg
1008113_8dd654ee-75143ef0.jpg
100958_0e587038-3a0073a1.jpg
1015320_8ec8ab14-277df119.jpg
1015322_8ec8ab14-277df119.jpg
1015334_8ec8ab14-277df119.jpg
1035386_9195066c-31518c7b.jpg
ls: write error: Broken pipe


### `calc_target=miss_rate`
Calculate the misrecognition rate for a given trained model on a specific dataset.

In [5]:
# Execute
!cd ../../ && \
mkdir -p ./outputs/miss_rate/ && \
CUDA_VISIBLE_DEVICES=-1 repair utils --dataset=BDD-Objects \
    --call risk_calculation_tool \
    --calc_target=miss_rate \
    --h5_dataset_path=./outputs/prepare_result \
    --create_image_subset_output_path=./outputs/create_image_subset_result/ \
    --output_dir=./outputs/miss_rate/ \
    --format_json=True \
    --model_dir=./outputs/VGG16/

2022-12-15 06:24:06.037811: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
reading folder of create_image_subset
finished reading
start predict
2022-12-15 06:25:31.813602: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
100%|██████████████████████████████████| 393880/393880 [31:27<00:00, 254.61it/s]finished predict
[Summary of miss rate]
{
    "0": {
        "total_image_count": 6980,
        "misrecognized_image_count": 1366,
        "misrecognized_rate": 0.19570200573065902
    },
    "1": {
        "total_image_count": 12139,
        "misrecognized_image_count": 5378,
        "misrecognized_rate": 0.44303484636296236
    },
    "2": {
        "total_image_count": 225941,
        "misrecognized_image_count": 11883,
        "misrecognized_rate": 0.

In [6]:
# Read result
!cd ../../ && fold ./outputs/miss_rate/results.json

{
    "response": {
        "miss_rate": {
            "summary": {
                "0": {
                    "total_image_count": 6980,
                    "misrecognized_image_count": 1366,
                    "misrecognized_rate": 0.19570200573065902
                },
                "1": {
                    "total_image_count": 12139,
                    "misrecognized_image_count": 5378,
                    "misrecognized_rate": 0.44303484636296236
                },
                "2": {
                    "total_image_count": 225941,
                    "misrecognized_image_count": 11883,
                    "misrecognized_rate": 0.0525933761468702
                },
                "3": {
                    "total_image_count": 2709,
                    "misrecognized_image_count": 1621,
                    "misrecognized_rate": 0.598375784422296
                },
                "4": {
                    "total_image_count": 0,
              

In [7]:
# Read result: A folder containing misrecognized images
!cd ../../&& tree ./outputs/miss_rate/misrecognision_data -L 2

[01;34m./outputs/miss_rate/misrecognision_data[00m
├── [01;34m0[00m
│   ├── [01;34m1[00m
│   ├── [01;34m12[00m
│   ├── [01;34m2[00m
│   ├── [01;34m3[00m
│   ├── [01;34m6[00m
│   ├── [01;34m7[00m
│   ├── [01;34m8[00m
│   └── [01;34m9[00m
├── [01;34m1[00m
│   ├── [01;34m0[00m
│   ├── [01;34m12[00m
│   ├── [01;34m2[00m
│   ├── [01;34m3[00m
│   ├── [01;34m6[00m
│   ├── [01;34m7[00m
│   ├── [01;34m8[00m
│   └── [01;34m9[00m
├── [01;34m12[00m
│   ├── [01;34m0[00m
│   ├── [01;34m1[00m
│   ├── [01;34m2[00m
│   ├── [01;34m3[00m
│   ├── [01;34m6[00m
│   ├── [01;34m7[00m
│   ├── [01;34m8[00m
│   └── [01;34m9[00m
├── [01;34m2[00m
│   ├── [01;34m0[00m
│   ├── [01;34m1[00m
│   ├── [01;34m12[00m
│   ├── [01;34m3[00m
│   ├── [01;34m6[00m
│   ├── [01;34m7[00m
│   ├── [01;34m8[00m
│   └── [01;34m9[00m
├── [01;34m3[00m
│   ├── [01;34m0[00m
│   ├── [01;34m1[00m
│   ├── [01;34m12[00m
│

In [15]:
# Read result: Inner of a folder containing misrecognized images
!cd ../../&& ls -1 ./outputs/miss_rate/misrecognision_data/0/1 |  head -n 5

1059039_9542db5e-dea288c8.jpg
1240245_addf601d-df742bd4.jpg
1240877_adf9e7c8-17566506.jpg
125749_c2ab5734-0f552875.jpg
152273_c605bc6e-a4a48e5c.jpg


### `calc_target=scene_prob_and_miss_rate`
Output the results of both `calc_target=scene_prob` and `calc_target=miss_rate`.
Since it is not necessary to specify `--format_json=True` when combining with other tools, let's look at the output in that case.

In [9]:
# Execute
!cd ../../ && \
mkdir -p ./outputs/scene_prob_and_miss_rate/ && \
CUDA_VISIBLE_DEVICES=-1 repair utils --dataset=BDD-Objects \
    --call risk_calculation_tool \
    --calc_target=scene_prob_and_miss_rate \
    --h5_dataset_path=./outputs/prepare_result \
    --create_image_subset_output_path=./outputs/create_image_subset_result/ \
    --output_dir=./outputs/scene_prob_and_miss_rate/ \
    --scalabel_format_label_path=./outputs/bdd100k/labels/det_20/ \
    --label=car \
    --attributes=weather=rainy,timeofday=dawn/dusk \
    --model_dir=./outputs/VGG16/

2022-12-15 07:00:54.633310: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
reading folder of create_image_subset
finished reading
[Query]
{
    "label": "2",
    "weather": "rainy",
    "timeofday": "dawn/dusk"
}
[Summary of scene prob]
{
    "total_image_count": 393880,
    "image_count_matched_to_query": 1184,
    "existence_rate": 0.003005991672590637
}
start predict
2022-12-15 07:02:20.989787: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
100%|███████████████████████████████████████| 1184/1184 [00:45<00:00, 20.91it/s]finished predict
[Summary of miss rate]
{
    "0": {
        "total_image_count": 0,
        "misrecognized_image_count": 0,
        "misrecognized_rate": 0
    },
    "1": {
        "total_image_count": 0,
        "misrecognized_

In [10]:
# Read result
!cd ../../ && fold ./outputs/scene_prob_and_miss_rate/results.json

{"response": {"query": {"label": "2", "weather": "rainy", "timeofday": "dawn/dus
k"}, "scene_prob": {"summary": {"total_image_count": 393880, "image_count_matche
d_to_query": 1184, "existence_rate": 0.003005991672590637}}, "miss_rate": {"summ
ary": {"0": {"total_image_count": 0, "misrecognized_image_count": 0, "misrecogni
zed_rate": 0}, "1": {"total_image_count": 0, "misrecognized_image_count": 0, "mi
srecognized_rate": 0}, "2": {"total_image_count": 1184, "misrecognized_image_cou
nt": 64, "misrecognized_rate": 0.05405405405405406}, "3": {"total_image_count": 
0, "misrecognized_image_count": 0, "misrecognized_rate": 0}, "4": {"total_image_
count": 0, "misrecognized_image_count": 0, "misrecognized_rate": 0}, "5": {"tota
l_image_count": 0, "misrecognized_image_count": 0, "misrecognized_rate": 0}, "6"
: {"total_image_count": 0, "misrecognized_image_count": 0, "misrecognized_rate":
 0}, "7": {"total_image_count": 0, "misrecognized_image_count": 0, "misrecognize
d_rate": 0}, "8"

In [17]:
# Read result: A folder containing images that match the query
!cd ../../ && ls -1 ./outputs/scene_prob_and_miss_rate/matched_data | head -n 5

1000810_8cc99f48-ca215966.jpg
1000811_8cc99f48-ca215966.jpg
1002699_8d18f11d-2d2299e7.jpg
1008112_8dd654ee-75143ef0.jpg
1008113_8dd654ee-75143ef0.jpg
ls: write error: Broken pipe


In [12]:
# Read result: A folder containing misrecognized images
!cd ../../ && tree ./outputs/scene_prob_and_miss_rate/misrecognision_data

[01;34m./outputs/scene_prob_and_miss_rate/misrecognision_data[00m
└── [01;34m2[00m
    ├── [01;34m0[00m
    │   ├── [01;35m130946_c359e7b1-5b68aea4.jpg[00m
    │   └── [01;35m16780_b41ace08-830c808c.jpg[00m
    ├── [01;34m1[00m
    │   ├── [01;35m154046_15f89ba0-d8a70cb4.jpg[00m
    │   ├── [01;35m160017_c70a8ace-931896fd.jpg[00m
    │   ├── [01;35m48831_b83d28bd-a9ab3f1d.jpg[00m
    │   ├── [01;35m517041_491a8e99-8665cd7b.jpg[00m
    │   ├── [01;35m610894_56180d13-fc15b5bf.jpg[00m
    │   └── [01;35m818523_73760969-aea6a396.jpg[00m
    ├── [01;34m12[00m
    │   ├── [01;35m1042_b1e1a7b8-0aec80e8.jpg[00m
    │   ├── [01;35m1044590_93395406-2777c722.jpg[00m
    │   ├── [01;35m1046_b1e1a7b8-0aec80e8.jpg[00m
    │   ├── [01;35m1093181_99e9d816-7fdd0a07.jpg[00m
    │   ├── [01;35m1179119_a5a7b92e-3448f13d.jpg[00m
    │   ├── [01;35m1192304_a7825612-cceb24d2.jpg[00m
    │   ├── [01;35m1253079_af5e53ae-49d487fa.jpg[00m
    │   ├── [