# Collecting out-of-distribution images using OODTrigger

About OOD  and why is it useful
What is going to be done in this notebook. 



In [1]:
# As usual, we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env(
    env_file_path="/Users/rgangire/workspace/code/repos/Geti-SDK/dev/geti-sdk/notebooks/use_cases/.env"
)

geti = Geti(server_config=geti_server_configuration)

## Selecting a project

We'll use the `CUB6` project that is already created. This project contains 6 classes of birds with each class having 50 images. The project is already trained and ready for deployment.


In [2]:
from geti_sdk.rest_clients import ModelClient, ProjectClient

PROJECT_NAME = "CUB6"

project_client = ProjectClient(session=geti.session, workspace_id=geti.workspace_id)
project = project_client.get_project_by_name(project_name=PROJECT_NAME)
model_client = ModelClient(
    session=geti.session, workspace_id=geti.workspace_id, project=project
)

## Creating a deployment for the project.

The OOD detection model uses feature vectors from the trained model to detect out-of-distribution images. Therefore we need to create a deployment with a model that has an XAI head.


In [3]:
from geti_sdk.detect_ood.utils import get_deployment_with_xai_head

deployment = get_deployment_with_xai_head(geti=geti, model_client=model_client)

2024-08-20 17:22:30,776 - ERROR - Error fetching version info
Traceback (most recent call last):
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/Users/rgangire/miniforge3/envs/geti-sdk/lib/python3.10/http/client.py", line 976, in se

## Creating the Combined Out-of-Distribution (COOD) Model

COOD is a framework for OOD detection model that combines individual OOD measures into one combined OOD (COOD) measure using a supervised model.

The COOD model uses the images from pre-determined datasets in the Geti project to learn the in-distributions and out-of-distribution patterns. 
If out-of-distribution images are not present in the project, they are created by applying strong corruptions on the in-distribution images. 

The model runs inference on all the in-distribution and out-of-distribution images and trains a random forest classifier to combine the individual OOD measures into one COOD measure. 

In [4]:
from geti_sdk.detect_ood.ood_model import COODModel

ood_model = COODModel(geti=geti, project=project, deployment=deployment)

2024-08-20 17:22:42,537 - INFO - Reading model /var/folders/f1/7tc0dfks1j590_v7st0w309r0000gn/T/tmpr2k860cf/deployment/Classification/model/model.xml
2024-08-20 17:22:42,833 - INFO - The model /var/folders/f1/7tc0dfks1j590_v7st0w309r0000gn/T/tmpr2k860cf/deployment/Classification/model/model.xml is loaded to CPU
2024-08-20 17:22:42,835 - INFO - 	Number of model infer requests: 1
2024-08-20 17:22:42,836 - INFO - Inference model wrapper initialized, force reloading model on device `CPU` to finalize inference model initialization process.
2024-08-20 17:22:43,028 - INFO - The model /var/folders/f1/7tc0dfks1j590_v7st0w309r0000gn/T/tmpr2k860cf/deployment/Classification/model/model.xml is loaded to CPU
2024-08-20 17:22:43,030 - INFO - 	Number of model infer requests: 1
2024-08-20 17:22:43,031 - INFO - Inference models loaded on device `CPU` successfully.
2024-08-20 17:22:43,031 - INFO - Building Combined OOD detection model for Intel® Geti™ project `CUB6`.
2024-08-20 17:22:46,331 - INFO - Down

Downloading images:   0%|          | 0/300 [00:00<?, ?it/s]

2024-08-20 17:22:55,565 - INFO - Downloaded 300 images in 9.2 seconds.
2024-08-20 17:22:55,795 - INFO - Downloading 0 images from project 'CUB6' and dataset 'OOD Images Collected' to folder /var/folders/f1/7tc0dfks1j590_v7st0w309r0000gn/T/tmpj6_lx8m6/ood_detection/CUB6/data/Dataset/images/OOD Images Collected...


Downloading images: 0it [00:00, ?it/s]

2024-08-20 17:22:55,803 - INFO - No images were downloaded.


Downloading image annotations:   0%|          | 0/300 [00:00<?, ?it/s]

[ WARN:0@45.560] global loadsave.cpp:241 findDecoder imread_('/var/folders/f1/7tc0dfks1j590_v7st0w309r0000gn/T/tmpj6_lx8m6/ood_detection/CUB6/data/Dataset/images/Artic_Tern_0011_143355_66c31f404d51394108c0d039.jpg'): can't open/read file: check file path/integrity


error: OpenCV(4.10.0) /Users/xperience/GHA-Actions-OpenCV/_work/opencv-python/opencv-python/opencv/modules/imgproc/src/color.cpp:196: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'


## Configuring a post inference hook to send the detected ood images to Geti

The COOD model is now trained. This model is now ready to detect images that can be OOD or anomalous relative to the user uploaded images in Geti. 
For a given image, the COOD model will output a score between 0 and 1. A score closer to 1 indicates that the image is OOD.

We would now use this COOD model to detect OOD images and upload them to a separate dataset in the Geti project. The idea is to collect these images and use them for further analysis or training.

For this we will add a post inference hook to the deployment. The hook will be configured to behave as follows:

- If the image has a COOD score greater than 0.5 :
- Send the image to the Geti project, to a dedicated dataset named 'OOD Images Collected'

For achieving the first part, we will use the OODTrigger, which will activate if the COOD score is greater than 0.5.
For the second part, we will use the GetiDataCollection action, which will send the image to the Geti project.

More about Triggers and Post Inference Hooks can be found in 012_post_inference_hooks.ipynb


In [None]:
from geti_sdk.post_inference_hooks import (
    GetiDataCollection,
    OODTrigger,
    PostInferenceHook,
)

trigger = OODTrigger(
    ood_model=ood_model
)  # The Trigger will activate whenever the COOD score is greater than 0.5


action = GetiDataCollection(  # the Action will send the detected OOD images to a new `OOD Images Collected` dataset in the Geti project
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="OOD Images Collected",
    log_level="info",
)

hook = PostInferenceHook(  # The Hook attaches the action to the trigger
    trigger=trigger, action=action
)

# Add the hook to the deployment
deployment.add_post_inference_hook(hook)

## Inferring a few images and detecting  OOD images

In [None]:
import cv2

from geti_sdk import Visualizer
from geti_sdk.demos import EXAMPLE_IMAGE_PATH

numpy_image = cv2.imread(EXAMPLE_IMAGE_PATH)
numpy_rgb = cv2.cvtColor(numpy_image, cv2.COLOR_BGR2RGB)

prediction = deployment.infer(numpy_rgb)

visualizer = Visualizer()
result = visualizer.draw(numpy_rgb, prediction)
visualizer.show_in_notebook(result)

In [None]:
prediction = deployment.infer(numpy_rgb)
print(f"Prediction contains objects with labels: {prediction.get_label_names()}")

From the cell above, you should get a printout with the list of labels in the prediction. If the label `dog` is among them, you should also see a log line stating that the image was uploaded to the Geti project

## Adding multiple hooks

We can add as many hooks as we like, each with different triggers and actions. Suppose we are primarily interested in images with dogs in them, for some reason. At the same time, we know that we are feeding our model images containing dogs, so any prediction that does not contain any dog-objects is suspicious. Those images might need to be added to the training set, in order to improve the model. So we want to sort the inferred images into a 'dogs' and a 'no dogs' category.

In the next cell, we'll create two hooks to achieve both these goals and add them to the deployment.
The hooks we'll create are the following:

**The 'dogs' hook**:
- Checks if the predictions contain 1 or more `dog`s.  
- If so, then:
- Save the image, the prediction and the score that triggered the action to a folder `dogs` on disk. In this case, the score is the number of predicted dogs

**The 'no dogs' hook**
- Checks if the predictions do not contain any dogs
- If so, send the image to the Geti server, to a separate dataset called `Inferred images - no dogs`

In [None]:
from geti_sdk.post_inference_hooks import FileSystemDataCollection, ObjectCountTrigger

NUMBER_OF_THREADS_PER_HOOK = 10

# First, remove any hooks that were added previously
deployment.clear_inference_hooks()

# Create the 'dogs' trigger, action and hook
dogs_trigger = ObjectCountTrigger(
    threshold=0, label_names=["dog"], mode="greater"
)  # Trigger will activate whenever a prediction contains one or more objects labelled 'dog'

dogs_action = FileSystemDataCollection(
    target_folder="hook_data/dogs",
    file_name_prefix="image",
    save_predictions=True,
    save_scores=True,
    save_overlays=True,
    log_level="debug",
)  # Action will store the image, prediction data, trigger score and the images with prediction overlays to the `dogs` folder on disk

dogs_hook = PostInferenceHook(
    trigger=dogs_trigger, action=dogs_action, max_threads=NUMBER_OF_THREADS_PER_HOOK
)

# Create the 'no_dogs' trigger, action and hook
no_dogs_trigger = ObjectCountTrigger(
    threshold=1, label_names=["dog"], mode="lower"
)  # Trigger will activate whenever a prediction does not contain any objects labelled 'dog'

no_dogs_action = GetiDataCollection(  # the Action will send data to a new `Inferred images - no dogs` dataset in the Geti project
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="Inferred images - no dogs",
)

no_dogs_hook = PostInferenceHook(
    trigger=no_dogs_trigger,
    action=no_dogs_action,
    max_threads=NUMBER_OF_THREADS_PER_HOOK,
)

# Add both hooks to the deployment
deployment.add_post_inference_hook(dogs_hook)
deployment.add_post_inference_hook(no_dogs_hook)

Now that the hooks are created and added to the deployment, we can run the inference again.

We will run it on 50 images from the COCO dataset. The images are selected such that each of them contains at least one dog. 

In the cell below, we first get a list of filepaths to images with `dog`s in them

In [None]:
import os

from geti_sdk.annotation_readers import DatumAnnotationReader
from geti_sdk.demos import get_coco_dataset

n_images = 50

path = get_coco_dataset()
ar = DatumAnnotationReader(path, annotation_format="coco")
ar.filter_dataset(labels=["dog"])
dog_image_filenames = ar.get_all_image_names()
dog_image_filepaths = [
    os.path.join(path, "images", "val2017", fn + ".jpg") for fn in dog_image_filenames
][0:n_images]
print(f"Selected the first {n_images} images containing dogs from COCO dataset")

Now, we can run inference on the images and measure the time required in the cell below

In [None]:
import time

from tqdm import tqdm

t_start = time.time()
for image_path in tqdm(dog_image_filepaths):
    image = cv2.imread(image_path)
    numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start
print(
    f"Inference on {n_images} images with 2 post-inference hooks completed in {t_elapsed:.2f} seconds."
)

You should now see a new folder `hook_data` in your working directory. Inside this folder, you'll find a folder titled `lots_of_dogs` and one named `no_dogs`. The `lots_of_dogs` folder contains four subfolders: `images`, `predictions`, `scores` and `overlays`. The contents of these folders are the following:
- `images` contains the image files which triggered the hook
- `predictions` contains the prediction data in .json format
- `scores` contains txt files with the score for each image that caused the hook to trigger
- `overlays` contains the images with the predictions visualized on top of them. This can be useful for checking the output visually.

The file names are consistent across the subfolders, i.e. the prediction for a certain image can be found in the .json file with the same name, in the `predictions` folder.

The `no_dogs` folder only contains `images` and `overlays`, because we configured the action with `save_predictions=False` and `save_scores=False`.

If you take a look in those folders now, you'll find that they are populated with images, predictions, score files and overlay images.

### What about overhead?

Because post inference hooks are executed in seperate threads, adding them to your deployment will add minimal overhead to the inference process. Let's clear the hooks and measure the inference time again, to get an estimate of the impact.

In [None]:
# Remove any post-inference hooks
deployment.clear_inference_hooks()

# Now run the inference loop without any hooks
t_start = time.time()
for image_path in tqdm(dog_image_filepaths):
    image = cv2.imread(image_path)
    numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start
print(
    f"Inference on {n_images} images without post-inference hooks completed in {t_elapsed:.2f} seconds."
)

Most likely you will notice that the inference time without any hooks is less than with the 2 hooks applied. Nevertheless, the additional time required is much smaller than if you would carry out the actions defined in the post-inference hooks after each inferred image in a synchronous manner.

## Saving a deployment with post inference hooks
If you save a deployment with post inference hooks, the hook configuration will be saved with it. The cell below shows how to do this.

In [None]:
target_folder = os.path.join("deployments", PROJECT_NAME)
deployment.save(target_folder);

Once saved, the deployment can be recreated and the post inference hooks will be added automatically. Upon executing the cell below, you should see the two post inference hooks being added to the deployment.

In [None]:
from geti_sdk.deployment import Deployment

offline_deployment = Deployment.from_folder(target_folder)

## Limiting hook execution rate

Suppose that we are running inference on a video stream. In that case, we might get many sequential frames which activate a hook trigger, because frames that appear shortly after one another may look very similar. To avoid filling up our data collection folder with such near-duplicate frames, we can choose to limit the rate at which an action is allowed to run. This can be configured in the `PostInferenceHook`constructor, using the `limit_action_rate` and `max_frames_per_second` parameters.

To give an example of this, in the last demo of this notebook we'll run inference 50 times *on the same image*, to simulate a video stream. We'll create a hook with the `AlwaysTrigger`, which activates after every inferred image or frame, and have it send the data to Geti using the `GetiDataCollection` action. However, to avoid filling up our dataset with 50 duplicate images, we'll limit the action execution rate to 1 frame per second.

The cell below shows how to create this hook.

In [None]:
from geti_sdk.post_inference_hooks import AlwaysTrigger

trigger = AlwaysTrigger()
action = GetiDataCollection(
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="Inferred video frames",
    log_level="debug",
)
geti_hook = PostInferenceHook(
    trigger=trigger,
    action=action,
    max_threads=5,
    limit_action_rate=True,
    max_frames_per_second=1,
)

Let's first clear the existing hooks, and then add our new hook to the deployment

In [None]:
offline_deployment.clear_inference_hooks()
offline_deployment.add_post_inference_hook(geti_hook)
offline_deployment.load_inference_models()

Now, let's run inference 50 times again, each time on the same image

In [None]:
image = cv2.imread(EXAMPLE_IMAGE_PATH)
numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

t_start = time.time()
for ind in tqdm(range(50)):
    offline_deployment.infer(numpy_rgb)

t_elapsed = time.time() - t_start
print(
    f"50 inference iterations with rate-limited Geti I/O hook completed in {t_elapsed:.2f} seconds."
)

Your Geti project should now contain a new dataset called `Inferred video frames`, which should contain as many images as the number of seconds the benchmark took to run (plus one, because the action fires immediately on the first frame). So if it took 8 seconds to infer 50 times, the hook should have uploaded 9 images to Geti.

Note that the trigger that we use is the `AlwaysTrigger`, which activates on every inferred image or video frame, regardless of the prediction outcome. The rate limiting happens in the `Action` phase of the hook, it ensures that the action does not run more frequently than allowed by the rate limit, even if the trigger fires much more often.