# Post inference hooks for inference model data collection
In this notebook we will have a look at how to set up post inference hooks for your inference models. The Geti SDK provides several basic triggers and actions that can be used to construct pipelines for, for instance, data collection, alerting, or other actions that need to take place based on inference results. 

These pipelines are referred to as `post inference hooks` and can be added to any `Deployment` for any project. In this notebook we will show how to configure them, and use them with existing deployments.

To start off, we will create a post inference hook that implements the following behaviour:

*For every inferred frame or image, check if the prediction contains any objects labelled `dog`. If it contains at least 1 dog, we want to collect it and send the image to the Geti server. The image will be stored in a new dataset called `Inferred images`, within the original project.*

In [None]:
# As usual we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env()

geti = Geti(server_config=geti_server_configuration)

## Selecting a project

we'll use the `COCO animal detection demo` project that we created in [notebook 002](002_create_project_from_dataset.ipynb).

In [None]:
from geti_sdk.demos import ensure_trained_example_project

PROJECT_NAME = "COCO animal detection demo"
project = ensure_trained_example_project(geti=geti, project_name=PROJECT_NAME)

## Create deployment for the project

In [None]:
deployment = geti.deploy_project(PROJECT_NAME)

## Checking deployment output
Let's quickly load the inference models and check the inference output on a sample image

In [None]:
deployment.load_inference_models()

In [None]:
import cv2

from geti_sdk.demos import EXAMPLE_IMAGE_PATH
from geti_sdk import Visualizer

numpy_image = cv2.imread(EXAMPLE_IMAGE_PATH)
numpy_rgb = cv2.cvtColor(numpy_image, cv2.COLOR_BGR2RGB)

prediction = deployment.infer(numpy_rgb)

visualizer = Visualizer()
result = visualizer.draw(numpy_rgb, prediction)
visualizer.show_in_notebook(result)

## Configuring a post inference hook to send image data to Geti

With the deployment all set up and ready, let's go ahead and add a post inference hook! We will configure it to behave as follows:

For each inferred image or frame:

- If and only if the prediction contains at least one object labelled `dog`:
- Send the image to the Geti project, to a dedicated dataset named 'Inferred images'

Basically, this behaviour can be separated into two parts: A **Trigger** and an **Action**. The first part, in which we check if the prediction contains at least one dog, is the Trigger. If the trigger activates, the Action will be carried out: Sending the data to the Intel Geti server. 

The reasoning here is that if the prediction contains a dog, we want to collect the image in our animal detection project so that we can include it in the next training round. To achieve this, we will use the `LabelTrigger`: It will activate if the prediction contains any objects labelled `dog`. 

Of course, many other triggers can be defined: For example, the `ObjectCountTrigger` can be used to activate only when a prediction contains a certain number of objects, the `EmptyLabelTrigger` will activate when the prediction does not contain any objects and the `ConfidenceTrigger` will activate when the probability for any of the predictions is below a certain threshold. 

The cell below shows how to define the hook outlined above. 

In [None]:
from geti_sdk.post_inference_hooks import (
    GetiDataCollection,
    LabelTrigger,
    PostInferenceHook,
)

trigger = LabelTrigger(
    label_names=["dog"]
)  # the Trigger will activate whenever a prediction contains any object labelled `dog`

action = GetiDataCollection(  # the Action will send data to a new `Inferred images` dataset in the Geti project
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="Inferred images",
    log_level="info",
)

hook = PostInferenceHook(  # The Hook attaches the action to the trigger
    trigger=trigger, action=action
)

Now, we just need to add the hook to the deployment

In [None]:
deployment.add_post_inference_hook(hook)

Once added, whenever we run inference on an image or video frame, the hook will execute automatically.

In [None]:
prediction = deployment.infer(numpy_rgb)
print(f"Prediction contains objects with labels: {prediction.get_label_names()}")

From the cell above, you should get a printout with the list of labels in the prediction. If the label `dog` is among them, you should also see a log line stating that the image was uploaded to the Geti project

## Adding multiple hooks

We can add as many hooks as we like, each with different triggers and actions. Suppose we are primarily interested in images with dogs in them, for some reason. At the same time, we know that we are feeding our model images containing dogs, so any prediction that does not contain any dog-objects is suspicious. Those images might need to be added to the training set, in order to improve the model. So we want to sort the inferred images into a 'dogs' and a 'no dogs' category.

In the next cell, we'll create two hooks to achieve both these goals and add them to the deployment.
The hooks we'll create are the following:

**The 'dogs' hook**:
- Checks if the predictions contain 1 or more `dog`s.  
- If so, then:
- Save the image, the prediction and the score that triggered the action to a folder `dogs` on disk. In this case, the score is the number of predicted dogs

**The 'no dogs' hook**
- Checks if the predictions do not contain any dogs
- If so, send the image to the Geti server, to a separate dataset called `Inferred images - no dogs`

In [None]:
from geti_sdk.post_inference_hooks import FileSystemDataCollection, ObjectCountTrigger

NUMBER_OF_THREADS_PER_HOOK = 10

# First, remove any hooks that were added previously
deployment.clear_inference_hooks()

# Create the 'dogs' trigger, action and hook
dogs_trigger = ObjectCountTrigger(
    threshold=0, label_names=["dog"], mode="greater"
)  # Trigger will activate whenever a prediction contains one or more objects labelled 'dog'

dogs_action = FileSystemDataCollection(
    target_folder="hook_data/dogs",
    file_name_prefix="image",
    save_predictions=True,
    save_scores=True,
    save_overlays=True,
    log_level="debug",
)  # Action will store the image, prediction data, trigger score and the images with prediction overlays to the `dogs` folder on disk

dogs_hook = PostInferenceHook(
    trigger=dogs_trigger, action=dogs_action, max_threads=NUMBER_OF_THREADS_PER_HOOK
)

# Create the 'no_dogs' trigger, action and hook
no_dogs_trigger = ObjectCountTrigger(
    threshold=1, label_names=["dog"], mode="lower"
)  # Trigger will activate whenever a prediction does not contain any objects labelled 'dog'

no_dogs_action = GetiDataCollection(  # the Action will send data to a new `Inferred images - no dogs` dataset in the Geti project
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="Inferred images - no dogs",
)

no_dogs_hook = PostInferenceHook(
    trigger=no_dogs_trigger,
    action=no_dogs_action,
    max_threads=NUMBER_OF_THREADS_PER_HOOK,
)

# Add both hooks to the deployment
deployment.add_post_inference_hook(dogs_hook)
deployment.add_post_inference_hook(no_dogs_hook)

Now that the hooks are created and added to the deployment, we can run the inference again.

We will run it on 50 images from the COCO dataset. The images are selected such that each of them contains at least one dog. 

In the cell below, we first get a list of filepaths to images with `dog`s in them

In [None]:
import os

from geti_sdk.annotation_readers import DatumAnnotationReader
from geti_sdk.demos import get_coco_dataset

n_images = 50

path = get_coco_dataset()
ar = DatumAnnotationReader(path, annotation_format="coco")
ar.filter_dataset(labels=["dog"])
dog_image_filenames = ar.get_all_image_names()
dog_image_filepaths = [
    os.path.join(path, "images", "val2017", fn + ".jpg") for fn in dog_image_filenames
][0:n_images]
print(f"Selected the first {n_images} images containing dogs from COCO dataset")

Now, we can run inference on the images and measure the time required in the cell below

In [None]:
import time

from tqdm import tqdm

t_start = time.time()
for image_path in tqdm(dog_image_filepaths):
    image = cv2.imread(image_path)
    numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start
print(
    f"Inference on {n_images} images with 2 post-inference hooks completed in {t_elapsed:.2f} seconds."
)

You should now see a new folder `hook_data` in your working directory. Inside this folder, you'll find a folder titled `lots_of_dogs` and one named `no_dogs`. The `lots_of_dogs` folder contains four subfolders: `images`, `predictions`, `scores` and `overlays`. The contents of these folders are the following:
- `images` contains the image files which triggered the hook
- `predictions` contains the prediction data in .json format
- `scores` contains txt files with the score for each image that caused the hook to trigger
- `overlays` contains the images with the predictions visualized on top of them. This can be useful for checking the output visually.

The file names are consistent across the subfolders, i.e. the prediction for a certain image can be found in the .json file with the same name, in the `predictions` folder.

The `no_dogs` folder only contains `images` and `overlays`, because we configured the action with `save_predictions=False` and `save_scores=False`.

If you take a look in those folders now, you'll find that they are populated with images, predictions, score files and overlay images.

### What about overhead?

Because post inference hooks are executed in seperate threads, adding them to your deployment will add minimal overhead to the inference process. Let's clear the hooks and measure the inference time again, to get an estimate of the impact.

In [None]:
# Remove any post-inference hooks
deployment.clear_inference_hooks()

# Now run the inference loop without any hooks
t_start = time.time()
for image_path in tqdm(dog_image_filepaths):
    image = cv2.imread(image_path)
    numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start
print(
    f"Inference on {n_images} images without post-inference hooks completed in {t_elapsed:.2f} seconds."
)

Most likely you will notice that the inference time without any hooks is less than with the 2 hooks applied. Nevertheless, the additional time required is much smaller than if you would carry out the actions defined in the post-inference hooks after each inferred image in a synchronous manner.

## Saving a deployment with post inference hooks
If you save a deployment with post inference hooks, the hook configuration will be saved with it. The cell below shows how to do this.

In [None]:
target_folder = os.path.join("deployments", PROJECT_NAME)
deployment.save(target_folder);

Once saved, the deployment can be recreated and the post inference hooks will be added automatically. Upon executing the cell below, you should see the two post inference hooks being added to the deployment.

In [None]:
from geti_sdk.deployment import Deployment

offline_deployment = Deployment.from_folder(target_folder)

## Limiting hook execution rate

Suppose that we are running inference on a video stream. In that case, we might get many sequential frames which activate a hook trigger, because frames that appear shortly after one another may look very similar. To avoid filling up our data collection folder with such near-duplicate frames, we can choose to limit the rate at which an action is allowed to run. This can be configured in the `PostInferenceHook`constructor, using the `limit_action_rate` and `max_frames_per_second` parameters.

To give an example of this, in the last demo of this notebook we'll run inference 50 times *on the same image*, to simulate a video stream. We'll create a hook with the `AlwaysTrigger`, which activates after every inferred image or frame, and have it send the data to Geti using the `GetiDataCollection` action. However, to avoid filling up our dataset with 50 duplicate images, we'll limit the action execution rate to 1 frame per second.

The cell below shows how to create this hook.

In [None]:
from geti_sdk.post_inference_hooks import AlwaysTrigger

trigger = AlwaysTrigger()
action = GetiDataCollection(
    session=geti.session,
    workspace_id=geti.workspace_id,
    project=project,
    dataset="Inferred video frames",
    log_level="debug",
)
geti_hook = PostInferenceHook(
    trigger=trigger,
    action=action,
    max_threads=5,
    limit_action_rate=True,
    max_frames_per_second=1,
)

Let's first clear the existing hooks, and then add our new hook to the deployment

In [None]:
offline_deployment.clear_inference_hooks()
offline_deployment.add_post_inference_hook(geti_hook)
offline_deployment.load_inference_models()

Now, let's run inference 50 times again, each time on the same image

In [None]:
image = cv2.imread(EXAMPLE_IMAGE_PATH)
numpy_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

t_start = time.time()
for ind in tqdm(range(50)):
    offline_deployment.infer(numpy_rgb)

t_elapsed = time.time() - t_start
print(
    f"50 inference iterations with rate-limited Geti I/O hook completed in {t_elapsed:.2f} seconds."
)

Your Geti project should now contain a new dataset called `Inferred video frames`, which should contain as many images as the number of seconds the benchmark took to run (plus one, because the action fires immediately on the first frame). So if it took 8 seconds to infer 50 times, the hook should have uploaded 9 images to Geti.

Note that the trigger that we use is the `AlwaysTrigger`, which activates on every inferred image or video frame, regardless of the prediction outcome. The rate limiting happens in the `Action` phase of the hook, it ensures that the action does not run more frequently than allowed by the rate limit, even if the trigger fires much more often.