# Geti model benchmarking

This notebook shows how to measure and compare the inference rates on your local hardware for the various algorithms available in a Geti project. The Geti SDK provides a `Benchmarker` class, that allows to quickly set up a series of benchmarking experiments for a specific project. This allows making a comparison of the framerates that can be achieved using deployed models of different algorithms. As such, it can help to select a suitable architecture for model deployment.

In [None]:
# As usual we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env()

geti = Geti(server_config=geti_server_configuration)

## Selecting a project
The `Benchmarker` can run experiments for a single project. It can be either a single task or a task chain project. As an example, we'll pick the `COCO animal detection demo` project that we created in [notebook 002](002_create_project_from_dataset.ipynb).

In [None]:
PROJECT_NAME = "COCO animal detection demo"
project = geti.get_project(project_name=PROJECT_NAME)

## Setting up the Benchmarker
Now that we know which project to pick, we can initialize the `Benchmarker`. We need to decide on a couple of things:
1. Which algorithms to benchmark
2. What media to use for benchmarking
3. The precision levels of the models we want to benchmark (i.e. FP32, FP16, INT8)

### Algorithms
For the algorithms, let's use 3 different algorithms that are available for the project that we selected. The `COCO animal detection demo` project is a single task project, with a Detection task. The `algorithms_to_benchmark` variable holds the names of the different Detection algorithms that we can choose in Geti: `SSD`, `YOLOX` and `MobileNetV2-ATSS`.

In [None]:
algorithms_to_benchmark = ["MobileNetV2-ATSS", "SSD", "YOLOX-S"]

### Media
The experiments can run on either images or a video. There are various options to specify the input media: You can supply a list of image filepaths, a path to a video, a list of Geti `Image` objects or numpy arrays, or a Geti `Video` object. In this case, we'll simply use all images that are already in the project.

In [None]:
from geti_sdk.rest_clients import ImageClient

image_client = ImageClient(
    session=geti.session, workspace_id=geti.workspace_id, project=project
)
images = image_client.get_all_images()

### Precision levels
Geti allows deploying models with different precision levels. Typically, deploying a model with `INT8` precision results in a considerable increase in throughput compared to running an `FP32` or even `FP16` model. Using the `Benchmarker`, we can measure the inference framerate for each of these precision levels and quantify the difference. We simply have to pass a `precision_levels` variable that contains the model precisions we want to deploy and measure.

In [None]:
precision_levels = ["FP32", "FP16", "INT8"]

### Benchmarker initialization
With the project, algorithms, media and precision levels sorted out, we can initialize the Benchmarker.

In [None]:
from geti_sdk.benchmarking import Benchmarker

benchmarker = Benchmarker(
    geti=geti,
    project=project,
    algorithms=algorithms_to_benchmark,
    precision_levels=precision_levels,
    benchmark_images=images,
)

## Preparing the Intel® Geti™ project to run the benchmark
Now that the `Benchmarker` is initialized, we need to make sure that all the algorithms that we'd like to benchmark have a model trained in the project. To do so, the Benchmarker provides a `prepare_benchmark` method that we can call. If we call it, the method will make sure of three things:
1. It will check if every algorithm we want to benchmark has a trained model in the project. If not, it will start model training and will wait for it to complete
2. It will check if for every trained model that we want to benchmark an optimized model is available in the specified precision levels. If not, it will trigger model optimiziation in the required precision levels and wait for it to complete
3. It will create and download deployments for all the specified algorithms and precision levels.

When calling the `prepare_benchmark` method, we just have to pass a path to a directory on the local disk. The method will save the deployments that it creates to this folder.

> NOTE: Preparing the benchmark may take some time, especially if not all algorithms have a model trained. In that case we have to wait for model training to complete. Please run the cell below and wait for all jobs to complete. Progress will be reported as the training and optimization advances.

In [None]:
import os

benchmark_folder = os.path.join("benchmarks", PROJECT_NAME)
benchmarker.prepare_benchmark(working_directory=benchmark_folder)

## Running the benchmark
At this point all models are trained, optimized and deployed! This means that the benchmark is ready to go. You can run the benchmark by calling the `run_throughput_benchmark` method that the `Benchmarker` provides. It accepts the following arguments:

- `working_directory`: The folder in which the deployments are stored that should be benchmarked. Benchmarking results will also be saved to this directory.
- `results_filename`: The name of the file in which the benchmarking results will be saved.
- `target_device`: The hardware that the inference models should run on. This defaults to `"CPU"`, but any device that is supported by OpenVINO can be used. More details can be found [here](https://docs.openvino.ai/2023.2/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html).
- `frames`: The number of video frames or images to use in the benchmark. These will be selected from the media that we provided in the benchmarking initalization early on in this notebook. Note that all frames/images will be loaded in memory for the benchmarking, so don't make this number too large or you may encounter out of memory issues.
- `repeats`: The number of times the benchmark needs to run on all frames. Increasing this number will give a more accurate estimate of the framerate, but increases the time required for the experiments. The total number of frames that are inferred for each model is `frames * repeats`.

Run the cell below to execute the benchmark. 

In [None]:
results = benchmarker.run_throughput_benchmark(
    working_directory=benchmark_folder,
    results_filename="results",
    target_device="CPU",
    frames=100,
    repeats=2,
)

## Inspecting the results
The benchmark results are stored in a `results.csv` file in the working directory that we specified earlier. In addition, the `run_throughput_benchmark` method returns the results as a list of dictionaries. This is captured in the `results` variable in the cell above. Using pandas we can easily visualize the results in the notebook.

Executing the cell below should show a table containing all results from the benchmark experiments. Each row represents one of the deployments for which the benchmark ran. 

The most important columns are ones labelled `model 1 score`, `success` and `fps`. `model 1 score` contains the model accuracy (or F-measure in case of a detection project) for the model used in the deployment. The `success` column indicates if the deployment was able to successfully run inference on all frames. It is either `1` or `0`, with `1` indicating success. Finally, the `fps` column shows the measured average frames per second for the deployment. 

In addition, the table contains some details about the system, indicating the operating system, some info regarding the target device and the python, geti-sdk and openvino versions. This is useful when comparing benchmark results across different hardware setups.

### Visual predictions comparison
Although the model scores give insight into the model performance statistically, comparing the models' predictions visually is useful. The `Benchmarker` exposes the `compare_predictions` method, which compares the saved deployment prediction results by inferring them on a provided image.

In [None]:
from IPython.display import display
from PIL import Image

from geti_sdk.demos import EXAMPLE_IMAGE_PATH

prediction_comparison = benchmarker.compare_predictions(
    working_directory=benchmark_folder,
    image=EXAMPLE_IMAGE_PATH,
    throughput_benchmark_results=results,
    include_online_prediction_for_active_model=True,
)
display(Image.fromarray(prediction_comparison))

## Conclusion
Ideally, the table below should help to select which model to pick for deployment in production use. The optimal model has a sufficiently high `model 1 score`, while still reaching the desired `fps`.

In [None]:
import pandas as pd

df = pd.DataFrame(results)
df