# Benchmarking a method

When developing a new method, you might want to see how other methods perform
on the dataset you are evaluating. Photoholmes includes the implementation for several
methods from the literature, covering a diverse array of approaches to forgery detection
to compare to.

It also includes a Benchmark object to easily evaluate the performance of a method over
a dataset. This ensures a fair and reproducible comparison between methods.
This is a short tutorial on how to use Benchmark object the included method
DQ and a custom method we will define on the Columbia dataset.

1. [ Running an included method ](#running-an-included-method)
2. [ Running a custom method ](#running-a-custom-method)


## Running an included method


To run an included method, we first import and instantiate the method. You can do this by using the method factory.


In [None]:
from photoholmes.methods.method_factory import MethodFactory, MethodName


dq, dq_preprocessing = MethodFactory.load(MethodName.DQ)

Or by importing the method directly.


In [None]:
from photoholmes.methods.dq import DQ, dq_preprocessing

dq = DQ()
dq

It's important to import the dq_preprocessing pipeline, as we need to pass it to the Dataset. But first, we need to download the [Columbia Uncompressed Image Splicing Detection](https://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/) dataset. Keep in mind this dataset is under a [ research-only use License ](https://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/dlform.html). You can download the dataset [ here ](https://www.dropbox.com/sh/786qv3yhvc7s9ki/AACbEEzGPrD3_y38bpWHzgdqa?e=1&dl=0).

Once downloaded, unzip the files and update the following variable with the path to the dataset folder.


In [None]:
columbia_dataset_path: str = (
    "data/Columbia Uncompressed Image Splicing Detection"  # UPDATE WITH THE PATH ON YOUR COMPUTER
)

As well as with the methods, we can load a dataset by direct import:


In [None]:
from photoholmes.datasets.columbia import ColumbiaDataset

dataset = ColumbiaDataset(
    img_dir=columbia_dataset_path,
    item_data=["image", "dct_coefficients", "qtables"],
    transform=dq_preprocessing,
)
print("Total images: ", len(dataset))

Or using the factory


In [None]:
from photoholmes.datasets.dataset_factory import DatasetFactory, DatasetName


dataset = DatasetFactory.load(
    DatasetName.COLUMBIA,
    dataset_dir=columbia_dataset_path,
    item_data=["image", "dct_coefficients", "qtables"],
    transform=dq_preprocessing,
)
print("Total images: ", len(dataset))

%% FIXME add links and shit a nuestra propia documentación cuando este

For more information on the datasets, see the Datasets section of the README.md

Lastly, we need to select the metrics to evaluate. We will load the Auroc, IoU and F1 using the MetricFactory. To see how to use metrics outside the factory or custom metrics, see the documentation.


In [None]:
from photoholmes.metrics.metric_factory import MetricFactory, MetricName

metrics = MetricFactory.load([MetricName.AUROC, MetricName.F1, MetricName.IoU])
print(metrics)

Now, we are ready to run the Benchmark. First, we create a Benchmark Object. The constructor allows to tune the following parameters:

- **save_method_outputs:** Whether to save the method outputs.
- **save_extra_outputs:** Whether to save extra outputs.
- **save_metrics_flag:** Whether to save metrics.
- **output_path:** Path to the output folder.
- **device:** torch Device for computation.
- **use_existing_output:** Whether to use existing saved outputs.
- **verbose:** Verbosity level.


In [None]:
from photoholmes.benchmark import Benchmark

benchmark = Benchmark(
    output_folder="example_output",
)

We are ready to go! The following cell will run the evaluation. It should take around two minutes to continue.


In [None]:
dq_results = benchmark.run(
    method=dq,
    dataset=dataset,
    metrics=metrics,
)
print(dq_results)

Notice that a folder example_output has been created. There, the benchmark will create the following folder structure:

```terminal
example_output/
└── dq
    └── columbiadataset
        ├── metrics
        │   └── 20240308_01:34_tampered_and_pristine
        │       ├── heatmap_report.json
        │       └── heatmap_state.pt
        └── outputs
            ├── canong3_02_sub_01
            │   └── output.npz
            ├── canong3_02_sub_02
            ...
```

Inside the _<method>/<dataset>_ folder, in this case _dq/colmubiadataset_, you will find two folders: _output/_ y _metrics_. Inside the ouputs
folder you will find the saved model outputs, so they can be reused and save compute time. There are three types of output:

1. _output.npz_: saves the benchmark outputs (heatmap, mask and/or detection)
2. _output_extra.npz_: saves any extra output arrays that were included in the benchmark output. This will be included only if the benchmark has the
   _save_extra_output=True_.
3. _output_extra_json.json_: save any extra output that isn't an array.

On the metrics folder, you will find the benchmarking results. Every time you run a benchmark, a folder with the name timestamp and the
type of run (tampered only or tampered and pristine). Inside the folder, you will find metric files for each type of output your method outputs.
Photoholmes divides method outputs into three types:

1. **heatmap:** a probability map.
2. **mask:** a binary mask.
3. **detection:** a score for detection.

Your method can predict only one output per type. Inside the metrics folder you will find a report for each type your method outputed, in this case
only _heatmap_report.json_.


Congrats! You have benchmarked DQ on the columbia dataset.


## Running a custom method


First, let's implemented a basic method to use our benchmark on. We won't do anything fancy, simply predict a random array in the same shape
as the image.


In [None]:
import random
from typing import Any, Tuple
from photoholmes.methods.base import BaseMethod, BenchmarkOutput
import numpy as np
from numpy.typing import NDArray
import torch
from photoholmes.preprocessing.image import ToNumpy
from photoholmes.preprocessing.input import InputSelection

from photoholmes.preprocessing.pipeline import PreProcessingPipeline


class RandomMethod(BaseMethod):

    def __init__(self, threshold=0.5):
        super().__init__()
        self.threshold = threshold

    def predict(self, image: NDArray) -> Tuple[NDArray, float]:
        heatmap = np.random.random(size=image.shape[:2])
        detection = random.random()
        return heatmap, detection

    def benchmark(self, image: NDArray) -> BenchmarkOutput:
        heatmap, detection = self.predict(image)
        return {
            "heatmap": torch.from_numpy(heatmap),
            "mask": None,
            "detection": torch.tensor([detection]),
        }


method = RandomMethod()
random_method_preprocessing = PreProcessingPipeline(
    [ToNumpy(), InputSelection(["image"])]
)

Let's test it out


In [None]:
! curl https://media.taringa.net/knn/fit:550/Z3M6Ly9rbjMvdGFyaW5nYS9ELzIvNy9GLzQvRi9veWVjb21vdmFhLzQzNy5qcGc -o data/paul.webp

In [None]:
from photoholmes.utils.image import read_image, plot

img = read_image("data/paul.webp")
plot(img)

In [None]:
import matplotlib.pyplot as plt

input_preprocessed = random_method_preprocessing(image=img)
heatmap, score = method.predict(**input_preprocessed)
print("Detection score:", score)
plt.imshow(heatmap)
plt.show()

Our method clearly isn't good, but it serves as a good enough example.

Let's load our datasets and metrics again and run the benchmark.

In [None]:
from photoholmes.datasets.dataset_factory import DatasetFactory, DatasetName
from photoholmes.metrics.metric_factory import MetricFactory, MetricName


columbia_dataset_path: str = (
    "data/Columbia Uncompressed Image Splicing Detection"  # UPDATE WITH THE PATH ON YOUR COMPUTER
)

dataset = DatasetFactory.load(
    DatasetName.COLUMBIA,
    dataset_dir=columbia_dataset_path,
    item_data=["image", "dct_coefficients", "qtables"],
    transform=random_method_preprocessing,
)
print("Total images: ", len(dataset))

metrics = MetricFactory.load([MetricName.AUROC, MetricName.F1, MetricName.IoU])
print(metrics)

In [None]:
from photoholmes.benchmark import Benchmark

benchmark = Benchmark(output_folder="example_output")

In [None]:
random_method_results = benchmark.run(method=method, dataset=dataset, metrics=metrics)
print(random_method_results)

If you check the output folder, you will notice a felder _randommethod/columbiadataset_ with the evaluation results.