Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add COCO evaluation metrics #111

Open
NielsRogge opened this issue May 3, 2021 · 12 comments
Open

Add COCO evaluation metrics #111

NielsRogge opened this issue May 3, 2021 · 12 comments

Comments

@NielsRogge
Copy link
Contributor

NielsRogge commented May 3, 2021

I'm currently working on adding Facebook AI's DETR model (end-to-end object detection with Transformers) to HuggingFace Transformers. The model is working fine, but regarding evaluation, I'm currently relying on external CocoEvaluator and PanopticEvaluator objects which are defined in the original repository (here and here respectively).

Running these in a notebook gives you nice summaries like this:
image

It would be great if we could import these metrics from the Datasets library, something like this:

import datasets

metric = datasets.load_metric('coco')

for model_input, gold_references in evaluation_dataset:
    model_predictions = model(model_inputs)
    metric.add_batch(predictions=model_predictions, references=gold_references)

final_score = metric.compute()

I think this would be great for object detection and semantic/panoptic segmentation in general, not just for DETR. Reproducing results of object detection papers would be way easier.

However, object detection and panoptic segmentation evaluation is a bit more complex than accuracy (it's more like a summary of metrics at different thresholds rather than a single one). I'm not sure how to proceed here, but happy to help making this possible.

@NielsRogge NielsRogge added the enhancement New feature or request label May 3, 2021
@bhavitvyamalik
Copy link

Hi @NielsRogge,
I'd like to contribute these metrics to datasets. Let's start with CocoEvaluator first? Currently how are are you sending the ground truths and predictions in coco_evaluator?

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Jun 2, 2021

Great!

Here's a notebook that illustrates how I'm using CocoEvaluator: https://drive.google.com/file/d/1VV92IlaUiuPOORXULIuAdtNbBWCTCnaj/view?usp=sharing

The evaluation is near the end of the notebook.

@bhavitvyamalik
Copy link

bhavitvyamalik commented Jun 3, 2021

I went through the code you've mentioned and I think there are 2 options on how we can go ahead:

  1. Implement how DETR people have done this (they're relying very heavily on the official implementation and they're focussing on torch dataset here. I feel ours should be something generic instead of pytorch specific.
  2. Do this implementation where user can convert its output and ground truth annotation to pre-defined format and then feed it into our function to calculate metrics (looks very similar to you wanted above)

In my opinion, 2nd option looks very clean but I'm still figuring out how's it transforming the box co-ordinates of coco_gt which you've passed to CocoEvaluator (ground truth for evaluation). Since your model output was already converted to COCO api, I faced little problems there.

@NielsRogge
Copy link
Contributor Author

Ok, thanks for the update.

Indeed, the metrics API of Datasets is framework agnostic, so we can't rely on a PyTorch-only implementation.

This file is probably want we need to implement.

@mariosasko mariosasko transferred this issue from huggingface/datasets Jun 2, 2022
@lvwerra lvwerra added metric request and removed enhancement New feature or request labels Aug 3, 2022
@kadirnar
Copy link
Contributor

kadirnar commented Aug 8, 2022

Hi @lvwerra

Do you plan to add a 3rd party application for the COCO map metric?

@roboserg
Copy link

roboserg commented Aug 7, 2023

Is there any update on this? What would be the recommended way of doing COCO eval with Huggingface?

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Aug 7, 2023

Yes there's an update on this. @rafaelpadilla has been working on adding native support for COCO metrics in the evaluate library, check the Space here: https://huggingface.co/spaces/rafaelpadilla/detection_metrics. For now you have to load the metric as follows:

import evaluate

evaluator = evaluate.load("rafaelpadilla/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

but this one is going to be integrated in the main evaluate library.

This is then leveraged to create the open object detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.

@rafaelpadilla
Copy link

rafaelpadilla commented Aug 8, 2023

Yep, we intend to integrate to evaluate library.

Meanwhile you can use from here https://huggingface.co/spaces/rafaelpadilla/detection_metrics

Update: the code with the evaluate AP metric and its variations was transferred to https://huggingface.co/spaces/hf-vision/detection_metrics

@maltelorbach
Copy link

Hi,
running

import evaluate
evaluator = evaluate.load("hf-vision/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

results in the following error:

ImportError: To be able to use hf-vision/detection_metrics, you need to install the following dependencies['detection_metrics'] using 'pip install detection_metrics' for instance'

How do I load the metric from the hub? Do I need to download the content of that repository manually first?

I'm running evaluate==0.4.1.

@sushil-bharati
Copy link

Ran into the same issue @maltelorbach posted on 12/14/2023

@sklum
Copy link

sklum commented Jun 25, 2024

I spent some time digging into this. The issue is that the hf-vision/detection_metrics metric uses a local module for some coco related dependencies (that's called detection_metrics, which is why you get the ImportError of that flavor). I tried to restructure the space to have a flat directory structure, but then ran into this #189 because certain dependencies aren't loaded (or downloaded), or something. Gave up after that. It seems informative that the object detection example just rolls its own metric code with torchmetrics so it's probably easiest to do that.

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Jun 28, 2024

Yes for now we switched to using Torchmetrics as it already provides a performant implementation with support for distributed training etc. so no need to duplicate it. cc @qubvel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants