Skip to content

fix(coco): emit 1-indexed category_id in COCO export#2276

Merged
Borda merged 4 commits into
roboflow:developfrom
madhavcodez:fix/coco-category-id-one-indexed-1181
May 27, 2026
Merged

fix(coco): emit 1-indexed category_id in COCO export#2276
Borda merged 4 commits into
roboflow:developfrom
madhavcodez:fix/coco-category-id-one-indexed-1181

Conversation

@madhavcodez
Copy link
Copy Markdown
Contributor

Before submitting
  • Self-reviewed the code
  • Updated documentation, follow Google-style
  • Added docs entry for autogeneration (if new functions/classes)
  • Added/updated tests
  • All tests pass locally

Description

DetectionDataset.as_coco() wrote COCO category_id (and categories[].id) starting at 0. The COCO specification and tools such as CVAT require category ids to start at 1, so exported annotations failed to import into CVAT (the problem reported in #1181).

save_coco_annotations already enforces 1-indexed image_id and annotation_id ("COCO spec requires 1-indexed ids"), which left category_id as the only 0-indexed inconsistency. This PR makes category_id 1-indexed as well, so the exported ids are internally consistent and spec-compliant by default.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)

Motivation and Context

The internal Detections.class_id is 0-indexed, but COCO category ids are expected to be 1-indexed. Exporting the 0-indexed value directly produced files CVAT rejects. Aligning category_id with the already-1-indexed image_id/annotation_id fixes CVAT import and matches the COCO spec.

Closes #1181

Changes Made

  • classes_to_coco_categories: emit id = class_index + 1 (1-indexed).
  • detections_to_coco_annotations: emit category_id = class_id + 1, kept consistent with the category ids above (internal class_id = k → COCO category_id = k + 1).
  • The read path is unchanged: coco_categories_to_classes / build_coco_class_index_mapping map categories to sequential internal class ids by name, not by absolute id value, so as_cocofrom_coco round-trips remain lossless. Detections.class_id stays 0-indexed in memory.
  • Added Google-style docstrings with examples to both serialization helpers.

Testing

  • I have tested this code locally
  • I have added unit tests that prove my fix is effective or that my feature works
  • All new and existing tests pass

pytest tests/dataset/formats/test_coco.py71 passed. ruff check and ruff format --check clean on the changed files.

New/updated tests:

  • test_classes_to_coco_categories_ids_start_at_one — category ids start at 1.
  • test_detections_to_coco_annotations_category_id_is_one_indexed — internal class_id kcategory_id k + 1.
  • test_coco_round_trip_preserves_class_ids_and_writes_one_indexed_categoriesas_coco writes 1-indexed ids on disk, and from_coco reads back the original 0-indexed internal class ids losslessly.
  • Updated the existing test_detections_to_coco_annotations parametrize cases that asserted the old 0-indexed category_id.

Additional Notes

This is a behavior change for the exported value of category_id, but it brings the field in line with the COCO spec and with image_id/annotation_id, which are already 1-indexed in save_coco_annotations. Since reading is name-based, existing supervision-written datasets still load correctly.

as_coco wrote category_id (and categories[].id) starting at 0, but the
COCO spec and tools like CVAT require category ids to start at 1, so
imports into CVAT failed. save_coco_annotations already enforces 1-indexed
image_id/annotation_id, which left category_id as the only 0-indexed
inconsistency.

Serialize the internal 0-indexed Detections.class_id as class_id + 1 in
both classes_to_coco_categories and detections_to_coco_annotations, keeping
the category ids consistent. The read path maps categories to internal
class ids by name, so as_coco -> from_coco round-trips stay lossless.

Closes roboflow#1181
@madhavcodez madhavcodez requested a review from SkalskiP as a code owner May 26, 2026 06:06
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78%. Comparing base (fb2dec9) to head (60ffbc8).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2276   +/-   ##
=======================================
  Coverage       78%     78%           
=======================================
  Files           66      66           
  Lines         8410    8412    +2     
=======================================
+ Hits          6552    6585   +33     
+ Misses        1858    1827   -31     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes COCO export compliance in DetectionDataset.as_coco() by making exported COCO categories[].id and annotation category_id 1-indexed (instead of 0-indexed), aligning with the COCO spec and tooling expectations (e.g., CVAT).

Changes:

  • Updated COCO category serialization to emit id = class_index + 1.
  • Updated COCO annotation serialization to emit category_id = class_id + 1.
  • Added/updated unit tests to assert 1-indexed IDs on disk while preserving 0-indexed internal Detections.class_id after round-trip load.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/supervision/dataset/formats/coco.py Adjusts COCO export helpers to emit 1-indexed category IDs and documents the mapping behavior.
tests/dataset/formats/test_coco.py Updates existing expectations and adds regression tests to confirm 1-indexed COCO output and lossless round-trip behavior.

Borda and others added 3 commits May 27, 2026 07:48
Documents the behavior-observable change from roboflow#2276/roboflow#1181 in the
UnReleased section, consistent with project convention used for roboflow#2267.

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- test_from_coco_loads_legacy_zero_indexed_category_ids: verifies
  supervision <=0.28.x files (category_id starting at 0) still load
  and map to correct internal class_ids via name-based read path
- test_save_coco_annotations_rejects_zero_starting_ids: pins ValueError
  guard at L500-505 for starting_image_id=0 / starting_annotation_id=0
- test_detections_to_coco_annotations_raises_when_class_id_is_none:
  validates class_id=None guard is reached before +1 arithmetic path
- test_coco_round_trip_multi_class_single_image: adversarial round-trip
  with two classes in one image (vs one class per image in existing test)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…ired

Raw COCO category_id values (now 1-based) are stored as class_id in the
returned Detections; callers must apply map_detections_class_id before
the field is meaningful. load_coco_annotations does this correctly;
documents the requirement so external callers are not surprised.

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda merged commit 81218c5 into roboflow:develop May 27, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request: as_coco Parameter Adjustment for sv.DetectionDataset in Supervision API

3 participants