Skip to content

perf(eval): vectorize HOTA per-frame alpha loop and id remapping#462

Merged
Borda merged 4 commits into
roboflow:developfrom
RubenHaisma:perf/vectorize-hota
Jun 22, 2026
Merged

perf(eval): vectorize HOTA per-frame alpha loop and id remapping#462
Borda merged 4 commits into
roboflow:developfrom
RubenHaisma:perf/vectorize-hota

Conversation

@RubenHaisma

Copy link
Copy Markdown
Contributor

Description

compute_hota_metrics is the slowest of the eval metrics, and the cost is per-frame Python overhead rather than the actual math. Two things were done one element/threshold at a time inside the per-frame loops:

  1. The 19 alpha thresholds were scored in a Python for a, alpha in ... loop (per frame), each iteration fancy-indexing and scattering into a separate matches_counts[a] matrix.
  2. The gt/tracker id → row-index mapping rebuilt a Python dict comprehension every frame (np.array([gt_id_to_idx[int(id_)] for id_ in ...])), twice — once in each pass.

Both are invariant transforms that numpy can do in a single vectorized op:

  • Alpha loop → broadcast. Score all thresholds at once: matched_sim[None, :] >= (ALPHA_THRESHOLDS[:, None] - EPS), then accumulate TP/FN/FP/LocA with array reductions and scatter the per-alpha co-occurrence counts into one (num_alphas, num_gt, num_tracker) array. (Each (gt, tracker) pair is unique within a frame, so the advanced-index in-place add has no duplicate destinations.)
  • Dict lookup → np.searchsorted. np.unique already returns sorted ids, so an id's index is its position by binary search — no per-frame dict.

No public API or output change.

Correctness

Output is identical to the previous implementation. Verified by a differential test against the pre-change code across 600 randomized sequences, including empty / single-detection / no-GT / no-tracker frames and both contiguous and non-contiguous ids (rtol=atol=1e-11, all fields incl. the per-alpha arrays).

  • Existing tests/eval/test_hota.py and the compute_hota_metrics doctest pass unchanged (HOTA/DetA/AssA = 0.745/0.816/0.691).
  • Adds test_metrics_invariant_to_id_relabeling: HOTA must be invariant to a non-monotonic id relabeling (ids unsorted within a frame), which directly exercises the new searchsorted remapping.

Benchmark

500 frames, ~40 GT / ~45 tracker per frame, non-contiguous ids (20 iters, warm):

ms/call
before 79.7
after 27.5
speedup ~2.9×

Remaining time is the per-frame Hungarian solve and first-pass IoU, which are inherently per-frame.

compute_hota_metrics did two things per frame that numpy can do at once:

  1. scored the 19 alpha thresholds in a Python for-loop
  2. rebuilt a Python dict comprehension to map gt/tracker ids to row
     indices (twice per frame)

Replace (1) by broadcasting the matched-pair similarities against
ALPHA_THRESHOLDS and scattering the per-alpha co-occurrence counts into a
single (num_alphas, num_gt, num_tracker) array; replace (2) with
np.searchsorted on the already-sorted unique-id arrays.

Output is identical to the previous implementation (verified across 600
randomized sequences incl. empty/single/no-gt/no-tracker frames and both
contiguous and non-contiguous ids). ~2.9x faster on a 500-frame, 40x45
sequence.

Also adds an id-relabeling invariance test exercising non-contiguous,
within-frame-unsorted ids.
@RubenHaisma RubenHaisma requested a review from SkalskiP as a code owner June 21, 2026 11:01
@CLAassistant

CLAassistant commented Jun 21, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up HOTA evaluation by removing per-frame Python overhead in compute_hota_metrics, while keeping metric outputs unchanged. It does so by vectorizing the per-alpha threshold scoring and replacing per-frame ID→index dict lookups with np.searchsorted on the globally-unique sorted ID lists.

Changes:

  • Vectorized per-frame alpha-threshold evaluation via broadcasting to score all 19 thresholds at once and accumulate TP/FN/FP/LocA.
  • Replaced per-frame Python dict ID remapping with np.searchsorted on np.unique’s sorted ID outputs.
  • Added a regression test ensuring metrics are invariant under non-monotonic ID relabeling (including unsorted IDs within a frame).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/trackers/eval/hota.py Vectorizes per-alpha scoring and uses np.searchsorted for ID remapping to reduce per-frame Python overhead.
tests/eval/test_hota.py Adds invariance test to validate correctness of the new ID remapping logic.

Borda and others added 3 commits June 22, 2026 14:40
- Add test_output_matches_sequential_reference (4 parametrized cases: contiguous
  ids, id-switch, non-monotonic ids, partial asymmetric match): runs vectorized
  implementation against pre-PR sequential reference (dict map + per-alpha Python
  loop) and asserts bit-identical per-alpha arrays — guards hot path from future
  silent drift (resolves M1 finding from /oss:review)
- Add _sequential_hota_reference module-level helper implementing pre-vectorization
  logic for use by the differential test
- Extend test_metrics_invariant_to_id_relabeling allclose loop with AssRe_array
  and AssPr_array (previously checked at scalar level only)

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- Reword 'no duplicate destinations' comment (hota.py:214): replaces false
  premise ('each pair is unique') with accurate MOT-convention invariant
  ('each id appears at most once per frame') — prevents future maintainer from
  silently switching to np.add.at believing it changes behavior
- Add searchsorted precondition comment (hota.py:136): states that all per-frame
  IDs are guaranteed present in unique_*_ids (silent wrong-index impossible)
- Tighten TrackEval reference range from hota.py:72-101 to hota.py:72-88 (the
  alpha loop at 91-101 no longer exists after vectorization)

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@Borda Borda merged commit e42d40a into roboflow:develop Jun 22, 2026
18 checks passed
Borda added a commit that referenced this pull request Jun 23, 2026
- Stamp CHANGELOG [Unreleased] → [2.5.0] — 2026-06-22; add missing entries: CBIoUTracker (#417), py.typed, Tuner params (#427), HOTA eval fixes (#462, #466)
- Bump version 2.4.0 → 2.5.0 in pyproject.toml
- Add C-BIoU row to README algorithms table and intro text (MOT17=63.0, SportsMOT=73.1, SoccerNet=82.6, DanceTrack=56.7)

---

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants