perf(eval): vectorize HOTA per-frame alpha loop and id remapping#462
Merged
Conversation
compute_hota_metrics did two things per frame that numpy can do at once:
1. scored the 19 alpha thresholds in a Python for-loop
2. rebuilt a Python dict comprehension to map gt/tracker ids to row
indices (twice per frame)
Replace (1) by broadcasting the matched-pair similarities against
ALPHA_THRESHOLDS and scattering the per-alpha co-occurrence counts into a
single (num_alphas, num_gt, num_tracker) array; replace (2) with
np.searchsorted on the already-sorted unique-id arrays.
Output is identical to the previous implementation (verified across 600
randomized sequences incl. empty/single/no-gt/no-tracker frames and both
contiguous and non-contiguous ids). ~2.9x faster on a 500-frame, 40x45
sequence.
Also adds an id-relabeling invariance test exercising non-contiguous,
within-frame-unsorted ids.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR speeds up HOTA evaluation by removing per-frame Python overhead in compute_hota_metrics, while keeping metric outputs unchanged. It does so by vectorizing the per-alpha threshold scoring and replacing per-frame ID→index dict lookups with np.searchsorted on the globally-unique sorted ID lists.
Changes:
- Vectorized per-frame alpha-threshold evaluation via broadcasting to score all 19 thresholds at once and accumulate TP/FN/FP/LocA.
- Replaced per-frame Python dict ID remapping with
np.searchsortedonnp.unique’s sorted ID outputs. - Added a regression test ensuring metrics are invariant under non-monotonic ID relabeling (including unsorted IDs within a frame).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/trackers/eval/hota.py |
Vectorizes per-alpha scoring and uses np.searchsorted for ID remapping to reduce per-frame Python overhead. |
tests/eval/test_hota.py |
Adds invariance test to validate correctness of the new ID remapping logic. |
- Add test_output_matches_sequential_reference (4 parametrized cases: contiguous ids, id-switch, non-monotonic ids, partial asymmetric match): runs vectorized implementation against pre-PR sequential reference (dict map + per-alpha Python loop) and asserts bit-identical per-alpha arrays — guards hot path from future silent drift (resolves M1 finding from /oss:review) - Add _sequential_hota_reference module-level helper implementing pre-vectorization logic for use by the differential test - Extend test_metrics_invariant_to_id_relabeling allclose loop with AssRe_array and AssPr_array (previously checked at scalar level only) --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- Reword 'no duplicate destinations' comment (hota.py:214): replaces false
premise ('each pair is unique') with accurate MOT-convention invariant
('each id appears at most once per frame') — prevents future maintainer from
silently switching to np.add.at believing it changes behavior
- Add searchsorted precondition comment (hota.py:136): states that all per-frame
IDs are guaranteed present in unique_*_ids (silent wrong-index impossible)
- Tighten TrackEval reference range from hota.py:72-101 to hota.py:72-88 (the
alpha loop at 91-101 no longer exists after vectorization)
---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Borda
approved these changes
Jun 22, 2026
Borda
added a commit
that referenced
this pull request
Jun 23, 2026
- Stamp CHANGELOG [Unreleased] → [2.5.0] — 2026-06-22; add missing entries: CBIoUTracker (#417), py.typed, Tuner params (#427), HOTA eval fixes (#462, #466) - Bump version 2.4.0 → 2.5.0 in pyproject.toml - Add C-BIoU row to README algorithms table and intro text (MOT17=63.0, SportsMOT=73.1, SoccerNet=82.6, DanceTrack=56.7) --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
compute_hota_metricsis the slowest of the eval metrics, and the cost is per-frame Python overhead rather than the actual math. Two things were done one element/threshold at a time inside the per-frame loops:for a, alpha in ...loop (per frame), each iteration fancy-indexing and scattering into a separatematches_counts[a]matrix.np.array([gt_id_to_idx[int(id_)] for id_ in ...])), twice — once in each pass.Both are invariant transforms that numpy can do in a single vectorized op:
matched_sim[None, :] >= (ALPHA_THRESHOLDS[:, None] - EPS), then accumulate TP/FN/FP/LocA with array reductions and scatter the per-alpha co-occurrence counts into one(num_alphas, num_gt, num_tracker)array. (Each(gt, tracker)pair is unique within a frame, so the advanced-index in-place add has no duplicate destinations.)np.searchsorted.np.uniquealready returns sorted ids, so an id's index is its position by binary search — no per-frame dict.No public API or output change.
Correctness
Output is identical to the previous implementation. Verified by a differential test against the pre-change code across 600 randomized sequences, including empty / single-detection / no-GT / no-tracker frames and both contiguous and non-contiguous ids (
rtol=atol=1e-11, all fields incl. the per-alpha arrays).tests/eval/test_hota.pyand thecompute_hota_metricsdoctest pass unchanged (HOTA/DetA/AssA= 0.745/0.816/0.691).test_metrics_invariant_to_id_relabeling: HOTA must be invariant to a non-monotonic id relabeling (ids unsorted within a frame), which directly exercises the newsearchsortedremapping.Benchmark
500 frames, ~40 GT / ~45 tracker per frame, non-contiguous ids (20 iters, warm):
Remaining time is the per-frame Hungarian solve and first-pass IoU, which are inherently per-frame.