Releases · NoeFontana/vernier

11 Jun 22:48

v0.2.0

e38eb03

0.2.0 — 2026-06-09 Latest

Latest

Release Notes

Real-prediction parity follow-up to 0.1.0. The headline work is six
new SOTA-harness cells that drive every kernel — bbox, segm, boundary,
keypoints, panoptic PQ, semantic mIoU, LVIS, and calibration — through
a frozen real-model prediction cache so the parity surface no longer
relies solely on synthetic fixtures. Two strict-mode behavioural fixes
ride along: the TIDE Missed-bin rewrite (previously a no-op under
parity_mode="strict") and the accumulator's n_d==0 precision/scores
write. The minor bump signals those output-value changes; the kernel
surface is otherwise unchanged from 0.1.0.

Added

Real-prediction SOTA harness — six new cells. Each cell drives a
pinned upstream checkpoint through the existing _harness_common
scaffolding (full-SHA cache key, _ensure_pinned_revision preflight,
torch.set_num_threads(1), int64 target_sizes, loud-fail on
unmapped class names) and asserts vernier-vs-oracle parity on real
output distributions:
- DETR-R50 (#265) — instance bbox / segm against the
  facebook/detr-resnet-50 checkpoint on COCO val2017. Aligned tier
  loosens dtScores to rtol = 2 * eps to absorb the documented
  serde_json vs Python strtod 1-ULP score-parser drift; all
  integer-reduction surfaces (precision, recall, counts, 12-stat AP/AR
  summary) stay bit-equal.
- Mask2Former panoptic + ADE-semantic (#266) — panoptic PQ
  against facebook/mask2former-swin-large-coco-panoptic on COCO
  panoptic val2017; semantic mIoU against the ADE checkpoint on
  ADE20K val. Both bit-equal to their oracles on integer-reduction
  surfaces.
- DETR-R50 calibration (#267) — reuses the #265 prediction
  cache to validate ADR-0018 ECE / MCE / reliability against the
  NumPy oracle at full distribution scale.
- rfdetr-segnano boundary (#269) — boundary IoU against
  bowenc0221's boundary_iou_api over the rfdetr-segnano TIDE cache;
  no new inference (boundary IoU is a different metric over the same
  RLE masks).
- LVIS detector (#270) — federated LVIS evaluation against the
  LVIS API. Reuses the TIDE cache pattern; gates the K=168/817
  full-val divergence currently tracked in open_followups.md.
- ViTPose keypoints (#271) — keypoints OKS evaluation against
  the usyd-community/vitpose-base-coco checkpoint on COCO val2017.

Fixed

TIDE Missed-bin strict-mode parity (#273) — the rewrite-layer
Missed fix was setting ignore_flag = Some(true) on missed GTs and
relying on effective_ignore to resolve under both parity modes.
Quirk D1's strict disposition discards ignore_flag entirely and
reads only is_crowd, so under parity_mode="strict" the rewrite
was a no-op: the AP denominator stayed unchanged and the per-bin
delta collapsed to exactly 0.0 (vs the ADR-0021 NumPy oracle's
spec'd 0.119 on DETR-R50). Fixed by deleting missed GTs from the
corrected dataset entirely — parity-mode-independent and
AP-equivalent to ignoring on the oracle's semantics. Validated to
within 1 ULP against the oracle on COCO val2017 + DETR-R50
(~150k detections, 8 ULP gate). Closes the ADR-0022 follow-up on
t_b = 0.1 for set-prediction transformer detectors.
n_d == 0 precision/scores write (#272) — the accumulator path
for classes with zero detections now writes 0.0 (not -1) into
the precision and scores tensors. Downstream consumers comparing
raw tensor values across releases will see this change; the public
AP / AR summary statistics are unaffected (they already skipped
-1 sentinel entries).

Install vernier-cli 0.2.0

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.2.0/vernier-cli-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.2.0/vernier-cli-installer.ps1 | iex"

Download vernier-cli 0.2.0

File	Platform	Checksum
vernier-cli-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
vernier-cli-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
vernier-cli-aarch64-unknown-linux-gnu.tar.xz	ARM64 Linux	checksum
vernier-cli-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

Assets 16

dist-manifest.json

sha256:7982234f911ebf7c7eb7c2b607ddc099a09b999a78ddf31b73b3401c10b48cf3

24.3 KB 2026-06-11T22:48:03Z
sha256.sum

sha256:b8149be52d1148e9aa3abd6cb76aebfb3769e679ebae6389e872aba87ec18f5c

513 Bytes 2026-06-11T22:48:03Z
source.tar.gz

sha256:522020fabcfd1bf88b674ea4b2d2258fbeb6c42bbed723547bb5cea8c792f829

2.38 MB 2026-06-11T22:48:02Z
source.tar.gz.sha256

sha256:fcdd4e74b26f724a59246c51a7838556a87ca4f2864789895134f06ce29856c7

81 Bytes 2026-06-11T22:48:02Z
vernier-cli-aarch64-apple-darwin.tar.xz

sha256:f7e55515e3e6312a826074377ad3beb6347fd9b771b3fd66f0ca68c1e6435182

641 KB 2026-06-11T22:48:03Z
vernier-cli-aarch64-apple-darwin.tar.xz.sha256

sha256:03f0bfc985a4f7782897cef5d5cfac7d15bd198387a655923211751787369206

107 Bytes 2026-06-11T22:48:03Z
vernier-cli-aarch64-unknown-linux-gnu.tar.xz

sha256:19ef709a664e86f088f8dc062df3e85c4c8dfe6d99ef9f569c256448830ae97a

674 KB 2026-06-11T22:48:03Z
vernier-cli-aarch64-unknown-linux-gnu.tar.xz.sha256

sha256:b506ab7484530cd46a5f20e4c79f67f36d409ff0ccd02f87872caf112dc6b275

112 Bytes 2026-06-11T22:48:03Z
vernier-cli-installer.ps1

sha256:bf8e63e4d44b4ac6583f7a3dd35cf0104dd9c76e319410b0e2e450f4d164f52a

21.2 KB 2026-06-11T22:48:03Z
vernier-cli-installer.sh

sha256:79da186af389d573ff3f04f35ff0635f723ec3d79819c1484f6645972c33a2c4

51.8 KB 2026-06-11T22:48:03Z
Source code (zip)

2026-06-11T22:42:53Z
Source code (tar.gz)

2026-06-11T22:42:53Z

19 May 03:57

github-actions

v0.1.0

bbfdea7

0.1.0 — 2026-05-19

Release Notes

First release out of the 0.0.x line. Mostly a performance + parallelism
follow-up to 0.0.4 — no new evaluation paradigms, every shipped kernel
keeps strict bit-equal parity with its oracle. The cross-paradigm
benchmark page is refreshed against the post-0.0.4 SHA on the same
machine fingerprint as the 0.0.4 snapshot (37652a58e939).

Added

num_threads parallelism (ADR-0047)
(#251, #253, #254, #256) — opt-in num_threads: int | None = None
on every public evaluate surface across all four paradigms: instance
(bbox / segm / boundary / keypoints), semantic, panoptic, and LVIS,
on batch + streaming + background entry points (Evaluator.evaluate,
Evaluator.background, submit / submit_png). The sequential
path (num_threads=None or 1) is byte-for-byte unchanged from
0.0.4; no rayon symbol is entered. parity_threads parity tests
assert bit-equal results across num_threads ∈ {None, 1, 2, 4, 8}
on every paradigm. CLI gains vernier eval --threads N.
bench-timings Cargo feature (#256) — atomic (par_iter, serial_post) split + build_*_anns call counter on
evaluate_with_parallel, attributed via the new BenchCounterSet
shared helper (#258). Off by default and stripped from the shipped
wheel; powers the bbox-scaling attribution at
docs/engineering/benchmarking/2026-05-bbox-cdf.md.
mimalloc-global Cargo feature on vernier-ffi (#256) —
allocator A/B knob, off by default; lets users opt into mimalloc
for hot-allocation workloads without it being a default cost.
Semantic divan microbench (#261) —
crates/vernier-semantic/benches/accumulate_confusion.rs
exercises three input distributions (realistic_perfect,
realistic_jittered, uniform_random) at the val2017
panoptic-semantic geometry; prereq for the chunked-u8 kernel work.

Changed

bbox AP perf (#256, #258, #259) — KernelScratch per-worker
annotation pool + direct-write parallel runner (replaces the
per-image Vec<CellOutput> intermediate with par_chunks_mut);
in-place image-major → canonical transpose via cycle-following
(eliminates a 26 MB intermediate buffer pair on val2017); the
eval_imgs + eval_imgs_meta transposes fuse into a single
cycle walk (halves index arithmetic, drops one of two 1.6 MB
visited-bitset allocations). Net val2017 nt=4: par_iter region
42 → 32 ms, serial_post 45 → 19 ms, peak working-set
−24 MB. The remaining Amdahl floor on --num-threads for bbox is
the ~200 ms single-threaded dataset_build (HashMap validation in
CocoDataset::from_parts), attributed via bench-timings.
Panoptic PQ perf (#260) — sparse-remap adjacent-pixel cache on
build_dense_intersections and build_dense_boundary_intersections.
COCO panoptic always hits the sparse branch (RGB-packed ids exceed
the 1 M dense cap) and panoptic segments are spatially contiguous,
so consecutive (g, d) pairs are usually identical; a 4-state
(last_g, last_d, last_gi, last_di) cache skips the FxHashMap
lookup on adjacent-pixel matches. Dense branch is deliberately
uncached (Vec::get is cheap enough that the miss overhead
regresses synthetic by ~70%). SSSE3 RGB→u32 pack on the panoptic
PNG decode path. New coco_like_rgb microbench arm exercises the
sparse-RGB path that the existing coco_like arms missed
(their ids 1..=50 took the dense path).
Semantic mIoU perf (#261) — decode buffer pool + chunked u8
kernel on accumulate_confusion for the T = u8 PNG fused-decode
path that drives Semantic — mIoU (val2017). The pool reuses the
per-image decode Vec<u8> across submissions; the chunked kernel
keeps the strict-mode u64-additive fold but processes pixels in
cache-line-sized batches.
Background-evaluator threading wired (#253, #254) —
BackgroundConfig.num_threads is no longer hardcoded None on the
panoptic and semantic FFI ctors; BackgroundCapable gains a
default-method apply_update_parallel that the panoptic and
semantic streaming impls override. Panoptic submit_png defers
PNG decode into the worker pool (PyBackedBytes zero-copy) so
libpng decode parallelises across submissions; the single-threaded
path keeps inline decode and is byte-for-byte unchanged.
vernier-pixel-pack folded into vernier-panoptic — the
SSSE3 RGB→u32 pack primitive added in #260 lived briefly as a
standalone workspace crate. With a single consumer
(vernier-panoptic::decode) and 172 LOC, it sat below the
leaf-crate threshold and the audited-unsafe carveout fits cleanly
inside the host crate (#![deny(unsafe_code)] at root, module-local
#[allow(unsafe_code)] on the SSSE3 pshufb fn). Folding it
back keeps the published crate set at the six 0.0.4 crates and
avoids the registry-reservations + Trusted-Publisher loop in the
release runbook for a non-reusable internal SIMD primitive.
Bench harness --num-threads (#251, #252) — bench run --num-threads "1,2,4,8" override overrides the workload's pinned
num_threads tuple; panoptic + semantic spawn helpers now forward
the flag (previously dropped, so every panoptic / semantic cell
ran with args.num_threads = None regardless of what the CLI
swept).
Bench page refreshed against 3a509df6c525 on the same
37652a58e939 fingerprint as the 0.0.4 snapshot, so the speedup
deltas are not confounded by host change. Per-cell movements
(vernier median, 0.0.4 → HEAD):
- panoptic PQ: 12.59 s → 10.53 s (−16.4%; speedup
  2.73× → 3.30× vs panopticapi). IQR also narrows from 21.22%
  to 9.78% (still over the 5% gate — PNG decode is chronically
  noisy on this host).
- semantic mIoU val2017: 5.00 s → 2.82 s (−43.6%;
  speedup 4.12× → 7.40× vs mmsegmentation).
- instance bbox / segm / boundary / keypoints / synth-semantic /
  LVIS move within VPS noise of their 0.0.4 numbers; speedups
  widen by 0.1×–0.5× as baselines drift slightly slower on this
  run.

Fixed

bench run --impl all on non-instance paradigms — impls_for_iou
raised KeyError for the paradigm-specific impls
(vernier_panoptic, panopticapi, mmsegmentation,
vernier_lvis, lvis-api) that #252 widened ALL_IMPLS to
include. Falls back to an empty IoU set for impls that aren't
registered for the instance paradigm.

Install vernier-cli 0.1.0

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.1.0/vernier-cli-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.1.0/vernier-cli-installer.ps1 | iex"

Download vernier-cli 0.1.0

File	Platform	Checksum
vernier-cli-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
vernier-cli-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
vernier-cli-aarch64-unknown-linux-gnu.tar.xz	ARM64 Linux	checksum
vernier-cli-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

Assets 16

16 May 20:39

github-actions

v0.0.4

6ad13d2

0.0.4 — 2026-05-16

Release Notes

Robustness follow-up to 0.0.3. No new evaluation paradigms or kernel
changes — this release widens the typed-error surface, adds a fuzz
harness, and pins the platform-compat matrix in CI. The cross-paradigm
benchmark page is refreshed against the post-0.0.3 SHA on a single
fingerprint (no more dual-SHA LVIS caveat).

Added

Typed Python error surface (#249) — four new PyValueError
subclasses (InvalidAnnotationError, NonFiniteError,
DimensionMismatchError, InvalidConfigError) with the public
surface pinned by tests/python/test_error_matrix.py and documented
at docs/reference/errors.md. Previously these all surfaced as bare
ValueError; existing except ValueError: catches still match the
new subclasses.
Fuzz harness (#249) — tools/fuzz/ cargo-fuzz targets for the
COCO / manifest / RLE / segmentation parsers (non-workspace crate so
the nightly cargo-fuzz toolchain stays out of the publishable
workspace). The vernier_core::fuzz_regressions integration test
replays minimised crashes on every cargo nextest run; CI's
slow.yml carries a 120 s/target smoke that builds once and exits.
Platform-compat matrix (#249) — slow.yml adds a
py3.10 × py3.13 × py3.14 ladder crossed with numpy / torch combos,
exercising the BackgroundEvaluator tutorial end-to-end. Catches
ABI / DLPack regressions that the single-version ci.yml matrix
doesn't surface.
bench-histogram Cargo feature (#249) — opt-in (G, D, wall_ns)
per-call recorder on match_image, off by default and stripped from
the shipped wheel. Powers the 10× val2017 scaling proof at
docs/engineering/matching-scaling.md. Gated on
vernier-core / vernier-ffi / vernier-mask; no production cost.
Stress-matrix workloads (#249) — 6 named regimes
(coco-baseline, detr-output, lvis-crowded, open-images-cats,
satellite-4k, pathology-8k) plus per-axis sweeps in
bench/workloads/stress_matrix.py; runner at
bench/runners/stress_runner.py. Catalogue and expected behaviour
per axis in docs/engineering/stress-matrix.md.
Memory-under-training-load runner (#249) —
bench/bench/runners/memory_bench.py (reuses
bench.harness.rss.RSSSampler); methodology and reading guide at
docs/engineering/memory-under-training.md.
Colab smoke notebook (#249) — free-tier platform-check entry
point; README badge links to it.

Changed

Tutorial smoke now ingests via the DLPack array path —
fake_model(image_ids) -> list[Detections] returning numpy arrays
submitted batch-mode (matches torchvision's detection-API
convention). Notebook cell-3 stays byte-identical to the .py body
modulo the module docstring and __main__ guard.
Bench page refreshed against the AMD EPYC-Milan host on a fresh
machine fingerprint (37652a58e939). Speedups hold within VPS
variance; absolute medians shift by ±3% versus the 0.0.3 snapshot.
The dual-SHA LVIS caveat retires — every section now lives at the
same SHA / fingerprint. The panoptic and synthetic-semantic cells
exceeded the 5% relative-IQR gate (chronically noisy on this host —
PNG decode dominates panoptic wall time, mmseg synthetic sits at the
noise floor at 200-image scale); flagged inline with * per the
renderer's existing convention.

Install vernier-cli 0.0.4

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.0.4/vernier-cli-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.0.4/vernier-cli-installer.ps1 | iex"

Download vernier-cli 0.0.4

File	Platform	Checksum
vernier-cli-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
vernier-cli-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
vernier-cli-aarch64-unknown-linux-gnu.tar.xz	ARM64 Linux	checksum
vernier-cli-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

Assets 16

15 May 04:21

github-actions

v0.0.3

cc6cbc2

0.0.3 — 2026-05-15

Release Notes

This is the diagnostic-surfaces and scenario-slicing release: instance
gains an oLRP error decomposition (Oksuz et al.), a detection-family
calibration summarizer (ECE / MCE / reliability), and a manifest-driven
slice-and-aggregate lane that runs one matching pass across N scenario
cells. Panoptic picks up boundary PQ. No paradigm shifts, no
crates.io additions — every kernel slots into the existing
vernier-core / vernier-panoptic / vernier-semantic surface.

Added

LRP / oLRP error decomposition (ADR-0043, ADR-0044, ADR-0045) —
Oksuz et al. (ECCV 2018 / TPAMI 2021) Localization Recall Precision
as an opt-in metric alongside AP. vernier.instance.optimal_lrp(gt, dt, iou=Bbox()|Segm()|Boundary()|Keypoints()) decomposes detection
performance into oLRP_Loc + oLRP_FP + oLRP_FN, minimised over a
per-class confidence threshold tau. CLI gains --metric {ap,olrp}
with ap preserving the existing headline-table contract. The Rust
core lives in crates/vernier-core/src/lrp/; the ADR-0005 firewall
is held (no edits to matching.rs / accumulate.rs / evaluate.rs).
Pure-NumPy oracle is the correctness contract (ADR-0043);
kemaloksuz/LRP-Error is an opt-in tripwire, not a parity gate.
vernier.panoptic.optimal_lrp is a typed NotImplementedError stub
— panoptic predictions carry no per-segment score so the tau sweep
has nothing to scan; extension is a follow-up ADR.
Boundary Panoptic Quality (ADR-0025 §Z1/Z2 amendment) —
PanopticEvaluator(boundary=True, dilation_ratio=0.02) now ships
under both parity_mode="strict" (bit-exact reproduction of
bowenc0221/boundary-iou-api's coco_panoptic_api/evaluation.py
at SHA 37d25586a677) and parity_mode="corrected" (deterministic,
snapshot-based; segment-id-sorted iteration). Composition is
iou = min(mask_iou, boundary_iou) — identical to the instance
Boundary case (the prior Q3 row of boundary-iou-quirks.md had
miscalled this; corrected in the same amendment). FN/FP attribution
is unchanged; U6/U7/V1-V7/W1/W7 stand. The streaming runner threads
boundary state per image with BoundaryScratch reuse, and
distributed-eval partials hash the dilation_ratio into
params_hash so silent boundary/instance partial mixing is rejected
at envelope-validation time. No FORMAT_VERSION bump. Cityscapes
panoptic (Z3) remains deferred.
Detection-family calibration summarizer (ADR-0018) —
ECE / MCE / reliability table for bbox / segm / boundary /
keypoints. Opt-in via Evaluator.evaluate(..., calibration=True);
the lazy result.calibration(iou=..., n_bins=15, binning="quantile", min_score=0.05, per_class=False, ...) re-fold
returns a vernier.calibration.CalibrationResult (polars
reliability / per_class plus scalar ece / mce). Re-folding
with different params does not re-run matching. Streaming pairing:
BackgroundEvaluator.finalize_with_cells() plus the
vernier.calibration.StreamingSnapshot wrapper. Clean-room NumPy
oracle is the correctness contract; 16/16 parity bit-equal at
strict mode. Panoptic and semantic calibration are deferred
(data-model prerequisites per the ADR's per-paradigm shape map).
Slice-and-aggregate (ADR-0046) — manifest-driven scenario
slicing across all three paradigms plus the vernier aggregate
fan-in verb. Python:
Evaluator.evaluate(..., manifest=..., cross_axes=...) accepts a
dict, JSON / CSV path, or Arrow PyCapsule manifest and returns
EvalResult.slices as a polars DataFrame (one row per
(axis, value) cell). CLI: vernier eval --manifest weather.json [--cross weather,time_of_day] [--label NAME] [--metric {ap,olrp}]
emits a v2 envelope; un-partitioned vernier eval keeps emitting
v1 verbatim. vernier.aggregate(results, manifest, *, baseline=None, metric=None) and vernier aggregate result1.json result2.json --manifest runs.json --baseline clean fan N runs
into a comparative table with <metric> (mPC) and
<metric>__rpc (rPC) columns when --baseline is set. The
tables= + manifest= cross product is a deliberate non-feature
with a client-side recipe at
docs/how-to/per-class-by-slice.md.
New reference schemas: manifest-schema.md,
aggregate-schema.md.

Install vernier-cli 0.0.3

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.0.3/vernier-cli-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.0.3/vernier-cli-installer.ps1 | iex"

Download vernier-cli 0.0.3

File	Platform	Checksum
vernier-cli-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
vernier-cli-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
vernier-cli-aarch64-unknown-linux-gnu.tar.xz	ARM64 Linux	checksum
vernier-cli-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

Assets 16

12 May 02:01

github-actions

v0.0.2

39950ab

0.0.2 — 2026-05-12

Release Notes

This is the three-paradigm release: instance gains panoptic and semantic
siblings, distributed eval lands across all three, and the bench harness
brings real-model + alternatives numbers to the docs site. Two new
crates ship to crates.io (vernier-panoptic, vernier-semantic) plus
the vernier-partial leaf that holds the shared partial wire envelope.

Added

Distributed-eval entry points on Evaluator (ADR-0035) — each
paradigm's public Evaluator gains
evaluate_to_partial(..., *, rank_id) -> bytes and a
classmethod from_partials(...) -> Summary. Per-paradigm shapes:
instance takes JSON bytes, semantic takes Dataset/Predictions,
panoptic takes per-image tuples + categories= (the one
asymmetry — PanopticDataset doesn't yet expose per-image
accessors; closing that gap is a follow-up). The streaming
substrate, the vernier-partial wire format, FORMAT_VERSION,
partition-disjointness invariant, and the five paradigm-shared
Partial* exception classes are all unchanged. The same DDP
recipe works on instance, semantic, and panoptic.
Distributed evaluation wire format (ADR-0031, ADR-0032) — new
vernier-partial workspace crate holds the shared partial-envelope
(magic + FORMAT_VERSION + framing + the five Partial* typed
errors) used by all three paradigms. FORMAT_VERSION is a 1→2
hard break (pre-1.0 policy). Cross-paradigm merge is structurally
rejected (paradigm tag in the envelope). Determinism contract is
paradigm-specific: instance preserves bit-exactness, semantic
preserves it for any partition, panoptic only when the partition
order matches the original GT order. BackgroundEvaluator reuses
the same substrate via finalize_to_partial.

Changed

Public-surface consolidation (ADR-0035, supersedes the public
StreamingEvaluator portion of ADR-0013; amends ADR-0014, ADR-0031,
ADR-0032). Each paradigm now exposes two classes: Evaluator
(frozen config dataclass; batch + DDP entry points) and
BackgroundEvaluator (in-training entry point; submit /
finalize / finalize_with_tables / finalize_to_partial /
context manager). The streaming pyclasses are removed from Python
entirely; the Rust substrate stays and is reachable via new
PyO3 functions (evaluate_*_to_partial, merge_*_partials) and
via BackgroundEvaluator.

Removed

vernier.{instance,panoptic,semantic}.StreamingEvaluator — the
three streaming pyclasses are removed from Python entirely. They no
longer appear on vernier._core, on any paradigm namespace, or
under a vernier._impl shim. The Rust streaming substrate
(vernier_core::stream::StreamingEvaluator<K>,
StreamingPanopticEvaluator, StreamingSemanticEvaluator) remains
as the implementation behind the new
evaluate_*_to_partial / merge_*_partials PyO3 functions and
BackgroundEvaluator's worker. No public deprecation shim — pre-1.0
hard break.
Evaluator.stream(...) factory on vernier.{panoptic,semantic} —
removed alongside the public streaming class. Use
BackgroundEvaluator(...) directly, or Evaluator.evaluate_to_partial
/ Evaluator.from_partials for DDP.
StreamingEvaluator.snapshot(running=True) and its Rust-side
snapshot_running() method — the biased fast path that ADR-0013
itself flagged as inappropriate for quality gates.
StreamingEvaluator.checkpoint() / restore() — these were
NotImplemented thin wrappers around snapshot_to_partial /
from_partials. The persistence story is now exclusively
evaluate_to_partial → store bytes → from_partials on resume.
BackgroundEvaluator.snapshot(), snapshot(peek=True),
snapshot_with_tables(), and the non-finalize to_partial() on
all three paradigms. Public surface is the consuming
finalize / finalize_with_tables / finalize_to_partial only.
BackgroundPanopticEvaluator.from_partials /
BackgroundSemanticEvaluator.from_partials — vestigial (return-
type bug carried them; no caller used them).
Semantic-segmentation user docs (ADR-0028 PR-B10) — three new
pages in docs/: migrate/from-mmsegmentation.md (semantic-side
migration recipe with preset / streaming / NaN-vs-0.0 /
binary-mask coverage), and explanation/three-paradigms.md (paradigm
picker — when to reach for instance vs panoptic vs semantic, why
they're sibling submodules rather than a single evaluator with a
knob). README updated to
feature the three-paradigm surface in a top-level section
alongside the install commands; mkdocs.yml nav surfaces both
new pages plus the previously-orphaned panoptic migration
guide.
Semantic-segmentation streaming evaluator (ADR-0028 PR-B9
partial — streaming only; Breakdown / result-tables follow-ups
scoped to a future PR). New
vernier_semantic::StreamingSemanticEvaluator is a flat
O(n_classes²) accumulator over ConfusionMatrix: update(image_id, gt, dt) folds via the same accumulate_confusion kernel the batch
path uses; snapshot() is constant-time relative to image count
(per ADR-0013, no fast-vs-running mode distinction needed). FFI
pyclass vernier._core.StreamingSemanticEvaluator is registered
on the module; the Python Evaluator.stream(n_classes, ignore_label=None) factory returns a fresh streaming evaluator
carrying the parent's parity_mode. Load-bearing invariant
(pinned by tests/python/test_semantic_streaming.py::test_streaming_finalize_bit_equals_batch_evaluate):
streaming finalize() is bit-equal to batch evaluate(...) over
the same images on f64 outputs. 10 new Python tests + 7 new Rust
tests; total workspace 472 Rust + 376 Python tests pass.
Semantic-segmentation Python wrapper + per-dataset presets
(ADR-0028 PR-B5) — new vernier.semantic submodule (per ADR-0029)
exposing Dataset / Predictions / Evaluator frozen dataclasses
plus Summary / ClassSemanticStats / ConfusionMatrix
re-exports of the FFI pyclasses (under their unprefixed names).
Dataset.from_arrays and Predictions.from_arrays accept any
unsigned-integer dtype; the wrapper preserves the input dtype and
the FFI/kernel walks at native dtype (since ADR-0037).
Dataset.from_files / Predictions.from_files decode single-
channel PNG label maps via lazy-imported Pillow (raises a
structured ImportError if Pillow is missing); RGB-encoded panoptic
PNGs are rejected with a typed message pointing at
vernier.panoptic.Dataset. Predictions.from_binary_masks
implements the AN2 per-class binary-mask merge with explicit
merge ∈ {"argmax", "first", "highest_class_id"} selector and
unlabeled_class parameter (quirks AN3, AN4). Per-dataset
presets Dataset.cityscapes / ade20k / pascal_voc bake the
canonical (n_classes, ignore_label) constants from
vernier_semantic::parity::*. 23 new Python tests cover the
wrapper round-trip, dtype handling, ignore-label / label-remap
propagation, binary-mask merge rules, RGB rejection, and
end-to-end PNG decode + evaluate.
Semantic-segmentation FFI surface (ADR-0028 PR-B4) —
vernier._core.evaluate_semantic_from_arrays(gt_label_maps, dt_label_maps, n_classes, parity_mode, *, ignore_label=None, label_remap=None) is the load-bearing pyfunction that drives the
Rust kernel + summarize pass under py.detach (ADR-0006). Inputs
are dicts mapping image_id (int) → 2-D numpy.ndarray of dtype
uint32. New pyclasses SemanticSummary, ClassSemanticStats,
ConfusionMatrix expose the per-class and global metrics; the
confusion matrix is materialized as a 2-D numpy.uint64 array
via ConfusionMatrix.counts() (ADR-0028 §F1 first-class output).
GT image-id ordering is sorted for deterministic accumulation
(quirk AM5 aligned). label_remap is pre-applied to DT
buffers at the FFI boundary (quirk AK2) so the hot kernel
loop avoids per-pixel dict lookups. PNG-decode (from_files) and
binary-mask (from_binary_masks) variants land in PR-B5
alongside the per-dataset preset constructors that drive them.
14 Python smoke tests pass; full workspace 465 Rust + 343 Python
green.
Semantic-segmentation kernel + summarize (ADR-0028 PR-B3) —
vernier_semantic::kernel::accumulate_confusion per-image
histogram fold (one pass over flattened (H, W) slices into a
u64 (n_classes, n_classes) matrix; ignore-label mask before
the bincount per quirk AJ2; out-of-range DT silent-skip per
AI4 strict-MS path). ConfusionMatrix is a flat-Vec<u64>
row-major shape that doubles as the FFI (N, N) numpy-view
source. vernier_semantic::summarize::summarize derives the
seven headline outputs (mIoU, FWIoU, pixel accuracy, mean
accuracy, per-class IoU/accuracy/precision, plus the confusion
matrix as a first-class output per AL8). parity_mode
selects NaN vs. 0.0 for zero-support per-class entries (quirk
AL2); means skip zero-support classes regardless of mode
(AL3, mirroring panopticapi W2 and LVIS AB3). 16
unit tests (kernel + summarize) on hand-computed fixtures, all
pass in --release and debug. No SIMD per ADR-0028 §"Numerical
layout" — the kernel is integer/memory-bandwidth bound. Dataset
constructors and FFI surface land in PR-B5 / PR-B4 respectively.
Semantic-segmentation crate scaffold (ADR-0028 PR-B2) — new
workspace member crates/vernier-semantic/ with Cargo.toml /
lib.rs / error.rs / parity.rs. Re-exports
vernier_core::parity::ParityMode per ADR-0028 §"Workspace and
dependency direction" — the first dep-edge asymmetry vs.
vernier-panoptic ⊥ vernier-core, justified by concrete reuse.
Pins the per-dataset ignore-label conventions
(CITYSCAPES_IGNORE_LABEL=255, ADE20K_IGNORE_LABEL=0,
PASCAL_VOC_IGNORE_LABEL=255), class counts, and
SEMANTIC_PARITY_EPS placeholder. SemanticError enum surfaces
the corrected-disposition...

Assets 16

Releases: NoeFontana/vernier

0.2.0 — 2026-06-09

Release Notes

Added

Fixed

Install vernier-cli 0.2.0

Install prebuilt binaries via shell script

Install prebuilt binaries via powershell script

Download vernier-cli 0.2.0

Uh oh!

0.1.0 — 2026-05-19

Release Notes

Added

Changed

Fixed

Install vernier-cli 0.1.0

Install prebuilt binaries via shell script

Install prebuilt binaries via powershell script

Download vernier-cli 0.1.0

Uh oh!

0.0.4 — 2026-05-16

Release Notes

Added

Changed

Install vernier-cli 0.0.4

Install prebuilt binaries via shell script

Install prebuilt binaries via powershell script

Download vernier-cli 0.0.4

Uh oh!

0.0.3 — 2026-05-15

Release Notes

Added

Install vernier-cli 0.0.3

Install prebuilt binaries via shell script

Install prebuilt binaries via powershell script

Download vernier-cli 0.0.3

Uh oh!

0.0.2 — 2026-05-12

Release Notes

Added

Changed

Removed

Uh oh!