Releases: NoeFontana/vernier
0.2.0 — 2026-06-09
Release Notes
Real-prediction parity follow-up to 0.1.0. The headline work is six
new SOTA-harness cells that drive every kernel — bbox, segm, boundary,
keypoints, panoptic PQ, semantic mIoU, LVIS, and calibration — through
a frozen real-model prediction cache so the parity surface no longer
relies solely on synthetic fixtures. Two strict-mode behavioural fixes
ride along: the TIDE Missed-bin rewrite (previously a no-op under
parity_mode="strict") and the accumulator's n_d==0 precision/scores
write. The minor bump signals those output-value changes; the kernel
surface is otherwise unchanged from 0.1.0.
Added
- Real-prediction SOTA harness — six new cells. Each cell drives a
pinned upstream checkpoint through the existing_harness_common
scaffolding (full-SHA cache key,_ensure_pinned_revisionpreflight,
torch.set_num_threads(1),int64target_sizes, loud-fail on
unmapped class names) and asserts vernier-vs-oracle parity on real
output distributions:- DETR-R50 (
#265) — instance bbox / segm against the
facebook/detr-resnet-50checkpoint on COCO val2017. Aligned tier
loosensdtScorestortol = 2 * epsto absorb the documented
serde_jsonvs Pythonstrtod1-ULP score-parser drift; all
integer-reduction surfaces (precision, recall, counts, 12-stat AP/AR
summary) stay bit-equal. - Mask2Former panoptic + ADE-semantic (
#266) — panoptic PQ
againstfacebook/mask2former-swin-large-coco-panopticon COCO
panoptic val2017; semantic mIoU against the ADE checkpoint on
ADE20K val. Both bit-equal to their oracles on integer-reduction
surfaces. - DETR-R50 calibration (
#267) — reuses the#265prediction
cache to validate ADR-0018 ECE / MCE / reliability against the
NumPy oracle at full distribution scale. - rfdetr-segnano boundary (
#269) — boundary IoU against
bowenc0221'sboundary_iou_apiover the rfdetr-segnano TIDE cache;
no new inference (boundary IoU is a different metric over the same
RLE masks). - LVIS detector (
#270) — federated LVIS evaluation against the
LVIS API. Reuses the TIDE cache pattern; gates the K=168/817
full-val divergence currently tracked inopen_followups.md. - ViTPose keypoints (
#271) — keypoints OKS evaluation against
theusyd-community/vitpose-base-cococheckpoint on COCO val2017.
- DETR-R50 (
Fixed
- TIDE Missed-bin strict-mode parity (
#273) — the rewrite-layer
Missed fix was settingignore_flag = Some(true)on missed GTs and
relying oneffective_ignoreto resolve under both parity modes.
Quirk D1's strict disposition discardsignore_flagentirely and
reads onlyis_crowd, so underparity_mode="strict"the rewrite
was a no-op: the AP denominator stayed unchanged and the per-bin
delta collapsed to exactly 0.0 (vs the ADR-0021 NumPy oracle's
spec'd 0.119 on DETR-R50). Fixed by deleting missed GTs from the
corrected dataset entirely — parity-mode-independent and
AP-equivalent to ignoring on the oracle's semantics. Validated to
within 1 ULP against the oracle on COCO val2017 + DETR-R50
(~150k detections, 8 ULP gate). Closes the ADR-0022 follow-up on
t_b = 0.1for set-prediction transformer detectors. n_d == 0precision/scores write (#272) — the accumulator path
for classes with zero detections now writes0.0(not-1) into
the precision and scores tensors. Downstream consumers comparing
raw tensor values across releases will see this change; the public
AP / AR summary statistics are unaffected (they already skipped
-1sentinel entries).
Install vernier-cli 0.2.0
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.2.0/vernier-cli-installer.sh | shInstall prebuilt binaries via powershell script
powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.2.0/vernier-cli-installer.ps1 | iex"Download vernier-cli 0.2.0
| File | Platform | Checksum |
|---|---|---|
| vernier-cli-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
| vernier-cli-x86_64-pc-windows-msvc.zip | x64 Windows | checksum |
| vernier-cli-aarch64-unknown-linux-gnu.tar.xz | ARM64 Linux | checksum |
| vernier-cli-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
0.1.0 — 2026-05-19
Release Notes
First release out of the 0.0.x line. Mostly a performance + parallelism
follow-up to 0.0.4 — no new evaluation paradigms, every shipped kernel
keeps strict bit-equal parity with its oracle. The cross-paradigm
benchmark page is refreshed against the post-0.0.4 SHA on the same
machine fingerprint as the 0.0.4 snapshot (37652a58e939).
Added
num_threadsparallelism (ADR-0047)
(#251, #253, #254, #256) — opt-innum_threads: int | None = None
on every public evaluate surface across all four paradigms: instance
(bbox / segm / boundary / keypoints), semantic, panoptic, and LVIS,
on batch + streaming + background entry points (Evaluator.evaluate,
Evaluator.background,submit/submit_png). The sequential
path (num_threads=Noneor1) is byte-for-byte unchanged from
0.0.4; no rayon symbol is entered.parity_threadsparity tests
assert bit-equal results acrossnum_threads ∈ {None, 1, 2, 4, 8}
on every paradigm. CLI gainsvernier eval --threads N.bench-timingsCargo feature (#256) — atomic(par_iter, serial_post)split +build_*_annscall counter on
evaluate_with_parallel, attributed via the newBenchCounterSet
shared helper (#258). Off by default and stripped from the shipped
wheel; powers the bbox-scaling attribution at
docs/engineering/benchmarking/2026-05-bbox-cdf.md.mimalloc-globalCargo feature onvernier-ffi(#256) —
allocator A/B knob, off by default; lets users opt into mimalloc
for hot-allocation workloads without it being a default cost.- Semantic divan microbench (#261) —
crates/vernier-semantic/benches/accumulate_confusion.rs
exercises three input distributions (realistic_perfect,
realistic_jittered,uniform_random) at the val2017
panoptic-semantic geometry; prereq for the chunked-u8 kernel work.
Changed
- bbox AP perf (#256, #258, #259) — KernelScratch per-worker
annotation pool + direct-write parallel runner (replaces the
per-imageVec<CellOutput>intermediate withpar_chunks_mut);
in-place image-major → canonical transpose via cycle-following
(eliminates a 26 MB intermediate buffer pair on val2017); the
eval_imgs+eval_imgs_metatransposes fuse into a single
cycle walk (halves index arithmetic, drops one of two 1.6 MB
visited-bitset allocations). Net val2017 nt=4: par_iter region
42 → 32 ms, serial_post 45 → 19 ms, peak working-set
−24 MB. The remaining Amdahl floor on--num-threadsfor bbox is
the ~200 ms single-threadeddataset_build(HashMap validation in
CocoDataset::from_parts), attributed viabench-timings. - Panoptic PQ perf (#260) — sparse-remap adjacent-pixel cache on
build_dense_intersectionsandbuild_dense_boundary_intersections.
COCO panoptic always hits the sparse branch (RGB-packed ids exceed
the 1 M dense cap) and panoptic segments are spatially contiguous,
so consecutive(g, d)pairs are usually identical; a 4-state
(last_g, last_d, last_gi, last_di)cache skips theFxHashMap
lookup on adjacent-pixel matches. Dense branch is deliberately
uncached (Vec::getis cheap enough that the miss overhead
regresses synthetic by ~70%). SSSE3 RGB→u32 pack on the panoptic
PNG decode path. Newcoco_like_rgbmicrobench arm exercises the
sparse-RGB path that the existingcoco_likearms missed
(their ids 1..=50 took the dense path). - Semantic mIoU perf (#261) — decode buffer pool + chunked u8
kernel onaccumulate_confusionfor theT = u8PNG fused-decode
path that drivesSemantic — mIoU (val2017). The pool reuses the
per-image decodeVec<u8>across submissions; the chunked kernel
keeps the strict-mode u64-additive fold but processes pixels in
cache-line-sized batches. - Background-evaluator threading wired (#253, #254) —
BackgroundConfig.num_threadsis no longer hardcodedNoneon the
panoptic and semantic FFI ctors;BackgroundCapablegains a
default-methodapply_update_parallelthat the panoptic and
semantic streaming impls override. Panopticsubmit_pngdefers
PNG decode into the worker pool (PyBackedByteszero-copy) so
libpng decode parallelises across submissions; the single-threaded
path keeps inline decode and is byte-for-byte unchanged. vernier-pixel-packfolded intovernier-panoptic— the
SSSE3 RGB→u32 pack primitive added in #260 lived briefly as a
standalone workspace crate. With a single consumer
(vernier-panoptic::decode) and 172 LOC, it sat below the
leaf-crate threshold and the audited-unsafe carveout fits cleanly
inside the host crate (#![deny(unsafe_code)]at root, module-local
#[allow(unsafe_code)]on the SSSE3pshufbfn). Folding it
back keeps the published crate set at the six 0.0.4 crates and
avoids the registry-reservations + Trusted-Publisher loop in the
release runbook for a non-reusable internal SIMD primitive.- Bench harness
--num-threads(#251, #252) —bench run --num-threads "1,2,4,8"override overrides the workload's pinned
num_threadstuple; panoptic + semantic spawn helpers now forward
the flag (previously dropped, so every panoptic / semantic cell
ran withargs.num_threads = Noneregardless of what the CLI
swept). - Bench page refreshed against
3a509df6c525on the same
37652a58e939fingerprint as the 0.0.4 snapshot, so the speedup
deltas are not confounded by host change. Per-cell movements
(vernier median, 0.0.4 → HEAD):- panoptic PQ: 12.59 s → 10.53 s (−16.4%; speedup
2.73× → 3.30× vs panopticapi). IQR also narrows from 21.22%
to 9.78% (still over the 5% gate — PNG decode is chronically
noisy on this host). - semantic mIoU val2017: 5.00 s → 2.82 s (−43.6%;
speedup 4.12× → 7.40× vs mmsegmentation). - instance bbox / segm / boundary / keypoints / synth-semantic /
LVIS move within VPS noise of their 0.0.4 numbers; speedups
widen by 0.1×–0.5× as baselines drift slightly slower on this
run.
- panoptic PQ: 12.59 s → 10.53 s (−16.4%; speedup
Fixed
bench run --impl allon non-instance paradigms —impls_for_iou
raisedKeyErrorfor the paradigm-specific impls
(vernier_panoptic,panopticapi,mmsegmentation,
vernier_lvis,lvis-api) that #252 widenedALL_IMPLSto
include. Falls back to an empty IoU set for impls that aren't
registered for the instance paradigm.
Install vernier-cli 0.1.0
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.1.0/vernier-cli-installer.sh | shInstall prebuilt binaries via powershell script
powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.1.0/vernier-cli-installer.ps1 | iex"Download vernier-cli 0.1.0
| File | Platform | Checksum |
|---|---|---|
| vernier-cli-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
| vernier-cli-x86_64-pc-windows-msvc.zip | x64 Windows | checksum |
| vernier-cli-aarch64-unknown-linux-gnu.tar.xz | ARM64 Linux | checksum |
| vernier-cli-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
0.0.4 — 2026-05-16
Release Notes
Robustness follow-up to 0.0.3. No new evaluation paradigms or kernel
changes — this release widens the typed-error surface, adds a fuzz
harness, and pins the platform-compat matrix in CI. The cross-paradigm
benchmark page is refreshed against the post-0.0.3 SHA on a single
fingerprint (no more dual-SHA LVIS caveat).
Added
- Typed Python error surface (#249) — four new
PyValueError
subclasses (InvalidAnnotationError,NonFiniteError,
DimensionMismatchError,InvalidConfigError) with the public
surface pinned bytests/python/test_error_matrix.pyand documented
atdocs/reference/errors.md. Previously these all surfaced as bare
ValueError; existingexcept ValueError:catches still match the
new subclasses. - Fuzz harness (#249) —
tools/fuzz/cargo-fuzz targets for the
COCO / manifest / RLE / segmentation parsers (non-workspace crate so
the nightly cargo-fuzz toolchain stays out of the publishable
workspace). Thevernier_core::fuzz_regressionsintegration test
replays minimised crashes on everycargo nextest run; CI's
slow.ymlcarries a 120 s/target smoke that builds once and exits. - Platform-compat matrix (#249) —
slow.ymladds a
py3.10 × py3.13 × py3.14 ladder crossed with numpy / torch combos,
exercising theBackgroundEvaluatortutorial end-to-end. Catches
ABI / DLPack regressions that the single-versionci.ymlmatrix
doesn't surface. bench-histogramCargo feature (#249) — opt-in(G, D, wall_ns)
per-call recorder onmatch_image, off by default and stripped from
the shipped wheel. Powers the 10× val2017 scaling proof at
docs/engineering/matching-scaling.md. Gated on
vernier-core/vernier-ffi/vernier-mask; no production cost.- Stress-matrix workloads (#249) — 6 named regimes
(coco-baseline,detr-output,lvis-crowded,open-images-cats,
satellite-4k,pathology-8k) plus per-axis sweeps in
bench/workloads/stress_matrix.py; runner at
bench/runners/stress_runner.py. Catalogue and expected behaviour
per axis indocs/engineering/stress-matrix.md. - Memory-under-training-load runner (#249) —
bench/bench/runners/memory_bench.py(reuses
bench.harness.rss.RSSSampler); methodology and reading guide at
docs/engineering/memory-under-training.md. - Colab smoke notebook (#249) — free-tier platform-check entry
point; README badge links to it.
Changed
- Tutorial smoke now ingests via the DLPack array path —
fake_model(image_ids) -> list[Detections]returning numpy arrays
submitted batch-mode (matches torchvision's detection-API
convention). Notebook cell-3 stays byte-identical to the.pybody
modulo the module docstring and__main__guard. - Bench page refreshed against the AMD EPYC-Milan host on a fresh
machine fingerprint (37652a58e939). Speedups hold within VPS
variance; absolute medians shift by ±3% versus the 0.0.3 snapshot.
The dual-SHA LVIS caveat retires — every section now lives at the
same SHA / fingerprint. The panoptic and synthetic-semantic cells
exceeded the 5% relative-IQR gate (chronically noisy on this host —
PNG decode dominates panoptic wall time, mmseg synthetic sits at the
noise floor at 200-image scale); flagged inline with*per the
renderer's existing convention.
Install vernier-cli 0.0.4
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.0.4/vernier-cli-installer.sh | shInstall prebuilt binaries via powershell script
powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.0.4/vernier-cli-installer.ps1 | iex"Download vernier-cli 0.0.4
| File | Platform | Checksum |
|---|---|---|
| vernier-cli-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
| vernier-cli-x86_64-pc-windows-msvc.zip | x64 Windows | checksum |
| vernier-cli-aarch64-unknown-linux-gnu.tar.xz | ARM64 Linux | checksum |
| vernier-cli-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
0.0.3 — 2026-05-15
Release Notes
This is the diagnostic-surfaces and scenario-slicing release: instance
gains an oLRP error decomposition (Oksuz et al.), a detection-family
calibration summarizer (ECE / MCE / reliability), and a manifest-driven
slice-and-aggregate lane that runs one matching pass across N scenario
cells. Panoptic picks up boundary PQ. No paradigm shifts, no
crates.io additions — every kernel slots into the existing
vernier-core / vernier-panoptic / vernier-semantic surface.
Added
- LRP / oLRP error decomposition (ADR-0043, ADR-0044, ADR-0045) —
Oksuz et al. (ECCV 2018 / TPAMI 2021) Localization Recall Precision
as an opt-in metric alongside AP.vernier.instance.optimal_lrp(gt, dt, iou=Bbox()|Segm()|Boundary()|Keypoints())decomposes detection
performance intooLRP_Loc + oLRP_FP + oLRP_FN, minimised over a
per-class confidence thresholdtau. CLI gains--metric {ap,olrp}
withappreserving the existing headline-table contract. The Rust
core lives incrates/vernier-core/src/lrp/; the ADR-0005 firewall
is held (no edits tomatching.rs/accumulate.rs/evaluate.rs).
Pure-NumPy oracle is the correctness contract (ADR-0043);
kemaloksuz/LRP-Erroris an opt-in tripwire, not a parity gate.
vernier.panoptic.optimal_lrpis a typedNotImplementedErrorstub
— panoptic predictions carry no per-segment score so the tau sweep
has nothing to scan; extension is a follow-up ADR. - Boundary Panoptic Quality (ADR-0025 §Z1/Z2 amendment) —
PanopticEvaluator(boundary=True, dilation_ratio=0.02)now ships
under bothparity_mode="strict"(bit-exact reproduction of
bowenc0221/boundary-iou-api'scoco_panoptic_api/evaluation.py
at SHA37d25586a677) andparity_mode="corrected"(deterministic,
snapshot-based; segment-id-sorted iteration). Composition is
iou = min(mask_iou, boundary_iou)— identical to the instance
Boundary case (the prior Q3 row ofboundary-iou-quirks.mdhad
miscalled this; corrected in the same amendment). FN/FP attribution
is unchanged; U6/U7/V1-V7/W1/W7 stand. The streaming runner threads
boundary state per image withBoundaryScratchreuse, and
distributed-eval partials hash thedilation_ratiointo
params_hashso silent boundary/instance partial mixing is rejected
at envelope-validation time. NoFORMAT_VERSIONbump. Cityscapes
panoptic (Z3) remains deferred. - Detection-family calibration summarizer (ADR-0018) —
ECE / MCE / reliability table for bbox / segm / boundary /
keypoints. Opt-in viaEvaluator.evaluate(..., calibration=True);
the lazyresult.calibration(iou=..., n_bins=15, binning="quantile", min_score=0.05, per_class=False, ...)re-fold
returns avernier.calibration.CalibrationResult(polars
reliability/per_classplus scalarece/mce). Re-folding
with different params does not re-run matching. Streaming pairing:
BackgroundEvaluator.finalize_with_cells()plus the
vernier.calibration.StreamingSnapshotwrapper. Clean-room NumPy
oracle is the correctness contract; 16/16 parity bit-equal at
strict mode. Panoptic and semantic calibration are deferred
(data-model prerequisites per the ADR's per-paradigm shape map). - Slice-and-aggregate (ADR-0046) — manifest-driven scenario
slicing across all three paradigms plus thevernier aggregate
fan-in verb. Python:
Evaluator.evaluate(..., manifest=..., cross_axes=...)accepts a
dict, JSON / CSV path, or Arrow PyCapsule manifest and returns
EvalResult.slicesas a polarsDataFrame(one row per
(axis, value)cell). CLI:vernier eval --manifest weather.json [--cross weather,time_of_day] [--label NAME] [--metric {ap,olrp}]
emits a v2 envelope; un-partitionedvernier evalkeeps emitting
v1 verbatim.vernier.aggregate(results, manifest, *, baseline=None, metric=None)andvernier aggregate result1.json result2.json --manifest runs.json --baseline cleanfan N runs
into a comparative table with<metric>(mPC) and
<metric>__rpc(rPC) columns when--baselineis set. The
tables=+manifest=cross product is a deliberate non-feature
with a client-side recipe at
docs/how-to/per-class-by-slice.md.
New reference schemas:manifest-schema.md,
aggregate-schema.md.
Install vernier-cli 0.0.3
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/NoeFontana/vernier/releases/download/v0.0.3/vernier-cli-installer.sh | shInstall prebuilt binaries via powershell script
powershell -ExecutionPolicy Bypass -c "irm https://github.com/NoeFontana/vernier/releases/download/v0.0.3/vernier-cli-installer.ps1 | iex"Download vernier-cli 0.0.3
| File | Platform | Checksum |
|---|---|---|
| vernier-cli-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
| vernier-cli-x86_64-pc-windows-msvc.zip | x64 Windows | checksum |
| vernier-cli-aarch64-unknown-linux-gnu.tar.xz | ARM64 Linux | checksum |
| vernier-cli-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
0.0.2 — 2026-05-12
Release Notes
This is the three-paradigm release: instance gains panoptic and semantic
siblings, distributed eval lands across all three, and the bench harness
brings real-model + alternatives numbers to the docs site. Two new
crates ship to crates.io (vernier-panoptic, vernier-semantic) plus
the vernier-partial leaf that holds the shared partial wire envelope.
Added
- Distributed-eval entry points on
Evaluator(ADR-0035) — each
paradigm's publicEvaluatorgains
evaluate_to_partial(..., *, rank_id) -> bytesand a
classmethodfrom_partials(...) -> Summary. Per-paradigm shapes:
instance takes JSON bytes, semantic takesDataset/Predictions,
panoptic takes per-image tuples +categories=(the one
asymmetry —PanopticDatasetdoesn't yet expose per-image
accessors; closing that gap is a follow-up). The streaming
substrate, thevernier-partialwire format,FORMAT_VERSION,
partition-disjointness invariant, and the five paradigm-shared
Partial*exception classes are all unchanged. The same DDP
recipe works on instance, semantic, and panoptic. - Distributed evaluation wire format (ADR-0031, ADR-0032) — new
vernier-partialworkspace crate holds the shared partial-envelope
(magic +FORMAT_VERSION+ framing + the fivePartial*typed
errors) used by all three paradigms.FORMAT_VERSIONis a 1→2
hard break (pre-1.0 policy). Cross-paradigm merge is structurally
rejected (paradigm tag in the envelope). Determinism contract is
paradigm-specific: instance preserves bit-exactness, semantic
preserves it for any partition, panoptic only when the partition
order matches the original GT order.BackgroundEvaluatorreuses
the same substrate viafinalize_to_partial.
Changed
- Public-surface consolidation (ADR-0035, supersedes the public
StreamingEvaluatorportion of ADR-0013; amends ADR-0014, ADR-0031,
ADR-0032). Each paradigm now exposes two classes:Evaluator
(frozen config dataclass; batch + DDP entry points) and
BackgroundEvaluator(in-training entry point;submit/
finalize/finalize_with_tables/finalize_to_partial/
context manager). The streaming pyclasses are removed from Python
entirely; the Rust substrate stays and is reachable via new
PyO3 functions (evaluate_*_to_partial,merge_*_partials) and
viaBackgroundEvaluator.
Removed
-
vernier.{instance,panoptic,semantic}.StreamingEvaluator— the
three streaming pyclasses are removed from Python entirely. They no
longer appear onvernier._core, on any paradigm namespace, or
under avernier._implshim. The Rust streaming substrate
(vernier_core::stream::StreamingEvaluator<K>,
StreamingPanopticEvaluator,StreamingSemanticEvaluator) remains
as the implementation behind the new
evaluate_*_to_partial/merge_*_partialsPyO3 functions and
BackgroundEvaluator's worker. No public deprecation shim — pre-1.0
hard break. -
Evaluator.stream(...)factory onvernier.{panoptic,semantic}—
removed alongside the public streaming class. Use
BackgroundEvaluator(...)directly, orEvaluator.evaluate_to_partial
/Evaluator.from_partialsfor DDP. -
StreamingEvaluator.snapshot(running=True)and its Rust-side
snapshot_running()method — the biased fast path that ADR-0013
itself flagged as inappropriate for quality gates. -
StreamingEvaluator.checkpoint()/restore()— these were
NotImplementedthin wrappers aroundsnapshot_to_partial/
from_partials. The persistence story is now exclusively
evaluate_to_partial→ store bytes →from_partialson resume. -
BackgroundEvaluator.snapshot(),snapshot(peek=True),
snapshot_with_tables(), and the non-finalizeto_partial()on
all three paradigms. Public surface is the consuming
finalize/finalize_with_tables/finalize_to_partialonly. -
BackgroundPanopticEvaluator.from_partials/
BackgroundSemanticEvaluator.from_partials— vestigial (return-
type bug carried them; no caller used them). -
Semantic-segmentation user docs (ADR-0028 PR-B10) — three new
pages indocs/:migrate/from-mmsegmentation.md(semantic-side
migration recipe with preset / streaming / NaN-vs-0.0 /
binary-mask coverage), andexplanation/three-paradigms.md(paradigm
picker — when to reach for instance vs panoptic vs semantic, why
they're sibling submodules rather than a single evaluator with a
knob). README updated to
feature the three-paradigm surface in a top-level section
alongside the install commands;mkdocs.ymlnav surfaces both
new pages plus the previously-orphaned panoptic migration
guide. -
Semantic-segmentation streaming evaluator (ADR-0028 PR-B9
partial — streaming only; Breakdown / result-tables follow-ups
scoped to a future PR). New
vernier_semantic::StreamingSemanticEvaluatoris a flat
O(n_classes²)accumulator overConfusionMatrix:update(image_id, gt, dt)folds via the sameaccumulate_confusionkernel the batch
path uses;snapshot()is constant-time relative to image count
(per ADR-0013, no fast-vs-running mode distinction needed). FFI
pyclassvernier._core.StreamingSemanticEvaluatoris registered
on the module; the PythonEvaluator.stream(n_classes, ignore_label=None)factory returns a fresh streaming evaluator
carrying the parent'sparity_mode. Load-bearing invariant
(pinned bytests/python/test_semantic_streaming.py::test_streaming_finalize_bit_equals_batch_evaluate):
streamingfinalize()is bit-equal to batchevaluate(...)over
the same images on f64 outputs. 10 new Python tests + 7 new Rust
tests; total workspace 472 Rust + 376 Python tests pass. -
Semantic-segmentation Python wrapper + per-dataset presets
(ADR-0028 PR-B5) — newvernier.semanticsubmodule (per ADR-0029)
exposingDataset/Predictions/Evaluatorfrozen dataclasses
plusSummary/ClassSemanticStats/ConfusionMatrix
re-exports of the FFI pyclasses (under their unprefixed names).
Dataset.from_arraysandPredictions.from_arraysaccept any
unsigned-integer dtype; the wrapper preserves the input dtype and
the FFI/kernel walks at native dtype (since ADR-0037).
Dataset.from_files/Predictions.from_filesdecode single-
channel PNG label maps via lazy-imported Pillow (raises a
structuredImportErrorif Pillow is missing); RGB-encoded panoptic
PNGs are rejected with a typed message pointing at
vernier.panoptic.Dataset.Predictions.from_binary_masks
implements the AN2 per-class binary-mask merge with explicit
merge ∈ {"argmax", "first", "highest_class_id"}selector and
unlabeled_classparameter (quirks AN3, AN4). Per-dataset
presetsDataset.cityscapes/ade20k/pascal_vocbake the
canonical(n_classes, ignore_label)constants from
vernier_semantic::parity::*. 23 new Python tests cover the
wrapper round-trip, dtype handling, ignore-label / label-remap
propagation, binary-mask merge rules, RGB rejection, and
end-to-end PNG decode + evaluate. -
Semantic-segmentation FFI surface (ADR-0028 PR-B4) —
vernier._core.evaluate_semantic_from_arrays(gt_label_maps, dt_label_maps, n_classes, parity_mode, *, ignore_label=None, label_remap=None)is the load-bearing pyfunction that drives the
Rust kernel + summarize pass underpy.detach(ADR-0006). Inputs
are dicts mapping image_id (int) → 2-Dnumpy.ndarrayof dtype
uint32. New pyclassesSemanticSummary,ClassSemanticStats,
ConfusionMatrixexpose the per-class and global metrics; the
confusion matrix is materialized as a 2-Dnumpy.uint64array
viaConfusionMatrix.counts()(ADR-0028 §F1 first-class output).
GT image-id ordering is sorted for deterministic accumulation
(quirk AM5 aligned).label_remapis pre-applied to DT
buffers at the FFI boundary (quirk AK2) so the hot kernel
loop avoids per-pixel dict lookups. PNG-decode (from_files) and
binary-mask (from_binary_masks) variants land in PR-B5
alongside the per-dataset preset constructors that drive them.
14 Python smoke tests pass; full workspace 465 Rust + 343 Python
green. -
Semantic-segmentation kernel + summarize (ADR-0028 PR-B3) —
vernier_semantic::kernel::accumulate_confusionper-image
histogram fold (one pass over flattened(H, W)slices into a
u64(n_classes, n_classes)matrix; ignore-label mask before
the bincount per quirk AJ2; out-of-range DT silent-skip per
AI4 strict-MS path).ConfusionMatrixis a flat-Vec<u64>
row-major shape that doubles as the FFI(N, N)numpy-view
source.vernier_semantic::summarize::summarizederives the
seven headline outputs (mIoU, FWIoU, pixel accuracy, mean
accuracy, per-class IoU/accuracy/precision, plus the confusion
matrix as a first-class output per AL8).parity_mode
selects NaN vs. 0.0 for zero-support per-class entries (quirk
AL2); means skip zero-support classes regardless of mode
(AL3, mirroring panopticapi W2 and LVIS AB3). 16
unit tests (kernel + summarize) on hand-computed fixtures, all
pass in--releaseand debug. No SIMD per ADR-0028 §"Numerical
layout" — the kernel is integer/memory-bandwidth bound. Dataset
constructors and FFI surface land in PR-B5 / PR-B4 respectively. -
Semantic-segmentation crate scaffold (ADR-0028 PR-B2) — new
workspace membercrates/vernier-semantic/withCargo.toml/
lib.rs/error.rs/parity.rs. Re-exports
vernier_core::parity::ParityModeper ADR-0028 §"Workspace and
dependency direction" — the first dep-edge asymmetry vs.
vernier-panoptic ⊥ vernier-core, justified by concrete reuse.
Pins the per-dataset ignore-label conventions
(CITYSCAPES_IGNORE_LABEL=255,ADE20K_IGNORE_LABEL=0,
PASCAL_VOC_IGNORE_LABEL=255), class counts, and
SEMANTIC_PARITY_EPSplaceholder.SemanticErrorenum surfaces
the corrected-disposition...