Skip to content

Release v1728

Choose a tag to compare

@github-actions github-actions released this 13 Jun 20:11
· 7 commits to main since this release
9b07dff

Automated release from CI pipeline

Changes:
feat(beyond-sota): ADR-155 metric unification + ADR-156 RaBitQ Pass-2 (honest negative + latent topk bugfix) (#1053)

  • refactor(train): hoist canonical PCK/OKS to un-gated metrics_core; fold test_metrics onto production (ADR-155 M1 §8)

ADR-155 §8 deferred item: test_metrics.rs reference kernels validated
production against their OWN reimplementation — a test that cannot catch a
canonical-impl bug (both could be wrong the same way).

  • Extract canonical_torso_size / pck_canonical / oks_canonical / sigmas /
    bounding_box_diagonal into a new NON-tch-gated metrics_core module, so
    the single metric definition is reachable under
    cargo test --no-default-features (the metrics module is tch-gated).
    metrics re-exports every item → still exactly ONE implementation.
  • Rewrite tests/test_metrics.rs to assert the PRODUCTION pck_canonical /
    oks_canonical equal hand-computed fixtures (not a reimplementation):
    canonical_pck_matches_hand_computed_fixture (corr=3/total=4/pck=0.75),
    hip↔hip normalizer pin, zero-visible⇒0.0, OKS perfect⇒1.0, fake-Gold pin.
  • Keep an INDEPENDENT raw-threshold reference kernel only as a differential
    cross-check: test_kernel_agrees_with_canonical asserts it AGREES with
    canonical where torso==1.0 (genuine cross-check, not duplication).

Grade: MEASURED. test_metrics 10→12 tests, 0 failed.

Co-Authored-By: claude-flow ruv@ruv.net

  • fix(sensing-server): relabel divergent live PCK/OKS so they're never conflated with canonical (ADR-155 M1 §2.1/§8 Goal C)

Goal C named training_api.rs:804 (torso-HEIGHT PCK). Auditing it surfaced
TWO findings the ADR-155 §1 table missed:

  1. training_api.rs is an ORPHAN file — not declared mod in lib.rs OR main.rs,
    so it does NOT compile into the crate. It does not drive the live server.
  2. The REAL live best_pck/best_oks (main.rs training path → RVF metadata
    JSON read by model_manager.rs) come from trainer.rs:
    • pck_at_threshold = RAW-threshold PCK, NO torso normalization (the most
      divergent kind), printed/serialized as bare "PCK@0.2".
    • oks_map calls oks_single(area=1.0) = the EXACT fake-Gold pattern
      ADR-155 §2.1 claimed closed elsewhere — still live here, inflating best_oks.

Resolution = RELABEL (torso/raw math is load-bearing on different data; the
pub fns can't be renamed without breaking API; sensing-server has no train/
ndarray dep). Honest unify is a tracked §8 backlog item.

  • training_api.rs: compute_pckcompute_pck_torso_height + divergence doc;
    val_pck/best_pck/val_oks struct fields documented as torso-HEIGHT proxies;
    logs say pck_torso_h@0.2. Test torso_pck_is_labelled_distinctly_from_canonical.
  • trainer.rs (LIVE): pck_at_threshold documented raw-unnormalized; oks_map
    area=1.0 flagged fake-Gold; test pck_at_threshold_is_raw_unnormalized_not_canonical.
  • main.rs: live print relabelled pck_raw@0.2 / oks_map(area=1.0 proxy).

No wire-format field renames (back-compat); no pub-API rename (no silent break).
Grade: MEASURED (relabel + divergence pinned). sensing-server 450→451 lib tests, 0 failed.

Co-Authored-By: claude-flow ruv@ruv.net

  • docs(adr-155): mark §8 metric items RESOLVED + audit map + honest §1 under-count correction (M1b Goals A/D)
  • §8.1: full PCK/OKS audit map (every def: file:line, basis, canonical/
    legacy/distinct), the two §8 items marked RESOLVED with resolution+why.
  • Honest finding: §1's "seven divergent metrics" was an UNDER-count —
    sensing-server's LIVE trainer.rs has a raw-unnormalized PCK and an
    area=1.0 fake-Gold OKS the table omitted, and the file §8 named
    (training_api.rs) is orphaned dead code. §9 honest-limits updated.
  • Goal D: metrics.rs *_v2 variants confirmed caller-less + deprecated;
    noted for future cleanup, NOT deleted (public API, tch-gated).
  • CHANGELOG [Unreleased] Fixed entry.

Co-Authored-By: claude-flow ruv@ruv.net

  • feat(ruvector): RaBitQ Pass-2 randomized rotation + topk bugfix (ADR-156 §8)

Implements the deferred "Multi-bit / Extended RaBitQ Pass 2" backlog item
from ADR-156 §8: a deterministic randomized orthogonal rotation applied
before sign-quantization, the published RaBitQ construction (Gao & Long,
SIGMOD 2024).

Rotation construction: Fast Hadamard Transform + seeded ±1 sign flips
("HD" / randomized Hadamard), O(d log d) time and O(d) memory — a dense
d×d rotation is O(d²) and infeasible at the 65,535-d the wire format
provisions for. Pads to the next power of two; SplitMix64 seeds the sign
stream so index-time and query-time rotations are bit-identical.

API is additive and backward-compatible: Pass 1 (from_embedding) is
untouched; Pass 2 is opt-in via Sketch::from_embedding_rotated and
SketchBank::with_rotation (+ insert_embedding / topk_embedding /
novelty_embedding helpers that rotate consistently). Default behaviour
is unchanged.

While building the Pass-2 coverage harness, found and fixed a PRE-EXISTING
correctness bug in SketchBank::topk: the n>k heap path used
BinaryHeap<Reverse<(d,id)>> (a min-heap) but treated its peek as the
max, so it returned the k FARTHEST sketches as "nearest". The shipped unit
tests only exercised the n≤k fast path, so it went unnoticed. Fixed to a
plain max-heap; pinned by topk_heap_path_returns_nearest and
tight_clusters_give_high_coverage_with_overfetch (the latter measured
0.072 on the old code).

New tests (+17, 100→117 in the crate): rotation determinism/norm-preservation
(rotation_is_deterministic_for_seed, rotation_preserves_norm), Pass-2
shape-compatibility, pass2_coverage_not_worse_than_pass1, and a
deterministic coverage report.

MEASURED top-K coverage (anisotropic planted-cluster fixture, cosine ground
truth; dim=128 N=2048 K=8 64 clusters noise=0.35 128 queries):
candidate_k=K=8 : Pass1 36.13% -> Pass2 46.39% (both << 90% bar)
candidate_k=24 : Pass1 83.89% -> Pass2 91.60% (Pass2 clears 90%)
candidate_k=32 : Pass1/Pass2 100%
Honest result: rotation consistently helps (+10pp at strict K), but neither
pass clears the ADR-084 90% bar at candidate_k==K on this distribution.
Pass 2 reaches 90% only with ~3x over-fetch (the ADR-084 "candidate set"
deployment pattern). Multi-bit Pass 3 evaluated separately.

Co-Authored-By: claude-flow ruv@ruv.net

  • feat(ruvector): multi-bit Pass-3 experiment + ADR-156/084 measured results

Adds the multi-bit half of the ADR-156 §8 "Multi-bit / Extended RaBitQ"
item as a MEASURED experiment (coverage::measure_multibit): rotate, then
b-bit uniform scalar-quantize each coord, rank by L1 over codes — the
natural multi-bit generalization of hamming. Measures the bit/coverage
tradeoff the backlog item asked for.

MEASURED at the strict bar (candidate_k=K=8, anisotropic planted-cluster
fixture, cosine ground truth):
Pass1 (1-bit, no rot) 36.13% 16 B/vec
Pass2 (1-bit, rot) 46.39% 16 B/vec
Pass3 (rot, 2-bit) 54.39% 32 B/vec
Pass3 (rot, 3-bit) 66.70% 48 B/vec
Pass3 (rot, 4-bit) 74.22% 64 B/vec
Honest: multi-bit monotonically helps but even 4-bit (4x memory) reaches
only 74% at the strict bar — neither rotation nor <=4-bit multi-bit clears
the strict-K 90% bar on this distribution. The bar is met via over-fetch
(Pass2 @ candidate_k=24). Tests: multibit_tradeoff_report,
multibit_1bit_matches_pass2_approx (+ sanity that 1-bit ~= Pass-2).

Docs:

  • ADR-156 §8 item #2 marked RESOLVED-PARTIAL; §5 #2 grade CLAIMED ->
    MEASURED-on-our-hardware; new §10 with full measured tables, the topk
    bugfix disclosure, and graded deferred sub-items.
  • ADR-084: "Pass 2" section answering the rotation open-question with
    measured numbers + the topk bug note.
  • CHANGELOG [Unreleased]: Added (Pass-2 milestone) + Fixed (topk heap).

Co-Authored-By: claude-flow ruv@ruv.net

Docker Image:
ghcr.io/ruvnet/RuView:9b07dff29868336a1659c996e312fb9f5c9cc1d0