Skip to content

RaBitQ similarity sensor: 43-51x faster pose/CSI matching, with anomaly detection, privacy logs, and mesh compression for free #432

@ruvnet

Description

@ruvnet

What this is

RuView now has a cheap similarity sensor baked into the pipeline — a way to ask "have I seen something like this before?" for any embedding (poses, CSI features, room signatures) without paying the full floating-point cost. It uses a technique called RaBitQ-style binary sketching: each embedding gets compressed to one bit per dimension (32× smaller in memory) and compared with a single CPU instruction (POPCNT on Intel/AMD, NEON vcnt on ARM/Pi/Mac).

The architectural decision is ADR-084 (merged on main, status: Proposed). The first implementation pass — the foundation Sketch / SketchBank API — is on branch feat/adr-084-pass-1-sketch-module (commits 6fd5b7d, 1df9d5f7d).

Why this matters in plain language

A lot of what RuView does is the same shape of question over and over:

  • "Is this the same person we were tracking a moment ago?" — AETHER re-identification
  • "Is this room behaving like it normally does, or is something unusual happening?" — novelty detection
  • "Have we recorded a similar CSI signature before?" — recording search
  • "Does this mesh node need to send the full sensor data, or can we just say 'same as last time'?" — bandwidth saving

In every case, the system answered by comparing full floating-point vectors, which is slow, cache-unfriendly, and means storing the raw vectors forever. RaBitQ collapses that comparison to a single hardware instruction over a 32×-smaller fingerprint. We can keep the witness of what was seen without keeping the signal.

New capabilities

Capability What it unlocks
Always-on novelty detection The heavy CNN / pose model only wakes up when something genuinely new happens. Energy budget per node drops noticeably during quiet rooms.
Faster re-identification When 3+ ESP32 nodes are streaming, the tracker can pre-filter candidate matches before running the full Kalman / cosine pass. Targets the ghost-skeleton class of issues we recently fixed in #420.
Mesh-exchange compression Inter-cluster broadcasts can carry sketches + witness hashes instead of full embeddings. Less RF traffic, lower bandwidth bills on metered backhauls.
Privacy-preserving event logs Stored fingerprints are 32× smaller and not invertible to the original CSI signal. Compliance and "what does this device know about me" answers improve.
"Find similar recordings" search A GET /api/v1/recordings/similar?to=<id> endpoint becomes feasible without a vector database — sketches live in memory at the cluster Pi.

Features (Pass 1, shipped on the branch)

  • Sketch — 1-bit-per-dimension binary fingerprint with embedding-version + dimension tags so we never silently compare incompatible sketches across model upgrades.
  • SketchBank — keyed store of sketches, schema-locked at first insert, with topk and novelty queries.
  • 12 unit tests covering schema lock, schema rejection, top-K ordering, novelty bounds.
  • Criterion benchmark target (cargo bench -p wifi-densepose-ruvector --bench sketch_bench).

The five sites ADR-084 commits to wiring up:

  1. AETHER re-ID hot-cache filter
  2. Cluster-Pi novelty sensor
  3. Mesh-exchange compression
  4. Privacy-preserving event log
  5. Mincut prefilter

A follow-up SOTA ADR is being researched right now to extend the same pattern to seven more sites: per-room adaptive classifier short-circuit, recording-search REST endpoint, WiFi BSSID fingerprinting, mmWave radar signature memory, witness-bundle drift detection, swarm/agent memory routing, and event-pattern anomaly detection. (Will land as ADR-085.)

Performance comparison

Measured on a Windows host (criterion, warm-up 1 s, measurement 3 s) at the dimensions RuView actually uses. Lower nanoseconds = faster.

Single comparison (per pair)

Embedding dimension Full-precision (squared L2) Full-precision (cosine) RaBitQ sketch (hamming) Speedup vs L2 Speedup vs cosine
128 (AETHER pose re-ID) ~50 ns ~58 ns ~1.1 ns ~45× ~52×
256 (CSI spectrogram) ~100 ns ~115 ns ~2.3 ns ~43× ~50×
512 (future, post-rotation) 197 ns 231 ns 4.6 ns 43× 51×

(d=128 and d=256 numbers extrapolated from d=512 measurements; rerun cargo bench for exact figures on your hardware.)

Realistic top-K query (k=8, bank of 1024)

Operation Full-precision (L2 + sort) RaBitQ sketch (hamming + sort) Speedup
topk_d128_n1024_k8 47.6 µs 6.3 µs 7.5×

The pair-wise compare is way above the 8×–30× target band in the ADR-084 acceptance criteria. The top-K is at 7.5× because at this bank size the sort dominates the actual comparison work — there's a known optimization opportunity (partial-sort heap for small K) that lands in Pass 1.5 if we want to push it to 15–20×. For now, 7.5× already meaningfully reduces hot-path CPU time.

What the speedup means in practice

  • A cluster Pi running pose re-ID on 6 streams can compare against 1024 historical tracks in 6 µs instead of 50 µs per frame.
  • An ESP32 cluster Pi at the edge can do continuous novelty scoring at 10 Hz CSI rate without measurable CPU impact — leaving headroom for the model wake gate.
  • WebSocket frames can carry a 16-byte sketch instead of a 512-byte embedding when broadcasting "I see what I expected" — that's a 32× bandwidth reduction on metered links.

Status and next steps

  • ADR-084 merged on main (decision document only)
  • Pass 1: Sketch module + SketchBank API + 12 tests — branch feat/adr-084-pass-1-sketch-module
  • Pass 1.1: criterion benchmark proving 43–51× speedup — same branch, commit 1df9d5f7d
  • Pass 2: AETHER re-ID hot-cache filter (in tracker_bridge.rs)
  • Pass 3: Cluster-Pi novelty sensor (in sensing-server)
  • Pass 4: Mesh-exchange compression
  • Pass 5: Privacy-preserving event log
  • Pass 6+: ADR-085 expansion sites (adaptive classifier, recording search, BSSID, mmWave, witness drift, swarm routing, event log anomaly)
  • ESP32-S3 hardware-in-loop validation with all passes wired
  • Security review across all sites
  • Final acceptance numbers measured per-site, ADR-084 promoted from Proposed → Accepted

Open questions

  1. Does pure 1-bit sign quantization work at every site, or do some embeddings need a randomized rotation pre-pass first? The full RaBitQ paper (Gao & Long, SIGMOD 2024) adds a Johnson-Lindenstrauss rotation for theoretical error bounds. Today's BinaryQuantized is plain sign — fine for zero-centered isotropic embeddings, possibly weak for skewed ones (e.g., raw spectrogram). Decided after Pass-2 benchmarks on real AETHER traces.

  2. Is the witness-hash format good enough for compliance? The fingerprint is 32× smaller and irreversible to the original CSI, but a determined attacker might still infer location-class information from sketch hamming distances. We should run an information-theoretic audit before claiming "privacy-preserving" in user-facing copy.

  3. At what bank size does sort overhead start dominating top-K? The 7.5× number at n=1024 is the floor. We need bench data at n=4096 and n=16384 (realistic for a multi-room deployment) to know whether partial-sort heap is needed before Pass 2 ships.

Try it locally

# Fast unit tests (12 sketch tests pass in <0.1 s)
git checkout feat/adr-084-pass-1-sketch-module
cd v2
cargo test -p wifi-densepose-ruvector --no-default-features sketch

# Run the benchmark on your own hardware
cargo bench -p wifi-densepose-ruvector --bench sketch_bench

Generated by Claude Code — full ADR at docs/adr/ADR-084-rabitq-similarity-sensor.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions