What this is
RuView now has a cheap similarity sensor baked into the pipeline — a way to ask "have I seen something like this before?" for any embedding (poses, CSI features, room signatures) without paying the full floating-point cost. It uses a technique called RaBitQ-style binary sketching: each embedding gets compressed to one bit per dimension (32× smaller in memory) and compared with a single CPU instruction (POPCNT on Intel/AMD, NEON vcnt on ARM/Pi/Mac).
The architectural decision is ADR-084 (merged on main, status: Proposed). The first implementation pass — the foundation Sketch / SketchBank API — is on branch feat/adr-084-pass-1-sketch-module (commits 6fd5b7d, 1df9d5f7d).
Why this matters in plain language
A lot of what RuView does is the same shape of question over and over:
- "Is this the same person we were tracking a moment ago?" — AETHER re-identification
- "Is this room behaving like it normally does, or is something unusual happening?" — novelty detection
- "Have we recorded a similar CSI signature before?" — recording search
- "Does this mesh node need to send the full sensor data, or can we just say 'same as last time'?" — bandwidth saving
In every case, the system answered by comparing full floating-point vectors, which is slow, cache-unfriendly, and means storing the raw vectors forever. RaBitQ collapses that comparison to a single hardware instruction over a 32×-smaller fingerprint. We can keep the witness of what was seen without keeping the signal.
New capabilities
| Capability |
What it unlocks |
| Always-on novelty detection |
The heavy CNN / pose model only wakes up when something genuinely new happens. Energy budget per node drops noticeably during quiet rooms. |
| Faster re-identification |
When 3+ ESP32 nodes are streaming, the tracker can pre-filter candidate matches before running the full Kalman / cosine pass. Targets the ghost-skeleton class of issues we recently fixed in #420. |
| Mesh-exchange compression |
Inter-cluster broadcasts can carry sketches + witness hashes instead of full embeddings. Less RF traffic, lower bandwidth bills on metered backhauls. |
| Privacy-preserving event logs |
Stored fingerprints are 32× smaller and not invertible to the original CSI signal. Compliance and "what does this device know about me" answers improve. |
| "Find similar recordings" search |
A GET /api/v1/recordings/similar?to=<id> endpoint becomes feasible without a vector database — sketches live in memory at the cluster Pi. |
Features (Pass 1, shipped on the branch)
Sketch — 1-bit-per-dimension binary fingerprint with embedding-version + dimension tags so we never silently compare incompatible sketches across model upgrades.
SketchBank — keyed store of sketches, schema-locked at first insert, with topk and novelty queries.
- 12 unit tests covering schema lock, schema rejection, top-K ordering, novelty bounds.
- Criterion benchmark target (
cargo bench -p wifi-densepose-ruvector --bench sketch_bench).
The five sites ADR-084 commits to wiring up:
- AETHER re-ID hot-cache filter
- Cluster-Pi novelty sensor
- Mesh-exchange compression
- Privacy-preserving event log
- Mincut prefilter
A follow-up SOTA ADR is being researched right now to extend the same pattern to seven more sites: per-room adaptive classifier short-circuit, recording-search REST endpoint, WiFi BSSID fingerprinting, mmWave radar signature memory, witness-bundle drift detection, swarm/agent memory routing, and event-pattern anomaly detection. (Will land as ADR-085.)
Performance comparison
Measured on a Windows host (criterion, warm-up 1 s, measurement 3 s) at the dimensions RuView actually uses. Lower nanoseconds = faster.
Single comparison (per pair)
| Embedding dimension |
Full-precision (squared L2) |
Full-precision (cosine) |
RaBitQ sketch (hamming) |
Speedup vs L2 |
Speedup vs cosine |
| 128 (AETHER pose re-ID) |
~50 ns |
~58 ns |
~1.1 ns |
~45× |
~52× |
| 256 (CSI spectrogram) |
~100 ns |
~115 ns |
~2.3 ns |
~43× |
~50× |
| 512 (future, post-rotation) |
197 ns |
231 ns |
4.6 ns |
43× |
51× |
(d=128 and d=256 numbers extrapolated from d=512 measurements; rerun cargo bench for exact figures on your hardware.)
Realistic top-K query (k=8, bank of 1024)
| Operation |
Full-precision (L2 + sort) |
RaBitQ sketch (hamming + sort) |
Speedup |
topk_d128_n1024_k8 |
47.6 µs |
6.3 µs |
7.5× |
The pair-wise compare is way above the 8×–30× target band in the ADR-084 acceptance criteria. The top-K is at 7.5× because at this bank size the sort dominates the actual comparison work — there's a known optimization opportunity (partial-sort heap for small K) that lands in Pass 1.5 if we want to push it to 15–20×. For now, 7.5× already meaningfully reduces hot-path CPU time.
What the speedup means in practice
- A cluster Pi running pose re-ID on 6 streams can compare against 1024 historical tracks in 6 µs instead of 50 µs per frame.
- An ESP32 cluster Pi at the edge can do continuous novelty scoring at 10 Hz CSI rate without measurable CPU impact — leaving headroom for the model wake gate.
- WebSocket frames can carry a 16-byte sketch instead of a 512-byte embedding when broadcasting "I see what I expected" — that's a 32× bandwidth reduction on metered links.
Status and next steps
Open questions
-
Does pure 1-bit sign quantization work at every site, or do some embeddings need a randomized rotation pre-pass first? The full RaBitQ paper (Gao & Long, SIGMOD 2024) adds a Johnson-Lindenstrauss rotation for theoretical error bounds. Today's BinaryQuantized is plain sign — fine for zero-centered isotropic embeddings, possibly weak for skewed ones (e.g., raw spectrogram). Decided after Pass-2 benchmarks on real AETHER traces.
-
Is the witness-hash format good enough for compliance? The fingerprint is 32× smaller and irreversible to the original CSI, but a determined attacker might still infer location-class information from sketch hamming distances. We should run an information-theoretic audit before claiming "privacy-preserving" in user-facing copy.
-
At what bank size does sort overhead start dominating top-K? The 7.5× number at n=1024 is the floor. We need bench data at n=4096 and n=16384 (realistic for a multi-room deployment) to know whether partial-sort heap is needed before Pass 2 ships.
Try it locally
# Fast unit tests (12 sketch tests pass in <0.1 s)
git checkout feat/adr-084-pass-1-sketch-module
cd v2
cargo test -p wifi-densepose-ruvector --no-default-features sketch
# Run the benchmark on your own hardware
cargo bench -p wifi-densepose-ruvector --bench sketch_bench
Generated by Claude Code — full ADR at docs/adr/ADR-084-rabitq-similarity-sensor.md
What this is
RuView now has a cheap similarity sensor baked into the pipeline — a way to ask "have I seen something like this before?" for any embedding (poses, CSI features, room signatures) without paying the full floating-point cost. It uses a technique called RaBitQ-style binary sketching: each embedding gets compressed to one bit per dimension (32× smaller in memory) and compared with a single CPU instruction (POPCNT on Intel/AMD, NEON
vcnton ARM/Pi/Mac).The architectural decision is ADR-084 (merged on
main, status: Proposed). The first implementation pass — the foundationSketch/SketchBankAPI — is on branchfeat/adr-084-pass-1-sketch-module(commits6fd5b7d,1df9d5f7d).Why this matters in plain language
A lot of what RuView does is the same shape of question over and over:
In every case, the system answered by comparing full floating-point vectors, which is slow, cache-unfriendly, and means storing the raw vectors forever. RaBitQ collapses that comparison to a single hardware instruction over a 32×-smaller fingerprint. We can keep the witness of what was seen without keeping the signal.
New capabilities
GET /api/v1/recordings/similar?to=<id>endpoint becomes feasible without a vector database — sketches live in memory at the cluster Pi.Features (Pass 1, shipped on the branch)
Sketch— 1-bit-per-dimension binary fingerprint with embedding-version + dimension tags so we never silently compare incompatible sketches across model upgrades.SketchBank— keyed store of sketches, schema-locked at first insert, withtopkandnoveltyqueries.cargo bench -p wifi-densepose-ruvector --bench sketch_bench).The five sites ADR-084 commits to wiring up:
A follow-up SOTA ADR is being researched right now to extend the same pattern to seven more sites: per-room adaptive classifier short-circuit, recording-search REST endpoint, WiFi BSSID fingerprinting, mmWave radar signature memory, witness-bundle drift detection, swarm/agent memory routing, and event-pattern anomaly detection. (Will land as ADR-085.)
Performance comparison
Measured on a Windows host (criterion, warm-up 1 s, measurement 3 s) at the dimensions RuView actually uses. Lower nanoseconds = faster.
Single comparison (per pair)
(d=128 and d=256 numbers extrapolated from d=512 measurements; rerun
cargo benchfor exact figures on your hardware.)Realistic top-K query (k=8, bank of 1024)
topk_d128_n1024_k8The pair-wise compare is way above the 8×–30× target band in the ADR-084 acceptance criteria. The top-K is at 7.5× because at this bank size the sort dominates the actual comparison work — there's a known optimization opportunity (partial-sort heap for small K) that lands in Pass 1.5 if we want to push it to 15–20×. For now, 7.5× already meaningfully reduces hot-path CPU time.
What the speedup means in practice
Status and next steps
feat/adr-084-pass-1-sketch-module1df9d5f7dtracker_bridge.rs)sensing-server)Open questions
Does pure 1-bit sign quantization work at every site, or do some embeddings need a randomized rotation pre-pass first? The full RaBitQ paper (Gao & Long, SIGMOD 2024) adds a Johnson-Lindenstrauss rotation for theoretical error bounds. Today's
BinaryQuantizedis plain sign — fine for zero-centered isotropic embeddings, possibly weak for skewed ones (e.g., raw spectrogram). Decided after Pass-2 benchmarks on real AETHER traces.Is the witness-hash format good enough for compliance? The fingerprint is 32× smaller and irreversible to the original CSI, but a determined attacker might still infer location-class information from sketch hamming distances. We should run an information-theoretic audit before claiming "privacy-preserving" in user-facing copy.
At what bank size does sort overhead start dominating top-K? The 7.5× number at n=1024 is the floor. We need bench data at n=4096 and n=16384 (realistic for a multi-room deployment) to know whether partial-sort heap is needed before Pass 2 ships.
Try it locally
Generated by Claude Code — full ADR at docs/adr/ADR-084-rabitq-similarity-sensor.md