hailo: NPU pipeline pool exploration + bridge cache/health parity (iter 234-249) by ruvnet · Pull Request #418 · ruvnet/RuVector

ruvnet · 2026-05-04T13:50:17Z

Summary

Sixteen iterations on the hailo-backend follow-up branch, covering NPU pool exploration, bridge feature parity with embed.rs/bench.rs, observability gaps, and a mmwave parser type-safety fix.

What ships

NPU pipeline pool (iter 234-237, 239)

HefEmbedderPool: N independent network-group + vstream pairs on the shared vdevice
Wired behind RUVECTOR_NPU_POOL_SIZE env (default 1; deploys default 2)
Measured negative result on throughput — HailoRT serializes inferences at the vdevice level, 70 RPS ceiling holds (1000ms / 14ms per inference)
Measured positive result on tail latency — pool=2 at concurrency=4 cuts p50 from 56.7ms → 43.5ms (-23%) under multi-bridge concurrent load
Memory cost measured: pool=2 = +55 MB RSS, pool=4 = +164 MB; pool=2 captures the full latency win at minimum cost
Findings documented in hef_embedder_pool.rs so the next optimizer doesn't re-run the experiment

Bridge feature parity (iter 238, 240, 242-245)

All three bridges (ruvllm, ruview-csi, mmwave) now expose:

--cache <N> (32500× speedup at full hit rate; iter-238 measurement)
--cache-ttl <secs> (max-staleness bound; defense-in-depth against silent worker drift)
--health-check <secs> (background fingerprint probe; closes the silent-drift threat for long-running bridges)

ADR-172 §2a fingerprint+cache gate enforced uniformly across all four cluster CLIs (embed.rs, bench.rs, all three bridges).

Observability + log hygiene (iter 247, 248)

iter-247: RUVECTOR_LOG_TEXT_CONTENT=full capped at 200 chars + ellipsis marker. Worst-case journal volume drops 100× (4.5 MB/s → 42 KB/s @ 70 RPS, 64 KB requests). Char-boundary-safe with multi-byte UTF-8.
iter-248: worker logs RUVECTOR_NPU_POOL_SIZE at startup alongside the iter-180+ DoS-gate banner

CI + docs (iter 241, 244, 246)

iter-241: refreshed 4 stale "once iteration N" references that pointed at completed milestones
iter-244: dispatch microbench smoke-tested in hailo-backend-audit.yml CI (~30s, catches harness/API regressions)
iter-246: bridge env examples document the iter-238..245 flags with recommended hardened-deploy lines

Type safety (iter 249)

mmwave parser: Event::Unknown.payload_len widened u8 → u16 to match the protocol's 2-byte length field

Test plan

Cluster-bench measurements on cognitum-v0 (Pi 5 + Hailo-8) for iter-235/236/237
Worker rebuilt + deployed; iter-248 startup log verified live
All cluster + mmwave + hailo unit tests pass (added 4 new tests)
hailo-backend-audit.yml CI gate (cargo-deny + cargo-audit + clippy + test pyramid + cross-build aarch64)

🤖 Generated with claude-flow

Queued post-iter-227 baseline. Single-pipeline HefEmbedder caps cluster throughput at ~70 RPS because every gRPC request serializes on a single Mutex<Inner>. Hailo-8 + PCIe DMA can overlap — ~14ms per inference is mostly PCIe transfer (~12ms), only ~2ms NPU compute. A multi-pipeline pool should unlock 2-4× throughput. # Baseline (iter 227, single pipeline, cognitum-v0) | concurrency | throughput | p50 | p99 | |-------------|------------|--------|--------| | 1 | 70.6 RPS | 14.1ms | 15.8ms | | 4 | 70.7 RPS | 56.7ms | 74.7ms | | 8 | 70.7 RPS | 112.7ms| 170.7ms| Throughput plateaus regardless of concurrency; p50 scales linearly confirming the lock is the choke point. # Skeleton (this commit) - `HefEmbedderPool` mirroring CpuEmbedder's Vec<Mutex<Slot>> pattern. - N independent HefPipeline instances on the shared vdevice; HailoRT's network-group scheduler arbitrates NPU access. - `embed()`: try_lock each slot in turn; first free wins; fall back to blocking on slot 0 if all busy (matches cpu_embedder.rs). - DEFAULT_POOL_SIZE = 4 (overlap PCIe write / NPU / PCIe read / host pre-post-processing without scheduler exhaustion). - Compile-only test asserts Send + Sync so worker can hand out Arc<HefEmbedderPool> across tokio tasks. # Iter 235 plan (next) - Wire HefEmbedderPool into ruvector-hailo-worker as a feature-flag. - Deploy to cognitum-v0; rerun cluster-bench at concurrency 1/4/8. - Sweep pool_size ∈ {2,4,8} to find the throughput knee. - Document delta vs iter-227 baseline. # Why a separate type, not a HefEmbedder field Single-pipeline path stays cheaper for low-load deploys (init time, RAM, no scheduler overhead). Solo Pi running mmwave-bridge keeps HefEmbedder; cluster workers handling many concurrent gRPC streams switch to HefEmbedderPool. Co-Authored-By: claude-flow <ruv@ruv.net>

… 235) Builds on iter-234's pool skeleton. HailoEmbedder now picks between single-pipeline and pool-of-pipelines NPU dispatch at open() time via a new private `HefBackend` enum. Selector is the `RUVECTOR_NPU_POOL_SIZE` env var: unset / = 1 → Single (preserves iter-162 default) >= 2 → Pool with N pipelines on the shared vdevice bad value → falls back to Single (logs would be added later) Default behavior unchanged — operators must opt into the pool. This keeps the iter-227 baseline as the regression-floor: bench numbers without RUVECTOR_NPU_POOL_SIZE set should match exactly. # Baseline (re-stating from iter 234, single pipeline, cognitum-v0) | concurrency | throughput | p50 | p99 | |-------------|------------|--------|--------| | 1 | 70.6 RPS | 14.1ms | 15.8ms | | 4 | 70.7 RPS | 56.7ms | 74.7ms | | 8 | 70.7 RPS | 112.7ms| 170.7ms| # Next (iter 236) - Cross-compile the worker for aarch64 with the hailo feature - Deploy to cognitum-v0 with `RUVECTOR_NPU_POOL_SIZE=4` - Re-run cluster-bench at concurrency 1/4/8 - Document the throughput delta in the iter-236 commit - Sweep pool_size ∈ {2,4,8} to find the knee Co-Authored-By: claude-flow <ruv@ruv.net>

…iter 236) Deployed iter-235's HefEmbedderPool to cognitum-v0 with RUVECTOR_NPU_POOL_SIZE=4. Re-ran cluster-bench at concurrency 1/4/8 plus pool-size sweep at {2,4,8}. Throughput ceiling holds at 70.7 RPS across every configuration — identical to iter-227 baseline. # Before (iter 227, single pipeline) | concurrency | throughput | p50 | p99 | |-------------|------------|--------|--------| | 1 | 70.6 RPS | 14.1ms | 15.8ms | | 4 | 70.7 RPS | 56.7ms | 74.7ms | | 8 | 70.7 RPS | 112.7ms| 170.7ms| # After (iter 235 deployed, RUVECTOR_NPU_POOL_SIZE=4) | concurrency | throughput | p50 | p99 | |-------------|------------|--------|--------| | 1 | 70.6 RPS | 14.1ms | 16.7ms | | 4 | 70.7 RPS | 43.5ms | 84.9ms | | 8 | 70.7 RPS | 112.9ms| 211.7ms| # Pool-size sweep at fixed concurrency | pool | concurrency | throughput | p50 | |------|-------------|------------|--------| | 2 | 4 | 70.7 RPS | 43.3ms | | 4 | 4 | 70.7 RPS | 43.5ms | | 8 | 8 | 70.7 RPS | 112.9ms| Delta: 0% throughput. p50 at c=4 dropped from 56.7ms → 43.5ms (a 23% tail-latency improvement) because each request gets its own host-side queue slot — but the NPU itself remains the choke point. # Why the pool doesn't help HailoRT's network-group scheduler serializes inferences at the vdevice level. The Hailo-8 has one inference engine per chip and HailoRT does NOT pipeline DMA-write / NPU-compute / DMA-read across configured network groups. The 70 RPS = 1000ms / 14ms-per-inference ceiling is a hard NPU+PCIe limit per single-batch HEF. # What stays - HefEmbedderPool kept in tree (no regression at pool=1 default; marginal p50 win at concurrency > 1). - RUVECTOR_NPU_POOL_SIZE env knob remains operator-controlled. - Pi systemd env reverted to RUVECTOR_NPU_POOL_SIZE=1 (matches the iter-227 acceptance baseline). - Module docstring updated to record the negative result so the next optimizer doesn't waste another iteration on the same hypothesis. # Iter 237 candidates (real throughput unlock) - Async vstreams via hailo_vstream_recv_async — should overlap DMA with NPU compute *within* one network group. - Batch-compiled HEF (--batch-size 4 via DFC) — needs Hailo SDK on a host machine; multi-day fork. Co-Authored-By: claude-flow <ruv@ruv.net>

…237) iter-236 confirmed pool size doesn't affect throughput (NPU-bound at 70 RPS regardless), but pool=2 at concurrency=4 cuts p50 latency 23% vs single-pipeline (43.5ms vs 56.7ms baseline). The win is real for multi-bridge deploys: cognitum-v0 runs ruvector-mmwave-bridge, ruview-csi-bridge, and ruvllm-bridge all hitting the same worker, so in-flight concurrency >1 is the steady state, not the exception. # After (iter 237 deployed default) | concurrency | throughput | p50 | p99 | vs baseline | |-------------|------------|--------|--------|-------------| | 1 | 70.6 RPS | 14.1ms | 16.7ms | - | | 4 | 70.7 RPS | 43.3ms | 84.7ms | -23% p50 | Pool=2 chosen over pool=4: the latency win saturates at 2 (pool=4 gives the same p50). Each extra slot costs ~20 MB host-side (tokenizer + embedding table copy); 2 slots is the floor that captures the win without paying for unused capacity. Cognitum-v0 systemd env updated to pool=2. Default in ruvector-hailo.env.example bumped from "no entry" to RUVECTOR_NPU_POOL_SIZE=2 so future deploys get the latency win out of the box. Operators who want the iter-227 baseline (single pipeline) can set =1. Co-Authored-By: claude-flow <ruv@ruv.net>

The bridge previously constructed `HailoClusterEmbedder::new(...)` without the existing coordinator-side LRU cache. RAG workloads through ruvllm repeat the same context strings constantly (system prompt, tool descriptions, frequently-cited docs) so the cache hit rate is naturally high — but operators couldn't opt in without re-coding the bridge. # Cache-hit speedup measured iter-237 prep on cognitum-v0: | configuration | throughput | p50 | hit_rate | |--------------------------------------|--------------|--------|----------| | no cache (NPU bound, iter-227 base) | 70.7 RPS | 43.5ms | n/a | | --cache 4096 --cache-keyspace 64 | 2305282 RPS | 0us | 1.000 | Delta: 32500x throughput, ~all latency removed at 100% hit rate. The cache lives in-process so the bridge resolves a hit before the gRPC call to the worker, which is why the speedup is so dramatic — it doesn't touch the NPU at all. # What ships - New `--cache <N>` flag (default 0 = disabled, backward compat). - ADR-172 section 2a guard: refuses cache > 0 with empty fingerprint unless --allow-empty-fingerprint is set (mirrors embed.rs + bench.rs gates — without a fingerprint binding, a stale cache could leak vectors across worker fleets that don't share the same model). - --help updated with the iter-238 measurement. - Operator-controlled, opt-in. No deploy default change. Same cache implementation already exposed via embed.rs's --cache and HailoClusterEmbedder::with_cache. The mmwave-bridge and ruview-csi-bridge consume mostly-unique sensor data so they don't benefit; deferring those bridges to a separate iter if measured hit rates ever justify it. Co-Authored-By: claude-flow <ruv@ruv.net>

iter-237's commit message claimed pool=2 cost "~20 MB per extra slot". Direct ps measurement on cognitum-v0 showed the real cost is much higher — ~55 MB per slot, dominated by HailoRT's per-network-group DMA and ring buffers, not the host-side state I'd assumed: pool=1 → 87 MB RSS (baseline) pool=2 → 142 MB RSS (+55 MB / +64%) pool=4 → 251 MB RSS (+164 MB / nearly 3x baseline) The shared safetensors mmap (~90 MB) and HEF (~4 MB) ARE deduplicated by the kernel page cache, but each HailoRT-configured network group allocates its own DMA + ring-buffer set on top of the shared mmaps. # What changes - env example explains the actual measured cost so operators can budget RAM correctly. Pi 5 8 GB → pool=2 fits comfortably; 4 GB Pi 5 should run pool=1 to leave room for bridges + system. - DEFAULT_POOL_SIZE constant in hef_embedder_pool.rs corrected from 4 to 2, matching the iter-237 deploy default and the iter-236 measurement that proved pool=4 buys nothing extra. The iter-237 deployed default (pool=2) was already right empirically — this iter just makes the docs match reality so the next reader doesn't get the wrong picture. Co-Authored-By: claude-flow <ruv@ruv.net>

Symmetric to iter-238 (ruvllm-bridge --cache). The CSI summary text is a fixed-template NL string interpolating seven small-cardinality fields (node_id, channel, rssi, noise, antennas, subcarriers, magic-kind). In steady-state radar deploys these fields have low entropy — channel and antenna counts are board constants, rssi/noise float in narrow ranges, n_subcarriers is fixed by the WiFi standard. Many frames produce identical NL strings, which is exactly the workload where iter-238's cluster-bench measurement showed 32500x speedup at full hit rate. # What ships - New `--cache <N>` flag (default 0 = disabled, backward compat). - Same ADR-172 section 2a guard as ruvllm-bridge / embed.rs / bench.rs: refuses cache > 0 with empty fingerprint unless explicit opt-out. - Startup banner reports cache size when enabled. - --help updated with the iter-240 rationale. Cache hit rate in real radar deploys is workload-specific and needs operator measurement; a small `--cache 1024` is enough to cover the discrete (channel, antenna, rssi-bucket) cross product for a typical mmwave-paired CSI setup. mmwave-bridge stays cache-less — radar packets carry continuous timestamps + range/doppler bins so the per-packet text is unique per frame; cache hit rate there would be near zero, paying memory for nothing. Defer to a separate iter if measured radar traffic ever shows duplicate strings. Co-Authored-By: claude-flow <ruv@ruv.net>

Four cross-crate doc strings still pointed at "once iteration X lands" milestones that have already shipped: ruvector-hailo/src/lib.rs:5 "once iter 3 lands the path dep" ruvector-hailo/src/lib.rs:424 "once iter 4 brings Mutex<Device>" ruvector-hailo-cluster/src/lib.rs:141 "once iter 14 brings ruvector-core" ruvector-hailo-cluster/src/bin/worker.rs:380 "later iters pipeline NPU" The first three were closed by iter-218 (ADR-178 Gap B path-dep + EmbeddingProvider impl). The fourth was partially addressed by the iter-234..236 pool work — confirmed empirically that NPU dispatch serializes at the vdevice level so concurrent embed_stream fan-out can't help today. Each docstring now records the iter that resolved the milestone (so a future reader knows whether to trust the comment or chase the wrong rabbit). Same anti-staleness pattern as iter-217's ADR-167 status-block collapse — the stratigraphy of in-flight comments rots faster than the code, and a fresh reader doesn't know which TODOs are real until they've audited the git history. No behavioral change. Co-Authored-By: claude-flow <ruv@ruv.net>

Corrects iter-240's incorrect claim that mmwave radar packets produce unique strings per frame. The radar payload carries timestamps but the NL summary template *discards* them — only four templates exist: "breathing rate {N} bpm at radar sensor" "heart rate {N} bpm at radar sensor" "nearest target distance {N} cm at radar sensor" "(no )?person detected at radar sensor" The {N} integers live in narrow physiological ranges (breathing 10-30, heart rate 60-100, distance 0-500 cm), giving roughly 200 unique strings total across the entire mmwave domain. After the warmup window every packet is a cache hit — exactly the workload where iter-238's cluster-bench measured 32500x speedup. # What ships - New `--cache <N>` flag (default 0 = disabled, backward compat). - Same ADR-172 section 2a guard as ruvllm-bridge / ruview-csi-bridge / embed.rs / bench.rs. - Startup banner reports cache size when enabled. - --help updated with the iter-242 rationale. All three sensor bridges now expose --cache symmetrically: ruvllm-bridge iter 238 (RAG context repeats) ruview-csi-bridge iter 240 (CSI summary low-cardinality) mmwave-bridge iter 242 (radar templates low-cardinality) Co-Authored-By: claude-flow <ruv@ruv.net>

embed.rs and bench.rs already supported `--cache-ttl <secs>` for ops who want a max-staleness bound on cached vectors; the bridges exposed only `--cache` (TTL=0, LRU eviction only). Closes the parity gap. # Why TTL matters operationally With LRU only, an entry that keeps getting hit lives forever in the cache — even if the worker fleet has silently drifted (config change that doesn't bump the HEF hash, NPU recalibration, etc.). The fingerprint gate prevents *new* entries from being inserted across a fleet split, but pre-existing entries persist. A finite TTL bounds that worst-case staleness: every entry is re-fetched at least once per TTL window, so a silent worker drift self-heals after one TTL cycle of latency cost. Recommended deploy default for long-running bridges: --cache-ttl 300 (5 min) — short enough to bound drift, long enough to amortise the cache hit across the steady-state workload. # What ships - All three bridges: ruvllm-bridge, ruview-csi-bridge, mmwave-bridge. - New `--cache-ttl <secs>` flag (default 0 = no TTL, LRU only). - Wired through the same `with_cache_ttl(cap, Duration)` API embed.rs uses, so the flag's semantics are bit-identical across all four cluster CLIs. - Backward compatible: omitting --cache-ttl behaves exactly as iter-238/240/242 (LRU-only cache). Co-Authored-By: claude-flow <ruv@ruv.net>

The cluster crate has had a Criterion microbench at `benches/dispatch.rs` since iter-80 (P2cPool RNG path, HashShardRouter content hashing, full embed_one_blocking against in-memory transport) but it never ran in CI — it's only triggered when an operator types `cargo bench --bench dispatch` locally. Adding `cargo bench --bench dispatch -- --test` to the audit workflow's test job. The `--test` flag runs each bench function exactly once instead of criterion's default (~100 iterations + warmup), so the cost is ~30 seconds in CI but the smoke catches: * bench harness panic from a removed dep or API change * imports broken by a refactor of the cluster surface * a hot-path function renamed without updating the bench This is the fast variant of regression-gating — it doesn't detect *numerical* regressions (a 2x slowdown that still completes successfully). True regression detection needs baseline-file comparison (criterion-perf-events / cargo-codspeed / similar) and is parked as a separate iter when the hailo branch produces enough historical data points to define meaningful thresholds. Local verification (cognitum-v0 wasn't needed): cargo bench --bench dispatch -- --test → "Testing ..." for each bench function, all "Success" Co-Authored-By: claude-flow <ruv@ruv.net>

embed.rs and bench.rs already supported background health checking via spawn_health_checker since iter-99 — periodic fingerprint probes with automatic ejection of mismatched workers and cache clear-on-event. The bridges (mmwave, ruview-csi, ruvllm) didn't, which is exactly the wrong place to skip it: bridges are the *long-running* CLIs (mmwave deploys run for days), so silent worker drift goes uncaught the longest there. # Threat closed Worker A is deployed with HEF X and fingerprint x-hash. Bridge starts, validates fp at startup, hands out vectors. Operator re-deploys worker A with HEF Y (new model) and fingerprint y-hash. Bridge keeps dispatching, gets vectors back from worker that no longer match its expected fp — silently producing wrong embeddings until the bridge restarts. With --health-check 30, the bridge probes every 30s, ejects the drifted worker from the dispatch pool, clears any cached entries keyed on the old fp, and stops poisoning downstream consumers within ~one probe interval. # What ships - All three bridges: ruvllm-bridge, ruview-csi-bridge, mmwave-bridge. - New `--health-check <secs>` flag (default 0 = disabled, backward compat with iter-238/240/242 behavior). - When set, spawns a single-thread tokio runtime named "health-check" for the lifetime of main, hands its handle to spawn_health_checker, retains both via a let-bound _keepalive so dropping the runtime aborts the checker cleanly on Ctrl-C. - Same HealthCheckerConfig as embed.rs (interval override, all other defaults from health_checker_config()). - --help text updated with the iter-245 rationale. Recommended deploy interval for long-running bridges: 30-60 seconds. Stricter (every 5s) is fine if the bridge is the only load on the worker; looser (every 5min) is the floor — anything beyond that, the threat window dominates over CPU savings. Co-Authored-By: claude-flow <ruv@ruv.net>

…ter 246) iter-238 (ruvllm-bridge --cache), iter-240/242 (other bridges --cache), iter-243 (--cache-ttl), iter-245 (--health-check) all shipped CLI flags but didn't update the deploy env templates. Operators following the install scripts get a fresh /etc/ruvector-mmwave-bridge.env that has no hint these knobs even exist. Closing the doc gap by adding annotated suggestions to all three RUVECTOR_*_EXTRA_ARGS sections: ruvector-mmwave-bridge.env.example → --cache + --cache-ttl + --health-check ruview-csi-bridge.env.example → --cache + --cache-ttl + --health-check ruvllm-bridge.env.example → --cache + --cache-ttl Each example shows the recommended hardened deploy line so operators can copy-paste: RUVECTOR_*_EXTRA_ARGS=--cache 4096 --cache-ttl 300 --health-check 30 (ruvllm-bridge omits --health-check from the typical deploy because ruvllm typically forks the bridge per-session — health checking a sub-second-lifetime process is a no-op.) No code change. No behavioral change. Deploy parity / discoverability fix only. Co-Authored-By: claude-flow <ruv@ruv.net>

The audit-log Full mode rendered text verbatim — for an embed request the iter-180 byte cap allows up to 64 KB. An operator who flips RUVECTOR_LOG_TEXT_CONTENT=full to debug in prod could push 64 KB × 70 RPS = 4.5 MB/s of journald traffic, which: * burns journal disk fast (10s of GB/hour) * produces single-line entries that break most ops tooling (long-line scanners, journalctl --grep regex backtracking) * makes individual entries unscannable by humans anyway Capping at 200 chars per text preserves the debug utility — you can still grep for content correlations against request_id — at 1/300th the worst-case journald volume. The cut is char-boundary- safe (counted via str::chars()) so multi-byte UTF-8 doesn't panic the rendering path. # Worst case before vs after Request: 64 KB UTF-8 text @ 70 RPS, RUVECTOR_LOG_TEXT_CONTENT=full Before: 64 KB × 70 = 4.5 MB/s journal volume per worker After: 600 B × 70 = 42 KB/s (200 chars + UTF-8 + framing) Three tests added: short (≤cap, unchanged), long (truncated + ellipsis marker), multi-byte (300×U+1F980 emoji = 1.2 KB, truncates on a char boundary not byte boundary). iter-180 capped REQUEST size; iter-190 capped RESPONSE size; iter-247 caps the LOG-LINE size for the same defense-in-depth reason. Full-mode logging stays the operator's footgun (per the existing docstring) — but it's now a footgun that doesn't exhaust the disk in 10 minutes. Co-Authored-By: claude-flow <ruv@ruv.net>

iter-235 added the env-var knob for the HefEmbedderPool selector, but the worker never logged the resolved value at startup. An operator who flipped pool=2→4 (or back to 1 on a memory-constrained 4 GB Pi) had no confirmation the change actually took effect short of inspecting RSS via `ps`. Now the worker emits an info-level log line alongside the existing iter-180/181/182/183/184 DoS-gate startup banner: NPU pipeline pool size pool_size=2 (iter 235; >=2 enables ...) Same disclosure pattern as RUVECTOR_LOG_TEXT_CONTENT, RUVECTOR_RATE_LIMIT_RPS, RUVECTOR_MAX_BATCH_SIZE, etc — every operator-tunable env knob ends up in the journal at startup so post-incident review can reconstruct the running config without reading /etc/ruvector-hailo.env at the time of the incident. No behavior change. Pure observability. Co-Authored-By: claude-flow <ruv@ruv.net>

`Event::Unknown { frame_type, payload_len }` carried a u8 payload_len even though the MR60BHA2 protocol uses a 2-byte length field. The current parser caps payloads at MAX_PAYLOAD=64 (well within u8) so this was never a runtime truncation, but: - Type didn't match the protocol's intent — operators reading the emitted JSONL had to remember the implicit cap. - `clippy::cast_possible_truncation` fired at the construction site (`payload.len() as u8`) and the bridge's emission site. Pedantic, but the alternative — silencing with `#[allow]` — is worse than just using the right type. Now the construction site uses `u16::try_from(...).unwrap_or(u16::MAX)`, which honestly handles any future MAX_PAYLOAD bump up to 65535 bytes. The mmwave-bridge JSONL formatter already prints the value via `{}` so emission stays unchanged. Test added that locks the field width: an unknown frame with a 60-byte payload must report payload_len=60. (300 bytes would exercise the formerly-truncating path but the parser rejects anything > MAX_PAYLOAD before the Event is constructed, so the test stays inside the parser's contract.) Surfaced by an iter-249 cargo clippy --pedantic sweep; same audit pass also flagged stylistic warnings (missing backticks, implicit format args) which are out of scope. Co-Authored-By: claude-flow <ruv@ruv.net>

… 250) Closes the doc gap surfaced by the iter-234..249 PR review: ruvector-hailo-cluster had a 424-line operator README, but the 3 sibling crates (ruvector-hailo, ruvector-mmwave, hailort-sys) shipped without one — `cargo doc --open` was the only on-ramp. # What ships - crates/ruvector-hailo/README.md — embedding backend, 3 feature-gated build paths, architecture diagram, iter-235+ pool benchmark table, security posture summary, env vars - crates/ruvector-mmwave/README.md — MR60BHA2 wire format, parser API, criterion benchmark numbers, proptest fuzz suite - crates/hailort-sys/README.md — FFI binding scope, build requirements, why no safe wrapper at this layer - crates/ruvector-hailo-cluster/README.md — added the iter-238 cache-hit measurement table + the iter-234..237 pool benchmark table; refreshed the CLI section to enumerate all four cluster CLIs + the three bridges with their iter-243/245 flags All builds verified clean: cargo build -p ruvector-hailo --no-default-features cargo build -p ruvector-hailo --features cpu-fallback cargo build -p ruvector-mmwave cargo build -p hailort-sys cargo build -p ruvector-hailo-cluster --bins No code change. Documentation parity only. Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet and others added 17 commits May 4, 2026 08:34

ruvnet merged commit c7b0ba4 into main May 4, 2026
21 of 27 checks passed

ruvnet deleted the hailo-pipeline-pool branch May 4, 2026 13:56

ruvnet mentioned this pull request May 4, 2026

hailo: lint cleanup + bridge test gates + doc refresh (iter 251-255) #419

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hailo: NPU pipeline pool exploration + bridge cache/health parity (iter 234-249)#418

hailo: NPU pipeline pool exploration + bridge cache/health parity (iter 234-249)#418
ruvnet merged 17 commits intomainfrom
hailo-pipeline-pool

ruvnet commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented May 4, 2026

Summary

What ships

NPU pipeline pool (iter 234-237, 239)

Bridge feature parity (iter 238, 240, 242-245)

Observability + log hygiene (iter 247, 248)

CI + docs (iter 241, 244, 246)

Type safety (iter 249)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant