feat: 20.4 cli & cluster inter-node migration to binary protocol#161
Merged
vieiralucas merged 13 commits intomainfrom Apr 3, 2026
Merged
feat: 20.4 cli & cluster inter-node migration to binary protocol#161vieiralucas merged 13 commits intomainfrom
vieiralucas merged 13 commits intomainfrom
Conversation
There was a problem hiding this comment.
2 issues found across 27 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="_bmad-output/implementation-artifacts/stories/20-4-cli-cluster-inter-node-migration.md">
<violation number="1" location="_bmad-output/implementation-artifacts/stories/20-4-cli-cluster-inter-node-migration.md:3">
P3: Story status should remain `review` while the PR is open; `ready-for-dev` doesn’t match the repo’s story workflow for open PRs.</violation>
</file>
<file name="crates/fila-cli/src/main.rs">
<violation number="1" location="crates/fila-cli/src/main.rs:375">
P2: The mTLS path rebuilds the root certificate store with bare `.unwrap()` calls, re-reading the CA cert file that was already parsed above. If any step fails (e.g., file permission change, invalid cert), the CLI panics instead of printing a clean error. Build the `ClientConfig` once by deferring the `.with_root_certificates(root_store)` call until you know whether `.with_client_auth_cert()` or `.with_no_client_auth()` is needed, avoiding both the duplicate I/O and the panics.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,59 @@ | |||
| # Story 20.4: CLI & Cluster Inter-Node Migration | |||
|
|
|||
| Status: ready-for-dev | |||
There was a problem hiding this comment.
P3: Story status should remain review while the PR is open; ready-for-dev doesn’t match the repo’s story workflow for open PRs.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/20-4-cli-cluster-inter-node-migration.md, line 3:
<comment>Story status should remain `review` while the PR is open; `ready-for-dev` doesn’t match the repo’s story workflow for open PRs.</comment>
<file context>
@@ -0,0 +1,59 @@
+# Story 20.4: CLI & Cluster Inter-Node Migration
+
+Status: ready-for-dev
+
+## Story
</file context>
ef72426 to
6d7d5fe
Compare
2dbd6fc to
b9c1f03
Compare
- add cluster write forwarding to enqueue/ack/nack handlers so writes go through Raft consensus (matches gRPC service behavior) - add cluster-aware create/delete queue through meta Raft - enrich get_stats and list_queues with Raft leader/replication info - fix opcode shift: ConsumeOk at 0x13, Delivery at 0x14, etc. - validate ACL permission kinds in CLI (produce/consume/admin only) - fix lua and visibility_timeout tests to use binary_addr - all 52 e2e tests pass including 7 cluster tests
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/fila-server/src/binary_handlers.rs">
<violation number="1" location="crates/fila-server/src/binary_handlers.rs:146">
P2: Error description text is leaked into the `message_id` field. Clients parsing this response will receive an error string where they expect a UUID (or empty string). The single-node error paths and the wildcard arm in the same function all correctly use `String::new()`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
vieiralucas
commented
Apr 3, 2026
- implement AsyncRead/AsyncWrite on CLI Stream enum (same pattern as server and SDK — third time this feedback was given) - remove bare .unwrap() in mTLS path, reuse root_store instead of re-reading CA cert file (cubic P2) - fix error description leaked into message_id field in cluster enqueue error path (cubic P2) - validate ACL permission kinds in CLI
- use try_send instead of send.await for delivery frames in the background reader to avoid blocking when the channel is full, which prevented processing of response frames (enqueue, ack) and caused deadlocks in benchmarks and drain-then-consume patterns - server: drain all existing consumers when a new Consume arrives on the same connection (one active consumer per connection) - consume(): cancel existing subscription inline before starting a new one (no sleep, proper state cleanup) - remove state cleanup from ConsumeStream::Drop (raced with next consume() call) - add double_consume_does_not_hang test - fix latency bench to reuse same stream for drain and measurement
replace try_send (drops messages) with overflow buffer architecture: - delivery frames: try_send to channel, on Full push to VecDeque overflow - overflow flushed at start of each reader loop iteration - response frames (enqueue/ack/nack results) always go to oneshots immediately — never blocked by delivery backpressure - TCP reads always continue so response frames are never starved - server-side delivery channel provides natural backpressure this fixes: - deadlock when delivery channel is full and client tries enqueue/ack - message loss from try_send dropping deliveries - benchmark hangs on latency drain and fairness accuracy
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/fila-sdk/src/client.rs">
<violation number="1" location="crates/fila-sdk/src/client.rs:876">
P1: `DELIVERY_OVERFLOW_HIGH_WATER` is declared but never checked in any code path. The doc comment promises TCP reads are paused when overflow exceeds this threshold, but the step-3 comment says "We always read, even if the overflow is large." The backpressure mechanism described in the doc is not implemented, so the overflow `VecDeque` can grow without bound under sustained consumer slowness.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
cargo clippy --workspace -- -D warningsclean🤖 Generated with Claude Code
Summary by cubic
Migrates
fila-cliand cluster inter-node traffic from gRPC tofila-fibpover TCP to unify transport and reduce overhead. Completes Linear 20.4; TLS/mTLS preserved; all writes flow through Raft;fila-sdkdelivery handling is backpressure-safe.New Features
fila-clitofila-fibpwith TLS/mTLS and existing flags; updatedfila-sdktests, e2e, benches, and profile-workload to usebinary_addr.Bug Fixes
fila-sdkdeadlocks and delivery loss under backpressure with an overflow buffer; response frames never starve; consume() cancels inline; server enforces one active subscription per connection.Written for commit 9854a89. Summary will update on new commits.
Benchmark Results (median of 3 runs, no baseline yet)
Commit:
1a770f5Time: 2026-04-03T18:54:57Z