Skip to content

Conversation

@thlorenz
Copy link
Collaborator

@thlorenz thlorenz commented Nov 25, 2025

Summary

Improve subscription reliability by requiring consensus from multiple clients. Instead of
accepting a subscription as successful after the first client confirms, now require 2/3 of
available clients to confirm before considering the subscription active. This makes the system
more robust to individual client failures.

Details

The subscription multiplexer now requires a configurable number of clients to confirm
subscriptions before treating them as successful. The required confirmation count is
automatically calculated as 2/3 of the total number of clients (with a minimum of 1).

Testing

Added comprehensive test coverage for the new behavior, including:

  • Single and multiple confirmation scenarios
  • Partial client failures that still meet the confirmation requirement
  • Total failure when not enough clients succeed
  • Edge cases like zero clients and zero confirmations
  • Verification that unsubscribe and shutdown operations ignore confirmation counts

Summary by CodeRabbit

  • New Features

    • Subscriptions now support configurable confirmation requirements with improved error handling for partial client failures and enhanced validation.
  • Tests

    • Expanded test coverage for multi-confirmation scenarios, partial failures, and edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

This change introduces multi-client confirmation requirements for subscriptions in the SubMux module. A new required_subscription_confirmations field is added to SubMuxClient<T>, initialized using Byzantine majority logic (⌊2N/3⌋ where N is the number of inner clients). The AccountSubscriptionTask::Subscribe variant signature changes to accept the confirmation count parameter. The subscription task logic now waits for multiple client confirmations before signaling success, with improved error handling for partial client failures and validation to ensure at least one client and valid confirmation counts.

Possibly related PRs

Suggested reviewers

  • bmuddha
  • GabrielePicco
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch thlorenz/more-reliable-subs

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eaaaf3 and a0b6f10.

📒 Files selected for processing (2)
  • magicblock-chainlink/src/submux/mod.rs (4 hunks)
  • magicblock-chainlink/src/submux/subscription_task.rs (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-chainlink/src/submux/mod.rs
  • magicblock-chainlink/src/submux/subscription_task.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-chainlink/src/submux/mod.rs
  • magicblock-chainlink/src/submux/subscription_task.rs
📚 Learning: 2025-11-19T09:34:37.917Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: test-integration/test-chainlink/tests/ix_remote_account_provider.rs:62-63
Timestamp: 2025-11-19T09:34:37.917Z
Learning: In test-integration/test-chainlink/tests/ix_remote_account_provider.rs and similar test files, the `_fwd_rx` receiver returned by `init_remote_account_provider()` is intentionally kept alive (but unused) to prevent "receiver dropped" errors on the sender side. The pattern `let (remote_account_provider, _fwd_rx) = init_remote_account_provider().await;` should NOT be changed to `let (remote_account_provider, _) = ...` because dropping the receiver would cause send() operations to fail.

Applied to files:

  • magicblock-chainlink/src/submux/subscription_task.rs
🧬 Code graph analysis (1)
magicblock-chainlink/src/submux/subscription_task.rs (1)
magicblock-chainlink/src/submux/mod.rs (2)
  • clients (228-231)
  • new (156-167)
🔇 Additional comments (11)
magicblock-chainlink/src/submux/mod.rs (4)

135-137: LGTM!

Clean documentation for the new field. The semantics are clear.


199-202: Verify: floor division yields sub-2/3 majority for some client counts.

The formula (n * 2) / 3 uses integer floor division. For certain values, this produces less than a true 2/3 majority:

n floor(2n/3) Percentage
3 2 67% ✓
4 2 50%
5 3 60%
6 4 67% ✓

If strict 2/3+ majority is required (e.g., for Byzantine fault tolerance semantics), consider ceiling division:

 let required_subscription_confirmations = {
     let n = clients.len();
-    cmp::max(1, (n * 2) / 3)
+    cmp::max(1, (n * 2 + 2) / 3) // ceiling division
 };

If floor division is intentional for leniency, the current implementation is correct.


203-214: LGTM!

The struct initialization properly incorporates the new field.


591-601: LGTM!

The precomputed confirmation count is correctly propagated to the subscription task.

magicblock-chainlink/src/submux/subscription_task.rs (7)

13-18: LGTM!

Clean variant design. The confirmation count is appropriately scoped to Subscribe only.


20-29: LGTM!

Clean utility method for consistent operation naming in error messages and logs.


40-70: LGTM!

Good defensive programming:

  • Early validation prevents spawning tasks for invalid inputs
  • Capping target_successes at total_clients handles edge cases gracefully
  • Clear error messages with operation context via op_name()

90-109: LGTM!

The early-success pattern is well implemented:

  • Success is signaled immediately when target is reached (unblocking the caller)
  • Loop continues to completion to collect all client results for comprehensive logging
  • Option::take() ensures the success channel is sent exactly once

135-140: LGTM!

Proper handling of the edge case where the spawned task panics or drops the channel unexpectedly.


150-159: LGTM!

Clean test helper. Good practice keeping _abort_sndr and _abort_rcvr as named bindings to prevent premature drops.


161-333: Excellent test coverage!

The test suite comprehensively covers:

  • Happy paths (single/multiple confirmations)
  • Partial failures meeting threshold
  • Total failures
  • Edge cases (zero clients, zero confirmations)
  • Non-Subscribe operations ignoring confirmation count
  • Threshold not met scenario

This provides strong confidence in the implementation correctness.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unnecessary defensive programming (with target sucess count checks), but LGTM otherwise

successes += 1;
if successes >= target_successes {
if let Some(tx) = tx.take() {
let _ = tx.send(Ok(()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this sends the Ok(()) multiple times potentially, on every success after target has been reached

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it won't since it takes the tx, so it is only Some the first time.

Copy link
Collaborator

@GabrielePicco GabrielePicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
We can evaluate whether this behaves as expected in production, since it depends on the configuration. For example, if we use ten different providers, the optimal strategy might be to wait for the first two responses, or to combine the 2/3 strategy with a timeout that still allows for minority confirmation.

Just a note: LGTM as a first version that we can monitor.

@thlorenz thlorenz merged commit 6a4791c into master Nov 25, 2025
19 of 20 checks passed
@thlorenz thlorenz deleted the thlorenz/more-reliable-subs branch November 25, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants