fix: use bounded incoming message buffers for all protocols by gilcu3 · Pull Request #2268 · near/mpc

gilcu3 · 2026-02-27T17:48:16Z

Added per-protocol incoming messages buffer capacity constants throughout threshold-signatures. Comms::with_buffer_capacity(max) rejects messages for new headers once the cap
is reached; messages for existing entries still flow. Honest participants always use the same buffer capacity for each protocol.

Now each protocol now declares its own maximum:

Simple protocols (sign, presign, CKD, DKG): small constants (0–7), derived by counting waitpoints
Triple generation: $131 \cdot N \cdot (P-1) + 7$, derived from the sub-protocol structure (and empirical tests), when N is the number of triples (used in batch triple generation) and P the number of participants. This is an exact formula for N = 1, but just an upper bound for N > 1. The reason for this is an optimization in one of the subprotocols, which makes computing the total with a constant formula impossible (because it uses hashing to decide some execution branches).

Majority of the code is the added tests to guarantee that the new buffer capacity bounds do not break existing implementations.

This solves the issue because the buffer is always bounded. A malicious can still stall the protocol, but can no longer cause an OOM, which seems a strict improvement. A better solution might be possible, but would certainly require breaking changes, so this one seems the best we can get for now.

As a byproduct, the computed numbers make it extremely clear how many rounds each protocol has, which is something that we did not know previously (for the triples for example)

claude · 2026-02-27T19:40:28Z

Code Review

Solid fix for the OOM DoS vector (issue #2247). The approach of bounding per-protocol incoming message buffers with protocol-specific constants is well-designed. The formula derivation for triple generation is clearly documented, and the test coverage across all protocol variants is thorough.

One observation worth noting (not blocking):

pop creates entries beyond max_entries — MessageBuffer::pop (internal.rs:252-262) unconditionally calls messages_lock.entry(header).or_default(), which can grow the HashMap beyond the stated max_entries. This is by design (honest protocol needs to receive on its own waitpoints), and the bounding in push prevents attacker-driven unbounded growth. The tests confirm honest execution matches the constant. Just noting this for future maintainers: max_entries bounds attacker-created entries, not total entries. A clarifying comment on the max_entries field or pop method would help.

No critical issues found.

✅ Approved

gilcu3 · 2026-03-03T13:34:28Z

@claude review

claude · 2026-03-03T13:36:12Z

PR title type suggestion: This PR introduces bounded message buffers as a new defensive feature/systematic improvement across all protocols, rather than fixing an existing bug. The type should probably be feat: or refactor: instead of fix:.

Suggested title: feat: use bounded incoming message buffers for all protocols or refactor: add bounded incoming message buffers for all protocols

claude · 2026-03-03T13:36:55Z

Code Review

This is a well-designed fix for the OOM DoS vector. The bounded buffer approach with protocol-specific constants is sound, and the test coverage is thorough. A few observations:

BufferFull error propagation terminates the protocol — In crates/node/src/protocol.rs:131, the ? on protocol.message() propagates MessageError::BufferFull as an anyhow::Error, aborting the entire protocol run. The PR description acknowledges this trade-off ("A malicious can still stall the protocol"). This is a strictly better situation than OOM, but worth confirming this is the desired production behavior rather than silently dropping the message and logging a warning. If a node receives a burst of legitimate messages that happen to arrive before the protocol has started polling (e.g., during a slow initialization), the buffer could fill and terminate prematurely. Consider whether a warn! + continue (drop the message) would be more resilient than hard-failing the protocol, since the protocol would simply stall on missing messages rather than crash. Not blocking if the team has considered this.

No other critical issues found. Safe arithmetic is used in the formula computation, the From<MessageError> for ProtocolError wiring is correct for test utils, and all Comms::new() callers are migrated to with_buffer_capacity().

✅ Approved

kevindeforth

Thank you!
Only nits.

kevindeforth · 2026-03-03T13:52:17Z

crates/threshold-signatures/src/frost/redjubjub/sign.rs

+pub(crate) const REDJUBJUB_SIGN_MAX_INCOMING_COORDINATOR_ENTRIES: usize = 1;
+/// Maximum incoming buffer entries for non-coordinator participants in the `RedJubjub` sign protocol.
+#[cfg(test)]
+pub(crate) const REDJUBJUB_SIGN_MAX_INCOMING_PARTICIPANT_ENTRIES: usize = 1;


do we need this somewhere else, or could we define it inside the test module?

left it there just to keep some uniformity. atm it could be just in the test module as you mention

crates/threshold-signatures/src/frost/eddsa/sign.rs

crates/threshold-signatures/src/ecdsa/robust_ecdsa/sign.rs

crates/threshold-signatures/src/ecdsa/ot_based_ecdsa/sign.rs

crates/threshold-signatures/src/confidential_key_derivation/protocol.rs

SimonRastikian

I reviewed most of the PR.
As explained in the meeting, we all know this is not the cleanest solution. I hope you could open an Issue describing a cleaner way to solve it. (This is independent of my approval)

The changes I hope to see is more helper functions that cleanup the code. Otherwise thanks for takling this.

crates/threshold-signatures/src/confidential_key_derivation/protocol.rs

SimonRastikian · 2026-03-03T16:05:37Z

crates/threshold-signatures/src/confidential_key_derivation/protocol.rs

+
+        for &p in &participants {
+            let comms = Comms::with_buffer_capacity(usize::MAX);
+            let comms_ref = comms.clone();


Nit: Please add a comment as discussed in our meeting about cloning Arc.

I believe this is not something you normally find in code, but I can add it in one place if that would help the future reader. Maybe I can refactor a bit to make it more clear

crates/threshold-signatures/src/ecdsa/ot_based_ecdsa/triples/batch_random_ot.rs

gilcu3 · 2026-03-03T16:42:52Z

I reviewed most of the PR. As explained in the meeting, we all know this is not the cleanest solution. I hope you could open an Issue describing a cleaner way to solve it. (This is independent of my approval)

Opened #2285

SimonRastikian

Thank you! :)

gilcu3 linked an issue Feb 27, 2026 that may be closed by this pull request

Remote DoS -- Node OOM crash #2247

Closed

gilcu3 force-pushed the 2247-remote-dos----node-oom-crash branch 3 times, most recently from b5605db to 362dfd8 Compare February 27, 2026 19:35

gilcu3 marked this pull request as ready for review February 27, 2026 19:39

gilcu3 force-pushed the 2247-remote-dos----node-oom-crash branch from e8ce1ed to 0b9e5d5 Compare March 3, 2026 13:34

kevindeforth previously approved these changes Mar 3, 2026

View reviewed changes

SimonRastikian requested changes Mar 3, 2026

View reviewed changes

gilcu3 mentioned this pull request Mar 3, 2026

make ts message buffers more ergonomical #2285

Open

gilcu3 added 3 commits March 3, 2026 17:46

fix: use bounded incoming message buffers for all protocols

23c26fc

fix: add doc for max_entries and propagate message errors

ef94896

fix: nits regarding checked_sub in tests

3cf34a6

gilcu3 dismissed kevindeforth’s stale review via 3cf34a6 March 3, 2026 16:47

gilcu3 force-pushed the 2247-remote-dos----node-oom-crash branch from 0b9e5d5 to 3cf34a6 Compare March 3, 2026 16:47

chore: making comms clone a little bit more clear

9f66d6b

SimonRastikian approved these changes Mar 3, 2026

View reviewed changes

kevindeforth approved these changes Mar 4, 2026

View reviewed changes

gilcu3 added this pull request to the merge queue Mar 4, 2026

Merged via the queue into main with commit c4dcc06 Mar 4, 2026
10 checks passed

gilcu3 deleted the 2247-remote-dos----node-oom-crash branch March 4, 2026 06:43

gilcu3 mentioned this pull request Mar 4, 2026

Refactor buffer capacity tests to reduce duplication #2292

Closed

Conversation

gilcu3 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 27, 2026

Code Review

Uh oh!

gilcu3 commented Mar 3, 2026

Uh oh!

claude bot commented Mar 3, 2026

Uh oh!

claude bot commented Mar 3, 2026

Code Review

Uh oh!

kevindeforth left a comment

Choose a reason for hiding this comment

Uh oh!

kevindeforth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gilcu3 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimonRastikian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SimonRastikian Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gilcu3 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gilcu3 commented Mar 3, 2026

Uh oh!

SimonRastikian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gilcu3 commented Feb 27, 2026 •

edited

Loading