Skip to content

fix: handle big messages in P2P layer#264

Merged
pablodeymo merged 2 commits intomainfrom
error-on-too-big-messages
Apr 8, 2026
Merged

fix: handle big messages in P2P layer#264
pablodeymo merged 2 commits intomainfrom
error-on-too-big-messages

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

This PR adds additional "big message" checks in the P2P layer:

  • when encoding a payload, check it doesn't go over the max uncompressed size
  • when responding to BlocksByRoot requests, skip big blocks

None of these code paths should be triggered during normal operation, but they improve network stability by keeping errors local.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 Kimi Code Review

General Assessment
The PR improves stream integrity by validating payload sizes before writing the SUCCESS response code, preventing corrupted streams when oversized messages are encountered. The logic is sound, but there are maintainability and observability issues to address.

Specific Feedback

1. Magic Number Maintainability (codec.rs:118, encoding.rs:55)
The literal 1024 is used in both files to define a safety margin. This should be extracted as a named constant (e.g., PAYLOAD_SIZE_MARGIN or MAX_PAYLOAD_SIZE_HEADROOM) in encoding.rs and imported where needed. This prevents drift if the margin needs adjustment.

2. Integer Underflow Risk (codec.rs:118, encoding.rs:55)
While MAX_PAYLOAD_SIZE is presumably large (≥ 1MB), defensive code should avoid potential underflow:

// Suggested:
if encoded.len().saturating_add(1024) > MAX_PAYLOAD_SIZE {

Or ensure MAX_PAYLOAD_SIZE is defined as const MAX_PAYLOAD_SIZE: usize = 1 << 20; with a static assertion that it's > 1024.

3. Silent Data Omission (codec.rs:116-122)
Skipping oversized blocks with only a warning log is risky for consensus:

  • The peer requested specific roots and receives an incomplete set without indication of why specific blocks are missing
  • Valid blocks should never exceed this limit; encountering one indicates database corruption, buggy SSZ serialization, or malicious data
  • Recommendation: Include the block root in the warning for debugging (warn!(root = ?block.hash_tree_root(), size = encoded.len(), "...")). Consider whether this should return a ResponseCode::ERROR chunk instead of silent omission, or at least increment a metrics counter for monitoring.

4. Error Handling Consistency (encoding.rs:55-59)
The error returned uses io::ErrorKind::InvalidData. Consider if this should be a more specific error type or if the message should include the actual size vs. limit for debugging: "payload size {} exceeds maximum {}".

5. Documentation
The comment in codec.rs:113-116 explaining why we encode-before-write is excellent and correctly identifies the stream corruption risk being mitigated.

Summary
The core fix is correct (encode-then-check-then-write), but address the magic number (Item 1) and improve observability (Item 3) before merging. The underflow check (Item 2) is good defensive practice but low priority given typical constants.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds defensive size checks to the P2P request/response layer: write_payload now rejects payloads exceeding MAX_PAYLOAD_SIZE - 1024 bytes, and the BlocksByRoot response writer pre-encodes each block and skips oversized ones with a warning instead of writing the SUCCESS byte first (which would corrupt the stream if encoding later failed).

Confidence Score: 4/5

Safe to merge; the only finding is a minor ordering inconsistency in the Status response path that has no practical impact.

Changes are small and well-reasoned. The BlocksByRoot fix correctly prevents stream corruption. The write_payload guard is consistent with the receiver's decode limit. The one P2 note (Status path still writes the byte before encoding) is a theoretical concern only — Status payloads are always tiny.

No files require special attention.

Vulnerabilities

No security concerns identified. The new size bounds prevent sending malformed frames over the wire; no input from untrusted sources reaches these new paths in a way that could be exploited.

Important Files Changed

Filename Overview
crates/net/p2p/src/req_resp/codec.rs Moves BlocksByRoot encoding before writing the SUCCESS byte, adding a pre-check to skip oversized blocks; Status response path retains old write-then-check ordering (low-risk but inconsistent).
crates/net/p2p/src/req_resp/encoding.rs Adds a guard in write_payload rejecting uncompressed payloads larger than MAX_PAYLOAD_SIZE - 1024; logic is correct and consistent with the receiver's decode limit.

Sequence Diagram

sequenceDiagram
    participant Sender as Codec (write_response)
    participant WP as write_payload
    participant Peer as Remote Peer

    Note over Sender,Peer: BlocksByRoot response (new flow)
    loop for each block
        Sender->>Sender: encoded = block.to_ssz()
        alt encoded.len() > MAX_PAYLOAD_SIZE - 1024
            Sender->>Sender: warn + continue (skip block)
        else size ok
            Sender->>Peer: write SUCCESS byte
            Sender->>WP: write_payload(encoded)
            WP->>WP: size guard check
            WP->>Peer: varint + snappy-compressed payload
        end
    end

    Note over Sender,Peer: write_payload guard (all callers)
    Sender->>WP: write_payload(encoded)
    alt uncompressed > MAX_PAYLOAD_SIZE - 1024
        WP-->>Sender: Err(InvalidData)
    else size ok
        WP->>Peer: varint + snappy payload
    end
Loading

Comments Outside Diff (1)

  1. crates/net/p2p/src/req_resp/codec.rs, line 105-110 (link)

    P2 Status response path still writes SUCCESS byte before write_payload

    The PR correctly moves the SUCCESS byte write after the size check for BlocksByRoot, but the Status response path retains the old ordering: the SUCCESS code byte is written before write_payload is called. If write_payload now returns an InvalidData error for any reason, that byte is already on the wire and the stream is technically corrupted. In practice Status messages are tiny so this path can't be triggered, but for consistency with the pattern introduced here consider encoding before writing:

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/net/p2p/src/req_resp/codec.rs
    Line: 105-110
    
    Comment:
    **Status response path still writes SUCCESS byte before `write_payload`**
    
    The PR correctly moves the SUCCESS byte write after the size check for `BlocksByRoot`, but the `Status` response path retains the old ordering: the SUCCESS code byte is written before `write_payload` is called. If `write_payload` now returns an `InvalidData` error for any reason, that byte is already on the wire and the stream is technically corrupted. In practice Status messages are tiny so this path can't be triggered, but for consistency with the pattern introduced here consider encoding before writing:
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/net/p2p/src/req_resp/codec.rs
Line: 105-110

Comment:
**Status response path still writes SUCCESS byte before `write_payload`**

The PR correctly moves the SUCCESS byte write after the size check for `BlocksByRoot`, but the `Status` response path retains the old ordering: the SUCCESS code byte is written before `write_payload` is called. If `write_payload` now returns an `InvalidData` error for any reason, that byte is already on the wire and the stream is technically corrupted. In practice Status messages are tiny so this path can't be triggered, but for consistency with the pattern introduced here consider encoding before writing:

```suggestion
                    ResponsePayload::Status(status) => {
                        let encoded = status.to_ssz();
                        io.write_all(&[ResponseCode::SUCCESS.into()]).await?;
                        write_payload(io, &encoded).await
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: skip big payloads in BlocksByRoot r..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 Codex Code Review

  1. crates/net/p2p/src/req_resp/encoding.rs:57 regresses the advertised wire limit. MAX_PAYLOAD_SIZE is the uncompressed payload cap, and decode_payload still accepts any payload up to that exact value. Subtracting 1024 here makes this node reject valid payloads in (MAX_PAYLOAD_SIZE - 1024, MAX_PAYLOAD_SIZE], so sender and receiver limits are now inconsistent. If the concern is snappy/frame overhead, that should be checked against MAX_COMPRESSED_PAYLOAD_SIZE after compression, not by shrinking the protocol payload limit.

  2. crates/net/p2p/src/req_resp/codec.rs:119 silently drops oversized BlocksByRoot results and then returns Ok(()). On the receiving side, an empty BlocksByRoot response is treated as a fetch failure and the peer is added to failed_peers for retry/backoff (crates/net/p2p/src/req_resp/handlers.rs:141, crates/net/p2p/src/req_resp/handlers.rs:271). That means a peer with the requested block can now be penalized as if it were missing the block. This should return an explicit non-success chunk or fail the response before writing anything, not silently convert “present but unsendable” into “not found”.

No consensus-state-transition or fork-choice logic is touched here; the review risk is in req/resp interoperability and peer-handling semantics. I couldn’t run cargo test in this environment because the sandbox blocks Rust toolchain download/init.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 Claude Code Review

Here is the review:


PR 264 Review — Handle big messages in P2P layer

Summary

The approach is sound: encoding before writing the SUCCESS byte prevents stream corruption, and the layered guards (pre-check + write_payload safety net) are logically consistent. A few things worth addressing:


Issues

Silent skip instead of RESOURCE_UNAVAILABLE (correctness / spec)

codec.rs:119-125 — When a block is oversized, the chunk is silently omitted. ResponseCode::RESOURCE_UNAVAILABLE (code 3) is already defined in messages.rs:60 for exactly this situation. Per the multi-chunk protocol semantics, sending an error chunk per slot allows the requester to distinguish "peer lacks the block" from "peer has it but cannot transmit it." Silent omission conflates the two.

The fix is straightforward — instead of continue, send an error chunk:

io.write_all(&[ResponseCode::RESOURCE_UNAVAILABLE.into()]).await?;
// (then write an error message payload with write_payload)

The existing error_message() helper in messages.rs makes this easy. The requester's decode_blocks_by_root_response already handles non-SUCCESS chunks gracefully (codec.rs:238-244).


Duplicate magic threshold — extract a named constant

encoding.rs:57 and codec.rs:119 both hard-code MAX_PAYLOAD_SIZE - 1024. The comment says "leeway for response codes and varint encoding," but the actual overhead is at most 6 bytes (1 response code + 5 varint bytes), not 1024. The value is conservative rather than wrong, but two callsites with the same magic number will drift. Suggest:

// in encoding.rs
/// Conservative headroom to prevent write_payload from being called
/// with a payload that would exceed MAX_PAYLOAD_SIZE after protocol framing.
pub const MAX_SAFE_PAYLOAD_SIZE: usize = MAX_PAYLOAD_SIZE - 1024;

Then both checks become > MAX_SAFE_PAYLOAD_SIZE.


Misleading comment in write_payload

encoding.rs:55-56 — "Leave some leeway for response codes and the varint encoding of the size." write_payload does not write response codes; that is the caller's concern. The comment is accurate only for codec.rs's perspective, not for this function in isolation. Suggest dropping "response codes" from the comment.


Warning log is missing the block slot

codec.rs:120-124 — The warning logs only size. Since block is available in the loop, adding slot = block.block.slot (or equivalent field path) costs nothing and makes post-mortem diagnosis possible:

warn!(
    size = encoded.len(),
    slot = %block.block.slot,
    "Skipping oversized block in BlocksByRoot response"
);

Minor / Non-issues

  • The redundancy between the codec.rs pre-check and the write_payload guard is intentional and correct: for BlocksByRoot the pre-check fires first (clean continue), while the guard in write_payload catches other callers. No action needed.
  • The error returned from write_payload uses io::ErrorKind::InvalidData, which is appropriate.
  • The comment explaining "encode-first to avoid orphaned SUCCESS byte" is clear and accurate.

Verdict

Two items worth addressing before merge: Item 1 (silent skip vs. RESOURCE_UNAVAILABLE) is the most meaningful — it's a correctness gap vs. the req/resp spec. Item 2 (named constant) eliminates a silent drift risk. Items 3 and 4 are cosmetic but cheap to fix.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@pablodeymo pablodeymo merged commit f00fbb0 into main Apr 8, 2026
7 checks passed
@pablodeymo pablodeymo deleted the error-on-too-big-messages branch April 8, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants