Skip to content

bugc: Add calldata, returndata, code, and transient storage to EVM codegen#181

Merged
gnidan merged 2 commits intomainfrom
compiler-calldata-returndata
Mar 9, 2026
Merged

bugc: Add calldata, returndata, code, and transient storage to EVM codegen#181
gnidan merged 2 commits intomainfrom
compiler-calldata-returndata

Conversation

@gnidan
Copy link
Member

@gnidan gnidan commented Mar 9, 2026

Summary

  • Extends bugc's generateRead in storage.ts to handle calldata, returndata, code, and transient storage locations (previously only storage and memory were supported)
  • Adds bugc's generateWrite support for transient storage via TSTORE
  • Calldata reads use CALLDATALOAD with shift+mask for partial reads
  • Returndata and code reads use scratch memory at 0x60 with RETURNDATACOPY/CODECOPY + MLOAD
  • Adds 6 tests covering all new read/write paths

…codegen

The read/write instruction handlers in storage.ts previously only
supported storage and memory locations. This adds full support for:

- calldata reads via CALLDATALOAD (with shift+mask for partial reads)
- returndata reads via RETURNDATACOPY to scratch memory + MLOAD
- code reads via CODECOPY to scratch memory + MLOAD
- transient storage reads via TLOAD
- transient storage writes via TSTORE

Includes tests covering all new read/write paths.
@gnidan
Copy link
Member Author

gnidan commented Mar 9, 2026

Self-review

Changes:

storage.ts was refactored from handling only storage/memory reads and storage writes into a comprehensive handler for all IR data locations:

  1. Calldata reads (generateCalldataRead): Uses CALLDATALOAD for full 32-byte reads. For partial reads (length < 32), applies SHR + AND mask since CALLDATALOAD returns left-aligned data.

  2. Returndata/code reads (generateCopyBasedRead): These locations lack a direct load opcode, so we zero scratch memory at 0x60, copy data there via RETURNDATACOPY/CODECOPY, then MLOAD. The zero-clearing ensures partial copies are properly zero-padded.

  3. Transient storage: Simple TLOAD/TSTORE wrappers matching the existing SLOAD/SSTORE pattern.

Design decisions:

  • Scratch memory at 0x60 ("zero slot") is the standard Solidity convention for temporary storage and is safe for single-word reads
  • The shift+mask pattern for partial calldata reads mirrors the existing partial storage read logic
  • CALLDATACOPY is not used — CALLDATALOAD is more efficient for word-aligned reads

Test coverage: 6 new tests verify mnemonic sequences for each path.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-09 20:45 UTC

@gnidan gnidan changed the title Add calldata, returndata, code, and transient storage to EVM codegen bugc: Add calldata, returndata, code, and transient storage to EVM codegen Mar 9, 2026
Copy link
Member Author

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed from the pointers/format perspective. The overall refactoring is clean — extracting generateStorageRead, generateCalldataRead, generateCopyBasedRead, and generateStorageWrite into separate functions improves readability a lot.

Two issues I noticed:

1. Returndata/code partial reads are missing shift+mask (potential bug)

generateCopyBasedRead copies length bytes to scratch at 0x60, then does MLOAD(0x60). When length < 32, the copied bytes are left-aligned in the 32-byte word (bytes at 0x60..0x60+length, rest is zero). But there's no shift+mask step to right-align the result.

Compare with calldata, which correctly shifts right by (32 - length) * 8 for partial reads, and storage, which shifts by (32 - offset - length) * 8. Returndata/code would return a left-aligned value while everything else returns right-aligned.

If the compiler never generates partial reads for these locations today, it's not a live bug, but it's a latent one. At minimum worth a comment; ideally add the same shift+mask treatment.

2. Transient storage reads don't handle offset/length

The IR Read interface allows offset and length for any location, but generateRead for transient just does TLOAD without partial-slot extraction. Storage reads handle this correctly via generateStorageRead with the shift+mask path. A partial transient read would silently return the full slot value.

Same suggestion: at minimum document the limitation, or reuse the storage partial-read logic (TLOAD is semantically identical to SLOAD for extraction purposes).


Everything else looks good. The scratch memory approach at 0x60 is sound, zeroing before copy handles partial-length padding correctly, and the CALLDATALOAD shift math is right. Tests cover the happy paths well.

Copy-based reads (returndata, code) returned left-aligned values
from MLOAD without right-aligning for partial reads (length < 32).
Add shift+mask path matching calldata's partial read handling.

Transient storage reads used TLOAD without offset/length handling.
Add the same partial-slot extraction as regular storage reads.
Copy link
Member Author

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both issues addressed correctly:

  1. generateCopyBasedRead now has separate full/partial paths — partial reads shift right by (32 - length) * 8 after MLOAD, matching calldata's right-alignment convention.

  2. generateTransientRead mirrors generateStorageRead with (32 - offset - length) * 8 shift for partial slot extraction.

Shift math checks out, test coverage looks good. LGTM.

@gnidan gnidan merged commit ccaf689 into main Mar 9, 2026
4 checks passed
@gnidan gnidan deleted the compiler-calldata-returndata branch March 9, 2026 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant