Skip to content

feat: rv64 alu_w shift_w#2484

Merged
shuklaayush merged 16 commits intorv64-word-opcode-copy-existing-modulesfrom
rv64-alu-w-shift-w
Apr 17, 2026
Merged

feat: rv64 alu_w shift_w#2484
shuklaayush merged 16 commits intorv64-word-opcode-copy-existing-modulesfrom
rv64-alu-w-shift-w

Conversation

@876pol
Copy link
Copy Markdown
Contributor

@876pol 876pol commented Mar 4, 2026

Resolves INT-6340, INT-6343.

Summary:

  • base_alu_w and shift_w reuse the existing generic core AIRs (BaseAluCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS> and ShiftCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>) via type aliases, eliminating the need for duplicated core code.
  • A new Rv64BaseAluWAdapter (alu_w.rs) bridges 64-bit register memory with the 4-byte core interface: it reads full 8-byte registers but only passes the low 4 bytes to the core, and sign-extends the 4-byte core output back to 8 bytes on write.
  • The test suite is largely carried over from the original RV32 code and primarily exercises the reused core AIR logic rather than the newly written adapter.

@876pol 876pol marked this pull request as ready for review April 16, 2026 18:33
@shuklaayush
Copy link
Copy Markdown
Collaborator

@claude can you review the Rv64BaseAluWAdapter air thoroughly and ensure it's not missing any constraints?

@claude
Copy link
Copy Markdown

claude bot commented Apr 17, 2026

Claude finished @shuklaayush's task in 4m 40s —— View job


Review of Rv64BaseAluWAdapterAir Constraints

  • Read the Rv64BaseAluWAdapter AIR code
  • Read existing adapter patterns for comparison (Rv64BaseAluAdapterAir, imm_to_bytes)
  • Analyze constraints for completeness and soundness
  • Post detailed review

I performed a thorough constraint-by-constraint audit of Rv64BaseAluWAdapterAir in extensions/riscv/circuit/src/adapters/alu_w.rs, comparing against the base Rv64BaseAluAdapterAir in alu.rs.

Verdict: The AIR constraints appear complete and sound. No missing constraints were identified. Below is the detailed analysis.


1. Immediate handling (lines 101–123) ✅

When rs2_as = 0 (immediate operand):

  • Reconstruction: rs2_imm = limbs[0] + limbs[1] * 2^8 + rs2_sign * 2^16 is constrained to equal local.rs2 — correctly reconstructs the 24-bit immediate from its byte encoding.
  • Sign byte: rs2_sign * (255 - rs2_sign) == 0 constrains the sign byte to be exactly 0x00 or 0xFF.
  • Sign extension within word: for i in 3..RV64_WORD_NUM_LIMBS constrains limb 3 to equal the sign byte. This correctly sign-extends to the 32-bit boundary. (The base ALU extends to RV64_REGISTER_NUM_LIMBS = 8 because its core sees 8 bytes — here only 4 bytes reach the core, so extending to 4 is sufficient.)
  • Range check: send_range(limbs[0], limbs[1]) with multiplicity is_valid - rs2_as ensures the low two bytes are valid bytes only when the immediate path is active. The multiplicity is never negative because when(rs2_as).assert_one(is_valid) guarantees rs2_as = 1 → is_valid = 1, so when is_valid = 0, rs2_as must be 0.

2. rs1 memory read (lines 125–139) ✅

  • Full 8-byte rs1_data is constructed: low 4 bytes from ctx.reads[0] (core interface) and upper 4 bytes from local.rs1_high (adapter witness).
  • The memory bridge read constrains the full 8-byte value against memory state, so rs1_high is implicitly constrained — a malicious prover cannot set arbitrary upper bytes without violating the memory bridge.
  • Gated by is_valid. Correct.

3. rs2 memory read (lines 141–159) ✅

  • Guard: when(rs2_as).assert_one(is_valid) ensures the read interaction only fires when the instruction is valid.
  • Full 8-byte rs2_data constructed from core's 4 bytes + rs2_high.
  • Memory read gated by rs2_as (multiplicity = rs2_as). When rs2_as = 0 (immediate), the read is disabled and rs2_high is unconstrained — this is safe because the values don't participate in any active constraint.
  • Address: MemoryAddress::new(local.rs2_as, local.rs2) — when rs2_as = 1 = RV64_REGISTER_AS, this correctly reads from the register address space.

4. Sign-extension of result (lines 161–191) ✅ — The novel part of this adapter

Sign bit extraction via XOR (lines 164–174):

  • result_sign is boolean-constrained.
  • sign_mask = 128 (bit 7).
  • send_xor(a, 128, c) where c = a + 128 - 2 * result_sign * 128 and a = writes[0][3] (MSB of 32-bit result).
  • Analysis: a XOR 128 flips bit 7. If result_sign = s equals the actual bit 7 of a:
    • s = 0a ∈ [0, 127], c = a + 128 ∈ [128, 255] ✅ valid byte
    • s = 1a ∈ [128, 255], c = a - 128 ∈ [0, 127] ✅ valid byte
    • Wrong sc falls outside [0, 255] → XOR table rejects. ✅ Sound.
  • The XOR lookup also implicitly range-checks result_word_msl to [0, 255].

Sign-extended write (lines 175–191):

  • sign_extend_limb = 255 * result_sign0x00 for positive, 0xFF for negative.
  • write_data[0..4] = core's 4-byte output.
  • write_data[4..8] = sign_extend_limb (replicated).
  • Full 8-byte write via memory bridge, gated by is_valid. Correct.

5. Execution bridge (lines 193–208) ✅

  • Operands: (rd_ptr, rs1_ptr, rs2, RV64_REGISTER_AS, rs2_as) — matches instruction encoding.
  • timestamp_delta = 3 (from three timestamp_pp() calls) — correctly accounts for rs1 read, rs2 read, write.
  • PC increment: DEFAULT_PC_STEP with ctx.to_pc. Standard pattern.

6. Timestamp consistency between AIR and filler ✅

The filler (fill_trace_row, lines 342–402) sets TIMESTAMP_DELTA = 2 and fills operations in reverse order:

  • Write at from_timestamp + 2
  • rs2 read at from_timestamp + 1
  • rs1 read at from_timestamp + 0

This matches the AIR's timestamp progression (0, 1, 2) and total delta of 3. Consistent.

7. Padding rows (is_valid = 0) ✅

When is_valid = 0:

  • rs2_as must be 0 (from when(rs2_as).assert_one(is_valid) + boolean constraint).
  • All memory interactions have multiplicity 0.
  • The when(not(rs2_as)) immediate constraints fire but are satisfied with all-zero witnesses.
  • The XOR for sign extraction fires with multiplicity 0.

Summary

The Rv64BaseAluWAdapter correctly adapts the 64-bit register interface to a 32-bit core by:

  1. Reading full 8-byte registers, passing only the low 4 bytes to the core.
  2. Handling immediates with proper sign-extension and range checking within the 4-byte view.
  3. Sign-extending the 4-byte core result back to 8 bytes via a sound XOR-based sign bit extraction.

All memory interactions, execution bridge, timestamp management, and edge cases (padding rows, immediate vs. register) are properly constrained. The constraint set is consistent with the patterns established by the base Rv64BaseAluAdapterAir.


@shuklaayush
Copy link
Copy Markdown
Collaborator

@claude can you check if the shift operations mask the shift properly - 0x3F for rv64? also check if it was masked properly before for rv32 - 0x1F

@claude
Copy link
Copy Markdown

claude bot commented Apr 17, 2026

Claude finished @shuklaayush's task in 2m 55s —— View job


Shift Amount Masking Analysis

  • Read the shared shift core AIR (shift/core.rs) to check masking logic
  • Read the deleted shift_w/core.rs to see the previous RV64 masking
  • Read shift_w/execution.rs to see how the core is reused for RV64
  • Read shift/execution.rs for RV64 shift execution
  • Analyze whether 0x1F (RV32) and 0x3F (RV64) masks are correctly applied
  • Post findings

Verdict: Both masks are correct, both before and after this PR.

How the circuit masks the shift amount

The generic ShiftCoreAir<NUM_LIMBS, LIMB_BITS> enforces the mask via the constraint at shift/core.rs:177-185:

let num_bits = AB::F::from_canonical_usize(NUM_LIMBS * LIMB_BITS);
self.range_bus
    .range_check(
        (c[0] - limb_shift * LIMB_BITS - bit_shift) * num_bits.inverse(),
        LIMB_BITS - log2(NUM_LIMBS * LIMB_BITS),
    )
    .eval(builder, is_valid);

This decomposes c[0] (the low byte of rs2) as:

c[0] = quotient × (NUM_LIMBS × LIMB_BITS) + limb_shift × LIMB_BITS + bit_shift

The effective shift amount used by all other constraints is limb_shift × LIMB_BITS + bit_shift, which equals c[0] % (NUM_LIMBS × LIMB_BITS). The quotient is range-checked to ensure consistency.

Constants (riscv.rs:13-16)

Constant Value
RV64_REGISTER_NUM_LIMBS 8
RV64_CELL_BITS 8
RV64_WORD_NUM_LIMBS 4

Per-operation analysis

Operation Core instantiation NUM_LIMBS × LIMB_BITS Circuit mask Execution mask RISC-V spec Status
SLL/SRL/SRA (64-bit) ShiftCoreAir<8, 8> 64 c[0] % 64 = & 0x3F rs2 & 0x3F rs2[5:0]
SLLW/SRLW/SRAW (32-bit word) ShiftCoreAir<4, 8> 32 c[0] % 32 = & 0x1F rs2 & 0x1F rs2[4:0]

Quotient range check validity

The quotient must fit in LIMB_BITS - log2(NUM_LIMBS × LIMB_BITS) bits:

  • 64-bit shifts: 8 - log2(64) = 8 - 6 = 2 bits → quotient ∈ [0, 3] → max c[0] = 3×64 + 63 = 255 ✅ (full byte)
  • 32-bit word shifts: 8 - log2(32) = 8 - 5 = 3 bits → quotient ∈ [0, 7] → max c[0] = 7×32 + 31 = 255 ✅ (full byte)

Execution trace matches circuit

Before vs. after this PR

The old shift_w/core.rs (now deleted) was an identical copy of the generic ShiftCoreAir<NUM_LIMBS, LIMB_BITS>. The masking logic was the same — this PR simply removes the duplicate. No behavioral change.


View job

// Write the result back to memory
exec_state.vm_write(RV32_REGISTER_AS, pre_compute.a as u32, &rd);
let rd_word = u32::from_le_bytes(<OP as ShiftWOp>::compute(rs1, rs2));
let rd = (rd_word as i32 as i64 as u64).to_le_bytes();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make a helper for this sign extension?

Resolves INT-6341, INT-6342.

Summary:
- `mul_w` and `divrem_w` reuse the existing generic core AIRs
(`MultiplicationCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>` and
`DivRemCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>`) via type aliases,
eliminating the need for duplicated core code.
- A new `Rv64MultWAdapter` (`mul_w.rs`) bridges 64-bit register memory
with the 4-byte core interface: it reads full 8-byte registers but only
passes the low 4 bytes to the core, and sign-extends the 4-byte core
output back to 8 bytes on write.
- The test suite is largely carried over from the original RV32 code and
primarily exercises the reused core AIR logic rather than the newly
written adapter.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@shuklaayush shuklaayush merged commit ea0a7a4 into rv64-word-opcode-copy-existing-modules Apr 17, 2026
4 of 54 checks passed
@shuklaayush shuklaayush deleted the rv64-alu-w-shift-w branch April 17, 2026 20:47
shuklaayush added a commit that referenced this pull request Apr 17, 2026
Resolves INT-6340, INT-6343.

Summary:
- `base_alu_w` and `shift_w` reuse the existing generic core AIRs
(`BaseAluCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>` and
`ShiftCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>`) via type aliases,
eliminating the need for duplicated core code.
- A new `Rv64BaseAluWAdapter` (`alu_w.rs`) bridges 64-bit register
memory with the 4-byte core interface: it reads full 8-byte registers
but only passes the low 4 bytes to the core, and sign-extends the 4-byte
core output back to 8 bytes on write.
- The test suite is largely carried over from the original RV32 code and
primarily exercises the reused core AIR logic rather than the newly
written adapter.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants