feat: rv64 alu_w shift_w by 876pol · Pull Request #2484 · openvm-org/openvm

876pol · 2026-03-04T20:29:18Z

Resolves INT-6340, INT-6343.

Summary:

base_alu_w and shift_w reuse the existing generic core AIRs (BaseAluCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS> and ShiftCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>) via type aliases, eliminating the need for duplicated core code.
A new Rv64BaseAluWAdapter (alu_w.rs) bridges 64-bit register memory with the 4-byte core interface: it reads full 8-byte registers but only passes the low 4 bytes to the core, and sign-extends the 4-byte core output back to 8 bytes on write.
The test suite is largely carried over from the original RV32 code and primarily exercises the reused core AIR logic rather than the newly written adapter.

shuklaayush · 2026-04-17T20:00:40Z

@claude can you review the Rv64BaseAluWAdapter air thoroughly and ensure it's not missing any constraints?

claude · 2026-04-17T20:00:56Z

Claude finished @shuklaayush's task in 4m 40s —— View job

Review of `Rv64BaseAluWAdapterAir` Constraints

Read the Rv64BaseAluWAdapter AIR code
Read existing adapter patterns for comparison (Rv64BaseAluAdapterAir, imm_to_bytes)
Analyze constraints for completeness and soundness
Post detailed review

I performed a thorough constraint-by-constraint audit of Rv64BaseAluWAdapterAir in extensions/riscv/circuit/src/adapters/alu_w.rs, comparing against the base Rv64BaseAluAdapterAir in alu.rs.

Verdict: The AIR constraints appear complete and sound. No missing constraints were identified. Below is the detailed analysis.

1. Immediate handling (lines 101–123) ✅

When rs2_as = 0 (immediate operand):

Reconstruction: rs2_imm = limbs[0] + limbs[1] * 2^8 + rs2_sign * 2^16 is constrained to equal local.rs2 — correctly reconstructs the 24-bit immediate from its byte encoding.
Sign byte: rs2_sign * (255 - rs2_sign) == 0 constrains the sign byte to be exactly 0x00 or 0xFF.
Sign extension within word: for i in 3..RV64_WORD_NUM_LIMBS constrains limb 3 to equal the sign byte. This correctly sign-extends to the 32-bit boundary. (The base ALU extends to RV64_REGISTER_NUM_LIMBS = 8 because its core sees 8 bytes — here only 4 bytes reach the core, so extending to 4 is sufficient.)
Range check: send_range(limbs[0], limbs[1]) with multiplicity is_valid - rs2_as ensures the low two bytes are valid bytes only when the immediate path is active. The multiplicity is never negative because when(rs2_as).assert_one(is_valid) guarantees rs2_as = 1 → is_valid = 1, so when is_valid = 0, rs2_as must be 0.

2. rs1 memory read (lines 125–139) ✅

Full 8-byte rs1_data is constructed: low 4 bytes from ctx.reads[0] (core interface) and upper 4 bytes from local.rs1_high (adapter witness).
The memory bridge read constrains the full 8-byte value against memory state, so rs1_high is implicitly constrained — a malicious prover cannot set arbitrary upper bytes without violating the memory bridge.
Gated by is_valid. Correct.

3. rs2 memory read (lines 141–159) ✅

Guard: when(rs2_as).assert_one(is_valid) ensures the read interaction only fires when the instruction is valid.
Full 8-byte rs2_data constructed from core's 4 bytes + rs2_high.
Memory read gated by rs2_as (multiplicity = rs2_as). When rs2_as = 0 (immediate), the read is disabled and rs2_high is unconstrained — this is safe because the values don't participate in any active constraint.
Address: MemoryAddress::new(local.rs2_as, local.rs2) — when rs2_as = 1 = RV64_REGISTER_AS, this correctly reads from the register address space.

4. Sign-extension of result (lines 161–191) ✅ — The novel part of this adapter

Sign bit extraction via XOR (lines 164–174):

result_sign is boolean-constrained.
sign_mask = 128 (bit 7).
send_xor(a, 128, c) where c = a + 128 - 2 * result_sign * 128 and a = writes[0][3] (MSB of 32-bit result).
Analysis: a XOR 128 flips bit 7. If result_sign = s equals the actual bit 7 of a:
- s = 0 → a ∈ [0, 127], c = a + 128 ∈ [128, 255] ✅ valid byte
- s = 1 → a ∈ [128, 255], c = a - 128 ∈ [0, 127] ✅ valid byte
- Wrong s → c falls outside [0, 255] → XOR table rejects. ✅ Sound.
The XOR lookup also implicitly range-checks result_word_msl to [0, 255].

Sign-extended write (lines 175–191):

sign_extend_limb = 255 * result_sign — 0x00 for positive, 0xFF for negative.
write_data[0..4] = core's 4-byte output.
write_data[4..8] = sign_extend_limb (replicated).
Full 8-byte write via memory bridge, gated by is_valid. Correct.

5. Execution bridge (lines 193–208) ✅

Operands: (rd_ptr, rs1_ptr, rs2, RV64_REGISTER_AS, rs2_as) — matches instruction encoding.
timestamp_delta = 3 (from three timestamp_pp() calls) — correctly accounts for rs1 read, rs2 read, write.
PC increment: DEFAULT_PC_STEP with ctx.to_pc. Standard pattern.

6. Timestamp consistency between AIR and filler ✅

The filler (fill_trace_row, lines 342–402) sets TIMESTAMP_DELTA = 2 and fills operations in reverse order:

Write at from_timestamp + 2
rs2 read at from_timestamp + 1
rs1 read at from_timestamp + 0

This matches the AIR's timestamp progression (0, 1, 2) and total delta of 3. Consistent.

7. Padding rows (is_valid = 0) ✅

When is_valid = 0:

rs2_as must be 0 (from when(rs2_as).assert_one(is_valid) + boolean constraint).
All memory interactions have multiplicity 0.
The when(not(rs2_as)) immediate constraints fire but are satisfied with all-zero witnesses.
The XOR for sign extraction fires with multiplicity 0.

Summary

The Rv64BaseAluWAdapter correctly adapts the 64-bit register interface to a 32-bit core by:

Reading full 8-byte registers, passing only the low 4 bytes to the core.
Handling immediates with proper sign-extension and range checking within the 4-byte view.
Sign-extending the 4-byte core result back to 8 bytes via a sound XOR-based sign bit extraction.

All memory interactions, execution bridge, timestamp management, and edge cases (padding rows, immediate vs. register) are properly constrained. The constraint set is consistent with the patterns established by the base Rv64BaseAluAdapterAir.

shuklaayush · 2026-04-17T20:25:43Z

@claude can you check if the shift operations mask the shift properly - 0x3F for rv64? also check if it was masked properly before for rv32 - 0x1F

claude · 2026-04-17T20:25:58Z

Claude finished @shuklaayush's task in 2m 55s —— View job

Shift Amount Masking Analysis

Read the shared shift core AIR (shift/core.rs) to check masking logic
Read the deleted shift_w/core.rs to see the previous RV64 masking
Read shift_w/execution.rs to see how the core is reused for RV64
Read shift/execution.rs for RV64 shift execution
Analyze whether 0x1F (RV32) and 0x3F (RV64) masks are correctly applied
Post findings

Verdict: Both masks are correct, both before and after this PR.

How the circuit masks the shift amount

The generic ShiftCoreAir<NUM_LIMBS, LIMB_BITS> enforces the mask via the constraint at shift/core.rs:177-185:

let num_bits = AB::F::from_canonical_usize(NUM_LIMBS * LIMB_BITS);
self.range_bus
    .range_check(
        (c[0] - limb_shift * LIMB_BITS - bit_shift) * num_bits.inverse(),
        LIMB_BITS - log2(NUM_LIMBS * LIMB_BITS),
    )
    .eval(builder, is_valid);

This decomposes c[0] (the low byte of rs2) as:

c[0] = quotient × (NUM_LIMBS × LIMB_BITS) + limb_shift × LIMB_BITS + bit_shift

The effective shift amount used by all other constraints is limb_shift × LIMB_BITS + bit_shift, which equals c[0] % (NUM_LIMBS × LIMB_BITS). The quotient is range-checked to ensure consistency.

Constants (`riscv.rs:13-16`)

Constant	Value
`RV64_REGISTER_NUM_LIMBS`	8
`RV64_CELL_BITS`	8
`RV64_WORD_NUM_LIMBS`	4

Per-operation analysis

Operation	Core instantiation	`NUM_LIMBS × LIMB_BITS`	Circuit mask	Execution mask	RISC-V spec	Status
SLL/SRL/SRA (64-bit)	`ShiftCoreAir<8, 8>`	64	`c[0] % 64` = `& 0x3F`	`rs2 & 0x3F`	rs2[5:0]	✅
SLLW/SRLW/SRAW (32-bit word)	`ShiftCoreAir<4, 8>`	32	`c[0] % 32` = `& 0x1F`	`rs2 & 0x1F`	rs2[4:0]	✅

Quotient range check validity

The quotient must fit in LIMB_BITS - log2(NUM_LIMBS × LIMB_BITS) bits:

64-bit shifts: 8 - log2(64) = 8 - 6 = 2 bits → quotient ∈ [0, 3] → max c[0] = 3×64 + 63 = 255 ✅ (full byte)
32-bit word shifts: 8 - log2(32) = 8 - 5 = 3 bits → quotient ∈ [0, 7] → max c[0] = 7×32 + 31 = 255 ✅ (full byte)

Execution trace matches circuit

shift_w/execution.rs:330-347: SLLW/SRLW/SRAW use rs2 & 0x1F ✅
shift/execution.rs:334-354: SLL/SRL/SRA use rs2 & 0x3F ✅
shift/core.rs:492-498: get_shift() helper uses (y[0] as usize) % (NUM_LIMBS * LIMB_BITS) ✅

Before vs. after this PR

The old shift_w/core.rs (now deleted) was an identical copy of the generic ShiftCoreAir<NUM_LIMBS, LIMB_BITS>. The masking logic was the same — this PR simply removes the duplicate. No behavioral change.

View job

shuklaayush · 2026-04-17T19:22:31Z

-    // Write the result back to memory
-    exec_state.vm_write(RV32_REGISTER_AS, pre_compute.a as u32, &rd);
+    let rd_word = u32::from_le_bytes(<OP as ShiftWOp>::compute(rs1, rs2));
+    let rd = (rd_word as i32 as i64 as u64).to_le_bytes();


can we make a helper for this sign extension?

Resolves INT-6341, INT-6342. Summary: - `mul_w` and `divrem_w` reuse the existing generic core AIRs (`MultiplicationCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>` and `DivRemCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>`) via type aliases, eliminating the need for duplicated core code. - A new `Rv64MultWAdapter` (`mul_w.rs`) bridges 64-bit register memory with the 4-byte core interface: it reads full 8-byte registers but only passes the low 4 bytes to the core, and sign-extends the 4-byte core output back to 8 bytes on write. - The test suite is largely carried over from the original RV32 code and primarily exercises the reused core AIR logic rather than the newly written adapter. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Resolves INT-6340, INT-6343. Summary: - `base_alu_w` and `shift_w` reuse the existing generic core AIRs (`BaseAluCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>` and `ShiftCoreAir<RV64_WORD_NUM_LIMBS, RV64_CELL_BITS>`) via type aliases, eliminating the need for duplicated core code. - A new `Rv64BaseAluWAdapter` (`alu_w.rs`) bridges 64-bit register memory with the 4-byte core interface: it reads full 8-byte registers but only passes the low 4 bytes to the core, and sign-extends the 4-byte core output back to 8 bytes on write. - The test suite is largely carried over from the original RV32 code and primarily exercises the reused core AIR logic rather than the newly written adapter. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Ayush Shukla <shuklaayush@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

876pol added 15 commits March 4, 2026 15:27

feat: rv64 base_alu_w shift_w

814e7a5

style: rename word names

2ad4850

fix: remove columns from base_alu_w and shift_w

57d9fb9

fix: tests

5c6727d

feat: more tests

b96d042

feat: improve tests

e3db316

style: cleanup

0dd8216

fix: remove change to b_sign column

a41dc9f

style: cleanup

c3bc038

style: cleanup

6cb2865

feat: remove _w specific cores

228f20c

style: comment

b0164b0

fix: improve tests

7cd0073

fix: update tests

2c9d42c

fix: update tests

b51234c

876pol marked this pull request as ready for review April 16, 2026 18:33

876pol requested review from jonathanpwang and shuklaayush April 16, 2026 18:34

shuklaayush approved these changes Apr 17, 2026

View reviewed changes

shuklaayush merged commit ea0a7a4 into rv64-word-opcode-copy-existing-modules Apr 17, 2026
4 of 54 checks passed

shuklaayush deleted the rv64-alu-w-shift-w branch April 17, 2026 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rv64 alu_w shift_w#2484

feat: rv64 alu_w shift_w#2484
shuklaayush merged 16 commits intorv64-word-opcode-copy-existing-modulesfrom
rv64-alu-w-shift-w

876pol commented Mar 4, 2026 •

edited

Loading

Uh oh!

shuklaayush commented Apr 17, 2026

Uh oh!

claude bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

shuklaayush commented Apr 17, 2026

Uh oh!

claude bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

shuklaayush Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

876pol commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuklaayush commented Apr 17, 2026

Uh oh!

claude bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of Rv64BaseAluWAdapterAir Constraints

1. Immediate handling (lines 101–123) ✅

2. rs1 memory read (lines 125–139) ✅

3. rs2 memory read (lines 141–159) ✅

4. Sign-extension of result (lines 161–191) ✅ — The novel part of this adapter

5. Execution bridge (lines 193–208) ✅

6. Timestamp consistency between AIR and filler ✅

7. Padding rows (is_valid = 0) ✅

Summary

Uh oh!

shuklaayush commented Apr 17, 2026

Uh oh!

claude bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Shift Amount Masking Analysis

How the circuit masks the shift amount

Constants (riscv.rs:13-16)

Per-operation analysis

Quotient range check validity

Execution trace matches circuit

Before vs. after this PR

Uh oh!

shuklaayush Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

876pol commented Mar 4, 2026 •

edited

Loading

claude bot commented Apr 17, 2026 •

edited

Loading

Review of `Rv64BaseAluWAdapterAir` Constraints

claude bot commented Apr 17, 2026 •

edited

Loading

Constants (`riscv.rs:13-16`)