Skip to content

perf(rv64): split base_alu into add_sub and bitwiselogic chips#2883

Merged
shuklaayush merged 6 commits into
develop-v2.1.0-rv64from
perf/split-base-alu-u16
Jun 19, 2026
Merged

perf(rv64): split base_alu into add_sub and bitwiselogic chips#2883
shuklaayush merged 6 commits into
develop-v2.1.0-rv64from
perf/split-base-alu-u16

Conversation

@GunaDD

@GunaDD GunaDD commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Re-do of PR #2777 (base_alu part only), now on top of the u16 memory-bus limbs change. Summary of the changes:

  • Split base_alu chip into add_sub and xor_or_and chops.
  • New xor_or_and chip is the old base_alu minus ADD/SUB.
  • New add_sub chip handles the add and sub opcodes and store 2 bytes per field element in its column.
  • This allows us to remove the interactions needed to range check that each individual field elements is bytes that was present in the previous base_alu chip.
  • Core width of the add_sub chip drops to 14 columns compared to the 29 columns of the base_alu chip.
  • Rewrite tests.rs of add_sub chip for the new u16 columns layout.

Improves perf by 6% on the reth benchmark: https://github.com/axiom-crypto/openvm-eth/actions/runs/27436476879

Closes INT-8102

@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@github-actions

This comment has been minimized.

@GunaDD GunaDD requested a review from shuklaayush June 15, 2026 17:44
@GunaDD GunaDD changed the title perf(rv64): split base_alu into add_sub and xor_or_and chips perf(rv64): split base_alu into add_sub and bitwiselogic chips Jun 15, 2026
@github-actions

This comment has been minimized.

@shuklaayush shuklaayush force-pushed the develop-v2.1.0-rv64 branch 2 times, most recently from 316c914 to 0b8d705 Compare June 19, 2026 10:24
GunaDD and others added 4 commits June 19, 2026 12:47
Re-do of PR #2777 (base_alu part only) on top of the u16 memory-bus limbs
change. The 64-bit BaseAlu chip is split into:

- add_sub: ADD/SUB with carry constraints, send_xor(a,a,0) result range
  checks, and the paired send_range(b,c) read-byte bounds required now
  that the memory bus only checks packed u16 values
- xor_or_and: XOR/OR/AND via the bitwise lookup, which already bounds the
  read bytes

The BaseAlu core (cols/AIR/executor/filler) is kept since the bigint
INT256 extension still uses it; its rv64-specific execution/cuda/tests
move to the new chips. base_alu_w is left untouched for now.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Builds on `perf/split-base-alu-u16`, which already split the RV64
`base_alu` chip
into separate `add_sub` (ADD, SUB) and `bitwise_logic` (XOR, OR, AND)
chips with
u16 columns. This PR extends that split to the remaining consumers of
the old
combined ALU core — the `base_alu_w` (ADDW/SUBW) chip and the 256-bit
bigint
(INT256) extension — and then removes the now-orphaned `base_alu` core.

- **`base_alu_w` (ADDW/SUBW) reuses the `add_sub` core** (`561463700`)
- Adds a u16 ALU-W adapter (`adapters/alu_w_u16.rs` + `alu_w_u16.cuh`).
- `base_alu_w` now reuses `add_sub/core.rs` instead of the full ALU
core; adds
    `base_alu_w/preflight.rs` and updates its execution / cuda / tests.

- **bigint (INT256): split `base_alu` → `add_sub` + `bitwise_logic`**
(`41ab11af6`)
  - `base_alu.rs` → `bitwise_logic.rs`, plus a new `add_sub.rs`.
- The extension now registers `Rv64AddSub256` and `Rv64BitwiseLogic256`
separately (`AddSubCoreAir` / `BitwiseLogicCoreAir`), with matching CUDA
    kernels, ABI, and tests.

- **Delete the dead `base_alu` core** (`2f0137386`) — with both
`base_alu_w` and
  bigint migrated off it.

Nothing depends on the old combined `base_alu` core anymore: ADD/SUB and
bitwise
ops are proven by separate, narrower chips across RV64, RV64-W, and
INT256, and
the orphaned core is gone.

Closes INT-8379

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shuklaayush shuklaayush force-pushed the perf/split-base-alu-u16 branch from 917f93a to c056190 Compare June 19, 2026 10:53
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@shuklaayush shuklaayush force-pushed the perf/split-base-alu-u16 branch from 1ad8909 to 3fe8036 Compare June 19, 2026 12:53
@github-actions

This comment has been minimized.

@shuklaayush shuklaayush merged commit a7ff92c into develop-v2.1.0-rv64 Jun 19, 2026
55 checks passed
@shuklaayush shuklaayush deleted the perf/split-base-alu-u16 branch June 19, 2026 14:14
@github-actions

Copy link
Copy Markdown
group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 1,030 4,000,051 399
keccak 16,380 14,365,133 3,023
sha2_bench 8,230 11,167,961 1,002
regex 1,235 4,090,656 358
ecrecover 435 112,210 279
pairing 598 592,827 296
kitchen_sink 3,886 1,979,971 861

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: ad9a0f8

Benchmark Workflow

shuklaayush added a commit that referenced this pull request Jun 19, 2026
Re-do of PR #2777 (base_alu part only), now on top of the u16 memory-bus
limbs change. Summary of the changes:
- Split base_alu chip into add_sub and xor_or_and chops.
- New xor_or_and chip is the old base_alu minus ADD/SUB.
- New add_sub chip handles the add and sub opcodes and store 2 bytes per
field element in its column.
- This allows us to remove the interactions needed to range check that
each individual field elements is bytes that was present in the previous
base_alu chip.
- Core width of the add_sub chip drops to 14 columns compared to the 29
columns of the base_alu chip.
- Rewrite tests.rs of add_sub chip for the new u16 columns layout.

Improves perf by 6% on the reth benchmark:
https://github.com/axiom-crypto/openvm-eth/actions/runs/27436476879

Closes INT-8102

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
shuklaayush added a commit that referenced this pull request Jun 19, 2026
Re-do of PR #2777 (base_alu part only), now on top of the u16 memory-bus
limbs change. Summary of the changes:
- Split base_alu chip into add_sub and xor_or_and chops.
- New xor_or_and chip is the old base_alu minus ADD/SUB.
- New add_sub chip handles the add and sub opcodes and store 2 bytes per
field element in its column.
- This allows us to remove the interactions needed to range check that
each individual field elements is bytes that was present in the previous
base_alu chip.
- Core width of the add_sub chip drops to 14 columns compared to the 29
columns of the base_alu chip.
- Rewrite tests.rs of add_sub chip for the new u16 columns layout.

Improves perf by 6% on the reth benchmark:
https://github.com/axiom-crypto/openvm-eth/actions/runs/27436476879

Closes INT-8102

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants