feat(riscv): i64 div/rem (Phase 3) — inline software long division by avrabe · Pull Request #131 · pulseengine/synth

avrabe · 2026-05-22T18:00:23Z

Summary

Implements I64DivS, I64DivU, I64RemS, I64RemU in the RV32IMAC selector — the four ops deferred from i64 Phase 2 (#128). RV32 i64 integer arithmetic is now complete.

Approach: inline software long division

RV32IMAC's M extension only has 32-bit div/divu/rem/remu — no 64-bit divide. Rather than invent a __divdi3 runtime-library contract (synth produces self-contained bare-metal ELF with no runtime), this inlines a restoring binary long division.

emit_i64_udiv_inline is the shared unsigned core — a 64-iteration shift-subtract loop with the 64-bit <<1, unsigned compare, and subtract open-coded over the lo/hi pair. All four ops route through it:

div_u / rem_u — operands straight to the core.
div_s / rem_s — derive each operand's sign via sra hi, 31, reduce to magnitudes with branchless (x^mask)-mask, run the core, fix the result sign (quotient sign = sign(dividend) ^ sign(divisor); remainder sign = sign(dividend), per wasm truncated division).

Trap semantics

Divide by zero — ORs the full 64-bit divisor (or zo, dl, dh; bne zo, zero, ok; ebreak; ok:), traps iff both halves are zero. Matches the i32 div trap style.
INT64_MIN / -1 overflow — guarded for div_s only, gated on options.signed_div_overflow_trap like the i32 path. rem_s deliberately omits it — INT64_MIN % -1 == 0 must not trap; i64_rem_s_does_not_emit_overflow_trap pins this.

Note for review — `emit_parallel_move`

The long-division loop holds 7+ values live across the loop body, but the selector's alloc_temp is liveness-unaware round-robin. The core therefore claims a fixed register file (t0-t6, s1-s3) and copies inputs in via a new alias-safe emit_parallel_move helper (cycle-breaking through s7, outside the temp pool). This is the one structural addition beyond plain codegen — worth a look.

Tests

+11 tests (148 → 158): one shape test per op (sequence shape + zero-divisor trap presence), the signed-overflow trap path, the rem_s no-trap distinction, the 64-iteration loop counter, i64-typed result plumbing, and 2 emit_parallel_move unit tests. The old i64_div_rem_are_unsupported_phase3 test is replaced by i64_div_rem_no_longer_unsupported.

Validation

cargo test --package synth-backend-riscv — 158 pass, 0 fail, 1 ignored.
cargo clippy --package synth-backend-riscv --all-targets -- -D warnings — clean.
cargo fmt --check — clean.

Cost / follow-ups

~30+ instructions per div/rem op (signed wrappers add ~20 for sign handling). Acceptable for AOT embedded; a future optimization could share one routine across call sites.
Still out of scope (unchanged): sub-word sign-extending i64 loads (i64.load8_s…), f32/f64.

🤖 Generated with Claude Code

Implements I64DivS / I64DivU / I64RemS / I64RemU in the RV32 instruction selector — the four ops deferred from Phase 2. RV32IMAC's M extension only has 32-bit div/rem, so a 64-bit divide is lowered to an inline restoring binary long-division loop (64 iterations) rather than a call to a runtime helper, keeping synth's output self-contained. - emit_i64_udiv_inline: the unsigned long-division core (shared by all four ops); quotient holds the dividend and shifts out as the remainder shifts in, with an open-coded 64-bit unsigned compare and subtract. - lower_i64_div: single entry point. div_u/rem_u feed the core directly; div_s/rem_s reduce operands to magnitudes via sign masks, divide, then fix the result sign (quotient = sign(n)^sign(d), remainder = sign(n)). - Zero-divisor trap checks the full 64-bit divisor (lo | hi == 0). - Signed INT64_MIN/-1 overflow trap is emitted for div_s only; rem_s of the same operands yields 0 and correctly does not trap. - emit_parallel_move: alias-safe copy-in of the long-division inputs to the core's fixed register file (the temp pool has no liveness tracking, unsafe across the loop body). 11 new tests; 148 -> 158 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codecov · 2026-05-22T18:21:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

v0.5.0 — verification & robustness: - #133 validator pattern to full i32 + i64 surface (#76) - #131 RV32 i64 div/rem (Phase 3 completes i64 integer) - #132 panic-free ir_to_arm + macro fix + gating fuzz restored Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

avrabe merged commit 167b7b1 into main May 23, 2026
10 of 13 checks passed

avrabe deleted the feat/riscv-i64-divrem branch May 23, 2026 12:24

avrabe mentioned this pull request May 23, 2026

docs(changelog): cut v0.5.0 section #134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(riscv): i64 div/rem (Phase 3) — inline software long division#131

feat(riscv): i64 div/rem (Phase 3) — inline software long division#131
avrabe merged 1 commit into
mainfrom
feat/riscv-i64-divrem

avrabe commented May 22, 2026

Uh oh!

codecov Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 22, 2026

Summary

Approach: inline software long division

Trap semantics

Note for review — emit_parallel_move

Tests

Validation

Cost / follow-ups

Uh oh!

codecov Bot commented May 22, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Note for review — `emit_parallel_move`