feat(riscv): i64 div/rem (Phase 3) — inline software long division#131
Merged
Conversation
Implements I64DivS / I64DivU / I64RemS / I64RemU in the RV32 instruction selector — the four ops deferred from Phase 2. RV32IMAC's M extension only has 32-bit div/rem, so a 64-bit divide is lowered to an inline restoring binary long-division loop (64 iterations) rather than a call to a runtime helper, keeping synth's output self-contained. - emit_i64_udiv_inline: the unsigned long-division core (shared by all four ops); quotient holds the dividend and shifts out as the remainder shifts in, with an open-coded 64-bit unsigned compare and subtract. - lower_i64_div: single entry point. div_u/rem_u feed the core directly; div_s/rem_s reduce operands to magnitudes via sign masks, divide, then fix the result sign (quotient = sign(n)^sign(d), remainder = sign(n)). - Zero-divisor trap checks the full 64-bit divisor (lo | hi == 0). - Signed INT64_MIN/-1 overflow trap is emitted for div_s only; rem_s of the same operands yields 0 and correctly does not trap. - emit_parallel_move: alias-safe copy-in of the long-division inputs to the core's fixed register file (the temp pool has no liveness tracking, unsafe across the loop body). 11 new tests; 148 -> 158 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
I64DivS,I64DivU,I64RemS,I64RemUin the RV32IMAC selector — the four ops deferred from i64 Phase 2 (#128). RV32 i64 integer arithmetic is now complete.Approach: inline software long division
RV32IMAC's M extension only has 32-bit
div/divu/rem/remu— no 64-bit divide. Rather than invent a__divdi3runtime-library contract (synth produces self-contained bare-metal ELF with no runtime), this inlines a restoring binary long division.emit_i64_udiv_inlineis the shared unsigned core — a 64-iteration shift-subtract loop with the 64-bit<<1, unsigned compare, and subtract open-coded over the lo/hi pair. All four ops route through it:div_u/rem_u— operands straight to the core.div_s/rem_s— derive each operand's sign viasra hi, 31, reduce to magnitudes with branchless(x^mask)-mask, run the core, fix the result sign (quotient sign =sign(dividend) ^ sign(divisor); remainder sign =sign(dividend), per wasm truncated division).Trap semantics
or zo, dl, dh; bne zo, zero, ok; ebreak; ok:), traps iff both halves are zero. Matches the i32 div trap style.INT64_MIN / -1overflow — guarded fordiv_sonly, gated onoptions.signed_div_overflow_traplike the i32 path.rem_sdeliberately omits it —INT64_MIN % -1 == 0must not trap;i64_rem_s_does_not_emit_overflow_trappins this.Note for review —
emit_parallel_moveThe long-division loop holds 7+ values live across the loop body, but the selector's
alloc_tempis liveness-unaware round-robin. The core therefore claims a fixed register file (t0-t6, s1-s3) and copies inputs in via a new alias-safeemit_parallel_movehelper (cycle-breaking throughs7, outside the temp pool). This is the one structural addition beyond plain codegen — worth a look.Tests
+11 tests (148 → 158): one shape test per op (sequence shape + zero-divisor trap presence), the signed-overflow trap path, the
rem_sno-trap distinction, the 64-iteration loop counter, i64-typed result plumbing, and 2emit_parallel_moveunit tests. The oldi64_div_rem_are_unsupported_phase3test is replaced byi64_div_rem_no_longer_unsupported.Validation
cargo test --package synth-backend-riscv— 158 pass, 0 fail, 1 ignored.cargo clippy --package synth-backend-riscv --all-targets -- -D warnings— clean.cargo fmt --check— clean.Cost / follow-ups
i64.load8_s…), f32/f64.🤖 Generated with Claude Code