Add small-value delayed reduction with Barrett algorithm#111
Open
wu-s-john wants to merge 1 commit intomicrosoft:mainfrom
Open
Add small-value delayed reduction with Barrett algorithm#111wu-s-john wants to merge 1 commit intomicrosoft:mainfrom
wu-s-john wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Add support for accumulating field × small_int products (i32, i64, i128) with delayed modular reduction using generic Barrett reduction: - SmallValueField<V> trait for small integer ↔ field conversion - WideMul trait for widening multiplication - BarrettReductionConstants with compile-time computed μ = ⌊2^512/p⌋ - SignedWideLimbs<N> accumulator for signed product sums - DelayedReduction<i32/i64/i128> implementations for all fields
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Add small-value delayed reduction with Barrett algorithm
Summary
This PR adds infrastructure for delayed modular reduction when accumulating field × small-integer products. This is a prerequisite for the upcoming small-value sumcheck optimization, which is a significantly larger change.
Motivation
In the sumcheck protocol, we frequently accumulate sums of products: Σ(field × value). Previously, we supported delayed reduction only for field × field products:
However, small-value sumcheck uses coefficients that fit in native integers (i32, i64, i128). These are not in Montgomery form:
Why Barrett Reduction?
Montgomery reduction computes
x × R⁻¹ mod p. When both operands are in Montgomery form,(a·R) × (b·R) = ab·R², and REDC givesab·R— exactly what we want.Barrett reduction computes
x mod pdirectly, without any R factor. When accumulating(field·R) × small_int, the product isfield·R·small_int. Barrett gives usfield·small_int mod pin Montgomery form — exactly right.The algorithm replaces expensive division by multiplication with a precomputed reciprocal:
Implementation Efficiency
The implementation is optimal with at most 1 conditional subtract for finalization:
2p < 2^256): The remainder fits in 4 limbs, enabling a fast path withmul_3x4_lo4and a singleif r ≥ p: r -= p2p ≈ 2^256): Uses 5-limb arithmetic with one correctionThe bound is tight because:
qsatisfiesq ≤ ⌊x/p⌋ ≤ q + 2r = x - q·psatisfies0 ≤ r < 3p0 ≤ r < 2p0 ≤ r < p✓In practice for our field sizes, a single correction suffices (verified by
debug_assert!).Verification of Barrett Constants
All precomputed constants are verified at test time against reference implementations using
num-bigint:BARRETT_MU⌊2^512 / p⌋using BigUint, compared against hardcoded limbsR384_MOD2^384 mod pusing BigUint, compared against hardcoded limbsUSE_4_LIMB_BARRETTdebug_assert!The test helpers in
field_reduction_constants.rsimplement this:Each field provider (BN254, Pasta, P256, T256) runs these tests via macros:
Additionally, the Barrett reduction itself is tested against field arithmetic:
reduce(field × small) == field * F::from(small)reduce(Σ field_i × small_i) == Σ (field_i * F::from(small_i))Future Work: Specialized Pasta Reduction (Already Implemented)
The Pasta curves (Pallas and Vesta) have primes in a special pseudo-Mersenne form that enables significantly faster reduction. From
src/provider/pasta.rs:Breaking down the structure in 64-bit limbs (little-endian):
Both primes have the form
p = 2²⁵⁴ + εwhere:limb[3] = 0x4000000000000000contributes exactly2^62 × 2^192 = 2^254limb[2] = 0(the 128-191 bit range is empty)ε = limb[0] + limb[1] × 2^64spans only ~125 bitsThis structure enables Solinas reduction, which exploits the congruence
2^254 ≡ -ε (mod p):This is faster than generic Barrett because:
Note: The specialized
pasta_reduce_6implementation is already complete and tested. It will be included in a follow-up PR after this one is approved and merged, to keep the review scope manageable.What's Included
BarrettReductionConstantsbarrett_reduce_6/7SignedWideLimbs<N>SmallValueField<V>DelayedReduction<i32/i64/i128>Overflow Capacity
WideLimbs<6>SignedWideLimbs<7>Sumcheck polynomials are bounded by practical sizes (≤2^40), so overflow is impossible.
Why Split This Out?
The small-value sumcheck PR that uses this infrastructure is substantially larger. Splitting the Barrett reduction foundation into its own PR makes review more manageable and establishes a clean abstraction boundary.