Conversation
Promote the static fmpz_mat_is_diagonal from acb_theta/sp2gz_decompose.c (marked "todo: move out?") to a public function in the fmpz_mat module.
Skip the expensive HNF/Iliopoulos pipeline when the input is already diagonal by detecting this case and calling fmpz_mat_snf_diagonal directly. Benchmarks show 13x-360x speedup on diagonal matrices.
Compute S = U * A * V where S is the Smith normal form and U, V are unimodular. Uses iterative Hermite normal form (alternating row and column HNF until diagonal), then xgcd-based row/column operations to fix the divisibility chain. Generalized to non-square matrices. Based on Jean Kieffer's implementation in acb_theta, with Phase 2 optimized from O(n^3) to O(n) per pair via direct row/column updates.
Remove the static copies of fmpz_mat_is_diagonal and fmpz_mat_snf_transform from acb_theta/sp2gz_decompose.c, now that they are public functions in the fmpz_mat module.
Compute elementary divisors d1|d2|...|dr without full SNF using Luebeck's algorithm: compute HNF, factor the product of leading entries, then for each prime p iteratively find the left nullspace mod p to determine p-adic valuations. Falls back to full SNF when prime factors do not fit in ulong.
Benchmark fmpz_mat_snf across matrix sizes, bit sizes, and types (square nonsingular, square singular, non-square).
Test edge cases (0x0, 0xn, mx0, 1x1, non-square diagonal) and randomized tests for both diagonal and non-diagonal matrices.
…ivisors Test cases generated with Magma's SmithForm and ElementaryDivisors, covering structured matrices, prescribed divisibility chains, singular matrices, non-square matrices, single row/column, and negative diagonal entries.
t-snf: 5000 -> 3000 iterations t-elementary_divisors: 3000 -> 2000 iterations
Replace the KB-vs-Iliopoulos cross-check with a comprehensive test that compares all five methods on small square nonsingular matrices: fmpz_mat_snf, snf_kannan_bachem, snf_iliopoulos, snf_transform, and elementary_divisors.
The old dispatcher used Kannan-Bachem for small matrices and det + Iliopoulos for larger ones. The iterative Hermite approach (alternating row/column HNF until diagonal) is faster across the board: 2-30x for square nonsingular matrices, with the biggest gains at larger bit sizes.
- Allow U and/or V to be NULL in fmpz_mat_snf_transform to skip computing unneeded transformation matrices - Replace bounded for-loop with while(!is_diagonal) in SNF Phase 1 (convergence is mathematically guaranteed; FLINT_ASSERT is a no-op in release builds so a bounded loop with assert was unsafe) - Move Xt/Mc allocation outside the Phase 1 loop in snf.c and snf_transform.c; defer Phase 2 temporaries to their scope - Use fmpz_mat_window for R in elementary_divisors.c (avoids copy) - Use _fmpz_vec_scalar_divexact_ui instead of element-by-element loop - Add FLINT_ASSERT guards after Phase 1 and for pivot row search - Extract _check_snf_rand helper in t-snf.c to deduplicate 4 test loops - Refactor cross-check tests to loop over methods with a switch - Add NULL-parameter tests for snf_transform (NULL U, NULL V, both) - Add documentation for is_diagonal, snf_transform, elementary_divisors
Add nmod_mat_left_nullspace which computes the left nullspace by row-reducing [A | I] directly, without transposing. Use it in fmpz_mat_elementary_divisors to simplify the left nullspace computation. Fix snf_transform.c Phase 2 indentation to properly nest inside its scoping block, remove redundant fmpz_set in the U update loop, and remove dead T1/T2 variables from the NULL-transform test.
|
Wouldn't be more natural to return a matrix of row vectors such that Also, this function can be implemented in terms of Maybe we should have a function
Not that this needs to be done in this PR, but it would at least be nice if |
| } | ||
|
|
||
| void | ||
| fmpz_mat_elementary_divisors(fmpz * ed, slong * rank_out, |
There was a problem hiding this comment.
rank_out could be the return value instead of being passed as a pointer?
|
I guess the 32-bit test failure is due to overflow in the hardcoded test case |
|
Thank you! I appreciate the feedback! Indeed, I first had the left nulsspace via transpose, but without thinking I thought I was being lazy, and rewrote it via the augmentation... but I can see its issues now. |
|
In Also, since the factorisation isn't used when a prime doesn't fit in |
The hardcoded expected diagonal `3395595701880` overflows a 32-bit `slong`, causing t-snf_transform to fail on 32-bit platforms. Change the expected-SNF parameter of the helper to accept decimal strings and build the target `fmpz_t` via `fmpz_set_str`, making every case portable across 32/64-bit.
Changes addressing Fredrik Johansson's review on flintlib#2627: * `nmod_mat_left_nullspace`: return row vectors (`X A = 0`) and a non-reduced basis, matching the convention of `nmod_mat_nullspace`. Implement via transpose + `nmod_mat_nullspace` instead of augmenting `[A | I]` and row-reducing, which is faster and reuses existing infrastructure. * `fmpz_mat_elementary_divisors`: return the rank as the function value rather than via an out-parameter pointer. * `fmpz_mat_elementary_divisors`: factor each HNF pivot individually rather than factoring their product. Use `fmpz_factor_smooth` with a `FLINT_BITS` smoothness bound and abort early (falling back to full SNF) as soon as a pivot has a prime factor that does not fit in a `ulong`. Guard against pathological inputs by also falling back when a pivot exceeds `2 * FLINT_BITS` bits, avoiding effectively unbounded factoring on very large entries. Collect the union of ulong-sized primes so each is passed to the Luebeck inner loop only once. * The caller of `nmod_mat_left_nullspace` inside the Luebeck step is updated to allocate and index the nullspace matrix with rows as basis vectors, matching the new API.
…ary_divisors_p FLINT_ASSERT is a no-op in release builds. The pivot-column identification loop relies on nmod_mat_nullspace returning an RREF-derived basis where each vector has a unique free-variable column; if that invariant ever breaks (future refactor of src/nmod_mat/nullspace.c), the subsequent fmpz_mat_row(R, -1) would be undefined behaviour. Fail loudly instead.
The Phase 1 alternating row/column HNF loop in fmpz_mat_snf and fmpz_mat_snf_transform was unbounded after 4e5d379; if a future bug in fmpz_mat_hnf or fmpz_mat_hnf_transform broke convergence the loop would hang silently. Add a defensive iteration ceiling derived from Kannan & Bachem (1979, SIAM J. Comput. 8, Theorem 5): O(mmn^2 * log(mmn * ||A||)) HNF passes, padded 4x and saturated at WORD_MAX. Practical iteration counts are orders of magnitude below this bound, so the throw only triggers on true runaways.
The previous wording called the returned basis "(non-reduced)", which is misleading because the backing nmod_mat_nullspace computes RREF and reads the basis off from it. State the actual structural property (unique free-variable column per row) that downstream callers such as fmpz_mat_elementary_divisors rely on.
State both fallback triggers explicitly: pivots above 2*FLINT_BITS bits, and pivots whose smooth factorization at FLINT_BITS leaves a composite cofactor. The previous wording collapsed them into a single sentence that was imprecise about which condition actually triggers the fallback.
Commit 3a2fab7 described two fallback triggers but the implementation at src/fmpz_mat/elementary_divisors.c:284, :295, and :307 has three distinct triggers: bit-size above 2*FLINT_BITS, a composite cofactor left by fmpz_factor_smooth, and a prime factor exceeding ulong (which arises separately because fmpz_factor_smooth may return 1 while still yielding a probable-prime factor too large for ulong). Name all three, and replace the non-ASCII em-dash with a plain "--" for consistency with the rest of doc/source/.
The Kannan-Bachem iteration bound was duplicated between snf.c and snf_transform.c (added in c31b103). Move it into its own file so both callers share a single implementation.
… fmpz_mat_ CLAUDE.md calls for internal helpers to follow the _module_foo() naming convention. Rename _elementary_divisors_p and _elementary_divisors_via_snf accordingly.
…tions Move t-elementary_divisors after the t-det_* block in fmpz_mat/test/main.c and move t-left_nullspace after t-invert_rows_cols in nmod_mat/test/main.c (both the #include list and the TEST_FUNCTION array).
Spell out that S may alias A, while U and V must be distinct from A, from S, and from each other (U aliasing A is unsafe because _fmpz_mat_snf_iter_bound(A) reads A after U and V have been initialised).
fmpz_factor_smooth trial-divides only up to 15 bits and uses ECM above that, so saying "trial-dividing primes of up to FLINT_BITS bits" misnamed the mechanism. Name fmpz_factor_smooth explicitly and describe the bound.
Smith normal form requires a non-negative diagonal; snf_transform negates rows of U at the end to enforce it. Mention this post-step alongside the alternating HNF and divisibility-chain phases.
- Short-circuit fmpz_mat_snf_transform to fmpz_mat_snf when both U and V are NULL; this closes the transform-vs-no-transform gap benchmarked in the PR body. - Replace hand-rolled U row-copy/update/negate loops in snf_transform with _fmpz_vec_set, _fmpz_vec_scalar_mul_fmpz, _fmpz_vec_scalar_addmul_fmpz, and _fmpz_vec_neg. - Hoist nmod_mat_init/clear for Rp out of the Luebeck nullspace loop in elementary_divisors (single init/clear pair instead of per level). - Drop the unused int section field from snf_params_t in p-snf.c. - Drop a stale narration comment in acb_theta/sp2gz_decompose.c.
|
the commits got a bit out of hand, but to avoid rewriting history right now, I let them be. |
Building on #2596, this PR adds
fmpz_mat_snf_transform(the feature request in flint2#204),fmpz_mat_elementary_divisors, and replaces the SNF dispatcher with a faster algorithm.fmpz_mat_snf_transform(S, U, V, A)computes S = UAV where S is the Smith normal form and U, V are unimodular, for arbitrary m x n matrices. Either U or V (or both) may be NULL to skip computing the corresponding transform. The algorithm alternates row and column HNF until the matrix is diagonal, then fixes the divisibility chain with xgcd-based row/column operations (O(n) per pair rather than the O(n^3) matrix multiplications in the originalacb_thetaimplementation by Jean Kieffer). This also removes ~100 lines of static duplicates fromacb_theta/sp2gz_decompose.c.fmpz_mat_elementary_divisors(ed, rank, A)computes d_1 | d_2 | ... | d_r without full SNF using Luebeck's algorithm: compute HNF, factor the product of the pivots, then for each prime p iteratively compute nullspace mod p to determine p-adic valuations. Falls back to full SNF if any prime factor does not fit in aulong. This uses the newnmod_mat_left_nullspace, which computes the left nullspace directly by row-reducing [A | I] rather than transposing.fmpz_mat_is_diagonalis promoted from a static function inacb_theta/sp2gz_decompose.c(which had a/* todo: move out? */comment).The
fmpz_mat_snfdispatcher now uses the iterative Hermite algorithm instead of Kannan-Bachem/Iliopoulos, avoiding the cost of determinant computation and Iliopoulos' modular reduction. There is also a fast path for already-diagonal matrices (44-360x speedup).Timings (us) for square nonsingular matrices, comparing the new dispatcher (
snf),snf_transformwithout and with U/V, Kannan-Bachem (KB), Iliopoulos (Ilio), andelementary_divisors(ed). KB/Ilio skipped for n > 20,edskipped for 200-bit entries where factorization dominates:The new dispatcher is 5-37000x faster than Kannan-Bachem and 3-9x faster than Iliopoulos at sizes 10-20. Passing NULL for U/V saves 15-40% at larger sizes.
elementary_divisorsis competitive for small-bit entries but depends on factorization.Diagonal fast path (us):
Tests include Magma-verified hardcoded cases and randomized cross-checks between all five SNF implementations (dispatcher, Kannan-Bachem, Iliopoulos,
snf_transform,elementary_divisors). Documentation added for all new functions.