New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flint_mpn_mul et al.; use in fmpz_mul, fmpz_addmul and other places #1141
Conversation
|
I'm not sure why there is a test failure for fmpz_mod_mul with MinGW GCC. However, running valgrind gives even without this patch, so there may be a preexisting bug in fmpz_mul_mod or elsewhere that we have to fix. |
|
I believe fmpz_add_ui/fmpz_sub_ui are buggy: they can write two limbs even when only one is allocated. This will need fixing in 2.9... |
|
It's sad that we are so much behind GMP up to 32000 limbs. I guess the MPIR stuff was the reason. I could believe the crossover moved up to 2000 limbs otherwise, but 32000 is way too high. And yes, the buggy add/sub_ui will have to be fixed for 2.9-beta3. |
|
Perhaps @albinahlback could write some test code for it that tests every single case that the new code creates so the bug is caught by test code. I think it is very important to get those functions 100% correct. |
Presumably mostly due to the missing mpn_sumdiff_n, doubling memory accesses in the butterflies? BTW, I looked at this function in MPIR and the generic version seems to be defined differently from the one in mpn_extras.h (performing an extra negation). I don't understand how the FFT can be correct with both... |
|
I'm inclined just to re-rewrite add/sub_ui. Without gotos. |
|
|
Trying with rewritten fmpz_addmul_ui/fmpz_submul_ui. |
|
As MinGW is the OS where |
|
I wouldn't merge this until any profiling has been made. I don't see any use of |
|
|
A systematic reason why we often don't catch bugs like this is that most test code doesn't randtest output variables before use, so we often just test the case where the output variable is 0. Setting output variables to random values is something I started doing in Arb a while ago and definitely helps. Plus wanting more FLINT_ASSERTs of course. I should also get into the habit of valgrinding the entire test suite more frequently. |
|
We should probably set assert on CI. |
|
What is the speed with 10 and less limbs? |
|
I will try to profile small operands this afternoon. |
|
I revised the profiling code to try a vector of different inputs. This might be more meaningful than recycling the same operand in a loop since these are branch-heavy functions. Some high-level benchmarks would be even better though. |
Don't think so. The double memory accesses are much less of a problem than you might imagine. |
|
Where is the commit that fixes fmpz_add_ui and friends? so that it can be applied to the 2.9 branch.. |
|
This would be better if there was some test code that would have failed with the previous version of add/sub_ui. But the profiles look good. I don't recall anything about the extra negation in MPIR. Both Flint and MPIR are tested pretty well so I'd be surprised if there is not some compensating change somewhere. |

This needs more profiling.
fmpz_poly_mul_KS had a cutoff at 1000 limbs for using the FLINT FFT which looks too low. I can't get a consistent speedup over mpn_mul below 32000 limbs on my machine. Perhaps because the FFT is missing assembly optimizations without MPIR? Anyway, this puts the FFT threshold in one obvious place (mpn_extras.h), where we can easily replace the SS-FFT with Dan's FFT when that's available.
I haven't profiled fmpz_mul, fmpz_addmul and fmpz_submul carefully at small sizes. There could be some GMP tricks that I'm missing.