-
-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement fast linear algebra modulo n < 2^31 #4968
Comments
Replying to @malb:
I am not sure using long here is a good idea since on various platforms we want to support sizeof(long) == sizeof (int). You might want to replace it by some typedef so we can make it long long on those platforms. Cheers, Michael |
comment:2
Is your point that the maximum prime for which we use |
comment:3
Won't that have negative implications. For example, on a 32-bit machine all matrices will take twice as much memory to store, and all arithmetic will be an order of magnitude slower. That certainly wouldn't make me happy!
|
comment:4
Replying to @williamstein:
Well, assuming we will transparently switch over to another implementation (using gmpz's for example) we should use the "fast" integer type, i.e. 32 bit ints and a 64 bit int type on 64 bit boxen. But my original point remains: On
sizeof(long) == sizeof(int) == 4, so using either a C99 type or some platform dependent typedef is what we need here to get optimum performance on all 64 bit platforms. Since we are already requiring C99 due to FLINT we might as well go for that solution. Cheers, Michael |
comment:5
I don't know what |
comment:7
I'd say this is now a duplicate? of #4260 and should be closed? |
Dependencies: #10281 |
comment:11
Replying to @burcin:
I elected not to do this because:
|
comment:12
matrix_window_modn_dense_uint64.pyx looks an awfully lot like matrix_window_modn_dense.pyx (and the same could be said of the non-window files), is there any reason to keep both? |
comment:13
William, are you still working on this code? I was about to start reviewing but then saw that you didn't reply to Robert's comment, hence wondering how high priority it is for you at the moment. |
comment:14
Replying to @malb:
Yes! Please review! Regarding robert's comment -- No, there is no reason to keep both. But maybe I can put all the cleanup in another patch, just in case there is a reason. Just to keep things simple. I care very much about this code. |
apply only this patch |
comment:15
Attachment: 4968-squashed.patch.gz I've squashed this into just changing matrix_modn_dense directly, which should make it both easier to see the changes and avoid adding a (modified) second copy of everything. |
comment:16
It's also rebased on top of sage-5.0.beta15. |
comment:17
32-bit vs 64-bit
if use_32bit_type(p):
raise RuntimeError, "BUG: p (=%s) is too small to use this matrix type"%p Why raise an error? Of course, the LinBox matrices take care of that by default, I am just confused by this. Also, this seems to contradict with "If the prime is small enough to fit in a byte" later.
mod_int vs. mod_int_uint64
Other stuff
|
comment:18
The unsigned long thing seems OK (int_fast64_t is a C type, defined in stdint.h). For consistency it would be good if MAX_MODULUS was INTEGER_MOD_INT64_LIMIT to be found in sage/rings/finite_rings/stdint.h this is less by one. In integer_mod.pyx the modulus is tested using <= and in the patch < is used. Apart from this I more or less agree with all points made previously. If you don't see a good reason to change the Strassen cutoff, I would leave it at is. File a new ticket instead. We should optimize it, but changing it based on one observation is daring. Honestly, I am against the if use_32bit_type(p) test. The matrix works perfectly for such p. Even thought it shouldn't be called, it may be called. If anybody has time to make the changes, I will test it within the next two weeks and it will have a positive review. |
comment:19
Replying to @sagetrac-mraum:
You are right, I got confused about Cython ctypdefs. "unsigned long" does not appear in a meaningful way in the generated C files. Sorry for the noise. Still, the documentation is wrong in various places and should be fixed (machine sized ints).
Fair enough, so let's make more observations. This code is about performance. I am not saying: let's run a comprehensive optimisation suite but not at least trying to use cutoffs which make the code reasonable fast seems wrong. |
comment:20
I don't think this code is about performance. The new code will only be used in cases that were not available previously (see the cases treated in matrix_int_dense). This is why I would like to separate this patch and the optimization. Also, please keep in mind, that not only William is desperate for having Sage use larger moduli (me certainly, John and his student recently posted on sage-devel, and most likely many more). |
comment:21
Replying to @sagetrac-mraum:
Sage can compute with larger primes it's just terribly slow, this patch fixes that.
Fair enough. I am not asking anyone to spend days and days improving the performance, but I am asking those who care about this patch to spend 10 minutes to test various cutoff parameters so that we have a reasonable default. If there are that many people who care about this, this should be easy. |
comment:22
I acknowledge that we really see this ticket from different perspectives. Would you be so kind and run some of the test that you think give a reasonable cutoff? If you post the commands, I can run them myself (on AMD and Intel). Based on this we can choose one. |
comment:23
I'd say just running
should probably do the trick. I posted timings on my machine above. I'll run some timings on other machines as well. I'm really not suggesting to spend hours and hours on this, just that we should pick a decent default value, i.e., my impression is that 20 is bad on most machines. I might be wrong though. |
comment:24
I'm completely with mraum here. In fact I have run tests like you suggest to set the current parameters. You'll find optimal parameters differ dramatically depending on the computer and the OS... e.g., Mac OS X could be totally different than one linux; Intel versus AMD, etc. I can see almost no point in doing what you suggest above. Either do things right (with a database for many machines), or leave it. |
comment:25
sage.math: 24 bits
** 31 bits **
|
comment:26
Replying to @williamstein:
I find it odd to claim this before even trying (perhaps you did and I missed that though). |
comment:27
I did a lot of timings like the above on my laptop when writing this patch... |
comment:28
So, was 20 a good on your laptop (I am asking because the 20 cutoff is older than your patch). FWIW I'll try this on bsd.math and cicero.skynet. If those too suggest to increase the cutoff to > 20, I suggest we do that. If only bsd.math and sage.math agree, I'd still say it's worth increasing the limit, because these are more mainstream machines than cicero.skynet. |
comment:29
Btw. it seems something went wrong with the quashing, there are still mentions of |
comment:30
bsd.math
So 100 is much better than 20, but not necessarily optimal. |
comment:31
Cicero is less conclusive as it's too loaded for benchmarketing. Anyway, here are the numbers: 224
2^31
What I take away from these numbers: increasing the default cutoff from 20 to 100 makes Sage twice as fast on some mainstream architectures (64-bit Linux, Xeon), about 1.7x faster on others (64-bit OSX) and probably slightly faster on less mainstream architectures (32-bit Linux on Pentium4). Am I missing something here? |
Attachment: 4968-review.patch.gz |
comment:32
I made this work; Only very few points had been overseen when squashing. I changed the cutoff back to 100. Since William and I both think one should do it properly or not at all, I thought it would be best to keep the old cutoff. This had been 100. The biggest advantage from my perspective is that the release manager will have to deal with one change only (32 -> 64bit) and not with two, when investigating potential speed regressions. Not that I found any, but there are more exotic architectures than mine. |
comment:33
Replying to @sagetrac-mraum:
Fair enough, 100 is what I was arguing for regardless. But I have to say being so blatantly ignored leaves a bitter aftertaste. Anyway, it seems to you have this patch covered, I won't intervene any more. |
comment:34
Please, don't take it like that. I was happy to see that your suggestion and the old cutoff were the same. But consider us changing this and the underlying implementation in parallel. What would the release manager do if regressions showed up. I would unmerge the whole thing (and we would feel like stealing his time). This way you can open a new ticket and do the optimization independently. |
comment:35
By the way, doctesting this patch left several (5-10) python processes running on a patchbot for more than 13 hours, which is probably abnormal... |
comment:36
I'm reviewing the linbox update (#12883), where malb removed linbox-stuff from |
comment:37
This ticket is in needs_review since three years... putting it into needs_info status, since obviously sagemath has move forward in the intervening time. |
The switch would allow to cover larger primes with the specialised code (and I currently work with primes ~ 232).
The switch requires changing all mentions of
int
tolong
and probably implementingarith_long
.Depends on #10281
CC: @simon-king-jena @sagetrac-mraum
Component: linear algebra
Issue created by migration from https://trac.sagemath.org/ticket/4968
The text was updated successfully, but these errors were encountered: