New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster zero matrix creation #11589
Comments
Author: Martin Albrecht, Simon King |
comment:1
Timings on my machine: with patch sage: for n in (50,500,1000): timeit('A=matrix(GF(2),%d,%d)'%(n,n))
....:
625 loops, best of 3: 104 µs per loop
625 loops, best of 3: 108 µs per loop
625 loops, best of 3: 115 µs per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(3),%d,%d)'%(n,n))
....:
625 loops, best of 3: 100 µs per loop
625 loops, best of 3: 448 µs per loop
125 loops, best of 3: 1.65 ms per loop without patch sage: for n in (50,500,1000): timeit('A=matrix(GF(2),%d,%d)'%(n,n))
....:
625 loops, best of 3: 97.6 µs per loop
625 loops, best of 3: 108 µs per loop
625 loops, best of 3: 128 µs per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(3),%d,%d)'%(n,n))
....:
625 loops, best of 3: 94.1 µs per loop
625 loops, best of 3: 502 µs per loop
125 loops, best of 3: 2.48 ms per loop
sage: |
comment:2
Hi Martin, Thank you for creating the patch! I don't like that the timings became a little slower for small matrices. We should try to find out why that happens. I think we should use calloc only when the argument Did you carefully determine the threshold for which calloc becomes faster than copying? Or is "nrows, ncols > 100" just a guess? Cheers, Simon |
comment:3
Replying to @simon-king-jena:
We'll also have to do more careful timings to establish there really was a regression. I'm not 100% sure it's not a fluke.
Good point.
An educated guess, i.e. I did look at some timings but it's not carefully tuned (and probably machine dependent). |
comment:4
Attachment: trac_11589.patch.gz I updated the patch to use malloc() if entries is no list. |
comment:5
Replying to @malb:
You mean: Use malloc if (and only if) it is a list. At least that is what you do in the patch (and coincides with what I had done as well). |
comment:6
Sorry, yes ... where's my head |
comment:7
Hi Martin, I have a guess why the patch isn't as quick as it should be: It is the line
If you pass the value 0, then the matrix is initialised as a diagonal matrix with 0 on the diagonal -- which means that time is wasted by initialising the diagonal twice. Better would be to pass |
comment:8
I was wrong in assuming that However, using None as default argument does have a tiny advantage: The test With However, with the original patch, I find a big surprise: Over GF(2), copying is always faster! For GF(p), p>2, I can confirm your statement that the threshold seems to be around 100. There is one problem with default Question: Is it worth it? Using None, I obtain for example:
Using 0, I obtain:
The difference is small, but it seems quite reproducible (not flaky) for me. So, my suggestion is to go through the matrix classes and allow |
comment:9
I made all matrices accept "None" as an argument. I tried to determine the threshold for copying versus creating a zero matrix, using the following test function:
It turns out that with both patches applied, copying is not a good idea over GF(2) and over the rationals:
Note that I observed that the memory consumption for the test over the rationals constantly increased: Perhaps there is a memory leak. However, in all other cases I studied, copying was better, for matrices of up to 30 rows and columns:
Therefore, I modified the threshold in Here are timings similar to the ones you did above (adding another test case). With both patches:
Without patches:
So, with both patches applied, the timings consistently improve. Reviewing I am not sure how the reviewing should be done, as we are both authors. Do you think it is ok if I review the first patch and you review the second patch? I need to run all long doctests, though; so far, I only did the tests in sage/matrix. TODO On a different ticket, the potential memory leak for rational dense matrices should be studied. I recall that in some arithmetic routines of matrices, zero matrices are created at the beginning. I am not sure whether that is done "the official way" or by always copying a zero matrix, but I think it was the latter. That might be worth while to fix as well. |
comment:10
PS: It seems to me that several doc tests from sage/matrix are considerably faster with the patches. I hope that can be confirmed. |
comment:11
Replying to @simon-king-jena:
Cool, you are fast!
Yes, as far as I know it's okay if we cross-review. So I'll take a look at your patch. |
comment:12
Here are timings on my machine (Macbook Pro, i7): no patch sage: for n in (50,500,1000): timeit('A=matrix(GF(2),%d,%d)'%(n,n))
....:
625 loops, best of 3: 98.1 µs per loop
625 loops, best of 3: 115 µs per loop
625 loops, best of 3: 130 µs per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(3),%d,%d)'%(n,n))
....:
625 loops, best of 3: 93.3 µs per loop
625 loops, best of 3: 538 µs per loop
125 loops, best of 3: 2.41 ms per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(625,"a"),%d,%d)'%(n,n))
....:
625 loops, best of 3: 189 µs per loop
125 loops, best of 3: 3.7 ms per loop
25 loops, best of 3: 14.1 ms per loop first only sage: for n in (50,500,1000): timeit('A=matrix(GF(2),%d,%d)'%(n,n))
....:
625 loops, best of 3: 104 µs per loop
625 loops, best of 3: 108 µs per loop
625 loops, best of 3: 112 µs per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(3),%d,%d)'%(n,n))
....:
625 loops, best of 3: 95.5 µs per loop
625 loops, best of 3: 442 µs per loop
625 loops, best of 3: 1.48 ms per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(625,"a"),%d,%d)'%(n,n))
....:
625 loops, best of 3: 184 µs per loop
125 loops, best of 3: 3.7 ms per loop
25 loops, best of 3: 13.8 ms per loop with both sage: for n in (50,500,1000): timeit('A=matrix(GF(2),%d,%d)'%(n,n))
....:
625 loops, best of 3: 94.5 µs per loop
625 loops, best of 3: 98.9 µs per loop
625 loops, best of 3: 108 µs per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(3),%d,%d)'%(n,n))
....:
625 loops, best of 3: 95 µs per loop
625 loops, best of 3: 438 µs per loop
625 loops, best of 3: 1.49 ms per loop
sage: for n in (50,500,1000): timeit('A=matrix(GF(625,"a"),%d,%d)'%(n,n))
....:
625 loops, best of 3: 176 µs per loop
125 loops, best of 3: 1.74 ms per loop
125 loops, best of 3: 6.24 ms per loop My take: it's definitely faster for large-ish matrices and undecided for small matrices, i.e. my timings for 50x50 are rather unstable. |
Some fine tuning for the creation of zero matrices |
comment:13
Attachment: trac_11589_copy_vs_creation_of_zero.patch.gz Sorry, I had to replace my patch, since apparently I forgot to change the doctests of _copy_zero. Actually, the new patch version adds some tests, as there is now a special case for matrices over the rationals. FWIW, with the old patch version all long tests except matrix_space.py pass, and with the new patch version (that only fixes the failing doctest) matrix_space.py passes as well. |
comment:14
Replying to @simon-king-jena:
I found it. It is the
I think that ought to be improved, but that would better be on another ticket. |
comment:15
I added a third patch, that applies the faster way of creating a zero matrix to I think that the third patch is small enough to be discussed here. However, note that there already is an open ticket on improving matrix multiplication: #8096. There is no activity for 16 months now! I think it would be worth while to invest some work there. It seems to me that a part of the issues of that old ticket are already resolved. Concerning reviews: I believe that the first patch is fine, and I give it a positive review. I have already mentioned that the tests pass, and the improvement of the timings (in particular when the second patch is added) is clear. |
comment:16
I give your two patches a positive review as well. |
Reviewer: Simon King, Martin Albrecht |
comment:17
Note that the patch for Strassen-Winograd multiplication at #11610 benefits from the patches here as well, because many matrix spaces are created during the iteration. |
comment:18
Please state which patches have to be applied... |
Use the mecanisms from the preceding patches in Strassen multiplication |
comment:19
Attachment: trac_11589_apply_zero_creation_to_strassen.patch.gz Replying to @jdemeyer:
If nothing is stated then, I thought, all patches are to be applied. I used the occasion to replace the third patch: A commit message was missing. |
comment:20
... and back to positive review. |
This comment has been minimized.
This comment has been minimized.
comment:21
Replying to @simon-king-jena:
For safety, I prefer not to make any such assumptions. In the few cases where it is unclear (like here), I prefer simply asking. |
Merged: sage-4.7.2.alpha1 |
… type MA_ENTRIES_ZERO There are many ways to specify entries when creating a matrix, which are handled in `args.pyx` with `MatrixArgs` class. It has been discussed before why allowing `entries=None` in the matrix creation methods can be beneficial in terms of performance (sagemath#11589 , sagemath#12020). This input `entries=None` is required to yield the zero matrix. For example: `matrix(QQ, 4, 4)` creates the zero matrix. This corresponds to the type `MA_ENTRIES_ZERO` in `args.pyx`. When one passes a value, e.g. `matrix(QQ, 4, 4, 2)`, then one gets a diagonal matrix with `2` on the diagonal; this corresponds to the type `MA_ENTRIES_SCALAR` in `args.pyx`. Currently, doing something like `matrix(QQ, 4, 4, 0)` will pick `MA_ENTRIES_SCALAR`, and therefore will build the matrix and fill the diagonal with zero. [Behind the scenes, there is still some acknowledgement that this is not the usual scalar matrix case, since this will not fail if the matrix is not square (`matrix(QQ, 4, 5, 0)` will not fail, but `matrix(QQ, 4, 5, 1)` will). But this is still not seen as `MA_ENTRIES_ZERO`.] This PR ensures the type `MA_ENTRIES_ZERO` is picked in this situation. This improves performance and solves some issues, as noted below. This PR also fixes the related issue sagemath#36065 . In fact, `entries=None` is the default in the `__init__` of all matrix subclasses presently implemented. It also used to be the default when constructing a matrix by "calling" a matrix space, until sagemath#31078 and https://github.com/sag emath/sage/commit/cef613a0a57b85c1ebc5747185213ae4f5ec35f2 which changed this default value from `None` to `0`, bringing some performance regression. Regarding this "calling the matrix space", this PR also solves the performance issue noted in sagemath#35961 (comment) , where it was observed that in the following piece of code: ``` sage: space = MatrixSpace(GF(9001), 10000, 10000) sage: %timeit zmat = space.matrix() 18.3 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) sage: %timeit zmat = space() 12.1 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` the called default is not the same. `space.matrix()` directly initializes the matrix through `entries=None`, but on the other hand `space()` actually calls the constructor with `entries=0`. This performance regression comes from sagemath#31078 and https://github.com/sag emath/sage/commit/cef613a0a57b85c1ebc5747185213ae4f5ec35f2 , where the default for construction from the matrix space was changed from `None` to `0`. This cannot be easily reverted: this is now the default in the `__call__` of the `Parent` class. So this PR does not directly revert the call default to `None` somehow, but the result is very close in effect (read: the overhead is small). Unchanged: through the parent call, `0` is converted to the base ring zero, which is passed to the constructor's `entries` which is then analyzed as `MA_ENTRIES_SCALAR`. Changed: the code modifications ensure that soon enough it will be detected that this is in fact `MA_ENTRIES_ZERO`. The small overhead explains why, despite the improvements, construction with `None` is still sometimes slightly faster than with `0`. Below are some timings showing the improvements for some fields. Also, this PR merged with the improvements discussed in sagemath#35961 will make the above timings of `space.matrix()` and `space()` be the same (which means a speed-up of a factor >500 for this call `space()`...). The measurement is for construction via calling the matrix space: `None` is `space(None)`, `Empty` is `space()`, `Zero` is `space(0)`, and `Nonzero` is `space(some nonzero value)`. ``` NEW TIMES field dim None Empty Zero Nonzero GF(2) 5 2.3e-06 3.2e-06 3.6e-06 3.2e-06 GF(2) 50 2.4e-06 3.3e-06 3.6e-06 5.8e-06 GF(2) 500 3.6e-06 4.5e-06 4.8e-06 3.1e-05 GF(512) 5 2.6e-05 2.8e-05 2.9e-05 2.9e-05 GF(512) 50 2.6e-05 2.9e-05 2.9e-05 4.0e-05 GF(512) 500 3.7e-05 3.8e-05 3.9e-05 1.6e-04 QQ 5 2.2e-06 3.3e-06 3.4e-06 3.2e-06 QQ 50 8.0e-06 9.2e-06 9.4e-06 1.2e-05 QQ 500 6.1e-04 6.3e-04 6.4e-04 6.7e-04 OLD TIMES field dim None Empty Zero Nonzero GF(2) 5 2.3e-06 3.5e-06 3.6e-06 3.7e-06 GF(2) 50 2.4e-06 6.0e-06 6.1e-06 6.0e-06 GF(2) 500 3.6e-06 3.0e-05 3.0e-05 3.0e-05 GF(512) 5 2.5e-05 2.8e-05 2.9e-05 2.9e-05 GF(512) 50 2.5e-05 3.9e-05 4.0e-05 4.0e-05 GF(512) 500 3.5e-05 1.5e-04 1.5e-04 1.6e-04 QQ 5 2.2e-06 3.5e-06 3.7e-06 3.7e-06 QQ 50 7.9e-06 1.2e-05 1.2e-05 1.2e-05 QQ 500 6.4e-04 6.9e-04 6.9e-04 6.9e-04 ``` Code used for the timings: ``` time_kwds = dict(seconds=True, number=20000, repeat=7) fields = [GF(2), GF(2**9), QQ] names = ["GF(2)", "GF(512)", "QQ"] vals = [GF(2)(1), GF(2**9).gen(), 5/2] print(f"field\tdim\tNone\tEmpty\tZero\tNonzero") for field,name,val in zip(fields,names,vals): for dim in [5, 50, 500]: space = MatrixSpace(field, dim, dim) tnone = timeit("mat = space(None)", **time_kwds) tempty = timeit("mat = space()", **time_kwds) tzero = timeit("mat = space(0)", **time_kwds) tnonz = timeit("mat = space(1)", **time_kwds) print(f"{name}\t{dim}\t{tnone:.1e}\t{tempty:.1e}\t{tzero:.1e}\t{ tnonz:.1e}") ``` ### 📝 Checklist - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [x] I have updated the documentation accordingly. ### ⌛ Dependencies URL: sagemath#36068 Reported by: Vincent Neiger Reviewer(s): Matthias Köppe, Vincent Neiger
Currently, we perform a matrix copy whenever we create a new zero matrix. This is undesirable for two reasons:
See also the discussion on [sage-devel]:
http://groups.google.com/group/sage-devel/browse_thread/thread/478fb19ecb724be0
Apply all three patches.
CC: @simon-king-jena
Component: linear algebra
Author: Martin Albrecht, Simon King
Reviewer: Simon King, Martin Albrecht
Merged: sage-4.7.2.alpha1
Issue created by migration from https://trac.sagemath.org/ticket/11589
The text was updated successfully, but these errors were encountered: