Accelerating the construction of matrices of type Matrix_modn_dense #35961

vneiger · 2023-07-17T14:09:41Z

Problem Description

This follows on from the issue #28432 , which was created in 2019 and was targetting the construction of the zero matrix from the Matrix_modn_dense_template (classes Matrix_modn_dense_{double,float}). The observation was that a lot of (useless) modular reductions were performed upon creation, even though no such reduction is necessary when all the wanted entries are zero.

Here the issue targets more generally the construction of matrices of this type. There are several cases where creation is obviously too slow (beyond avoidable modular reductions) and there are also some inconsistencies.

For creating the zero matrix, some examples of slowliness/inconsistencies are already given in #28432 (comment) . There one notices that several natural ways of creating the zero matrix are significantly slower than copying the cached zero matrix, and that creating a zero matrix over the rationals is much faster than in the present case (over modular integers).

Consider also the identity matrix and diagonal matrices:

sage: field = GF(9001)
sage: space = MatrixSpace(field, 10000, 10000)
sage: _ = space.zero_matrix(); _ = space.identity_matrix()
sage: %timeit zmat = space.zero()
1.2 s ± 9.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit zmat = space.one()
217 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit zmat = space(1)
1.23 s ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: d = [field.random_element() for k in range(10000)]
sage: %timeit dmat = space.diagonal_matrix(d)
199 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Creating a diagonal matrix or the identity matrix with space.one() is faster than creating the the identity with space(1). This is because some of them rely on copies of cached matrices, whereas space.zero() does not. Indeed, copying is about 200ms:

sage: %timeit cmat = copy(space.identity_matrix())
199 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Creating a dense random matrix takes as long as some of the above (very special and sparse) cases:

sage: %timeit rmat = space.random_element()
1.16 s ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Proposed Solution

The constructor seems to parse all the entries of the matrix, whatever the type of matrix (MA_ENTRIES_* from args.pyx). Many other classes (rationals, integers, GF(2), ..) use an iterator with sparse=True, which should bring great improvements at least for the zero matrix and diagonal matrices. It seems this could be adapted here.

A more complete study of the performance of the different methods of matrix construction (types MA_ENTRIES_*) should be conducted, to find out if other cases have to be tackled. Also, one should be careful not to introduce performance regressions, e.g. for some types of constructions or for small matrix sizes.

Alternatives Considered

At least, one should use copies of the cached zero/identity matrices whenever this is faster.

Additional Information

No response

Is there an existing issue for this?

I have searched the existing issues for a bug report that matches the one I want to file, without success.

The text was updated successfully, but these errors were encountered:

vneiger · 2023-07-17T14:10:50Z

@marizee @ClementPernet

marizee · 2023-07-25T10:13:15Z

Using a sparse iterator as suggested is indeed speeding up the creation of the zero matrix :

sage: field = GF(9001)
sage: space = MatrixSpace(field, 10000, 10000)
sage: _ = space.zero_matrix(); _ = space.identity_matrix()
sage: %timeit zmat = space.zero()
16.8 ms ± 62.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This also accelerates the constructor for the matrix argument of type MA_ENTRIES_SCALAR (creating a diagonal of ones, in this case):

sage: %timeit zmat = space(1)
16.5 ms ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, there are improvements that still need to be made, for example regarding the creation of a diagonal matrix and the identity matrix :

sage: %timeit zmat = space.one()
374 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: d = [field.random_element() for k in range(10000)]
sage: %timeit dmat = space.diagonal_matrix(d)
340 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Indeed we expect the timings to be closer than they are currently, in particular for space(1) versus space.one().

marizee · 2023-08-01T15:40:22Z

The problem (slow identity) seems to be due to copying the zero matrix. In matrix_space.py, the lazy attribute _copy_zero indicates whether it is faster to copy the zero matrix or to create a new matrix from scratch. In our case, it returns False (at least for matrices that are not very small).

I've added a condition in identity_matrix and diagonal_matrix, to avoid making the copy:

        if self._copy_zero:
            A = self.zero_matrix().__copy__()
        else:
            A = self.matrix()

This does improve the timings and there is no surprising discrepancy anymore (compare this to the previous message):

sage: field = GF(9001)
sage: space = MatrixSpace(field, 10000, 10000)
sage: _ = space.zero_matrix(); _ = space.identity_matrix()
sage: %timeit idmat = space.one()
20.7 ms ± 67.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: d = [field.random_element() for k in range(10000)]
sage: %timeit dmat = space.diagonal_matrix(d)
18.5 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I noticed that _copy_zero is not used anywhere in Sage. Is it relevant? Some further benchmarks are needed to see what dimension threshold is for _copy_zero for Matrix_modn_dense matrices.

On another note: the __cinit__ method is only doing a malloc, not zeroing the allocated memory for the new matrix. Thus, when creating e.g. a diagonal matrix with the iterator with sparse=True, the entries outside are not necessarily all zero in the matrix. One solution is to add a memset at the beginning of __init__, yet this turns out to be quite time-consuming. A faster solution is to use a calloc instead of malloc in __cinit__. For this solution the only downside seems to be that this slows down the __copy__ method (the others are not really impacted). I will investigate this further to see how to avoid this slowdown of __copy__.

vneiger · 2023-08-02T13:13:48Z

Good catch, with _copy_zero. For these modn_dense matrices it is not clear that this is ever a good idea to copy zero rather than doing a malloc/calloc... This brings back the discussion from sage-devel/c/R4-xnstyS-A, and the related #11589 which also mentions constructing matrices with entries = None versus entries = 0.

Some tests for when to use _copy_zero had been done previously, see #11589 (comment) . I cannot find another issue mentioning why _copy_zero is not used anymore... one should probably search in the git history...

Regarding None versus zero, it seems that the current code is not always behaving best (in these timings below, I used the sparse iterator and calloc as you suggest):

sage: space = MatrixSpace(GF(9001), 10000, 10000)

sage: %timeit zmat = space.matrix()
18.3 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
sage: %timeit zmat = space(None)
18.1 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

These two calls build the zero matrix using entries = None. But the following ones call the constructor with entries=0, which is much slower for large matrices:

sage: %timeit zmat = space()
12.1 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit zmat = zero_matrix(GF(9001), 10000, 10000)
12.3 ms ± 89.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit zmat = space(0)
12.1 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It is expected for the last one, but for the two others one could expect the same efficiency as above when using entries=None. For the first one, space(), it seems tricky at first, I'm not sure where the 0 appears. For the second one, zero_matrix, we may simply change the line

    return matrix_space.MatrixSpace(ring, nrows, ncols, sparse)(0)

(in special.py, 994) into something with None instead of 0, since all matrix spaces are supposedly supporting None and depending on the matrix space this should either not change performances, or improve them.

vneiger · 2023-08-03T15:12:45Z

For the issue of slowing down copy when using calloc instead of malloc in the initialization, I would suggest adding an optional parameter to __cinit__ so that it does calloc by default (so we keep the great speedup for creating zero), but allows a malloc if wanted (to avoid the calloc overhead when copying). So, something along these lines:

    def __cinit__(self, *args, bint zeroed_alloc=True, **kwds):
        [...]
        if zeroed_alloc :
            self._entries = <celement *>check_calloc(self._nrows * self._ncols, sizeof(celement))
            self._matrix = <celement **>check_calloc(self._nrows, sizeof(celement*))
        else:
            self._entries = <celement *>check_allocarray(self._nrows * self._ncols, sizeof(celement))
            self._matrix = <celement **>check_allocarray(self._nrows, sizeof(celement*))
        [...]

and then, in __copy__,

        A = self.__class__.__new__(self.__class__, self._parent, 0, 0, 0, zeroed_alloc=False)

marizee · 2023-08-06T08:52:52Z

I also suggest to add zeroed_alloc=False in the other methods calling __new__, which are __neg__, _lmul_, _add_ and _sub_.
Zeroing them beforehand doesn't seem to be necessary in these cases so this might be useless computing.

In the methods mentioned above, some arguments given to __new__ are None whereas in __copy__, the matching ones are 0:

def __neg__(self):
    [...]
    M = self.__class__.__new__(self.__class__, self._parent,None,None,None, zeroed_alloc=False)
    [...]

def __copy__(self):
    [...]
    A = self.__class__.__new__(self.__class__, self._parent, 0, 0, 0, zeroed_alloc=False)
    [...]

Is there any reason for them to be different ? I tried replacing the 0 into None and the timings were not affected.

vneiger · 2023-08-07T09:44:44Z

also suggest to add zeroed_alloc=False in the other methods calling new, which are neg, lmul, add and sub.
Zeroing them beforehand doesn't seem to be necessary in these cases so this might be useless computing.

Yes, indeed. There also may be other places where this __new__ is used (e.g. _mod_int_c in matrix_integer_dense).

Is there any reason for them to be different ? I tried replacing the 0 into None and the timings were not affected.

I do not think there is a difference since these arguments don't play a role in the initialization/allocation. Using consistently the same one (e.g. None which is used more often) would be better.

… type MA_ENTRIES_ZERO There are many ways to specify entries when creating a matrix, which are handled in `args.pyx` with `MatrixArgs` class. It has been discussed before why allowing `entries=None` in the matrix creation methods can be beneficial in terms of performance (sagemath#11589 , sagemath#12020). This input `entries=None` is required to yield the zero matrix. For example: `matrix(QQ, 4, 4)` creates the zero matrix. This corresponds to the type `MA_ENTRIES_ZERO` in `args.pyx`. When one passes a value, e.g. `matrix(QQ, 4, 4, 2)`, then one gets a diagonal matrix with `2` on the diagonal; this corresponds to the type `MA_ENTRIES_SCALAR` in `args.pyx`. Currently, doing something like `matrix(QQ, 4, 4, 0)` will pick `MA_ENTRIES_SCALAR`, and therefore will build the matrix and fill the diagonal with zero. [Behind the scenes, there is still some acknowledgement that this is not the usual scalar matrix case, since this will not fail if the matrix is not square (`matrix(QQ, 4, 5, 0)` will not fail, but `matrix(QQ, 4, 5, 1)` will). But this is still not seen as `MA_ENTRIES_ZERO`.] This PR ensures the type `MA_ENTRIES_ZERO` is picked in this situation. This improves performance and solves some issues, as noted below. This PR also fixes the related issue sagemath#36065 . In fact, `entries=None` is the default in the `__init__` of all matrix subclasses presently implemented. It also used to be the default when constructing a matrix by "calling" a matrix space, until sagemath#31078 and https://github.com/sag emath/sage/commit/cef613a0a57b85c1ebc5747185213ae4f5ec35f2 which changed this default value from `None` to `0`, bringing some performance regression. Regarding this "calling the matrix space", this PR also solves the performance issue noted in sagemath#35961 (comment) , where it was observed that in the following piece of code: ``` sage: space = MatrixSpace(GF(9001), 10000, 10000) sage: %timeit zmat = space.matrix() 18.3 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) sage: %timeit zmat = space() 12.1 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` the called default is not the same. `space.matrix()` directly initializes the matrix through `entries=None`, but on the other hand `space()` actually calls the constructor with `entries=0`. This performance regression comes from sagemath#31078 and https://github.com/sag emath/sage/commit/cef613a0a57b85c1ebc5747185213ae4f5ec35f2 , where the default for construction from the matrix space was changed from `None` to `0`. This cannot be easily reverted: this is now the default in the `__call__` of the `Parent` class. So this PR does not directly revert the call default to `None` somehow, but the result is very close in effect (read: the overhead is small). Unchanged: through the parent call, `0` is converted to the base ring zero, which is passed to the constructor's `entries` which is then analyzed as `MA_ENTRIES_SCALAR`. Changed: the code modifications ensure that soon enough it will be detected that this is in fact `MA_ENTRIES_ZERO`. The small overhead explains why, despite the improvements, construction with `None` is still sometimes slightly faster than with `0`. Below are some timings showing the improvements for some fields. Also, this PR merged with the improvements discussed in sagemath#35961 will make the above timings of `space.matrix()` and `space()` be the same (which means a speed-up of a factor >500 for this call `space()`...). The measurement is for construction via calling the matrix space: `None` is `space(None)`, `Empty` is `space()`, `Zero` is `space(0)`, and `Nonzero` is `space(some nonzero value)`. ``` NEW TIMES field dim None Empty Zero Nonzero GF(2) 5 2.3e-06 3.2e-06 3.6e-06 3.2e-06 GF(2) 50 2.4e-06 3.3e-06 3.6e-06 5.8e-06 GF(2) 500 3.6e-06 4.5e-06 4.8e-06 3.1e-05 GF(512) 5 2.6e-05 2.8e-05 2.9e-05 2.9e-05 GF(512) 50 2.6e-05 2.9e-05 2.9e-05 4.0e-05 GF(512) 500 3.7e-05 3.8e-05 3.9e-05 1.6e-04 QQ 5 2.2e-06 3.3e-06 3.4e-06 3.2e-06 QQ 50 8.0e-06 9.2e-06 9.4e-06 1.2e-05 QQ 500 6.1e-04 6.3e-04 6.4e-04 6.7e-04 OLD TIMES field dim None Empty Zero Nonzero GF(2) 5 2.3e-06 3.5e-06 3.6e-06 3.7e-06 GF(2) 50 2.4e-06 6.0e-06 6.1e-06 6.0e-06 GF(2) 500 3.6e-06 3.0e-05 3.0e-05 3.0e-05 GF(512) 5 2.5e-05 2.8e-05 2.9e-05 2.9e-05 GF(512) 50 2.5e-05 3.9e-05 4.0e-05 4.0e-05 GF(512) 500 3.5e-05 1.5e-04 1.5e-04 1.6e-04 QQ 5 2.2e-06 3.5e-06 3.7e-06 3.7e-06 QQ 50 7.9e-06 1.2e-05 1.2e-05 1.2e-05 QQ 500 6.4e-04 6.9e-04 6.9e-04 6.9e-04 ``` Code used for the timings: ``` time_kwds = dict(seconds=True, number=20000, repeat=7) fields = [GF(2), GF(2**9), QQ] names = ["GF(2)", "GF(512)", "QQ"] vals = [GF(2)(1), GF(2**9).gen(), 5/2] print(f"field\tdim\tNone\tEmpty\tZero\tNonzero") for field,name,val in zip(fields,names,vals): for dim in [5, 50, 500]: space = MatrixSpace(field, dim, dim) tnone = timeit("mat = space(None)", **time_kwds) tempty = timeit("mat = space()", **time_kwds) tzero = timeit("mat = space(0)", **time_kwds) tnonz = timeit("mat = space(1)", **time_kwds) print(f"{name}\t{dim}\t{tnone:.1e}\t{tempty:.1e}\t{tzero:.1e}\t{ tnonz:.1e}") ``` ### 📝 Checklist - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [x] I have updated the documentation accordingly. ### ⌛ Dependencies URL: sagemath#36068 Reported by: Vincent Neiger Reviewer(s): Matthias Köppe, Vincent Neiger

vneiger added the t: enhancement label Jul 17, 2023

vneiger self-assigned this Jul 17, 2023

vneiger added the c: linear algebra label Jul 17, 2023

marizee mentioned this issue Aug 10, 2023

Speed up the creation of submatrices of Matrix_modn_dense_template matrices #36059

Merged

5 tasks

vneiger mentioned this issue Aug 10, 2023

Speed-up matrix construction by ensuring MatrixArgs type MA_ENTRIES_ZERO #36068

Merged

5 tasks

marizee mentioned this issue Aug 16, 2023

Speed-up the creation of a zero matrix of type Matrix_modn_dense_template #36093

Merged

5 tasks

vbraun closed this as completed in #36093 Aug 27, 2023

mkoeppe added this to the sage-10.2 milestone Aug 27, 2023

vneiger mentioned this issue Aug 28, 2023

Speed-up constructor of Matrix_modn_dense_template #28432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating the construction of matrices of type Matrix_modn_dense #35961

Accelerating the construction of matrices of type Matrix_modn_dense #35961

vneiger commented Jul 17, 2023

vneiger commented Jul 17, 2023

marizee commented Jul 25, 2023

marizee commented Aug 1, 2023

vneiger commented Aug 2, 2023

vneiger commented Aug 3, 2023

marizee commented Aug 6, 2023

vneiger commented Aug 7, 2023 •

edited

Accelerating the construction of matrices of type Matrix_modn_dense #35961

Accelerating the construction of matrices of type Matrix_modn_dense #35961

Comments

vneiger commented Jul 17, 2023

Problem Description

Proposed Solution

Alternatives Considered

Additional Information

Is there an existing issue for this?

vneiger commented Jul 17, 2023

marizee commented Jul 25, 2023

marizee commented Aug 1, 2023

vneiger commented Aug 2, 2023

vneiger commented Aug 3, 2023

marizee commented Aug 6, 2023

vneiger commented Aug 7, 2023 • edited

vneiger commented Aug 7, 2023 •

edited