New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the creation of submatrices of Matrix_modn_dense_template
matrices
#36059
Conversation
Here, some examples of timings with the current version of Sage:
And the same measurements with the proposed changes:
Most of the time taken by
The improvement of |
During review, I will add some timings to highlight some tested performance (the goal being to check that there is no regression). One caveat is that I have had to compile with the changes in #35961 otherwise comparison is too difficult due to long matrix creation. So these changes should probably be listed as a dependency of this PR, and be finalized first. This is to check that, even when selecting many small pieces of the storage far away from each other in the
|
cdef Matrix_modn_dense_template M = self.new_matrix(nrows=nrows, ncols=self._ncols) | ||
memcpy(M._entries, self._entries+row*ncols, sizeof(celement)*ncols*nrows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines could be kept, under a condition if col == 0 and ncols == self._ncols
. This would make the copying a bit faster (singly memcpy) in this situation where one wants to retrieve a (contiguous) block of complete rows.
this 3 times. | ||
|
||
:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be, on a single line, this 3 times::
. This should also be fixed in matrix1.pyx
.
for i, row in enumerate(rows): | ||
if row < 0 or row >= self._nrows: | ||
raise IndexError("row index out of range") | ||
memcpy(A._entries+(i*self._ncols), self._entries+(row*self._ncols), sizeof(celement)*self._ncols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use _matrix
:
memcpy(A._matrix[i], self._matrix[row], sizeof(celement)*self._ncols)
Apart from the above suggestions for minor modifications, this looks good to me. The performance improvements are particularly significant. Below is a table which shows speedups for various sizes.
I did not compare thoroughly more heterogeneous cases (taking subsets of rows and columns, possibly in nonincreasing orders, etc.), but tests suggest improvements there as well. This is expected since the code is the same as before (for the In the table we see ratios old vs. new, showing that the new code is always faster, often by a large factor. The only case where it is slower (by a small margin, factor is between 0.9 and 1), is related to the suggestion in my review, to keep a separate case for retrieving a submatrix consisting of a block of complete rows. In the table we also see (column
|
In case for future reference, the code that was used for doing the timings (this was run and results stored in dicts in two versions, before and after modifications, and then both dicts were loaded to compare the values).
|
|
||
cdef Py_ssize_t i,r | ||
for i,r in enumerate(range(row, row+nrows)) : | ||
memcpy(M._entries + (i*ncols), self._entries+self._ncols*r+col, sizeof(celement)*ncols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_matrix
could be used here as well, probably with
memcpy(M._matrix[i], self._matrix[r]+col, sizeof(celement)*ncols)
cdef Matrix_modn_dense_template M = self.new_matrix(nrows=nrows, ncols=ncols) | ||
|
||
if col == 0 and ncols == self._ncols: | ||
memcpy(M._entries, self._entries+row*ncols, sizeof(celement)*ncols*nrows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using _matrix
:
memcpy(M._entries, self._matrix[row], sizeof(celement)*ncols*nrows)
Looks good to me, thanks! |
Documentation preview for this PR (built with commit a52a33f; changes) is ready! 🎉 |
Merge looks good. |
…n_dense_template` matrices <!-- ^^^^^ Please provide a concise, informative and self-explanatory title. Don't put issue numbers in there, do this in the PR body below. For example, instead of "Fixes sagemath#1234" use "Introduce new method to calculate 1+1" --> <!-- Describe your changes here in detail --> Modified the method `submatrix()` to do a systematic call to `memcpy()`. Overrode methods `matrix_from_rows()`, `matrix_from_columns()` and `matrix_from_rows_and_columns()`. <!-- Why is this change required? What problem does it solve? --> In the current version of `submatrix()`, only the case when the submatrix has as many columns as the matrix, relies on the contiguous memory storage of `Matrix_modn_dense_template` and call `memcpy()`. In fact, this can be applied to any set of arguments given to the method and enables better performances. Overriding the other methods mentioned above allows to avoid calling `get_unsafe()` and `set_unsafe()` that are doing time-consuming casts/conversions that are unnecessary in this case. <!-- If this PR resolves an open issue, please link to it here. For example "Fixes sagemath#12345". --> <!-- If your change requires a documentation PR, please link it appropriately. --> ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> <!-- If your change requires a documentation PR, please link it appropriately --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> <!-- Feel free to remove irrelevant items. --> - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on - sagemath#12345: short description why this is a dependency - sagemath#34567: ... --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> URL: sagemath#36059 Reported by: Marie Bonboire Reviewer(s): Vincent Neiger
Modified the method
submatrix()
to do a systematic call tomemcpy()
.Overrode methods
matrix_from_rows()
,matrix_from_columns()
andmatrix_from_rows_and_columns()
.In the current version of
submatrix()
, only the case when the submatrix has as many columns as the matrix, relies on the contiguous memory storage ofMatrix_modn_dense_template
and callmemcpy()
.In fact, this can be applied to any set of arguments given to the method and enables better performances.
Overriding the other methods mentioned above allows to avoid calling
get_unsafe()
andset_unsafe()
that are doing time-consuming casts/conversions that are unnecessary in this case.📝 Checklist
⌛ Dependencies