Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimise _cs_matrix._set_many when new entries are all zero #11603

Closed
wants to merge 2 commits into from

Conversation

huonw
Copy link

@huonw huonw commented Feb 27, 2020

Reference issue

Closes: gh-11600

What does this implement/fix?

This adds a special case for doing a scatter-set (arrayXarray) on a compressed matrices where there's many zeros, and, in particular, where the only new values outside the existing sparsity structure are zero.

Changing the sparsity structure of a csr_matrix or csc_matrix is expensive, and so it's a nice idea to check whether that's actually required, before doing anything. This accelerates functionality like m.setdiag(0) and m[i, j] = 0 significantly, because it never has to change sparsity.

For the latter case, the Getset.track_fancy_setitem benchmark is extended to also measure setting some indices to zero, in addition to setting it to random values. On csr and csc matrices, with a different sparsity structure, setting to zero is 3.7-67× faster than random.

The example in #11600 (setdiag(0) on a 10000x10000 random CSR matrix) goes from 147ms to ~5ms; essentially the same as the "manual" one described there.

Changing the sparsity structure of a csr_matrix or csc_matrix is expensive, and so it's a nice idea
to check whether that's actually required, before doing anything. This accelerates functionality
like `m.setdiag(0)` and `m[i, j] = 0` significantly, because it never has to change sparsity.

For the latter case, the `Getset.track_fancy_setitem` benchmark is extended to also measure setting
some indices to zero, in addition to setting it to random values. On csr and csc matrices, with a
different sparsity structure, setting to zero is 3.7-67× faster than random.

The example in scipy#11600 (`setdiag(0)` on a 10000x10000 random CSR matrix) goes from 147ms to ~5ms;
essentially the same as the "manual" one described there.
@huonw
Copy link
Author

huonw commented Feb 27, 2020

(I wrote this before I saw #11600 (comment) , which probably changes whether this is a reasonable approach.)

@perimosocordiae
Copy link
Member

Thanks for opening a PR! As you saw in my comment, I'm not sure we want to change behavior so directly, but I do think that there's a valuable enhancement to be made along these lines. Let's try to find an approach that allows users to opt in. That could be a separate method, or it could be a keyword argument on an existing method.

@huonw
Copy link
Author

huonw commented Mar 3, 2020

it could be a keyword argument on an existing method

Are there any existing methods for this sort of operation? I think the only ones are setdiag and [] indexing? One possibility would be to add an ignore_zeros=True parameter to setdiag, and add a set(i, j, x, ignore_zeros=False) method that is equivalent to m[i, j] = x with ignore_zeros=False, and equivalent to this new implementation with ignore_zeros=True.

@perimosocordiae
Copy link
Member

There are also in-place math operators (like +=), but I don't think we need to worry about those here. To keep things simple, let's focus on setdiag for now.

I like your proposal of a keyword argument, though I think something like insert_zeros=True may be more clear about the intent. Regardless of naming, we would want to have the parameter default to the current zero-inserting behavior.

In a later change, we could more safely update the default behavior by issuing warnings to users who don't explicitly set the keyword argument for a few releases, then swapping the default to the more efficient approach. That may be more noise than it's worth, though.

@lucascolley lucascolley added enhancement A new feature or improvement needs-decision Items that need further discussion before they are merged or closed labels Dec 23, 2023
@huonw
Copy link
Author

huonw commented May 16, 2024

This has been sitting around for 4 years: I'm clearly not going to have a chance to revisit.

Thanks for the reviews so far, maybe they'll help someone else in future if they pick it up.

@huonw huonw closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement needs-decision Items that need further discussion before they are merged or closed scipy.sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sparse.csr_matrix.setdiag(0) is slow due to changing sparsity structure
4 participants