ENH: sparse: speedup csr/csc setdiag by converting to coo #19962

dschult · 2024-01-25T02:39:54Z

Reference issue

Fixes gh-19943

What does this implement/fix?

Implement setdiag for compressed format that avoids index value setting by converting to coo format, calling setdiag, and then converting back.

The return conversion took two steps -- converting to csr/csc and then updating the data on self. By creating a private convenience function to shortcut the tocsr/tocsc function, we avoid one of those steps. This also helps a few places in _compressed.__init__ where we convert to coo and back again. We can avoid some of the input checking.

Additional information

Some crude benchmarks. New code is 5-125 times faster. All have linear dependence on nnz. The convert-to-coo version has three curves on top of each other with a lower slope. Below is the picture with just the convert-to-coo curves to show that they do increase, just at a lower slope.

[Edit -- legend in the second plot should indicate the new branch -- not main]

perimosocordiae

Could you also run the benchmarks for the case where the diagonal is already set in the sparsity pattern? This would involve calling .setdiag(1) once in the setup code, before the timing loop.

perimosocordiae · 2024-01-25T16:18:08Z

scipy/sparse/_coo.py

                x.sum_duplicates()
            return x

+    def _coo_to_compressed(self, swap):  #major_dim, minor_dim, major, minor, data):


Seems like we should be calling this inside the tocsc() and tocsr() methods above.

Also, I assume we don't need the comment here.

To call the helper function inside tocsr and tocsc, I made '_swap' a staticmethod in csr and csc.

I could have defined a new function swap in each of tocsr and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.

perimosocordiae · 2024-01-25T16:20:50Z

scipy/sparse/_compressed.py

+        coo = self.tocoo()
+        coo._setdiag(values, k)
+        arrays = coo._coo_to_compressed(self._swap)
+        self.indptr, self.indices, self.data, _ = arrays


This will always make a new copy of the index and data arrays, while (I think) the old version would update those arrays in-place if possible.

This might not be an issue in practice, but it's something to keep in mind.

scipy/sparse/_compressed.py

dschult

I've implemented your review suggestions.

The benchmarks when values on the diagonal are already set show that the original method is faster (see below). Looks like number of new values being set is key.

The time needed for that case is quite small, maybe it's not a crunch point... but the conversion approach is a factor of >4 slower for that quite small value. I'll see if there is a clear line in terms of the number of new entries.

scipy/sparse/_compressed.py

dschult · 2024-01-25T20:09:22Z

scipy/sparse/_coo.py

                x.sum_duplicates()
            return x

+    def _coo_to_compressed(self, swap):  #major_dim, minor_dim, major, minor, data):


To call the helper function inside tocsr and tocsc, I made '_swap' a staticmethod in csr and csc.

I could have defined a new function swap in each of tocsr and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.

dschult · 2024-01-26T03:55:31Z

Which is faster when? converting-to-coo or indexing-csr?
Most cases for any N with small density show converting-to-coo as much faster -- often orders of magnitude faster. But as the nonzero structure of the particular diagonal updated becomes dense, there is a linear decrease in time. Indeed indexing-csr wins when the diagonal of interest is 100% dense. Thus there are some cases (dense diagonal being set) when the indexing approach is faster.

The nice linear dependence on filled diagonal entries is shown in the plot below. The convert-to-coo method is flat, while the indexing method slopes downward. For sparse matrixes (0.01 or 0.001 density) essentially all entries must be already filled for indexing to work well. But for densities of 0.1 or higher, the trade-off occurs at a lower rate of filled diagonal entries. The speed-up is not huge (< factor of 2 unless 100% dense diagonal), but it is there.

I don't see a good way to find a cutoff automatically. The crossing point seems to depend on nnz and N. We can certainly have both codes in the same function if we can find a formula for when to use which.

The code used for benchmarking is in details. It uses a version of this branch with both indexing-csr (version=0) and convert-tocoo (version=1) available with a version kwarg added to _setdiag.

import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
import time
plt.ion()

def measure_times(M, N, density, preset):
    A = scipy.sparse.random(M, N, density=density, format='csc',dtype=np.float64)
    A.setdiag(values=0, k=0)
    if preset:
        A.setdiag(values=np.ones(preset, dtype=np.float64), k=0)
    A.eliminate_zeros()

    runs = 2
    time_csc = 0
    for i in range(runs + 1):
      A2 = A.copy()
      start = time.time()
      A2._setdiag(np.asarray(1), k=0, version=0)
      end = time.time()
      if i>=2:
        time_csc += end-start

    time_coo = 0
    for i in range(runs + 1):
      A2 = A.copy()
      start = time.time()
      A2._setdiag(np.array(1), k=0, version=1)
      end = time.time()
      if i>=2:
        time_coo += end-start

    count_replacements = len(A.diagonal(0).nonzero()[0])
    print("number of replacement entries: ", count_replacements, time_csc, time_coo)

    return (time_csc, time_coo)

dens = [0.001, 0.01, 0.1, 0.3]
N = 1000

filled = [int(i * N / 10) for i in range(1,9)]
filled += [int(0.8 * N + i * N/50) for i in range(1,11)]
print("N: ",N, " filled: ", filled)
M = len(filled)

all = np.zeros((0,2))
for fig, density in enumerate(dens):
    l = []
    for fill in filled:
      l.append( measure_times(N, N, density, preset=fill) )
    all = np.concatenate((all, l))

data = np.concatenate((all, np.atleast_2d(np.tile(filled, len(dens))).T), axis=1)
print(data.shape)
print(repr(data))

plt.figure(1)
plt.clf()
plt.plot(data[0: M, 2], data[0: M, 0], '-ob')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 0], '-og')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 0], '-ok')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 0], '-ob')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 0], '-og')
plt.plot(data[0: M, 2], data[0: M, 1], '-om')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 1], '-oc')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 1], '-or')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 1], '-om')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 1], '-oc')

plt.xlabel(f"filled diag entries (N = {N})")
plt.ylabel("run-time in s")
plt.legend([
    f"main(density {dens[0]})",
    f"main(density {dens[1]})",
    f"main(density {dens[2]})",
    f"main(density {dens[3]})",
#    f"main(density {dens[4]})",
    f"csr_setdiag(density {dens[0]})",
    f"csr_setdiag(density {dens[1]})",
    f"csr_setdiag(density {dens[2]})",
    f"csr_setdiag(density {dens[3]})",
#    f"csr_setdiag(density {dens[4]})",
])

dschult · 2024-02-01T04:38:46Z

As @perimosocordiae suspected, it is faster to use CSC indexing for setdiag when the diagonal already has nonzero values. When I set up a comparison between "converting to coo" and "using csc indexing", I was able to find good parameters to examine for a cutoff formula. It turns out NNZ is better than density and "new values" needed on the diagonal is better than "pct of values being new". The time tradeoff seems to be a simple linear relationship that doesn't change very much with N the size of the matrix. My first estimates had a quadratic relationship between nnz and newvals. But linear is all that was needed. In fact the intercept is zero. And the slope is a simple (approximate) value. The condition that seems to work is:

newvals <= 0.001 * nnz

The following subplots show which method is faster dependent on the two parameter axes nnz and newvals. Blue dot means converting to COO to use setdiag and convert back is faster. Yellow means using CSC indexing is faster. The line between blue and yellow is the boundary we seek to make a cutoff criteria. The magenta line represents the formula above. It seems to match the data quite well for a range of N values. The boundary is not quite linear for larger N and denser matrices -- with a region of blue extending below the line. But that level of density is close to 30%, so I haven't explored it much.

Based on these results I created a version that goes through CSC indexing enough to know how many new values there are. Luckily that part of the calculation is a small portion of the total time. At thatpoint the code decides whether to create the new entires or convert to coo. The following plots show results for the new chimera version. By cutting a linear path through the pictures above and plotting the time for various methods we can see that the chimera version (red -- labeled "new") tracks along close to the faster method -- and a little above due to the extra checking. It then switches to the other method as that crosses to become the faster method. The red dots should be close to the lowest for any x-coordinate, and they seem to be.

The other methods compared to are:

"old"(green) the current main branch version
"csc"(cyan) a sped up version of "old" doing the routines manually to avoid overhead and revealing newvals in the process.
"coo"(magenta) conversion to coo and back. Best choice for empty diagonals.
"new"(red) choose between csc and coo methods based on the above formula.

The lines through parameter space are noted under the subplot. Increasing nnz for low value of newvals. Increaing nnz while decreasing newvals (down the diagonal of the first plot above). Increasing newvals for high values of nnz and increasing newvals for a low value of nnz. In each case, the red curve is close to the shortest time and switches to to the new shortest time branch when they cross. Also note that the "old" and "csc" methods are about the same. The time is well dominated by creating the new entries in the diagonal.

The new commits implement previous suggestions and then implement this heuristic functionality.

perimosocordiae

That analysis was excellent, thanks @dschult ! The changes look good from here.

scipy/sparse/_compressed.py

scipy/sparse/_coo.py

…essed

This test creates a new mixin class for testing the `_cs_matrix._setdiag` function branches from scipygh-19962. As-is, the `_setdiag` function has a bug in the convert to COO branch that does not sort the indices if `self.has_sorted_indices` is set. The test is marked with an `xfail`.

dschult requested a review from perimosocordiae as a code owner January 25, 2024 02:39

github-actions bot added scipy.sparse enhancement A new feature or improvement labels Jan 25, 2024

perimosocordiae requested changes Jan 25, 2024

View reviewed changes

dschult commented Jan 25, 2024

View reviewed changes

dschult force-pushed the csr_setdiag branch from f363f18 to be7188b Compare February 1, 2024 05:04

perimosocordiae approved these changes Feb 1, 2024

View reviewed changes

scipy/sparse/_compressed.py Show resolved Hide resolved

dschult force-pushed the csr_setdiag branch 2 times, most recently from 814d6ba to 97938e6 Compare February 6, 2024 01:51

perimosocordiae reviewed Feb 8, 2024

View reviewed changes

scipy/sparse/_coo.py Outdated Show resolved Hide resolved

dschult added 6 commits March 4, 2024 16:01

csr _setdiag now uses coo._setdiag

1b13802

implement review suggestions

6f6d6f5

use heuristic cutoff for csr setdiag

e68dd24

add comment with link to discussion of formula

dab8585

fix intc error with pythran. Add idx_dtype conversion in coo_to_compr…

f42bb16

…essed

revert idx_dtype based on both M and N to just N

7ea24e9

dschult force-pushed the csr_setdiag branch from 26291e5 to 7ea24e9 Compare March 4, 2024 21:01

perimosocordiae merged commit df5044c into scipy:main Mar 6, 2024

j-bowhay added this to the 1.13.0 milestone Mar 6, 2024

perimosocordiae mentioned this pull request Mar 7, 2024

BUG: Regression in main creating sparse matrix from float16 raises ValueError: Output dtype not compatible with inputs. #20200

Closed

dschult deleted the csr_setdiag branch April 11, 2024 14:30

This was referenced May 10, 2024

BUG: sparse matrix creation in 1.13 with indices not summing entries any more #20670

Closed

BUG: sparse: Fix summing duplicates for CSR/CSC creation from (data,coords) #20687

Merged

EarlMilktea mentioned this pull request Sep 25, 2024

Examine and fix scipy-related test failures cupy/cupy#8621

Closed

dschult mentioned this pull request Dec 17, 2024

[0.25.0] test_random_walker.py tests fail with float16 scikit-image/scikit-image#7635

Closed

broesler mentioned this pull request Sep 19, 2025

BUG: sparse: csc_array.setdiag() makes indices unsorted without setting has_sorted_indices to False #23644

Open

broesler mentioned this pull request Sep 25, 2025

BUG: Fixes #23644: sparse: csc_array.setdiag() makes indices unsorted #23676

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: sparse: speedup csr/csc setdiag by converting to coo #19962

ENH: sparse: speedup csr/csc setdiag by converting to coo #19962

Uh oh!

dschult commented Jan 25, 2024 •

edited

Loading

Uh oh!

perimosocordiae left a comment

Uh oh!

perimosocordiae Jan 25, 2024

Uh oh!

perimosocordiae Jan 25, 2024

Uh oh!

dschult Jan 25, 2024

Uh oh!

perimosocordiae Jan 25, 2024

Uh oh!

Uh oh!

dschult left a comment

Uh oh!

Uh oh!

dschult Jan 25, 2024

Uh oh!

dschult commented Jan 26, 2024 •

edited

Loading

Uh oh!

dschult commented Feb 1, 2024

Uh oh!

perimosocordiae left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: sparse: speedup csr/csc setdiag by converting to coo #19962

ENH: sparse: speedup csr/csc setdiag by converting to coo #19962

Uh oh!

Conversation

dschult commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

perimosocordiae left a comment

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

dschult Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dschult Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

dschult commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Feb 1, 2024

Uh oh!

perimosocordiae left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschult commented Jan 25, 2024 •

edited

Loading

dschult commented Jan 26, 2024 •

edited

Loading