Skip to content

Conversation

dschult
Copy link
Contributor

@dschult dschult commented Jan 25, 2024

Reference issue

Fixes gh-19943

What does this implement/fix?

Implement setdiag for compressed format that avoids index value setting by converting to coo format, calling setdiag, and then converting back.

The return conversion took two steps -- converting to csr/csc and then updating the data on self. By creating a private convenience function to shortcut the tocsr/tocsc function, we avoid one of those steps. This also helps a few places in _compressed.__init__ where we convert to coo and back again. We can avoid some of the input checking.

Additional information

Some crude benchmarks. New code is 5-125 times faster. All have linear dependence on nnz. The convert-to-coo version has three curves on top of each other with a lower slope. Below is the picture with just the convert-to-coo curves to show that they do increase, just at a lower slope.

cross_branch

new_code

[Edit -- legend in the second plot should indicate the new branch -- not main]

@github-actions github-actions bot added scipy.sparse enhancement A new feature or improvement labels Jan 25, 2024
Copy link
Member

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also run the benchmarks for the case where the diagonal is already set in the sparsity pattern? This would involve calling .setdiag(1) once in the setup code, before the timing loop.

x.sum_duplicates()
return x

def _coo_to_compressed(self, swap): #major_dim, minor_dim, major, minor, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should be calling this inside the tocsc() and tocsr() methods above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I assume we don't need the comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To call the helper function inside tocsr and tocsc, I made '_swap' a staticmethod in csr and csc.

I could have defined a new function swap in each of tocsr and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.

coo = self.tocoo()
coo._setdiag(values, k)
arrays = coo._coo_to_compressed(self._swap)
self.indptr, self.indices, self.data, _ = arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always make a new copy of the index and data arrays, while (I think) the old version would update those arrays in-place if possible.

This might not be an issue in practice, but it's something to keep in mind.

Copy link
Contributor Author

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented your review suggestions.

The benchmarks when values on the diagonal are already set show that the original method is faster (see below). Looks like number of new values being set is key.

The time needed for that case is quite small, maybe it's not a crunch point... but the conversion approach is a factor of >4 slower for that quite small value. I'll see if there is a clear line in terms of the number of new entries.

DiagRand

DiagPreset

x.sum_duplicates()
return x

def _coo_to_compressed(self, swap): #major_dim, minor_dim, major, minor, data):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To call the helper function inside tocsr and tocsc, I made '_swap' a staticmethod in csr and csc.

I could have defined a new function swap in each of tocsr and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.

@dschult
Copy link
Contributor Author

dschult commented Jan 26, 2024

Which is faster when? converting-to-coo or indexing-csr?
Most cases for any N with small density show converting-to-coo as much faster -- often orders of magnitude faster. But as the nonzero structure of the particular diagonal updated becomes dense, there is a linear decrease in time. Indeed indexing-csr wins when the diagonal of interest is 100% dense. Thus there are some cases (dense diagonal being set) when the indexing approach is faster.

The nice linear dependence on filled diagonal entries is shown in the plot below. The convert-to-coo method is flat, while the indexing method slopes downward. For sparse matrixes (0.01 or 0.001 density) essentially all entries must be already filled for indexing to work well. But for densities of 0.1 or higher, the trade-off occurs at a lower rate of filled diagonal entries. The speed-up is not huge (< factor of 2 unless 100% dense diagonal), but it is there.

I don't see a good way to find a cutoff automatically. The crossing point seems to depend on nnz and N. We can certainly have both codes in the same function if we can find a formula for when to use which.

bench_preset_values

The code used for benchmarking is in details. It uses a version of this branch with both indexing-csr (version=0) and convert-tocoo (version=1) available with a version kwarg added to _setdiag.

import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
import time
plt.ion()

def measure_times(M, N, density, preset):
    A = scipy.sparse.random(M, N, density=density, format='csc',dtype=np.float64)
    A.setdiag(values=0, k=0)
    if preset:
        A.setdiag(values=np.ones(preset, dtype=np.float64), k=0)
    A.eliminate_zeros()

    runs = 2
    time_csc = 0
    for i in range(runs + 1):
      A2 = A.copy()
      start = time.time()
      A2._setdiag(np.asarray(1), k=0, version=0)
      end = time.time()
      if i>=2:
        time_csc += end-start

    time_coo = 0
    for i in range(runs + 1):
      A2 = A.copy()
      start = time.time()
      A2._setdiag(np.array(1), k=0, version=1)
      end = time.time()
      if i>=2:
        time_coo += end-start

    count_replacements = len(A.diagonal(0).nonzero()[0])
    print("number of replacement entries: ", count_replacements, time_csc, time_coo)

    return (time_csc, time_coo)

dens = [0.001, 0.01, 0.1, 0.3]
N = 1000

filled = [int(i * N / 10) for i in range(1,9)]
filled += [int(0.8 * N + i * N/50) for i in range(1,11)]
print("N: ",N, " filled: ", filled)
M = len(filled)

all = np.zeros((0,2))
for fig, density in enumerate(dens):
    l = []
    for fill in filled:
      l.append( measure_times(N, N, density, preset=fill) )
    all = np.concatenate((all, l))

data = np.concatenate((all, np.atleast_2d(np.tile(filled, len(dens))).T), axis=1)
print(data.shape)
print(repr(data))

plt.figure(1)
plt.clf()
plt.plot(data[0: M, 2], data[0: M, 0], '-ob')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 0], '-og')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 0], '-ok')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 0], '-ob')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 0], '-og')
plt.plot(data[0: M, 2], data[0: M, 1], '-om')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 1], '-oc')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 1], '-or')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 1], '-om')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 1], '-oc')

plt.xlabel(f"filled diag entries (N = {N})")
plt.ylabel("run-time in s")
plt.legend([
    f"main(density {dens[0]})",
    f"main(density {dens[1]})",
    f"main(density {dens[2]})",
    f"main(density {dens[3]})",
#    f"main(density {dens[4]})",
    f"csr_setdiag(density {dens[0]})",
    f"csr_setdiag(density {dens[1]})",
    f"csr_setdiag(density {dens[2]})",
    f"csr_setdiag(density {dens[3]})",
#    f"csr_setdiag(density {dens[4]})",
])

@dschult
Copy link
Contributor Author

dschult commented Feb 1, 2024

As @perimosocordiae suspected, it is faster to use CSC indexing for setdiag when the diagonal already has nonzero values. When I set up a comparison between "converting to coo" and "using csc indexing", I was able to find good parameters to examine for a cutoff formula. It turns out NNZ is better than density and "new values" needed on the diagonal is better than "pct of values being new". The time tradeoff seems to be a simple linear relationship that doesn't change very much with N the size of the matrix. My first estimates had a quadratic relationship between nnz and newvals. But linear is all that was needed. In fact the intercept is zero. And the slope is a simple (approximate) value. The condition that seems to work is:

newvals <= 0.001 * nnz

The following subplots show which method is faster dependent on the two parameter axes nnz and newvals. Blue dot means converting to COO to use setdiag and convert back is faster. Yellow means using CSC indexing is faster. The line between blue and yellow is the boundary we seek to make a cutoff criteria. The magenta line represents the formula above. It seems to match the data quite well for a range of N values. The boundary is not quite linear for larger N and denser matrices -- with a region of blue extending below the line. But that level of density is close to 30%, so I haven't explored it much.

fig_newvals_nnz_faster

Based on these results I created a version that goes through CSC indexing enough to know how many new values there are. Luckily that part of the calculation is a small portion of the total time. At thatpoint the code decides whether to create the new entires or convert to coo. The following plots show results for the new chimera version. By cutting a linear path through the pictures above and plotting the time for various methods we can see that the chimera version (red -- labeled "new") tracks along close to the faster method -- and a little above due to the extra checking. It then switches to the other method as that crosses to become the faster method. The red dots should be close to the lowest for any x-coordinate, and they seem to be.

The other methods compared to are:

  • "old"(green) the current main branch version
  • "csc"(cyan) a sped up version of "old" doing the routines manually to avoid overhead and revealing newvals in the process.
  • "coo"(magenta) conversion to coo and back. Best choice for empty diagonals.
  • "new"(red) choose between csc and coo methods based on the above formula.

fig_time_over_param_lines

The lines through parameter space are noted under the subplot. Increasing nnz for low value of newvals. Increaing nnz while decreasing newvals (down the diagonal of the first plot above). Increasing newvals for high values of nnz and increasing newvals for a low value of nnz. In each case, the red curve is close to the shortest time and switches to to the new shortest time branch when they cross. Also note that the "old" and "csc" methods are about the same. The time is well dominated by creating the new entries in the diagonal.

The new commits implement previous suggestions and then implement this heuristic functionality.

Copy link
Member

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That analysis was excellent, thanks @dschult ! The changes look good from here.

@dschult dschult force-pushed the csr_setdiag branch 2 times, most recently from 814d6ba to 97938e6 Compare February 6, 2024 01:51
@perimosocordiae perimosocordiae merged commit df5044c into scipy:main Mar 6, 2024
@j-bowhay j-bowhay added this to the 1.13.0 milestone Mar 6, 2024
@dschult dschult deleted the csr_setdiag branch April 11, 2024 14:30
broesler added a commit to broesler/scipy that referenced this pull request Sep 25, 2025
This test creates a new mixin class for testing the
`_cs_matrix._setdiag` function branches from scipygh-19962. As-is, the
`_setdiag` function has a bug in the convert to COO branch that does not
sort the indices if `self.has_sorted_indices` is set. The test is marked
with an `xfail`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: sparse: CSC.setdiag is slower than converting to LIL and back
3 participants