-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH: sparse: speedup csr/csc setdiag by converting to coo #19962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also run the benchmarks for the case where the diagonal is already set in the sparsity pattern? This would involve calling .setdiag(1)
once in the setup code, before the timing loop.
scipy/sparse/_coo.py
Outdated
x.sum_duplicates() | ||
return x | ||
|
||
def _coo_to_compressed(self, swap): #major_dim, minor_dim, major, minor, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should be calling this inside the tocsc()
and tocsr()
methods above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I assume we don't need the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To call the helper function inside tocsr
and tocsc
, I made '_swap' a staticmethod
in csr
and csc
.
I could have defined a new function swap in each of tocsr
and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.
scipy/sparse/_compressed.py
Outdated
coo = self.tocoo() | ||
coo._setdiag(values, k) | ||
arrays = coo._coo_to_compressed(self._swap) | ||
self.indptr, self.indices, self.data, _ = arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always make a new copy of the index and data arrays, while (I think) the old version would update those arrays in-place if possible.
This might not be an issue in practice, but it's something to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented your review suggestions.
The benchmarks when values on the diagonal are already set show that the original method is faster (see below). Looks like number of new values being set is key.
The time needed for that case is quite small, maybe it's not a crunch point... but the conversion approach is a factor of >4 slower for that quite small value. I'll see if there is a clear line in terms of the number of new entries.
scipy/sparse/_coo.py
Outdated
x.sum_duplicates() | ||
return x | ||
|
||
def _coo_to_compressed(self, swap): #major_dim, minor_dim, major, minor, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To call the helper function inside tocsr
and tocsc
, I made '_swap' a staticmethod
in csr
and csc
.
I could have defined a new function swap in each of tocsr
and tocsc
Let me know if you'd prefer not to make '_swap' a staticmethod in csr/csc.
Which is faster when? converting-to-coo or indexing-csr? The nice linear dependence on filled diagonal entries is shown in the plot below. The convert-to-coo method is flat, while the indexing method slopes downward. For sparse matrixes ( I don't see a good way to find a cutoff automatically. The crossing point seems to depend on nnz and N. We can certainly have both codes in the same function if we can find a formula for when to use which. The code used for benchmarking is in details. It uses a version of this branch with both indexing-csr (version=0) and convert-tocoo (version=1) available with a version kwarg added to import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse
import time
plt.ion()
def measure_times(M, N, density, preset):
A = scipy.sparse.random(M, N, density=density, format='csc',dtype=np.float64)
A.setdiag(values=0, k=0)
if preset:
A.setdiag(values=np.ones(preset, dtype=np.float64), k=0)
A.eliminate_zeros()
runs = 2
time_csc = 0
for i in range(runs + 1):
A2 = A.copy()
start = time.time()
A2._setdiag(np.asarray(1), k=0, version=0)
end = time.time()
if i>=2:
time_csc += end-start
time_coo = 0
for i in range(runs + 1):
A2 = A.copy()
start = time.time()
A2._setdiag(np.array(1), k=0, version=1)
end = time.time()
if i>=2:
time_coo += end-start
count_replacements = len(A.diagonal(0).nonzero()[0])
print("number of replacement entries: ", count_replacements, time_csc, time_coo)
return (time_csc, time_coo)
dens = [0.001, 0.01, 0.1, 0.3]
N = 1000
filled = [int(i * N / 10) for i in range(1,9)]
filled += [int(0.8 * N + i * N/50) for i in range(1,11)]
print("N: ",N, " filled: ", filled)
M = len(filled)
all = np.zeros((0,2))
for fig, density in enumerate(dens):
l = []
for fill in filled:
l.append( measure_times(N, N, density, preset=fill) )
all = np.concatenate((all, l))
data = np.concatenate((all, np.atleast_2d(np.tile(filled, len(dens))).T), axis=1)
print(data.shape)
print(repr(data))
plt.figure(1)
plt.clf()
plt.plot(data[0: M, 2], data[0: M, 0], '-ob')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 0], '-og')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 0], '-ok')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 0], '-ob')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 0], '-og')
plt.plot(data[0: M, 2], data[0: M, 1], '-om')
plt.plot(data[M: 2*M, 2], data[M: 2*M, 1], '-oc')
plt.plot(data[2*M: 3*M, 2], data[2*M: 3*M, 1], '-or')
plt.plot(data[3*M: 4*M, 2], data[3*M: 4*M, 1], '-om')
#plt.plot(data[4*M: 5*M, 2], data[4*M: 5*M, 1], '-oc')
plt.xlabel(f"filled diag entries (N = {N})")
plt.ylabel("run-time in s")
plt.legend([
f"main(density {dens[0]})",
f"main(density {dens[1]})",
f"main(density {dens[2]})",
f"main(density {dens[3]})",
# f"main(density {dens[4]})",
f"csr_setdiag(density {dens[0]})",
f"csr_setdiag(density {dens[1]})",
f"csr_setdiag(density {dens[2]})",
f"csr_setdiag(density {dens[3]})",
# f"csr_setdiag(density {dens[4]})",
]) |
As @perimosocordiae suspected, it is faster to use CSC indexing for
The following subplots show which method is faster dependent on the two parameter axes Based on these results I created a version that goes through CSC indexing enough to know how many new values there are. Luckily that part of the calculation is a small portion of the total time. At thatpoint the code decides whether to create the new entires or convert to coo. The following plots show results for the new chimera version. By cutting a linear path through the pictures above and plotting the time for various methods we can see that the chimera version (red -- labeled "new") tracks along close to the faster method -- and a little above due to the extra checking. It then switches to the other method as that crosses to become the faster method. The red dots should be close to the lowest for any x-coordinate, and they seem to be. The other methods compared to are:
The lines through parameter space are noted under the subplot. Increasing The new commits implement previous suggestions and then implement this heuristic functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That analysis was excellent, thanks @dschult ! The changes look good from here.
814d6ba
to
97938e6
Compare
This test creates a new mixin class for testing the `_cs_matrix._setdiag` function branches from scipygh-19962. As-is, the `_setdiag` function has a bug in the convert to COO branch that does not sort the indices if `self.has_sorted_indices` is set. The test is marked with an `xfail`.
Reference issue
Fixes gh-19943
What does this implement/fix?
Implement setdiag for compressed format that avoids index value setting by converting to coo format, calling setdiag, and then converting back.
The return conversion took two steps -- converting to csr/csc and then updating the data on self. By creating a private convenience function to shortcut the tocsr/tocsc function, we avoid one of those steps. This also helps a few places in
_compressed.__init__
where we convert to coo and back again. We can avoid some of the input checking.Additional information
Some crude benchmarks. New code is 5-125 times faster. All have linear dependence on nnz. The convert-to-coo version has three curves on top of each other with a lower slope. Below is the picture with just the convert-to-coo curves to show that they do increase, just at a lower slope.
[Edit -- legend in the second plot should indicate the new branch -- not
main
]