# Checking speed of operation in numpy and scipy.sparse

The authors of **Fast Multiresolution Image Querying** use only a handful of wavelet coefficients for thresholding and then comparison (paragraph 2.2 4). One of the motivations is to accelerate search.

We want to check whether for size 1000 vectors (with only a handful of nonzero entries) comparing them pointwise is faster using numpy dense vectors or scipy.sparse csr vectors.

In [1]:
import numpy as np
from scipy import sparse

In [2]:
N = 1000
vd = np.random.randn(N)

vds = vd[:]
vds[:950] = 0 # sparsified version of vd, but still as a numpy array

In [3]:
vs = sparse.csr_matrix(vds)
vs

<1x1000 sparse matrix of type '<class 'numpy.float64'>'
	with 50 stored elements in Compressed Sparse Row format>

In [4]:
%timeit vs != vs

180 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [5]:
%timeit vd != vd

1.36 µs ± 34.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [6]:
%timeit vds != vds

1.43 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


It seems like using sparse vectors for such task doesn't make much sense, as the comparison is 100x slower for them.

To sum up, using numpy implementation is faster and easier than actually storing only nonzero coefficients, so we're going to use that.