# Benchmarking sparse dataformat conversion
*Comparing scipy.sparse, pandas.array.SparseArray and xarray + pydata/sparse*

Dependencies:
 - scipy: numpy
 - pandas: a few
 - xarray + pydata/sparse: scipy, **numba**, **llvmlite**

In [9]:
import xarray

import numpy as np
import pandas as pd

import sparse
import scipy.sparse

In [50]:
print('numpy', np.__version__)
print('pandas', pd.__version__)
print('scipy', scipy.__version__)
print('xarray', xarray.__version__)
print('sparse', sparse.__version__)

numpy 1.18.1
pandas 1.0.1
scipy 1.4.1
xarray 0.15.0
sparse 0.9.1


## Creating a sparse scipy array

In [12]:
%%time

rng = np.random.RandomState(0)

X = scipy.sparse.rand(1000, 100000, format='csr')

CPU times: user 3.97 s, sys: 64.1 ms, total: 4.03 s
Wall time: 4.03 s


In [38]:
X.nnz

1000000

In [13]:
%timeit X.copy()

1.08 ms ± 339 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [33]:
(X.data.nbytes + X.indices.nbytes) / 1e6

12.0

In [14]:
%timeit X.asformat('csc')

11.3 ms ± 54.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [54]:
%timeit X.asformat('coo')

3.14 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Creating a sparse frame with pandas

In [21]:
%%time

X_df = pd.DataFrame.sparse.from_spmatrix(X)

CPU times: user 16.7 s, sys: 88.1 ms, total: 16.8 s
Wall time: 16.8 s


In [34]:
X_df.dtypes.head()

0    Sparse[float64, 0.0]
1    Sparse[float64, 0.0]
2    Sparse[float64, 0.0]
3    Sparse[float64, 0.0]
4    Sparse[float64, 0.0]
dtype: object

In [28]:
X_df.memory_usage(deep=True).sum() / 1e6

12.000128

In [37]:
%time

X_df.sparse.to_coo()

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 4.77 µs


<1000x100000 sparse matrix of type '<class 'numpy.float64'>'
	with 1000000 stored elements in COOrdinate format>

## Creating sparse xarray arrays via pydata/sparse

In [42]:
%%time

X_xr = xarray.DataArray(sparse.COO.from_scipy_sparse(X))

CPU times: user 15.3 ms, sys: 8 ms, total: 23.3 ms
Wall time: 22.1 ms


In [45]:
%%time

X_xr.data.to_scipy_sparse().asformat('csr')

CPU times: user 18.4 ms, sys: 21 µs, total: 18.4 ms
Wall time: 17.1 ms


<1000x100000 sparse matrix of type '<class 'numpy.float64'>'
	with 1000000 stored elements in Compressed Sparse Row format>