# Comparitive benchmarks

Here, we perform benchmarks to compare anndata's performance vs other packages (Loom, Seurat). We'll be looking into IO time, IO memory, on-disk size and in-memory size.

In [1]:
import anndata as ad
import scanpy as sc

In [2]:
adata = sc.datasets.pbmc3k()

In [3]:
adata

AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

## Reading & writing

Let us start by writing & reading anndata's native HDF5 file format: `.h5ad`:

In [4]:
%%time

adata.write('test.h5ad')

CPU times: user 97.9 ms, sys: 31.6 ms, total: 130 ms
Wall time: 165 ms


In [5]:
%%time

adata = ad.read('test.h5ad')

CPU times: user 64 ms, sys: 35.4 ms, total: 99.5 ms
Wall time: 99.4 ms


We see that reading and writing is much faster than for loom files. The efficiency gain here is due to explicit storage of the sparse matrix structure.

In [6]:
%%time

adata.write_loom('test.loom')

CPU times: user 4.29 s, sys: 1.41 s, total: 5.7 s
Wall time: 7.43 s


In [7]:
%%time

adata_loom = ad.read_loom('test.loom')

CPU times: user 1.27 s, sys: 227 ms, total: 1.5 s
Wall time: 1.55 s




In [8]:
%%time 

adata.write_zarr('test.zarr')

  warn('ignoring keyword argument %r' % k)


CPU times: user 173 ms, sys: 41.9 ms, total: 215 ms
Wall time: 391 ms


In [9]:
%%time 

adata_zarr = ad.read_zarr('test.zarr')

CPU times: user 55.8 ms, sys: 0 ns, total: 55.8 ms
Wall time: 65.2 ms


## Memory

Next, we benchmark the IO memory, On disk size and In memory size

In [10]:
%load_ext memory_profiler

In [11]:
%memit

adata = ad.read('test.h5ad')

peak memory: 403.57 MiB, increment: 1.28 MiB


In [12]:
%memit

adata_loom = ad.read_loom('test.loom')

peak memory: 387.79 MiB, increment: 0.00 MiB




In [13]:
%memit

adata_zarr = ad.read_zarr('test.zarr')

peak memory: 382.39 MiB, increment: 0.75 MiB


In [14]:
adata_size = sys.getsizeof(adata)/1024/1024
adata_loom_size = sys.getsizeof(adata_loom)/1024/1024
adata_zarr_size = sys.getsizeof(adata_zarr)/1024/1024

print(adata_size, "MiB")
print(adata_loom_size, "MiB")
print(adata_zarr_size,"MiB")



22.992788314819336 MiB
27.154497146606445 MiB
22.992788314819336 MiB


  return X.__sizeof__()
  return X.__sizeof__()


## Benchmark summary

A summary of the tests performed above for datasets of different sizes.

<table>
<tr>
    <th> Package </th>
    <th> Dataset size </th>
    <th> IO Time</th>
    <th> IO Memory </th>
    <th> On disk size </th>
    <th> In memory size </th> 
</tr>

<tr>
    <td>Anndata</td> 
    <td>5 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>
<tr>
    <td>Anndata</td> 
    <td>20 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>

<tr>
    <td>Loom</td> 
    <td>5 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>
<tr>
    <td>Loom</td> 
    <td>20 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>

</table>