# Comparitive benchmarks

Here, we perform benchmarks to compare anndata's performance vs other packages (Loom, Seurat). We'll be looking into IO time, IO memory, on-disk size and in-memory size.

In [1]:
import anndata as ad
import scanpy as sc

ERROR: Error in parse(text = x, srcfile = src): <text>:1:8: unexpected symbol
1: import anndata
           ^


In [2]:
adata = sc.datasets.pbmc3k()

In [3]:
adata

AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

## Reading & writing

Let us start by writing & reading anndata's native HDF5 file format: `.h5ad`:

In [4]:
%%time

adata.write('test.h5ad')

CPU times: user 106 ms, sys: 55.7 ms, total: 162 ms
Wall time: 167 ms


In [5]:
%%time

adata = ad.read('test.h5ad')

CPU times: user 88.8 ms, sys: 15.8 ms, total: 105 ms
Wall time: 125 ms


We see that reading and writing is much faster than for loom files. The efficiency gain here is due to explicit storage of the sparse matrix structure.

In [6]:
%%time

adata.write_loom('test.loom')

CPU times: user 4.68 s, sys: 898 ms, total: 5.58 s
Wall time: 6.08 s


In [7]:
%%time

adata_loom = ad.read_loom('test.loom')

CPU times: user 1.49 s, sys: 542 ms, total: 2.03 s
Wall time: 2.08 s




In [8]:
%%time 

adata.write_zarr('test.zarr')

CPU times: user 156 ms, sys: 44.2 ms, total: 201 ms
Wall time: 294 ms


  warn('ignoring keyword argument %r' % k)


In [9]:
%%time 

adata_zarr = ad.read_zarr('test.zarr')

CPU times: user 75.8 ms, sys: 7.91 ms, total: 83.7 ms
Wall time: 149 ms


## Memory

Next, we measure the IO memory, on-disk size and in-memory size.

In [10]:
%load_ext memory_profiler

```%memit``` measures the memory use of a single statement

In [11]:
%memit

adata.write('test.h5ad')

peak memory: 529.79 MiB, increment: 0.49 MiB


In [12]:
%memit

adata = ad.read('test.h5ad')

peak memory: 530.32 MiB, increment: 0.00 MiB


In [13]:
%%time

adata.write_loom('test.loom')

CPU times: user 3.35 s, sys: 687 ms, total: 4.04 s
Wall time: 4.99 s


In [14]:
%memit

adata_loom = ad.read_loom('test.loom')

peak memory: 526.47 MiB, increment: 0.00 MiB




In [15]:
%memit 

adata.write_zarr('test.zarr')

peak memory: 522.41 MiB, increment: 0.00 MiB


  warn('ignoring keyword argument %r' % k)


In [16]:
%memit

adata_zarr = ad.read_zarr('test.zarr')

peak memory: 522.43 MiB, increment: 0.00 MiB


Next, we look at the in-memory size of the different objects. Loom files have a higher in-memory size.

In [17]:
adata_size = sys.getsizeof(adata)/1024/1024
adata_loom_size = sys.getsizeof(adata_loom)/1024/1024
adata_zarr_size = sys.getsizeof(adata_zarr)/1024/1024

print(adata_size, "MiB")
print(adata_loom_size, "MiB")
print(adata_zarr_size,"MiB")



22.992788314819336 MiB
27.154497146606445 MiB
22.992788314819336 MiB


  return X.__sizeof__()
  return X.__sizeof__()


## Benchmark summary

A summary of the tests performed above for datasets of different sizes.

<table>
<tr>
    <th> Package </th>
    <th> Dataset size </th>
    <th> IO Time</th>
    <th> IO Memory </th>
    <th> On disk size </th>
    <th> In memory size </th> 
</tr>

<tr>
    <td>Anndata</td> 
    <td>22.99 MiB</td> 
    <td> 230 ms</td>
    <td> 501.57 MiB</td> 
    <td> X </td>
    <td> 22.99 MiB </td>
</tr>
<tr>
    <td>Loom</td> 
    <td>22.99 MiB</td> 
    <td> 4.71 s</td>
    <td> 484.86 MiB </td> 
    <td> X </td>
    <td> 27.15 MiB</td>
</tr>
<tr>
    <td>Anndata</td> 
    <td>169.30 MiB</td> 
    <td> 456 ms</td>
    <td> 1267.84 MiB </td> 
    <td> X </td>
    <td> 173.76 MiB </td>
</tr>


<tr>
    <td>Loom</td> 
    <td>169.30 MiB</td> 
    <td> 5.76 s</td>
    <td> 1274.40 MiB </td> 
    <td> X </td>
    <td> 180.84 MiB  </td>
</tr>

</table>

In [None]:
install.packages('Seurat')
library(Seurat)