# Comparitive benchmarks

Here, we perform simple benchmarks to demonstrate basic performance.

In [13]:
import anndata as ad
import scanpy as sc

In [14]:
adata = sc.datasets.pbmc3k()

In [15]:
adata

AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

## Reading & writing

Let us start by writing & reading anndata's native HDF5 file format: `.h5ad`:

In [29]:
%%time

adata.write('test.h5ad')

CPU times: user 164 ms, sys: 30.3 ms, total: 194 ms
Wall time: 238 ms


In [30]:
%%time

adata = ad.read('test.h5ad')

CPU times: user 78 ms, sys: 27 ms, total: 105 ms
Wall time: 109 ms


We see that reading and writing is much faster than for loom files. The efficiency gain here is due to explicit storage of the sparse matrix structure.

In [31]:
%%time

adata.write_loom('test.loom')

CPU times: user 2.86 s, sys: 785 ms, total: 3.64 s
Wall time: 3.67 s


In [33]:
%%time

adata_loom = ad.read_loom('test.loom')

CPU times: user 1.57 s, sys: 234 ms, total: 1.81 s
Wall time: 1.82 s




In [34]:
%%time 

adata.write_zarr('test.zarr')

CPU times: user 176 ms, sys: 68 ms, total: 244 ms
Wall time: 270 ms


  warn('ignoring keyword argument %r' % k)


In [35]:
%%time 

adata_zarr = ad.read_zarr('test.zarr')

CPU times: user 52.4 ms, sys: 8.15 ms, total: 60.6 ms
Wall time: 66.5 ms


## Memory

Next, we benchmark the IO memory, On disk size and In memory size

In [28]:
%load_ext memory_profiler

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


In [50]:
%memit

adata = ad.read('test.h5ad')

peak memory: 469.88 MiB, increment: 0.00 MiB


In [51]:
%memit

adata_loom = ad.read_loom('test.loom')

peak memory: 475.45 MiB, increment: 0.00 MiB




In [52]:
%memit

adata_zarr = ad.read_zarr('test.zarr')

peak memory: 467.76 MiB, increment: 0.01 MiB


In [53]:
import math

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

print(convert_size(sys.getsizeof(adata)))
print(convert_size(sys.getsizeof(adata_loom)))
print(convert_size(sys.getsizeof(adata_zarr)))

22.99 MB
27.15 MB
22.99 MB


  return X.__sizeof__()
  return X.__sizeof__()


## Benchmark summary

We performed the tests again for datasets of different sizes

<table>
<tr>
    <th> Package </th>
    <th> Dataset size </th>
    <th> IO Time</th>
    <th> IO Memory </th>
    <th> On disk size </th>
    <th> In memory size </th> 
</tr>

<tr>
    <td>Anndata</td> 
    <td>5 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>
<tr>
    <td>Anndata</td> 
    <td>20 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>

<tr>
    <td>Loom</td> 
    <td>5 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>
<tr>
    <td>Loom</td> 
    <td>20 MB</td> 
    <td> 516 ms</td>
    <td> 442.MiB </td> 
    <td> X </td>
    <td> X </td>
</tr>

</table>