This analyzes the performance of BTrees. BTrees.LOBTree is a btree of Long->Object which is a nice id-to-object mapping.
But it takes a long time to build - 1M objects takes 8 seconds. A dict is much faster. Unless you need `>`, `<` there's no reason to use a BTree.

Still -- SortedDict might be slower than a BTree. Consider that!

Nope, looks like SortedDict is better on build time and access time; only advantage of BTree is it uses half the RAM. Not worth using BTree.

In [2]:
from bisect import bisect_left
import random
import time
import sys
import numpy as np
from sortedcontainers import SortedDict, SortedSet
from pympler.asizeof import asizeof
import sortednp as snp
from cykhash import Int64Set
from operator import itemgetter
from typing import Callable, Union, List, Any, Tuple
from collections import Counter, namedtuple
from dataclasses import dataclass
from pympler.asizeof import asizeof
from BTrees.LOBTree import LOBTreePy


In [3]:
n=10**5
items = [random.random() for _ in range(n)]
ids = [id(item) for item in items]

In [9]:
t0 = time.time()
bt = LOBTreePy()
for i in range(n):
    t[ids[i]] = items[i]
t1 = time.time()
print('btree build', t1-t0)

btree build 0.6621558666229248


In [10]:
t0 = time.time()
sd = SortedDict()
for i in range(n):
    sd[ids[i]] = items[i]
t1 = time.time()
print('sorteddict build', t1-t0)

sorteddict build 0.1623368263244629


In [19]:
sd_size = asizeof(sd)
bt_size = 51.4*n  # from a benchmark -- asizeof doesn't work on bt since it's a c object
print(f'sorteddict is {round(sd_size / bt_size, 2)}x the size of btree')

sorteddict is 2.11x the size of btree


In [20]:
rand_ids = list(random.choice(ids) for _ in range(10**4))

In [21]:
%%timeit -n 3 -r 3
list(bt.get(r) for r in rand_ids)

7.37 ms ± 2.67 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)


In [22]:
%%timeit -n 3 -r 3
list(sd.get(r) for r in rand_ids)

4.53 ms ± 2.19 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)
