#### Multi-attribute indexing

The one place SQLite will still have a speed edge is in multidimensional range queries using a multi-attribute index. For equality, no prob - concatenate the values into a tuple and you're good to go. That beats SQLite by a lot, and works on both index types. But `a < 5 and b < 6`, not so much.

Here, let's demo.

In [24]:

import random
from litebox import LiteBox
from filterbox import FilterBox, FrozenFilterBox

objs = [{'a': random.random(), 'b': random.random()} for _ in range(10**6)]
lb = LiteBox(objs, {'a': float, 'b': float})
lb_multi = LiteBox(objs, {'a': float, 'b': float}, index=[('a', 'b')])
fb = FilterBox(objs, ['a', 'b'])
ffb = FrozenFilterBox(objs, ['a', 'b'])


In [25]:
%%timeit
lb.find("a < 0.001 and b < 0.001")

528 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [26]:
%%timeit
fb[{'a': {'<': 0.001}, 'b': {'<': 0.001}}]

677 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [27]:
%%timeit
ffb[{'a': {'<': 0.001}, 'b': {'<': 0.001}}]

99.1 µs ± 780 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [29]:
# and now the multi-attribute indexing, blam

In [28]:
%%timeit
lb_multi.find("a < 0.001 and b < 0.001")

46.7 µs ± 322 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [23]:
%%timeit
# not gonna beat it with something naive either
[o for o in ffb[{'a': {'<': 0.001}}] if o['b'] < 0.001]

204 µs ± 3.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Unfortunately, there's not really a good way to implement a multi-attr index here. 
BTree doesn't support multi-attribute lookups afaik.

So we're kinda stuck.

Contrary to popular belief, you can't just "concatenate the keys and use a regular BTree". At least, not with this implementation; it doesn't support separate lookups for "parts" of a key, so you'd be treating the whole key as one object. 

But! We can make a BTree of `{obj: obj}`, so `BTree{key1: BTree{key2: values}}` could work. Except that when `key1`'s values are all unique... you get a whole ton of BTrees.

OK, so we still don't have a good idea. Making a multi-attribute BTree out of a single-attribute one doesn't seem doable.

The best hack I can think of is:
 - Build the tree on concatenated keys `{(key1, key2): values}` 
 - Get {keys: values} in the range `(k1_min, -inf) < (k1, k2) < (k1_max, inf)`. 
 - Post-filter keys that don't match the k2 constraint.
 - Return only the values with matching keys.

The order bound isn't living up to tree standards, but I bet it would be passable most of the time anyway. Probably beats doing a separate search on key2 and intersecting the results.

The `-inf / inf` values would need to be some type-independent thing. `None` is always small in BTrees so that could be the lower bound.

Could cram it in at the value level instead? `BTree({key1: [(key2, val), (key2, val) ...]` Avoids the awkward comparisons. Burns some RAM though. And it's really equivalent to just using one index and doing the rest in a list comprehension outside the container. 

### todo
think about the frozen arrays and how you would implement it there. That might give good insights.
Sparse ndarrays maybe? Quad / octrees?

In [3]:
from filterbox.btree import BTree
from random import random


In [5]:
objs = [
    {'i': i, 'a': random()*10, 'b': random()} for i in range(10**3)
]

# Task: Find objs where 1 < a < 2 and 0.5 < b < 0.6.

In [6]:
tree = BTree()
for o in objs:
    tree[o['a']] = (o['b'], o)

In [10]:
for b, o in tree.get_range_expr({'>': 1, '<': 2}):
    if b < 0.6 and b > 0.5:
        print(o)

{'i': 191, 'a': 1.0570674936431124, 'b': 0.5649662294471903}
{'i': 437, 'a': 1.3542792185455155, 'b': 0.5753256982901156}
{'i': 185, 'a': 1.401839984653963, 'b': 0.5310477476841865}
{'i': 772, 'a': 1.44039489179562, 'b': 0.5176671572926902}
{'i': 457, 'a': 1.469287583082859, 'b': 0.5475469700864543}
{'i': 943, 'a': 1.5722080241319658, 'b': 0.5615369447345585}
{'i': 231, 'a': 1.6165395202332788, 'b': 0.5551452004632332}
{'i': 92, 'a': 1.7698873658963565, 'b': 0.5834212111319615}
{'i': 392, 'a': 1.834056255838259, 'b': 0.545838844154715}
{'i': 691, 'a': 1.8549647165079397, 'b': 0.5517766855664482}


In [12]:
# attributes are hashable, so this could work as a dict too. But that's less general.
# Or parallel arrays, one for each attribute, plus one for the object ID. Nah, too hard to add/remove items.
