Don't mark it obsolete, speed tests for sortedcontainers are mistaken #1

netch80 · 2017-02-01T22:01:18Z

Bintrees are substantially faster in case of inexact key and value search is used. The performance tests (reported at http://www.grantjenks.com/docs/sortedcontainers/performance.html) mainly ignore key ordering in containers. Only iter() test has something common with ordering; others can be faster satisfied on any non-sorted store like hash map. Moreover, the API of SortedDict with grounding on absolute index and additional operations to retrieve real data is conceptually misdesigned.

With inexact search of ceiling key or key-value pair, bintrees performance is much better. My results on 100000 key set (FreeBSD 10.3/amd64, bintrees with Cython, Python 2.7.13):

$ ./bench
Time for AVLTree (key only): 3.88829088211
Time for RBTree (key only): 3.60424399376
Time for SortedDict (key only): 5.31183195114
Time for AVLTree (KVP): 3.37197518349
Time for RBTree (KVP): 3.34165000916
Time for SortedDict (KVP): 5.43365883827

So, bintrees are almost twice faster for this task class.

The test code follows.



N = 100000*10
if name == "main":

avld = bintrees.avltree.AVLTree()

rbd = bintrees.rbtree.RBTree()

sd = sortedcontainers.SortedDict()

for k in range(0, N, 10):

avld[k] = rbd[k] = sd[k] = k * 11

t0 = time.time()

for k in range(N):

try:

t = avld.ceiling_key(k)

except KeyError:

pass

t1 = time.time()

print "Time for AVLTree (key only):", t1 - t0

t0 = time.time()

for k in range(N):

try:

t = rbd.ceiling_key(k)

except KeyError:

pass

t1 = time.time()

print "Time for RBTree (key only):", t1 - t0

t0 = time.time()

for k in range(N):

try:

t = sd.iloc[sd.bisect_left(k)]

except (KeyError, IndexError):

pass

t1 = time.time()

print "Time for SortedDict (key only):", t1 - t0

t0 = time.time()

for k in range(N):

try:

t = avld.ceiling_item(k)

except KeyError:

pass

t1 = time.time()

print "Time for AVLTree (KVP):", t1 - t0

t0 = time.time()

for k in range(N):

try:

t = rbd.ceiling_item(k)

except KeyError:

pass

t1 = time.time()

print "Time for RBTree (KVP):", t1 - t0

t0 = time.time()

for k in range(N):

try:

t = sd.iloc[sd.bisect_left(k)]

t2 = sd[t]

except (KeyError, IndexError):

pass

t1 = time.time()

print "Time for SortedDict (KVP):", t1 - t0

The text was updated successfully, but these errors were encountered:

mozman · 2017-02-02T03:43:32Z

bintrees wont't go away, but no further development is planned, just bugfixes and check for compatibility with newer python releases.

mozman closed this as completed Feb 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't mark it obsolete, speed tests for sortedcontainers are mistaken #1

Don't mark it obsolete, speed tests for sortedcontainers are mistaken #1

netch80 commented Feb 1, 2017 •

edited

Loading

mozman commented Feb 2, 2017

Don't mark it obsolete, speed tests for sortedcontainers are mistaken #1

Don't mark it obsolete, speed tests for sortedcontainers are mistaken #1

Comments

netch80 commented Feb 1, 2017 • edited Loading

mozman commented Feb 2, 2017

netch80 commented Feb 1, 2017 •

edited

Loading