A Persistent Map Implementation based on Hash Array Mapped Tries
Rust Python Shell
Latest commit 7bb893a Oct 25, 2016 @michaelwoerister Bump version

README.md

hamt-rs Build Status

A Hash Array Mapped Trie implementation based on the Ideal Hash Trees paper by Phil Bagwell. This is the persistent map datastructure used in Scala's and Clojure's standard libraries. The idea to use a special collision node to deal with hash collisions is taken from Clojure's implementation.

Usage

let mut map = HamtMap::new();

for i in range(0, size) {
    map = map.plus(i, i);
}

if map.find(&0) == Some(1) {
    ...
}

let (without_10, size_changed_10) = map.remove(&10);
let (without_20, size_changed_20) = map.remove(&20);

for (k, v) in map.iter() {
    ...
}

Performance

Looks pretty good so far, for a fully persistent data structure. The benchmarks below were done on a Core i7-4712MQ, with random numbers and the compile flags -C lto -C opt-level=3 -C target-feature=+popcnt.

Lookup

Times (in microseconds) for one thousand lookups in a collection with ELEMENT COUNT elements (where key and value types are u64).

ELEMENT COUNT HAMT HASHMAP
10 34 32
1000 49 37
100000 72 44

In percent over std::HashMap (less than 100% means faster, more means slower than std::HashMap).

ELEMENT COUNT HAMT HASHMAP
10 106% 100%
1000 130% 100%
100000 160% 100%

The HAMT is in the same ballpark as the std::HashMap, even for larger collections. Also, LLVM unfortunately does not (yet) properly translate the cntpop intrinsic function (which could be just one CPU instruction on many architectures, but is translated to a much more expensive instruction sequence currently). As pointed out on reddit, properly configuring LLVM (e.g. by setting the target-cpu option) is necessary for it to issue the popcnt instruction.

Insertion

Times (in microseconds) for one thousand insertions into a collection with ELEMENT COUNT elements (again, key and value type is u64).

ELEMENT COUNT HAMT HASHMAP
10 133 48
1000 185 76
100000 1521 99

In percent over std::HashMap (less than 100% means faster, more means slower than std::HashMap).

ELEMENT COUNT HAMT HASHMAP
10 279% 100%
1000 242% 100%
100000 1537% 100%

As can be seen, the HAMT holds up pretty well against the non-persistent std::HashMap.