Implement the collection views API for TrieMap. #18287

michaelsproul · 2014-10-24T19:10:07Z

I've implemented the new collection views API for TrieMap. I more or less followed the approach set out by @gankro in BTreeMap, by using a SearchStack. There's quite a bit of unsafe code, but I've wrapped it safely where I think is appropriate. I've added tests to ensure everything works, and performance seems quite good.

test trie::bench_map::bench_find                           ... bench:     67879 ns/iter (+/- 4192)
test trie::bench_map::bench_find_entry                     ... bench:    186814 ns/iter (+/- 18748)
test trie::bench_map::bench_insert_large                   ... bench:    716612 ns/iter (+/- 160121)
test trie::bench_map::bench_insert_large_entry             ... bench:    851219 ns/iter (+/- 20331)
test trie::bench_map::bench_remove                         ... bench:    838856 ns/iter (+/- 27998)
test trie::bench_map::bench_remove_entry                   ... bench:    981711 ns/iter (+/- 53046)

Using an entry is slow compared to a plain find, but is only ~15% slower for inserts and removes, which is where this API is most useful. I'm tempted to remove the standalone remove function in favour of an entry-based approach (to cut down on complexity).

I've added some more comments to the general part of the code-base, which will hopefully help the next person looking over this. I moved the three key structures to the top of the file so that the nesting structure is clearly visible, and renamed Child<T> to TrieNode<T> and TrieNode<T> to InternalNode<T> to improve clarity. If these changes are creeping, I'm happy to revert them.

Let me know if my use of fail! is ok, I was a little unsure of how specific to be. Some of the data-structures have various invariants that shouldn't be broken, so using fail! seemed appropriate.

Still to do

Modernise iterators (make them double-ended).
Make the keys generic, or rename this data-structure (see: libcollections: TrieMap is not a general Trie #14902).
Possibly move this code out of libcollections. Searching Github for TrieMap turns up very few real results.

Related issues: #18009 and #17320

rust-highfive · 2014-10-24T19:10:14Z

Warning

These commits modify unsafe code. Please review it carefully!

Gankra · 2014-10-24T21:01:35Z

Thanks a ton! On my phone will dig into this in a bit.

michaelsproul · 2014-10-24T23:12:54Z

Fixed some unused import warnings and removed the XXX comment that was making the Travis build fail.

Gankra · 2014-10-25T16:34:36Z

src/libcollections/trie.rs

+/// for very small numbers. For example, 1 and 2 are identical in all but their least significant
+/// 4 bits. If both numbers are used as keys, a chain of maximum length will be created to
+/// differentiate them.
+///


Is there a name for this kind of Radix Trie? The Radix Trie I know allocates exactly one node per-element. Just enough to distinguish it from everything else. This implementation is like a hybrid xfast-radix trie, as described. I can see that it could be more efficient to have fixed-width chunks, though.

Or maybe wikipedia's being dumb and radix/prefix/patricia aren't all synonyms?

I wondered exactly this and I didn't find anything online until just now. These notes from CMU refer to this data structure simply as a "trie".

http://www.cs.cmu.edu/afs/cs/academic/class/15451-f04/www/lectures/lect0928.txt

I think the " radix" prefix is erroneous unless we implement node compression. Perhaps @thestinger, who seems to have written most of this, could shed some light?

It's quite tempting to implement a Patricia Trie or one of the X/Y trees to see how fast they are. That said, we already have Btrees for ordered maps, and I suspect the main use case for a trie would be semantic (finding prefix nodes, etc). This is what I designed my "generic trie" for, but the performance is pretty awful (about 10x slower than a HashMap, depending on key length).

Gankra · 2014-10-25T18:17:46Z

Ack, got some stuff I need to take care of, might not be able to get back into this for a bit. If no one else picks this up in a few days, feel free to ping me as a reminder.

michaelsproul · 2014-10-25T18:28:57Z

Ok, I'll fix up the things you caught and play around with boxing a bit. I'm also suddenly very busy (3 midterms next week, yay!).

michaelsproul · 2014-10-26T23:12:39Z

Just fixed an awful bug in OccupiedEntry::take, and cleaned up a lot of stuff by using push and pop.

Working on performance now.

Gankra · 2014-10-31T04:14:44Z

@michaelsproul Sorry for the delay. Unfortunately Collections Reform just passed, which means everything is getting refactored. This might mess up your PR. @alexcrichton has volunteered to review this PR in the hopes that we can get this in before the really harsh breakage kicks in, though. (my reviews can't get things merged, anyway)

Beyond tons of methods getting renamed and all the traits getting removed, TrieMap is getting moved into a trie directory and split into map.rs and set.rs. I'm trying to do this in a way that git can maybe understand and handle gracefully, but it'll probably cause a merge conflict. Ideally we can get this in first, and I can deal with the merge conflict for you. Worst-case scenario, you might have to do some patch-fiddling or copy-pasting. Again, I hope to avoid this.

michaelsproul · 2014-10-31T04:23:15Z

Yeah, I don't mind a bit of merge conflict resolution.

Gankra · 2014-11-07T04:58:54Z

Sorry for the delay! Just landed The Big Break. We can start reasonably handling normal collections PRs again. Ready to review once you rebase!

michaelsproul · 2014-11-07T22:47:10Z

Hey! All done!

Quite a painless merge thanks to the magic of Git.

I gave up working on performance, as I think our energy would be better spent implementing a Patricia Trie with the collection views API in mind from the start. I've started digging in to the literature and I'll be starting soon (other workloads permitting).

Here are some benchmarks for TrieMap:

# Without optimisations
test trie::map::bench::bench_get                           ... bench:     68591 ns/iter (+/- 2629)
test trie::map::bench::bench_get_entry                     ... bench:    217937 ns/iter (+/- 5618)
test trie::map::bench::bench_insert_large                  ... bench:    689835 ns/iter (+/- 117282)
test trie::map::bench::bench_insert_large_entry            ... bench:    879627 ns/iter (+/- 15893)
test trie::map::bench::bench_insert_large_low_bits         ... bench:    814859 ns/iter (+/- 113885)
test trie::map::bench::bench_insert_small                  ... bench:    350994 ns/iter (+/- 6301)
test trie::map::bench::bench_insert_small_low_bits         ... bench:    490604 ns/iter (+/- 21042)
test trie::map::bench::bench_lower_bound                   ... bench:      5268 ns/iter (+/- 150)
test trie::map::bench::bench_remove                        ... bench:    815333 ns/iter (+/- 22148)
test trie::map::bench::bench_remove_entry                  ... bench:   1033120 ns/iter (+/- 95156)

# With optimisations
test trie::map::bench::bench_get                           ... bench:      7129 ns/iter (+/- 485)
test trie::map::bench::bench_get_entry                     ... bench:     38037 ns/iter (+/- 5207)
test trie::map::bench::bench_insert_large                  ... bench:    546967 ns/iter (+/- 16281)
test trie::map::bench::bench_insert_large_entry            ... bench:    591403 ns/iter (+/- 34081)
test trie::map::bench::bench_insert_large_low_bits         ... bench:    442047 ns/iter (+/- 14441)
test trie::map::bench::bench_insert_small                  ... bench:    204656 ns/iter (+/- 16639)
test trie::map::bench::bench_insert_small_low_bits         ... bench:    133403 ns/iter (+/- 6810)
test trie::map::bench::bench_lower_bound                   ... bench:       556 ns/iter (+/- 20)
test trie::map::bench::bench_remove                        ... bench:    168592 ns/iter (+/- 89249)
test trie::map::bench::bench_remove_entry                  ... bench:    200875 ns/iter (+/- 2574)

Gankra · 2014-11-08T15:27:10Z

src/libcollections/trie/map.rs

-                         key, value, 1);
-        if ret.is_none() { self.length += 1 }
-        ret
+        let (_, old_val) = insert(  &mut self.root.count,


Absolutely trivial style nit: any reason everything in the parens has an extra space? tab-alignment?

Yeah, that was for tab alignment. Could just flow it.

Gankra · 2014-11-08T19:20:50Z

The transmutes are at best unidiomatic, I would convert them to only casts where possible (or whatever macro crazyness was suggested).

Gankra · 2014-11-09T03:10:45Z

src/libcollections/trie/map.rs

+        // Extract the value from the leaf-node of interest.
+        let leaf_node = mem::replace(search_stack.pop_ref(), Nothing);
+
+        let value: T = match leaf_node {


Is this type annotation necessary?

Nope, but I like an occasional type annotation...

shrug It's probably fine.

Gankra · 2014-11-09T03:43:33Z

@michaelsproul I've reviewed everything. If you're uncomfortable with fiddling with the transmutes, I think it's fine if you just put in a FIXME to the effect of "these transmutes might be invoking UB, and are unidiomatic at best; find a way to remove them".

When you think you've addressed everything, post a new comment.

Gankra · 2014-11-09T03:46:22Z

Oh: and I'm not super confident about all the re-checking you do on the key. I think it's fine for the entries to assume they were initialized correctly as long as failing to do so only produces wrong results, and not unsafe results.

bstrie · 2014-11-09T17:24:26Z

src/libcollections/lib.rs

@@ -24,7 +24,7 @@

 #![allow(unknown_features)]
 #![feature(macro_rules, default_type_params, phase, globs)]
-#![feature(unsafe_destructor, import_shadowing, slicing_syntax)]
+#![feature(unsafe_destructor, import_shadowing, slicing_syntax, if_let)]


In general I'd prefer to avoid introducing even more gated features to the stdlib for the moment. I mean, yeah, it's already got a huge pile of gated features and we're probably going to ungate if-let for 1.0, so I'm not going to reject on these grounds... but it's still something to keep in mind.

Ok, I'll take it out.

Gankra · 2014-11-09T18:40:51Z

OK everything LGTM now, I think. @bstrie your call. @bstrie can take it from here.

Gankra · 2014-11-09T19:40:27Z

Punting transmute stuff to #18817

bstrie · 2014-11-10T00:56:29Z

src/libcollections/trie/map.rs

+        // While no appropriate slot is found, keep descending down the Trie,
+        // adding nodes to the search stack.
+        let search_successful: bool;
+        unsafe {


Which of the calls in the following block are actually unsafe? If nothing else, could we move this inside the loop?

The call to next_child is unsafe, so we could only put the unsafe block inside the loop.

Alternatively, we could make next_child provide a safe interface...

bstrie · 2014-11-10T01:31:40Z

Looks good to me, r+ waiting on getting an answer to those last two questions.

michaelsproul · 2014-11-10T04:26:58Z

Waiting on your thoughts re: unsafety of next_child before I do a rebase.

bstrie · 2014-11-10T13:59:16Z

src/libcollections/trie/map.rs

+        let search_successful: bool;
+        loop {
+            unsafe {
+                match next_child(search_stack.peek(), key, search_stack.length) {


If next_child is the only unsafe call, then I'd prefer to reduce the scope of the unsafe like so:

match unsafe { next_child(search_stack.peek(), key, search_stack.length) } {

This makes it much more obvious where the unsafe bits lie, and precludes more unsafety from sneaking into the match body later. Also, I'm a bit of a stickler for shrinking unsafe scopes. :) This is my last criticism, I swear!

bstrie · 2014-11-10T14:49:23Z

I lied! One more minor nit: I think it's a shame that changing those panic!()s to unreachable!()s caused us to lose information regarding the invariants that were violated to reach the unreachable code. I've filed #18842 to see if we can give unreachable!() the same custom error string reporting capability as panic!(). In the meantime, can you take the former panic!() strings (e.g. "Invalid SearchStack") and move them into comments next to the unreachable!() lines, for the benefit of future developers?

bstrie · 2014-11-10T18:24:19Z

Thanks!

michaelsproul · 2014-11-10T18:28:26Z

Thank you!

Should I do a rebase? There are lots of messy commits.

bstrie · 2014-11-10T18:31:03Z

Sure thing.

Gankra · 2014-11-10T18:39:32Z

Thanks a bunch for persisting on this!

😍

michaelsproul · 2014-11-10T18:42:15Z

@gankro: It was an absolute pleasure 😄

bstrie · 2014-11-10T18:59:59Z

Thanks! 💃

…=bstrie I've implemented the new collection views API for TrieMap. I more or less followed the approach set out by @gankro in BTreeMap, by using a `SearchStack`. There's quite a bit of unsafe code, but I've wrapped it safely where I think is appropriate. I've added tests to ensure everything works, and performance seems quite good. ``` test trie::bench_map::bench_find ... bench: 67879 ns/iter (+/- 4192) test trie::bench_map::bench_find_entry ... bench: 186814 ns/iter (+/- 18748) test trie::bench_map::bench_insert_large ... bench: 716612 ns/iter (+/- 160121) test trie::bench_map::bench_insert_large_entry ... bench: 851219 ns/iter (+/- 20331) test trie::bench_map::bench_remove ... bench: 838856 ns/iter (+/- 27998) test trie::bench_map::bench_remove_entry ... bench: 981711 ns/iter (+/- 53046) ``` Using an entry is slow compared to a plain find, but is only ~15% slower for inserts and removes, which is where this API is most useful. I'm tempted to remove the standalone `remove` function in favour of an entry-based approach (to cut down on complexity). I've added some more comments to the general part of the code-base, which will hopefully help the next person looking over this. I moved the three key structures to the top of the file so that the nesting structure is clearly visible, and renamed `Child<T>` to `TrieNode<T>` and `TrieNode<T>` to `InternalNode<T>` to improve clarity. If these changes are creeping, I'm happy to revert them. Let me know if my use of `fail!` is ok, I was a little unsure of how specific to be. Some of the data-structures have various invariants that shouldn't be broken, so using `fail!` seemed appropriate. ## Still to do * Modernise iterators (make them double-ended). * Make the keys generic, or rename this data-structure (see: #14902). * Possibly move this code out of libcollections. [Searching Github for TrieMap turns up very few real results.][triemap-search] Related issues: #18009 and #17320 [triemap-search]: https://github.com/search?utf8=%E2%9C%93&q=TrieMap+language%3ARust&type=Code&ref=searchresults

Gankra reviewed Oct 25, 2014
View reviewed changes

Gankra reviewed Nov 8, 2014
View reviewed changes

Gankra reviewed Nov 9, 2014
View reviewed changes

bstrie reviewed Nov 9, 2014
View reviewed changes

bstrie reviewed Nov 10, 2014
View reviewed changes

bstrie mentioned this pull request Nov 10, 2014

Add an optional string parameter to the unreachable!() macro to communicate which invariant was violated #18842

Closed

Implement collection views API for TrieMap.

f52e2bd

bors closed this Nov 10, 2014

bors merged commit f52e2bd into rust-lang:master Nov 10, 2014

michaelsproul deleted the triemap-collection-views branch November 10, 2014 22:44

Implement the collection views API for TrieMap. #18287

Implement the collection views API for TrieMap. #18287

Conversation

michaelsproul commented Oct 24, 2014

Still to do

rust-highfive commented Oct 24, 2014

Gankra commented Oct 24, 2014

michaelsproul commented Oct 24, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Oct 25, 2014

michaelsproul commented Oct 25, 2014

michaelsproul commented Oct 26, 2014

Gankra commented Oct 31, 2014

michaelsproul commented Oct 31, 2014

Gankra commented Nov 7, 2014

michaelsproul commented Nov 7, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Nov 8, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Nov 9, 2014

Gankra commented Nov 9, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Nov 9, 2014

Gankra commented Nov 9, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bstrie commented Nov 10, 2014

michaelsproul commented Nov 10, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bstrie commented Nov 10, 2014

bstrie commented Nov 10, 2014

michaelsproul commented Nov 10, 2014

bstrie commented Nov 10, 2014

Gankra commented Nov 10, 2014

michaelsproul commented Nov 10, 2014

bstrie commented Nov 10, 2014