Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upImplement append and split_off for BTreeMap and BTreeSet #26227
Conversation
rust-highfive
assigned
aturon
Jun 11, 2015
This comment has been minimized.
This comment has been minimized.
|
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
bluss
reviewed
Jun 12, 2015
| reason = "recently added as part of collections reform 2")] | ||
| pub fn append(&mut self, other: &mut Self) { | ||
| // Read all values from `other` into `self`. | ||
| // The use of `ptr::read` is safe as we clear `other` afterwards |
This comment has been minimized.
This comment has been minimized.
bluss
Jun 12, 2015
Contributor
This doesn't look safe -- clear will run drop on all keys & values, so it leads to double drop.
This comment has been minimized.
This comment has been minimized.
bluss
reviewed
Jun 12, 2015
| /// ``` | ||
| #[unstable(feature = "btree_append_split_off", | ||
| reason = "recently added as part of collections reform 2")] | ||
| pub fn split_off<B: Borrow<K>>(&mut self, at: B) -> Self { |
This comment has been minimized.
This comment has been minimized.
bluss
Jun 12, 2015
Contributor
The convention in the file is to use Q for this type parameter, and take &Q. Consistency will make it easier for users to understand docs too.
Actually let it be K: Borrow<Q>. For each comparison, call .borrow() on the K side, i.e the maps' own keys.
This comment has been minimized.
This comment has been minimized.
jooert
Jun 12, 2015
Author
Contributor
Ok, I used the form specified in RFC 509. I'm happy to change it, but I don't quite get how your proposed signature would look like?
This comment has been minimized.
This comment has been minimized.
bluss
Jun 12, 2015
Contributor
Hm then I wonder why the RFC specified it like that. Maybe old BorrowFrom origins?
I propose
pub fn split_off<Q: ?Sized>(&mut self, at: &Q) -> Self
where K: Borrow<Q>, Q: OrdThis should be just the same as what fn get does. If it's not, then I typed it out wrong.
bluss
reviewed
Jun 12, 2015
| let mut should_swap = false; | ||
| { | ||
| // Using `unwrap` is safe because `self.len()` > 0. | ||
| if at <= self.keys().next().unwrap() { |
This comment has been minimized.
This comment has been minimized.
bluss
Jun 12, 2015
Contributor
Documenting unwraps like this is great, but I would avoid using “safe”. Panic is well defined and memory safe, and there's a lot of unsafe blocks in the file, so it's better to avoid the word safe when it's not about memory safety.
This comment has been minimized.
This comment has been minimized.
|
I'm not familiar enough with the search stack code, so I'd love if that part were reviewed by some one else. |
jooert
force-pushed the
jooert:btree_append_split_off
branch
from
bd1f270
to
6d959cc
Jun 12, 2015
This comment has been minimized.
This comment has been minimized.
|
Updated with fixes to bluss's remarks |
This comment has been minimized.
This comment has been minimized.
|
r? @bluss (I'm handing this off officially, since you've started reviewing it anyway!) |
rust-highfive
assigned
bluss
and unassigned
aturon
Jun 12, 2015
This comment has been minimized.
This comment has been minimized.
|
@glaebhoerl is the most experienced with the current design, but I'll be happy to review this a bit later. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro I think that must have been a typo? |
This comment has been minimized.
This comment has been minimized.
|
Oops, yes I meant @gereeter |
This comment has been minimized.
This comment has been minimized.
|
Thanks, the parts I looked at look great now |
gereeter
reviewed
Jun 13, 2015
| { | ||
| // `unwrap` won't panic because `self.len()` > 0. | ||
| if at <= self.keys().next().unwrap().borrow() { | ||
| should_swap = true; |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This looks like this should work fine, but I'm not sure about the algorithms. |
This comment has been minimized.
This comment has been minimized.
|
First, thanks everyone for review and input, this is great! |
This comment has been minimized.
This comment has been minimized.
|
I can't seem to find any references talking about splitting or merging B-Trees, unfortunately - it just isn't an operation that most users have to do. For For Suppose we want to split the following tree at 7:
We start at the root, looking for where seven would go:
Since that point is in the middle of the root, we split the root into two piece, one larger and one smaller:
Once we've done that, we move on to the next node searching for 7:
At that point, as in the root node, we split the node into a greater part and a lesster part:
Since that node we just split was a leaf node (also, because we actually found our splitting key), we don't need to do any more splitting, and we are left with two trees, one greater than our key and the other less than our key. There is still some more work involved, as this splitting process probably left many of the nodes that we split underfull, requiring steps to recoalesce nodes, but once that is done, the B-Tree is split. Note that since this just goes up and down to search path the the node, it only take |
This comment has been minimized.
This comment has been minimized.
|
Note: Since the splitting and merging algorithms are fairly involved and badly documented, I would not at all be opposed to merging this PR as is and opening performance issues to use the better algorithms. |
This comment has been minimized.
This comment has been minimized.
|
That sounds interesting, thank you for the explanation! I will try to implement something like that. |
This comment has been minimized.
This comment has been minimized.
|
|
jooert
force-pushed the
jooert:btree_append_split_off
branch
from
6d959cc
to
8026390
Jun 27, 2015
This comment has been minimized.
This comment has been minimized.
|
I pushed a linear time implementation of |
gereeter
reviewed
Jun 27, 2015
|
|
||
| // Second, we build a tree from the sorted sequence in linear time. | ||
| self.length = elements.len(); | ||
| let (depth, root) = Node::from_sorted_iter(elements.into_iter(), self.length, self_b); |
This comment has been minimized.
This comment has been minimized.
gereeter
Jun 27, 2015
Contributor
If you can build from an iterator, it would be more efficient to make an iterator that merges two iterators instead of going through a Vec.
This comment has been minimized.
This comment has been minimized.
jooert
Jun 27, 2015
Author
Contributor
Yeah, I don't like allocating a new Vec here, too, but for the algorithm in from_sorted_iter to work, I have to know how many elements I have after the merge.
This comment has been minimized.
This comment has been minimized.
gereeter
Jun 27, 2015
Contributor
Bleh, I hadn't though of the equal key case. It still might be possible and not too difficult, as in the worst case, only the "right edge" of the BTree will be left underfull, and every underfull node except for the root (which, if I remember correctly, is allowed to be underfull) is adjacent to a completely full node, allowing a simple steal to fix things.
gereeter
reviewed
Jun 27, 2015
| (Some(self_element), Some(other_element)) => { | ||
| // Check which elements comes first and only advance the corresponding iterator. | ||
| // If two keys are equal, take the value from `other`. | ||
| if self_element.0 < other_element.0 { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
gereeter
reviewed
Jun 27, 2015
| impl<L, U> Level<L, U> { | ||
| fn new_node(&mut self, node: Node<L, U>, capacity: usize, minimum: usize) { | ||
| // Make the new node the current node for this level. | ||
| mem::replace(&mut self.cur_node, Some(node)); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
gereeter
reviewed
Jun 27, 2015
| // Compute level structure | ||
| let capacity = capacity_from_b(b); | ||
| let minimum = min_load_from_capacity(capacity); | ||
| let mut num_elements = len; |
This comment has been minimized.
This comment has been minimized.
gereeter
Jun 27, 2015
Contributor
Instead of passing this in, you could just require that I: ExactSizeIterator and call .len().
This comment has been minimized.
This comment has been minimized.
gereeter
reviewed
Jun 27, 2015
| // Allocate memory for the worst-case number of levels. | ||
| // let max_height = (0.5*((num_elements + 1) as f64)).log(minimum as f64) as usize; | ||
| // let mut levels = Vec::with_capacity(max_height); | ||
| let mut levels = Vec::new(); |
This comment has been minimized.
This comment has been minimized.
gereeter
Jun 27, 2015
Contributor
It might help to have a comment explaining that the leaf nodes come first and the root comes last.
gereeter
reviewed
Jun 27, 2015
|
|
||
| // Extract current node to insert it as edge on the level above | ||
| let new_edge = unsafe { | ||
| mem::replace(&mut levels.get_unchecked_mut(num_level).cur_node, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
gereeter
reviewed
Jun 27, 2015
|
|
||
| loop { | ||
| // Determine how many nodes we need on this level. | ||
| let num_nodes = num_elements / (capacity + 1) + 1; |
This comment has been minimized.
This comment has been minimized.
arthurprs
reviewed
Jul 18, 2015
| let mut num_elements = len; | ||
| // Allocate memory for the worst-case number of levels. | ||
| // let max_height = (0.5*((num_elements + 1) as f64)).log(minimum as f64) as usize; | ||
| // let mut levels = Vec::with_capacity(max_height); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
jooert
Jul 20, 2015
Author
Contributor
Yes, the idea was to minimize the number of allocations. But this isn't possible with the changes I've pushed just yet.
This comment has been minimized.
This comment has been minimized.
|
@jooert what's the status of this PR? |
jooert
added some commits
May 21, 2015
jooert
force-pushed the
jooert:btree_append_split_off
branch
from
8026390
to
5f1a116
Jul 20, 2015
This comment has been minimized.
This comment has been minimized.
|
@Gankro Sorry for the very long delay and thanks for the kind words! I've just pushed a new version of I haven't started implementing the |
This comment has been minimized.
This comment has been minimized.
|
@jooert So I've become increasingly convinced that BTree needs to be refactored to use parent pointers (parent_ptr, edge_index). This would remove all allocations except for Is this something that you think would simplify your code? Something that you'd be interested in doing? Note that one can actually implement parent pointers but not bother to replace all the search stacks to start. That is it's theoretically possible for the two to co-exist temporarily. |
This comment has been minimized.
This comment has been minimized.
|
Parent pointers would simplify my code in so far as I wouldn't need to keep track of the different levels of the tree using the |
This comment has been minimized.
This comment has been minimized.
|
To my knowledge, it's a performance slam dunk. This is the strategy used in Google's https://code.google.com/p/cpp-btree/ |
This comment has been minimized.
This comment has been minimized.
|
@jooert Possibly stupid question. You said two pointers because we'd need not only a "parent pointer" but also an "index in parent"? We can always use u8 for len/cap/parent_index and move those to the end of the struct. That'd save quite a bit of space. The google implementation linked by @Gankro uses u8 as default. |
This comment has been minimized.
This comment has been minimized.
|
@arthurprs Yes, exactly. |
This comment has been minimized.
This comment has been minimized.
|
Rust will pad the struct to have a size that's a multiple of its alignment, though. So having using a u8 doesn't actually save you anything if it will just be rounded up to what a u64 would have done. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro not really, it depends where in the struct you want to have your smaller types. If you stick a u8 in between 2 usizes it'll have 7 bytes of padding to allow aligned access on the second. At least in Cish standard ABI. What I'm proposing is sticking them at the end. Example: http://is.gd/b7ChEQ This way the new Node will have the same size as the current one (40 bytes in 64bit builds), the only limitation is having a B <= 255. If that's not enough we can use u16s instead and still keep the 40 bytes size, not true for 32bit builds though. |
This comment has been minimized.
This comment has been minimized.
|
Yes if you fold other values to be smaller, you will get savings. I was assuming you were just suggesting only making one field a u8 (which would be useless). |
This comment has been minimized.
This comment has been minimized.
|
Note that in a defunct PR I removed the ability to set B at all, so we can safely just make it a constant that "happens" to work. (the current default has always seemed fine, and gains from changing it are small to trivial). |
This comment has been minimized.
This comment has been minimized.
|
Cool, so if we ever go this route (parent pointers) we should consider using these smaller integers types to save space. |
This comment has been minimized.
This comment has been minimized.
|
Is anyone working on the parent pointer implementation? |
This comment has been minimized.
This comment has been minimized.
|
Am 08.08.2015 um 21:25 schrieb Andrew Paseltiner:
|
apasel422
referenced this pull request
Aug 17, 2015
Closed
Change `BTreeMap` to use parent pointers #27865
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
What's the status of this? Blocked on the parent pointer rewrite? Ready to go with a rebase? Outstanding concerns? |
This comment has been minimized.
This comment has been minimized.
|
I think this should wait pending the rewrite at https://github.com/gereeter/btree-rewrite. |
This comment has been minimized.
This comment has been minimized.
|
Ah ok, in that case I'm gonna close this for now (clearing out the queue). |
jooert commentedJun 11, 2015
Changes the internal SearchStack API to return the key on removal as well.