Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for collections reform part 2 (RFC 509) #19986

Closed
aturon opened this Issue Dec 18, 2014 · 55 comments

Comments

Projects
None yet
@aturon
Copy link
Member

aturon commented Dec 18, 2014

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Dec 18, 2014

cc @Gankro

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Dec 19, 2014

TODO:

Easy

Moderate

  • add append to:
    • DList
    • Vec
    • RingBuf
    • BTreeMap
    • BTreeSet
    • Bitv
    • BitvSet
    • VecMap
  • add split_off:
    • DList
    • Vec
    • RingBuf
    • BTreeMap
    • BTreeSet
    • Bitv
    • BitvSet
    • VecMap
  • Upgrade the Entry API - has PR
  • Add truncate and resize to RingBuf (not specified, but makes sense to do) - has PR
  • Add swap_pop to RingBuf (not specified, but makes sense to do) - has PR
@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Dec 19, 2014

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Dec 19, 2014

Note: I'm about to post a PR that does the deprecation for Vec methods.

@cgaebel

This comment has been minimized.

Copy link
Contributor

cgaebel commented Dec 19, 2014

cc @bfops I think you were working on upgrading the entry api?

@csouth3

This comment has been minimized.

Copy link
Contributor

csouth3 commented Dec 19, 2014

I can take care of resize for Vec.

@bfops

This comment has been minimized.

Copy link
Contributor

bfops commented Dec 20, 2014

Yeah I'm working on the entry API. Seems to be coming along!

@pczarn

This comment has been minimized.

Copy link
Contributor

pczarn commented Dec 21, 2014

I'll start work on these methods for RingBuf:

Add truncate and resize to RingBuf (not specified, but makes sense to do)
Add swap_pop to RingBuf (not specified, but makes sense to do)

Did you mean swap_remove?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Dec 21, 2014

Yes, sorry.

@gamazeps

This comment has been minimized.

Copy link
Contributor

gamazeps commented Dec 21, 2014

I'm starting the misc stabilization, should be over in a 2-3 days :)

@csouth3

This comment has been minimized.

Copy link
Contributor

csouth3 commented Dec 21, 2014

So, some of the misc stuff is done in #20053, but I didn't touch Bitv::{get,set}, didn't touch HashMap's and HashSet's iterators (either the documentation, or actually implementing DoubleEndedIterator and ExactSizeIterator for all of the appropriate iterators using the next_back implemented as next strategy noted in the RFC (if this is even what the RFC intends)), and I didn't attempt the tedium of moving std::vec to std::collections::vec, sooo....there's certainly plenty of misc left to do but wanted to give you heads up so your time doesn't end up getting wasted duplicating work :)

@gamazeps

This comment has been minimized.

Copy link
Contributor

gamazeps commented Dec 21, 2014

thx :)

@pczarn

This comment has been minimized.

Copy link
Contributor

pczarn commented Dec 22, 2014

Oh, did you mean swap_back_remove and swap_front_remove? What about grow for ringbuf?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Dec 22, 2014

@pczarn That's a good point. I'm inclined to say "follow your heart"? Definitely not grow, since that's being deprecated on Vec itself. Maybe. Probably.

@bfops

This comment has been minimized.

Copy link
Contributor

bfops commented Dec 23, 2014

#20163 mostly upgrades the entry API for HashMap

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 24, 2014

Wait why would you want to impl .next_back for the hashmap iterators? That doesn't make sense.

@pczarn

This comment has been minimized.

Copy link
Contributor

pczarn commented Dec 24, 2014

@bluss Implementing ExactSize requires DoubleEndedIterator. See #19327

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 24, 2014

ExactSizeIterator is a hack I proposed so that .enumerate() could be used back to front on slices. I'm unhappy to see it infect everything. Why impl that trait, and if we want it in general, isn't it time to untie it from double ended iteration?

@csouth3

This comment has been minimized.

Copy link
Contributor

csouth3 commented Dec 24, 2014

Yeah, that is actually straight from the RFC:

Explicitly specify HashMap's iterators to be non-deterministic between iterations.
This would allow e.g. next_back to be implemented as next, reducing code complexity.

The reason is so we can impl ExactSizeIterator (maybe?) as pczarn says (I suppose because we want rposition and len?). Though if so I agree that it would probably be nice to untie it from DoubleEnded if possible since the bound in this case imposes an undesirable restriction (of course that would mean we'd have to find a new home for rposition since that uses next_back().

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 24, 2014

It's not explicitly deciding anything about ExactSizeIterator

@csouth3

This comment has been minimized.

Copy link
Contributor

csouth3 commented Dec 24, 2014

You are correct. To be completely clear, I just speculated that because the referenced PR is the origin of the next_back/next discussion. Ultimately, beyond deciding on that issue, the RFC doesn't detail exactly what to do with that though, so I defer to the powers that be for what action to actually take here.

@TimDumol

This comment has been minimized.

Copy link
Contributor

TimDumol commented Jan 1, 2015

I gave a shot at implementing append() and split_off() for DList over here: #20406

bors added a commit that referenced this issue Jan 5, 2015

auto merge of #20197 : pczarn/rust/ring_buf-collections-reform, r=Gankro
Part of collections reform part 1 and 2, #18424 and #19986

* shrink_to_fit
* swap_back_remove
* swap_front_remove
* truncate
* resize

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jan 6, 2015

rollup merge of rust-lang#20197: pczarn/ring_buf-collections-reform
Part of collections reform part 1 and 2, rust-lang#18424 and rust-lang#19986

* shrink_to_fit
* swap_back_remove
* swap_front_remove
* truncate
* resize

bors added a commit that referenced this issue Jan 11, 2015

auto merge of #20406 : TimDumol/rust/dlist-append-split-off, r=Gankro
Implements the `append()` and `split_off()` methods proposed by the collections reform part 2 RFC.

RFC: rust-lang/rfcs#509
Tracking issue: #19986
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 18, 2016

I believe this has all since been implemented, so closing.

@jooert

This comment has been minimized.

Copy link
Contributor

jooert commented Feb 18, 2016

Actually append and split_off have not been implemented for BTreeMap and BTreeSet yet.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 18, 2016

Ah good point.

@alexcrichton alexcrichton reopened this Feb 18, 2016

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 22, 2016

I've started working on append and split_off methods for BTreeMap and BTreeSet.

@apasel422

This comment has been minimized.

Copy link
Member

apasel422 commented Apr 22, 2016

@xosmig #32466 already covers BTreeMap::append.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 22, 2016

@apasel422 OK. Then only split_off.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 28, 2016

Hello!
I've faced with problems.
To recalc sizes of trees after split_off using O(log n) operations, it's necessary to know exact size of each subtree.
So, there are some different solutions:

  1. add size field to the LeafNode structure.

I think, benefits are great.

First, it lets us to do very flexible splits and merges with only O(log n) operations.
Now it's possible to merge trees using O(log n) operations, but it it not so interesting without fast split.

Second, some other interesting and usefull functions need it.
For example: kth_element or "get the number of elements between two values". Both with O(log n) operations.

Third, as I understand (but i am not totally sure), it lets us to implement the rope ( https://en.wikipedia.org/wiki/Rope_(data_structure) ) in easy way. Just as BTreeMap<(), V>.

But overhead is notable (4 or 8 bytes for each node, depending on machine word size). I think, it is possible to embed this sizes without memory overhead for 64-bit machines (due to aligning).

  1. Implement split_off with O(log n + "size of the one of the results")

2.1) Split with O(log n) operations and then calculate size of the one of the results.

2.2) There is another algorithm which doesn't use the fast algorithm with the same complexity, but (as I understand) with greater constant.

  1. Implement split_off with O(log n + "size of the smallest one") but with greater constant

Split using O(log n) operations and then calculate sizes of the trees via 2 alternating iterators, but stop when one of the sizes is counted.

Which implementation should I use?

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 29, 2016

@apasel422, what do you think?

@apasel422

This comment has been minimized.

Copy link
Member

apasel422 commented Apr 29, 2016

@xosmig This isn't my area of expertise, but I'd recommend opening a discussion thread on https://internals.rust-lang.org/ or submitting a pull request for whichever option is easiest and CCing @jooert and @gereeter. You might also want to take a look at #26227.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 29, 2016

@apasel422 The main question is "Is this overhead okay or not?". So, who can decide it?

@gereeter

This comment has been minimized.

Copy link
Contributor

gereeter commented Apr 29, 2016

First, note that option one can be slightly optimized by only storing composite sized in internal nodes.

I'm not a big fan of option one - I expect split_off to be a fairly rare operation, and slowing down everything else and increasing the memory footprint solely for its benefit does not seem like a good idea. Since we explicitly support large collections (as large as can fit in memory, basically), the size field would have to be a usize, which cannot be combined with other fields. The application to ropes is interesting, but really wants a separate implementation - the BTreeMap interface is totally wrong for the job (though the node module might be useful). kth_element and measuring the length of a range seems useful, but again, I see them as niche use cases that really want a specialized sequence data structure.

I don't have strong feelings between options two and three. I think that a variation on option two, where the choice of which side to count is made based on which edge was descended at the root node, would probably be the best. It doesn't have the overhead of option three, but still wont take long if you split near the edges.

There is also a fourth option, namely dropping support for constant time calls to len(). This is a somewhat more radical change, but it would easily allow for split_off in O(log n) and avoid the question of how to calculate size completely. Moreover, while it might inconvenience users, it is easy for users to wrap a BTreeMap with something that does track length, recovering the behavior for anyone who actually needs accurate lengths. I suspect that there is much higher demand for is_empty than there is for len.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 30, 2016

@gereeter
About rope: it can be implemented as BTree with an implicit key. It needs just 2 more usefull methods for BTreeMap: split_nth (splits off the first number of elements) and erase_nth.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 30, 2016

@gereeter
"First, note that option one can be slightly optimized by only storing composite sized in internal nodes."
Yes, that's realy great point! There are about (a rough estimate, in fact, on average, more) 6 / 7 leafs. So only 1 / 7 of all nodes should contain 1 extra usize. Also there are about n / (1.5B) = n / 9 nodes on the average, where n is a number of elements in the container. So, only about 1 / 63 n extra usizes. It's only about 1 bit per element on a 64-bit machine and a half bit on a 32-bit machine. It's nothing, isn't it?
Now the size of InternalNode is about 11 * (size_of::<V>() + size_of::<K>()) + 112 on a 64-bit architecture. So, 8 extra bytes would be okay I think.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 30, 2016

If I didn't make any blunder, I am assured that each Internal node should contain itselfs subtree size.

@xosmig

This comment has been minimized.

Copy link
Contributor

xosmig commented Apr 30, 2016

There are some average values above. The worst case is: (1 / 5)n nodes and 1 / 6 internal nodes. So (1 / 30)n extra usizes, about 2 extra bits per key on a 64-bit architecture and about 1 extra bit per key on a 32-bit architecture. It's also nothing.

bors added a commit that referenced this issue Jun 2, 2016

Auto merge of #33947 - xosmig:btree_split_off, r=gereeter
Implement split_off for BTreeMap and BTreeSet (RFC 509)

Fixes #19986 and refactors common with append methods.
It splits the tree with O(log n) operations and then calculates sizes by traversing the lower one.

CC @gereeter

@bors bors closed this in #33947 Jun 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.