Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drain() doc of vector (and others) is still unclear #92765

Closed
ssomers opened this issue Jan 11, 2022 · 9 comments · Fixed by #92902
Closed

drain() doc of vector (and others) is still unclear #92765

ssomers opened this issue Jan 11, 2022 · 9 comments · Fixed by #92902
Labels
A-collections Area: std::collections. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ssomers
Copy link
Contributor

ssomers commented Jan 11, 2022

The description of Vec::drain:

  • states that it "creates a draining iterator", leading to expectations that each iteration drains one element, that stopping the iteration should drain fewer elements because iterators are lazy (witnessed in Tracking issue for Vec::extract_if and LinkedList::extract_if #43244). The rest of the description counters that expectation, but by saying "when the iterator is dropped", there's again the suggestion that removal happens at that late stage. While conceptually it's simpler to state that the removal happens unconditionally and immediately, which corresponds to the verb "drain", like BinaryHeap::drain and HashMap::drain do.
  • says "unspecified how many elements are removed" if you circumvent dropping, which suggests it's only removing elements in the range like the rest of the text, but in reality the whole vector is slaughtered (Improve the documentation for Vec::drain #74652 clarified one aspect of this case but slightly muddled this one).
  • lists the possibility to use mem::forget as if that is just one of the ways to use the iterator, which makes the description harder to understand if you're not advanced in Rust.
  • does not say what happens to the elements removed but not yielded if the iterator is dropped before it is fully consumed. It makes most sense that they get dropped rather than leaked, but given what leeway Vec::drain grants itself elsewhere, it might be helpful to clarify. None of the other drain methods say this either (Perhaps improve the documentation of drain members further #93769). Vec::into_iter also doesn't but I believe the fact that the signature says it consumes the container implies it.

VecDeque::drain has the same issue (and got missed by #74652). String::drain probably too, it doesn't mention what may happen if you mem::forget the iterator.

@rustbot label A-collections A-docs C-enhancement T-libs-api

@rustbot rustbot added A-collections Area: std::collections. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 11, 2022
@steffahn
Copy link
Member

steffahn commented Feb 8, 2022

Follow-up to my remark and question here: #93769 (comment)

I feel like "creates a draining iterator" does more accurately describe the situation. On second thought, and looking at the existing documentation you linked, I do actually agree that those are easier to understand. The tracking issue you link is quite long, so I don't feel like reading through all of it, but I would be interested if you could point out (some of) the concrete comments you're referring to where you witnessed any particular incorrect expectations.

Maybe the best solution could be the high-level "Removes the specified range from the vector, returning all removed elements as an iterator"-style documentation in combination with an additional remark later in the documentation that explains why the iterator does nonetheless keep a mutable borrow on the original data-structure for the duration of its existence. So feel free to ignore this point in your PRs if you want, if I come up with a good way of phrasing such an additional remark, I can make a follow-up PR later.

@steffahn
Copy link
Member

steffahn commented Feb 8, 2022

does not say what happens to the elements removed but not yielded if the iterator is dropped before it is fully consumed.

That seems like a non-issue to me. Conventionally, dropping one thing should never result in leakage of a different thing, unless (that's explicitly documented). To quote myself in a different reply in your PR:

AFAIK memory leaks in Rust are only supposed to happen when other things are already leaked (i.e. leaking one thing can result in more leakage) or by creating a reference-cycle in an Arc/Rc or similar structure, or by explicitly calling a function that is meant to leak data like Box::leak or mem::forget or using ManuallyDrop, etc.


This doesn't mean that I dislike your changes that explicitly call out the dropping, it doesn't hurt to be more explicit, I'm just saying the status quo is less imprecise that you might have assumed.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

where you witnessed any particular incorrect expectations

It starts even earlier than I thought: e.g. it's mentioned as an argument to let drain_filter do drain-on-drop and lead to rust-lang/rfcs#2369. The currently last post in #43244 most clearly lists the expectation. I feel this is due to the wording of "draining iterator".

an additional remark later in the documentation that explains why the iterator does nonetheless keep a mutable borrow

Good point.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

I'm just saying the status quo is less imprecise that you might have assumed.

For my 3rd bullet, I think it's imprecise (I'd use vague) because leaking is listed as one of the normal ways to use the iterator.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

For clarity, I'm not claiming with any authority that Vec::Drain does not do drain-on-drop. I'm saying its description (contract, design) does not require it to do anything on drop, apart from dropping unconsumed items and a vector buffer just like any IntoIter. The fact it (currently) takes a mutable borrow, allows it to do something more on drop, suggests that it does something more on drop, and the current implementation indeed does something more on drop, but what it does is hardly draining but rather patching the vector internals and returning their ownership to the vector itself. The implementation could be changed to not need the borrow at all, at the cost of an allocation (just not sure if the stable signature could be changed to remove the mutable borrow in a compatible way). I think it's a stretch to call this drain-on-drop but who knows what the term actually stands for.

My proposed BTreeMap::Drain is definitely not drain-on-drop by any standard, it doesn't need a mutable borrow (because a tree is split up in nodes whereas a vector has a single huge buffer).

@steffahn
Copy link
Member

steffahn commented Feb 9, 2022

It starts even earlier than I thought: e.g. it's mentioned as an argument to let drain_filter do drain-on-drop and lead to rust-lang/rfcs#2369. The currently last post in #43244 most clearly lists the expectation. I feel this is due to the wording of "draining iterator".

Interesting… maybe I should at some point take the time to actually look through that discussion in the tracking issue. There’s a lot of topics being discussed a once though.

For my 3rd bullet, I think it's imprecise (I'd use vague) because leaking is listed as one of the normal ways to use the iterator.

Leaking should never be a normal way to use the iterator. The current documentation of Vec's drain with the whole "if the iterator is dropped..., if the iterator is not dropped" does implicitly suggest that leaking the iterator is a normal interaction, so I guess my personal opinion would be that that documentation should be changed, in line with what I said here #93769 (comment)

Unlike other data types, of course Vec has no validity constraints (beyond the memory-safety relevant ones that can’t be broken by safe API anyways), so of course this option to leave the collection in an invalid state is actually not an option. Whether it is an option for those other data types with validity constraints is a different question (which – as you know – we’re discussing a bit in #93769).

For clarity, I'm not claiming with any authority that Vec::Drain does not do drain-on-drop. I'm saying its description (contract, design) does not require it to do anything on drop, apart from dropping unconsumed items and a vector buffer just like any IntoIter. The fact it (currently) takes a mutable borrow, allows it to do something on drop, suggests that it does something on drop, and the current implementation indeed does something on drop, but what it does is hardly draining but rather patching the vector internals and returning their ownership to the vector itself. The implementation could be changed to not need the borrow at all, at the cost of an allocation (just not sure if the stable signature could be changed to remove the mutable borrow in a compatible way). I think it's a stretch to call this drain-on-drop but who knows what the term actually stands for.

I fully agree with the observation, and as noted above, I also agree that the wording in terms of "Removes the specified range from the vector, returning all removed elements as an iterator" is easier to understand. I now also agree that this describes the intention of what drain does, and the fact that it can avoid additional allocations by having the iterator access the data structure internals (and patch up the remainder on drop) is basically just an implementation detail. I would not agree that we should ever actually consider to change the implementation by introducing an allocation. I feel like .drain methods are a great example of the strength of Rust to provide zero-overhead abstractions. The abstraction says "remove elements and return an iterator containing them", but the actual implementation saves all the allocation and extra copying steps that you would need for this kind of API in any programming language that isn’t Rust.

Of course for a hypothetical BTreeMap drain method, this kind of optimization is not possible, because there isn’t actually any memory that can be preserved either. E.g. its clear method just does *self = BTreeMap::new(), while clearing a Vec, HashMap, or BinaryHeap preserves the existing allocation and capacity. Arguably BTreeMap doesn’t even need a drain method, because you could just as well do mem::take(&mut foo).into_iter(). But by the same token it doesn’t need a clear method either (you can do *self = BTreeMap::new() yourself), so a drain method can make sense for convenience.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 10, 2022

Of course for a hypothetical BTreeMap drain method, this kind of optimization is not possible, because there isn’t actually any memory that can be preserved either.

Not true.

  • A map can have an allocated, empty root node or no root node at all. It's just a possibly tiny part and that's probably why it receives no attention.
  • Doing the same trick as in Vec::drain is possible, you can in theory avoid all allocations while owning the map internals. But it's excruciatingly difficult. And the biggest optimization is reusing subtrees with ranges of keys, which doesn't need any trickery.

E.g. its clear method just does *self = BTreeMap::new(), while clearing a Vec, HashMap, or BinaryHeap preserves the existing allocation and capacity.

BTreeMap's clear could preserve the root, just like HashMap's clear describes and does.

Arguably BTreeMap doesn’t even need a drain method, because you could just as well do mem::take(&mut foo).into_iter().

That's true for all drain methods without arguments. My proposed BTreeMap::drain (rotting away in #81075) takes a range argument, just like Vec's.

@steffahn
Copy link
Member

  • A map can have an allocated, empty root node or no root node at all.

Ah, okay I see.

ranges of keys

Oh, was this “proposed BTreeMap::drain” one that accepts a range?


That's true for all drain methods without arguments.

Well, but e.g. HashMap::drain explicitly documents that the underlying allocated memory is kept, which is not what happens on mem::take(&mut foo).into_iter().

@ssomers
Copy link
Contributor Author

ssomers commented Feb 10, 2022

Yes, BTreeMap::drain takes a range (it's buried as #81075).

Well, but e.g. HashMap::drain explicitly documents that the underlying allocated memory is kept

Another good point, I can't remember anyone bringing that up all the way since my first attempt at a rangeless drain, #66747.

@bors bors closed this as completed in 4fa71ed Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants