Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upfallible collection allocation 1.0 #2116
Conversation
Gankro
changed the title
fallible allocation 1.0
fallible collection allocation 1.0
Aug 18, 2017
This comment has been minimized.
This comment has been minimized.
|
I'm really sorry this isn't perfect. I am just deeply exhausted with working on this problem right now, and need to push out what I have just to get it out there and focus on something else for a bit. I'm not 100% convinced with all my "don't rock the boat" rationales for CollectionAllocErr, and could probably be very easily convinced to change that. It's just that my default stance on this kinda stuff is "don't touch anything, because the Servo team probably is relying on it in 17 different ways that will make me sad". |
This was referenced Aug 18, 2017
daurnimator
reviewed
Aug 18, 2017
|
|
||
| This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu. | ||
| Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle. |
This comment has been minimized.
This comment has been minimized.
daurnimator
Aug 18, 2017
Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.
This comment has been minimized.
This comment has been minimized.
|
I think the plan for unwinding is terrible. I could support the
There is no evidence of this not being considered practical. More in general, the server use case seems a little thin. I've already mentioned in the internals thread that there's a lot more to be considered there: servers come in different shapes and sizes. Considering one of Rust's 2017 goals is “Rust should be well-equipped for writing robust, high-scale servers” I think this use case (or, I'd like to argue, use cases) should be explored in more detail. Depending on unwinding for error handling is a terrible idea and entirely contrary to Rust best practices. This by itself should be listed under the “drawbacks” section. Besides being counteridiomatic, recovering from unwinding doesn't work well in at least three cases, two of which are not currently considered by the RFC:
|
This comment has been minimized.
This comment has been minimized.
rpjohnst
commented
Aug 18, 2017
|
Using unwinding to contain errors at task granularity is completely idiomatic. It's why Rust bothers to have unwinding at all. Allowing OOMs to panic in addition to their current behavior is totally in line with this. It's not a full solution, but it is a necessary part of one. |
This comment has been minimized.
This comment has been minimized.
|
Update: The suggestion was followed. No need to read rest of this comment (which I have left below the line) I suggest that the filename for this RFC be changed to something that isn't quite so subtle. (The current filename, "alloc-me-like-one-of-your-french-girls.md", is a meme/quote from the movie "Titanic"; I infer that reference is meant to bring to mind "fallibility", but I needed some help along the way.) |
Gankro
force-pushed the
Gankro:if-at-first-you-dont-alloc
branch
from
0687133
to
c1da9a1
Aug 18, 2017
pnkfelix
reviewed
Aug 18, 2017
|
|
||
| This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu. | ||
| Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle. |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
make sure that this previous comment on the earlier filename does not get lost in the shuffle; quoting here for completeness:
Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.
pnkfelix
reviewed
Aug 18, 2017
|
|
||
| Here unwinding is available, and seems to be the preferred solution, as it maximizes the chances of allocation failures bubbling out of whatever libraries are used. This is unlikely to be totally robust, but that's ok. | ||
|
|
||
| With unwinding there isn't any apparent use for an infallible allocation checker. |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
I didn't know what "infallible allocation checker" meant when I first read through this. The term does not occur, at least not as written, elsewhere in the document.
Is it meant to be a hypothetical tool listed above with "User Profile: Embedded", namely:
some system to prevent infallible allocations from ever being used
If so, maybe just add "we'll call this an 'infallible allocation checker' when the idea is first introduced, just to define local terminology?
This comment has been minimized.
This comment has been minimized.
Gankro
Aug 18, 2017
Author
Contributor
Yeah good catch (it's what I end up proposing in the first Future Work section)
pnkfelix
reviewed
Aug 18, 2017
|
|
||
| ## User Profile: Runtime | ||
|
|
||
| A garbage-collected runtime (such as SpiderMonkey or the Microsoft CLR), is generally expected to avoid crashing due to out-of-memory conditions. Different strategies and allocators are used for different situations here. Most notably, there are allocations on the GC heap for the running script, and allocations on the global heap for the actual runtime's own processing (e.g. performing a JIT compilation). |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
Are you using the word "crash" here in a narrow sense that just references to undefined-behavior and low-level errors like segmentation faults, etc? Or is it meant to include unchecked exceptions, which Java's OutOfMemoryError qualifies as? (I understand that you did not include the JVM in your list of example garbage-collected runtimes, but I think most reasonable people would include it as an example of one...)
But maybe I misunderstand the real point being made in this sentence, since you draw a distinction between allocations made for the script versus allocations for the internals of the runtime. I.e. is your point that even Java avoids crashing from out-of-memory conditions that arise from the runtime internals (like JIT compilation) ?
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
(I guess my previous comment is implicitly suggesting that you use a more specific word than "crash" in the first sentence. That, or add text elsewhere to the document that specifies what "crash" denotes in the context of this RFC.)
This comment has been minimized.
This comment has been minimized.
Gankro
Aug 18, 2017
Author
Contributor
Yes, I should be more clear. I mostly meant crashing due to runtime internals, but script stuff should also try to recover (e.g. triggering a GC when a script allocation fails and retrying). I ended up cutting all focus from the script side because, as I note just below, the script allocations aren't actually relevant to this RFC (AFAICT).
I didn't include the JVM because I had only found the time to interviewed SM and CLR people.
This comment has been minimized.
This comment has been minimized.
Only as a last resort, such that that one assertion failure doesn't take down your whole process accidentally. Unwinding should not be used for errors that are more or less expected and you know how to deal with. https://doc.rust-lang.org/stable/book/second-edition/ch09-03-to-panic-or-not-to-panic.html |
This comment has been minimized.
This comment has been minimized.
rpjohnst
commented
Aug 18, 2017
|
Yes, precisely. In many situations, allocation failure is unexpected and has no meaningful response at a granularity smaller than a task. This is a reason to support oom=panic rather than just abort. |
pnkfelix
reviewed
Aug 18, 2017
|
|
||
| ## try_reserve | ||
|
|
||
| `try_reserve` and `try_reserve_exact` would be added to `HashMap`, `Vec`, `String`, and `VecDeque`. These would have the exact same APIs as their infallible counterparts, except that OOM would be exposed as an error case, rather than a call to `Alloc::oom()`. They would have the following signatures: |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
Clarification request: You'll only be adding fn try_reserve_exact to the types that already supply fn reserve_exact, right?
This comment has been minimized.
This comment has been minimized.
Gankro
Aug 18, 2017
Author
Contributor
I'll be honest I was working off memory and thought reserve and reserve_exact were always together. If not, then yeah what you said.
pnkfelix
reviewed
Aug 18, 2017
| ``` | ||
| /// Tries to reserve capacity for at least `additional` more elements to be inserted | ||
| /// in the given `Vec<T>`. The collection may reserve more space to avoid | ||
| /// frequent reallocations. After calling `reserve`, capacity will be |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
The doc-comment should be further revised since this is the doc for fn try_reserve, not fn reserve. In particular, instead of saying "After calling reserve, capacity will ..." (which is not relevant to this function), you could instead say:
If
try_reservereturnsOk, capacity will be greater than or equal toself.len() + additional. If capacity is already sufficient, then returnsOk(with no side-effects to this collection).
pnkfelix
reviewed
Aug 18, 2017
|
|
||
| ## Eliminate the CapacityOverflow distinction | ||
|
|
||
| Collections could potentially just create an `AllocErr::Unsupported("capacity overflow")` and feed it to their allocator. Presumably this wouldn't do something bad to the allocator? Then the oom=abort flag could be used to completely control whether allocation failure is a panic or abort (for participating allocators). |
This comment has been minimized.
This comment has been minimized.
pnkfelix
Aug 18, 2017
Member
clever! (don't know if I like it, but had to give it credit nonetheless)
withoutboats
added
T-libs
T-lang
labels
Aug 18, 2017
This comment has been minimized.
This comment has been minimized.
|
@Gankro Thanks for all the hard work! However, I also don't like unwinding as error handling. Unwinding should only happen when I have signaled that I don't have anything to do to help; IMHO, it is damage control, rather than error handling. Have there been any proposals to add some sort of enum OOMOutcome {
Resolved(*const u8), // OOM was resolved and here is the allocation
// Could not resolve the OOM, so do what you have to
#[cfg(oom=panic)]
Panic,
#[cfg(oom=abort)]
Abort,
}
fn set_oom_handler<H>(handler: H)
where H: Fn(/* failed operation args... */) -> OOMOutcome;You would then call The benefit of this approach is that the existing interface doesn't have to change at all, and applications don't have choose to use An alternate approach would be to make the Yet another approach would be to make the OOM handler a type that implements a trait. Allocators would then be generic over their OOMHandler type. |
This comment has been minimized.
This comment has been minimized.
elahn
commented
Aug 19, 2017
|
Great idea, @mark-i-m. In a server app, I'd grab a chunk of memory on startup, then in my OOM handler:
On operating systems that error on allocation as opposed to access, this would remove the need to manually tune the app for request load as a function of memory consumption. Thanks @Gankro for your work on this RFC. |
This comment has been minimized.
This comment has been minimized.
|
@Gankro I have mixed feelings about this RFC. I think this is a design, it provides me with the ability to write a Still, it's much better than the current situation, and it doesn't have any drawbacks I can see as long as the guts are unstable (taking into account that application/firmware code will be written against this), so I'm favor of merging this even as-is. Thanks for writing this! |
This comment has been minimized.
This comment has been minimized.
|
Perhaps this could be an experimental RFC until we get experience with what doesn't work so well? That's seemed to work well in the past for the allocator interfaces... |
This comment has been minimized.
This comment has been minimized.
|
Given the constraints, I'd say this is good enough. I have only one question, what are the semantics of |
aturon
self-assigned this
Aug 22, 2017
This comment has been minimized.
This comment has been minimized.
arthurprs
commented
Aug 23, 2017
•
|
@gnzlbg that's implementation detail. The stdlib version, for example, is able to guarantee room for any combination of items in advance. Some variants though can't 100% guarantee room without knowing the dataset. Hopscotch and cuckoo hash tables come to my mind as examples. |
This comment has been minimized.
This comment has been minimized.
|
@arthurprs IIUC the whole point of Without this guarantee, one cannot avoid OOM at all, so why would I actually call So... I must be missing something, because as I understand it, this guarantee is not an implementation detail of (*) unless we add |
This comment has been minimized.
This comment has been minimized.
arthurprs
commented
Aug 23, 2017
•
|
I don't disagree with your overall idea, I'm just saying that it can't be done in ALL cases. |
This comment has been minimized.
This comment has been minimized.
@arthurprs This is why I was asking (and why I choose the HashMap as an example). I agree with you on this. This cannot be guaranteed for all collections. |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Jan 5, 2018
Yes, but I was pointing out that the performance degradation warning sign would be absent. Instead, this is a system where the race condition I described instantly results in data loss. |
This comment has been minimized.
This comment has been minimized.
stepancheg
commented
Jan 5, 2018
For “some” allocations calling alloc crate explicitly and/or implementing “FallibleVec” outside of stdlib would probably be enough. |
This comment has been minimized.
This comment has been minimized.
That's actually not entirely true. So long as there are no loops where a type In other words, an |
This comment has been minimized.
This comment has been minimized.
And why exactly is this better than having |
This comment has been minimized.
This comment has been minimized.
|
Try_reserve way has dead branches with panic, vs no panics, a minor disadvantage. But we don't have to pay the price of a second crate and duplicated implementation to have fallible collections! |
This comment has been minimized.
This comment has been minimized.
stepancheg
commented
Jan 5, 2018
I think it's too expensive to duplicate all of or large part of rust stdlib with
Even worse, as (And I agree with @gnzlbg that at RFC should be at least split into part which panic on OOM and part with adds About "minor ideological point". I'd like to insert famous quote: "Simple things should be simple, complex things should be possible." Most of programs (and libraries) won't ever need and won't support fallible allocations, crash on OOM is the best strategy for them, and having to deal with fallible allocations would be too much burden. And for specific situations when you need to gracefully handle OOM, low level API to allocate memory should be enough, and kernel-like developers could create specific libraries to use that API. |
This comment has been minimized.
This comment has been minimized.
This is reasonable. In general, I disagree vehemently with unwinding catching as an error handling strategy, but its silly that OOM is somehow a different type of failure. It's even sillier that currently, we also abort on invalid layout because of the laxity of
You're arguing against your own point? It is now demonstrated that this is not expensive at all: eddyb/rust@future-box...QuiltOS:allocator-error
Totally agreed! This is why I use the allocator type to enforce that this won't happen. (I do want to add a nicer way to zero-cost cast between the |
This comment has been minimized.
This comment has been minimized.
No, this is solved by having a lint, and then using
Every
You are not a developer working in an embedded, RTOS, or OS kernel context. Please stop talking for us, because you do not have any knowledge or understanding of what needs we have by your very own admission, and you have shown a complete lack of empathy or interest in understanding use cases other than "hosted Linux with overcommit". |
This comment has been minimized.
This comment has been minimized.
|
Mod note Please calm down, everyone. Phrases like "Instead of bragging" are not constructive. In general, talk about proposals, not people. It helps nobody to nitpick on who has the authority to talk about what. Instead, make your point, and explain the context as to why it is relevant. If someone makes a point that is inapplicable to a kind of system, talk about why it is inapplicable; don't focus on the credentials of the people involved. |
rust-lang
deleted a comment from
stepancheg
Jan 5, 2018
rust-lang
deleted a comment from
whitequark
Jan 5, 2018
This comment has been minimized.
This comment has been minimized.
|
further mod note: Two comments deleted. If you have complaints about how someone is engaging in a discussion, please talk to the mods and we can address it privately, rather than bringing them into the thread itself. |
This comment has been minimized.
This comment has been minimized.
|
Today I finally girded myself to wade back into this thread :-) To be honest, I think the key problem with this RFC is its title. Failable allocation as a general topic has a huge set of stakeholders with divergent needs, and this RFC makes very clear that it is not trying to solve the general problem -- but its title perhaps suggests otherwise. The fact of the matter is that However, I think part of the frustration on the thread is that others feel they can see their way to a fully general solution that obviates the need for This is a classic situation in the Rust world. The danger, though, is that by always looking to the "more perfect" solution we never ship. The classic way we handle this in Rust is by considering "forward compatibility". In this case: to what extent does Now, another concern is that By all means, let's discuss a more perfect solution-- on a separate RFC proposing it in detail. But in the meantime, let's ship a useful iteration that is forward-compatible, with the shared understanding that it's an imperfect approach that we don't want to apply universally across |
This comment has been minimized.
This comment has been minimized.
We agreed before that for
But this is not true. In my opinion, the best platform-agnostic guarantees that we can provide for
In particular, given that Also, revisiting the systems in which the On embedded and Linux/MacOSX/*BSDs the user can provide allocators that go both ways. As @whitequark correctly points out, in embedded it makes little sense in general for users to provide allocators that overcommit, but it is still something that can be done. I've repeated many times that we should split
I don't know. Hopefully others have better ideas but |
This comment has been minimized.
This comment has been minimized.
This seems extreme. Adding UB to this case would not be helpful for the use cases mentioned in the original RFC. Firefox, for example, is okay with not being notified of OOMs that were delayed by overcommit, but wants to catch as many OOMs as possible. Like, 32 bit systems exist, and even with overcommit there can be OOM-on-allocation on those pretty easily. Hobbling an API completely just because it isn't perfect for some platforms in seems extreme to me. Rather, it makes more sense to define it as "try_reserve is defined to reserve X memory. If the OS commits memory on allocation, try_reserve is guaranteed to produce an error when it is unable to allocate the memory. If not, it is guaranteed to produce an error in case the allocation operation somehow failed, but may succeed when there isn't actually enough memory to satisfy it" |
This comment has been minimized.
This comment has been minimized.
|
@Manishearth makes sense, what about:
Do you know how to remove the undefined behavior for |
This comment has been minimized.
This comment has been minimized.
|
(We discussed this in IRC, @gnzlbg was misusing "undefined behavior") Reasoning about the OOM killer is out of scope for Rust's safety model ; the OOM killer can kill a Rust program even when the Rust program wasn't allocating. One could call this "implementation-defined behavior" but it's not even that; the OOM killer is out of scope for Rust, as is the robustness of Rust programs in presence of an attached debugger mutating the process, or /proc/self/mem, or |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg Quick clarification: the RFC does not propose to add The specification you and @Manishearth are hashing out seems just fine to me; this entire enterprise is about a "best effort" API anyway. |
This comment has been minimized.
This comment has been minimized.
|
@aturon I totally agree with the principle, but two things. First, the "more perfect solution"'s difficulty is vastly overestimated. As soon as someone fixes the windows errors in rust-lang/rust#47043 I am confident I can make most collections allocator- and fallibility- polymorphic in 2 weeks, tops, seeing that I already did a couple in 2-3 days in https://github.com/QuiltOS/rust/commits/allocator-error. Second, at the very least, |
Gankro commentedAug 18, 2017
•
edited by Kimundi
Add minimal support for fallible allocations to the standard collection APIs. This is done in two ways:
oom=panicconfiguration is added to make global allocators panic on oom.try_reserve() -> Result<(), CollectionAllocErr>method is added.The former is sufficient to unwinding users, but the latter is insufficient for the others (although it is a decent 80/20 solution). Completing the no-unwinding story is left for future work.
Rendered
Updated link:
Rendered