Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallible collection allocation 1.0 #2116

Merged
merged 2 commits into from Feb 7, 2018

Conversation

@Gankro
Copy link
Contributor

commented Aug 18, 2017

Add minimal support for fallible allocations to the standard collection APIs. This is done in two ways:

  • For users with unwinding, an oom=panic configuration is added to make global allocators panic on oom.
  • For users without unwinding, a try_reserve() -> Result<(), CollectionAllocErr> method is added.

The former is sufficient to unwinding users, but the latter is insufficient for the others (although it is a decent 80/20 solution). Completing the no-unwinding story is left for future work.

Rendered


Updated link:

Rendered

@Gankro Gankro changed the title fallible allocation 1.0 fallible collection allocation 1.0 Aug 18, 2017

@Gankro

This comment has been minimized.

Copy link
Contributor Author

commented Aug 18, 2017

I'm really sorry this isn't perfect. I am just deeply exhausted with working on this problem right now, and need to push out what I have just to get it out there and focus on something else for a bit.

I'm not 100% convinced with all my "don't rock the boat" rationales for CollectionAllocErr, and could probably be very easily convinced to change that. It's just that my default stance on this kinda stuff is "don't touch anything, because the Servo team probably is relying on it in 17 different ways that will make me sad".


This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu.
Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle.

This comment has been minimized.

Copy link
@daurnimator

daurnimator Aug 18, 2017

Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

I think the plan for unwinding is terrible. I could support the try_reserve part of this RFC separately as a stop-gap measure on the way to full support.

Similar to the embedded case, handling allocation failure at the granularity of tasks is ideal for quality-of-implementation purposes. However, unlike embedded development, it isn't considered practical (in terms of cost) to properly take control of everything and ensure allocation failure is handled robustly.

There is no evidence of this not being considered practical.

More in general, the server use case seems a little thin. I've already mentioned in the internals thread that there's a lot more to be considered there: servers come in different shapes and sizes. Considering one of Rust's 2017 goals is “Rust should be well-equipped for writing robust, high-scale servers” I think this use case (or, I'd like to argue, use cases) should be explored in more detail.

Depending on unwinding for error handling is a terrible idea and entirely contrary to Rust best practices. This by itself should be listed under the “drawbacks” section. Besides being counteridiomatic, recovering from unwinding doesn't work well in at least three cases, two of which are not currently considered by the RFC:

  1. Platforms without unwinding support. Not really any need to go into detail here, as the RFC describes it pretty well.
  2. FFI. Unwinding is not supported across FFI boundaries. Allocation errors now result in a relatively clean abort. With this RFC, with unwinding from allocation errors through FFI can result in weird/non-deterministic/undefined behavior.
  3. Synchronization primitives You can't use any of the standard synchronization primitives such as Once, Mutex, and RwLock if you expect your code to unwind because of the possibility of lock poisoning. This was also already mentioned in the internals thread.
@rpjohnst

This comment has been minimized.

Copy link

commented Aug 18, 2017

Using unwinding to contain errors at task granularity is completely idiomatic. It's why Rust bothers to have unwinding at all. Allowing OOMs to panic in addition to their current behavior is totally in line with this. It's not a full solution, but it is a necessary part of one.

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Aug 18, 2017

Update: The suggestion was followed. No need to read rest of this comment (which I have left below the line)


I suggest that the filename for this RFC be changed to something that isn't quite so subtle.

(The current filename, "alloc-me-like-one-of-your-french-girls.md", is a meme/quote from the movie "Titanic"; I infer that reference is meant to bring to mind "fallibility", but I needed some help along the way.)

@Gankro Gankro force-pushed the Gankro:if-at-first-you-dont-alloc branch from 0687133 to c1da9a1 Aug 18, 2017


This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu.
Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle.

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

make sure that this previous comment on the earlier filename does not get lost in the shuffle; quoting here for completeness:

Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.


Here unwinding is available, and seems to be the preferred solution, as it maximizes the chances of allocation failures bubbling out of whatever libraries are used. This is unlikely to be totally robust, but that's ok.

With unwinding there isn't any apparent use for an infallible allocation checker.

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

I didn't know what "infallible allocation checker" meant when I first read through this. The term does not occur, at least not as written, elsewhere in the document.

Is it meant to be a hypothetical tool listed above with "User Profile: Embedded", namely:

some system to prevent infallible allocations from ever being used

If so, maybe just add "we'll call this an 'infallible allocation checker' when the idea is first introduced, just to define local terminology?

This comment has been minimized.

Copy link
@Gankro

Gankro Aug 18, 2017

Author Contributor

Yeah good catch (it's what I end up proposing in the first Future Work section)


## User Profile: Runtime

A garbage-collected runtime (such as SpiderMonkey or the Microsoft CLR), is generally expected to avoid crashing due to out-of-memory conditions. Different strategies and allocators are used for different situations here. Most notably, there are allocations on the GC heap for the running script, and allocations on the global heap for the actual runtime's own processing (e.g. performing a JIT compilation).

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

Are you using the word "crash" here in a narrow sense that just references to undefined-behavior and low-level errors like segmentation faults, etc? Or is it meant to include unchecked exceptions, which Java's OutOfMemoryError qualifies as? (I understand that you did not include the JVM in your list of example garbage-collected runtimes, but I think most reasonable people would include it as an example of one...)

But maybe I misunderstand the real point being made in this sentence, since you draw a distinction between allocations made for the script versus allocations for the internals of the runtime. I.e. is your point that even Java avoids crashing from out-of-memory conditions that arise from the runtime internals (like JIT compilation) ?

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

(I guess my previous comment is implicitly suggesting that you use a more specific word than "crash" in the first sentence. That, or add text elsewhere to the document that specifies what "crash" denotes in the context of this RFC.)

This comment has been minimized.

Copy link
@Gankro

Gankro Aug 18, 2017

Author Contributor

Yes, I should be more clear. I mostly meant crashing due to runtime internals, but script stuff should also try to recover (e.g. triggering a GC when a script allocation fails and retrying). I ended up cutting all focus from the script side because, as I note just below, the script allocations aren't actually relevant to this RFC (AFAICT).

I didn't include the JVM because I had only found the time to interviewed SM and CLR people.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

Using unwinding to contain errors at task granularity is completely idiomatic.

Only as a last resort, such that that one assertion failure doesn't take down your whole process accidentally. Unwinding should not be used for errors that are more or less expected and you know how to deal with. https://doc.rust-lang.org/stable/book/second-edition/ch09-03-to-panic-or-not-to-panic.html

@rpjohnst

This comment has been minimized.

Copy link

commented Aug 18, 2017

Yes, precisely. In many situations, allocation failure is unexpected and has no meaningful response at a granularity smaller than a task. This is a reason to support oom=panic rather than just abort.


## try_reserve

`try_reserve` and `try_reserve_exact` would be added to `HashMap`, `Vec`, `String`, and `VecDeque`. These would have the exact same APIs as their infallible counterparts, except that OOM would be exposed as an error case, rather than a call to `Alloc::oom()`. They would have the following signatures:

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

Clarification request: You'll only be adding fn try_reserve_exact to the types that already supply fn reserve_exact, right?

This comment has been minimized.

Copy link
@Gankro

Gankro Aug 18, 2017

Author Contributor

I'll be honest I was working off memory and thought reserve and reserve_exact were always together. If not, then yeah what you said.

```
/// Tries to reserve capacity for at least `additional` more elements to be inserted
/// in the given `Vec<T>`. The collection may reserve more space to avoid
/// frequent reallocations. After calling `reserve`, capacity will be

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

The doc-comment should be further revised since this is the doc for fn try_reserve, not fn reserve. In particular, instead of saying "After calling reserve, capacity will ..." (which is not relevant to this function), you could instead say:

If try_reserve returns Ok, capacity will be greater than or equal to self.len() + additional. If capacity is already sufficient, then returns Ok (with no side-effects to this collection).


## Eliminate the CapacityOverflow distinction

Collections could potentially just create an `AllocErr::Unsupported("capacity overflow")` and feed it to their allocator. Presumably this wouldn't do something bad to the allocator? Then the oom=abort flag could be used to completely control whether allocation failure is a panic or abort (for participating allocators).

This comment has been minimized.

Copy link
@pnkfelix

pnkfelix Aug 18, 2017

Member

clever! (don't know if I like it, but had to give it credit nonetheless)

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

@Gankro Thanks for all the hard work!

However, I also don't like unwinding as error handling. Unwinding should only happen when I have signaled that I don't have anything to do to help; IMHO, it is damage control, rather than error handling.

Have there been any proposals to add some sort of set_oom_handler to the allocator interface? I am imagining something like the following:

enum OOMOutcome {
    Resolved(*const u8), // OOM was resolved and here is the allocation
    
    // Could not resolve the OOM, so do what you have to
    #[cfg(oom=panic)]
    Panic,
    #[cfg(oom=abort)]
    Abort,
}

fn set_oom_handler<H>(handler: H) 
    where H: Fn(/* failed operation args... */) -> OOMOutcome;

You would then call set_oom_handler with your handler at the beginning of your program. Your handler can then choose what it wants to happen, including triggering a GC or whatever...

The benefit of this approach is that the existing interface doesn't have to change at all, and applications don't have choose to use try_reserve vs the existing things.

An alternate approach would be to make the oom_handler a language item or something (but that seems more suitable for the global allocator than collection allocators).

Yet another approach would be to make the OOM handler a type that implements a trait. Allocators would then be generic over their OOMHandler type.

@elahn

This comment has been minimized.

Copy link

commented Aug 19, 2017

Great idea, @mark-i-m. In a server app, I'd grab a chunk of memory on startup, then in my OOM handler:

  • set max_inflight_requests = inflight_requests - 1
  • send back-pressure/notify the monitoring, traffic and instance manager
  • resolve the allocation using some of the chunk, so the request can succeed

On operating systems that error on allocation as opposed to access, this would remove the need to manually tune the app for request load as a function of memory consumption.

Thanks @Gankro for your work on this RFC.

@whitequark

This comment has been minimized.

Copy link
Member

commented Aug 19, 2017

@Gankro I have mixed feelings about this RFC. I think this is a design, it provides me with the ability to write a VecAllocExt trait that gives me try_push and friends that would be ergonomic enough to provide a short path to an eventual stability, and these are the parts that I like. It's useful! But also, it's very imperfect, and (without going into details) I would like to see most of the internals to be completely redone before it's stabilized.

Still, it's much better than the current situation, and it doesn't have any drawbacks I can see as long as the guts are unstable (taking into account that application/firmware code will be written against this), so I'm favor of merging this even as-is.

Thanks for writing this!

@mark-i-m

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2017

Perhaps this could be an experimental RFC until we get experience with what doesn't work so well? That's seemed to work well in the past for the allocator interfaces...

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2017

Given the constraints, I'd say this is good enough.

I have only one question, what are the semantics of try_reserve on a HashMap ? That is, If I want to insert an element into a HashMap and I do something like hash_map.try_reserve(hash_map.len() + 1) am I guaranteed that on insertion the HasMap won't try to allocate?

@aturon aturon self-assigned this Aug 22, 2017

@arthurprs

This comment has been minimized.

Copy link

commented Aug 23, 2017

@gnzlbg that's implementation detail. The stdlib version, for example, is able to guarantee room for any combination of items in advance. Some variants though can't 100% guarantee room without knowing the dataset. Hopscotch and cuckoo hash tables come to my mind as examples.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2017

@arthurprs IIUC the whole point of try_reserve is to let you use std::collections without panic!ing on OOM. With the guarantee that "if try_reserve(N) succeeds, the collection can grow up to size N without allocating new memory" and due to this, OOMs cannot happen. Avoiding panics is neither easy, nor ergonomic, nor reliable, but if you are careful, doable.

Without this guarantee, one cannot avoid OOM at all, so why would I actually call try_reserve instead of reserve if I can get an OOM panic! anyway? E.g. try_resere(N) succeeds, ok, now what? I can still get an OOM panic. Is there anything useful that I can do with this information if I cannot use it to avoid OOM panics?

So... I must be missing something, because as I understand it, this guarantee is not an implementation detail of try_reserve, but its reason for existing. Without this guarantee, I really don't see how the method can be used to do anything useful (*).

(*) unless we add try_... variants of other insertion methods.

@arthurprs

This comment has been minimized.

Copy link

commented Aug 23, 2017

I don't disagree with your overall idea, I'm just saying that it can't be done in ALL cases.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2017

I don't disagree with your overall idea, I'm just saying that it can't be done for ALL data structures.

@arthurprs This is why I was asking (and why I choose the HashMap as an example). I agree with you on this. This cannot be guaranteed for all collections.

@eternaleye

This comment has been minimized.

Copy link

commented Jan 5, 2018

@comex:

Assuming you're talking about Android, doesn't that have overcommit enabled by default?

Yes, but I was pointing out that the performance degradation warning sign would be absent. Instead, this is a system where the race condition I described instantly results in data loss.

@stepancheg

This comment has been minimized.

Copy link

commented Jan 5, 2018

@comex

how about your favorite OS kernel?
Even on Linux, kernel code is expected to gracefully handle out-of-memory conditions… at least some of the time

For “some” allocations calling alloc crate explicitly and/or implementing “FallibleVec” outside of stdlib would probably be enough.

@whitequark

This comment has been minimized.

Copy link
Member

commented Jan 5, 2018

@eternaleye

Hard real-time systems usually don't use reference-counting, because deallocating data can cause an unbounded delay as anything it referred to is possibly deallocated

That's actually not entirely true. So long as there are no loops where a type T stores an Rc<T> directly or indirectly, the amount of elementary deallocation operations is bounded by the depth of the tree rooted in Rc<T>. In practice, often T would not store any pointers, e.g. consider network buffers that are reference-counted. (You can see that in lwip.) You would also need to use pool allocators to avoid the overhead of merging and splitting free blocks, as opposed to free-list allocators or something similar, in order to have the delay also bounded in time.

In other words, an Rc::try_new() that uses a pool allocator is a perfectly reasonable thing to have on a hard real-time system and there are indeed examples of it today in wide use.

@whitequark

This comment has been minimized.

Copy link
Member

commented Jan 5, 2018

@stepancheg

For “some” allocations calling alloc crate explicitly and/or implementing “FallibleVec” outside of stdlib would probably be enough.

And why exactly is this better than having try_reserve? You've just split the crates that work with collections in two incompatible universes, in exchange for... what? A minor ideological point?

@Ericson2314

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2018

Try_reserve way has dead branches with panic, vs no panics, a minor disadvantage. But we don't have to pay the price of a second crate and duplicated implementation to have fallible collections!

@stepancheg

This comment has been minimized.

Copy link

commented Jan 5, 2018

@whitequark

And why exactly is this better than having try_reserve? You've just split the crates that work with collections in two incompatible universes, in exchange for... what? A minor ideological point?

I think it's too expensive to duplicate all of or large part of rust stdlib with try_ functions. try_reserve on vector is not enough, you also need:

  • Rc::try_new
  • String::try_push
  • BufWriter::try_new
  • ToString::try_to_string
  • Mutex::try_new
  • Channel::try_enqueue_with_oom
  • thread-local get
    and so on.

Even worse, as try_ functions are not required to be called, it would be hard to know if some particular library (or just a module inside a program) doesn't accidentally use function which panics/crashes on OOM instead of returning Result.

(And I agree with @gnzlbg that at RFC should be at least split into part which panic on OOM and part with adds try_ functions).

About "minor ideological point". I'd like to insert famous quote: "Simple things should be simple, complex things should be possible."

Most of programs (and libraries) won't ever need and won't support fallible allocations, crash on OOM is the best strategy for them, and having to deal with fallible allocations would be too much burden.

And for specific situations when you need to gracefully handle OOM, low level API to allocate memory should be enough, and kernel-like developers could create specific libraries to use that API.

@Ericson2314

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2018

And I agree with @gnzlbg that at RFC should be at least split into part which panic on OOM and part with adds try_ functions

This is reasonable. In general, I disagree vehemently with unwinding catching as an error handling strategy, but its silly that OOM is somehow a different type of failure. It's even sillier that currently, we also abort on invalid layout because of the laxity of Alloc:ooms type. Invalid use of the allocator API has nothing to do with OOM.

think it's too expensive to duplicate all of or large part of rust stdlib with try_ functions

You're arguing against your own point? It is now demonstrated that this is not expensive at all: eddyb/rust@future-box...QuiltOS:allocator-error

it would be hard to know if some particular library (or just a module inside a program) doesn't accidentally use function which panics/crashes on OOM instead of returning Result.

Totally agreed! This is why I use the allocator type to enforce that this won't happen. (I do want to add a nicer way to zero-cost cast between the A and AbortAlloc<A> variants, but virtual of it still being an extra method call, accidental usage is far less likely).

@whitequark

This comment has been minimized.

Copy link
Member

commented Jan 5, 2018

Even worse, as try_ functions are not required to be called, it would be hard to know if some particular library (or just a module inside a program) doesn't accidentally use function which panics/crashes on OOM instead of returning Result.

No, this is solved by having a lint, and then using #[deny(infallible_allocation)] (until collections are parameterized with allocators) or by having an allocator type with an associated error type specified as ! (once collections are parameterized with allocators, as @Ericson2314 correctly suggests).

Most of programs (and libraries) won't ever need and won't support fallible allocations

Every #![no_std] library that uses liballoc today can and should support fallible allocations to allow using it in embedded contexts. If implemented as you propose, all these libraries will have to migrate to FallibleVec for this to happen, which means that using them in hosted contexts with panic-on-OOM becomes unnecessarily unwieldy.

And for specific situations when you need to gracefully handle OOM, low level API to allocate memory should be enough, and kernel-like developers could create specific libraries to use that API.

You are not a developer working in an embedded, RTOS, or OS kernel context. Please stop talking for us, because you do not have any knowledge or understanding of what needs we have by your very own admission, and you have shown a complete lack of empathy or interest in understanding use cases other than "hosted Linux with overcommit".

@Manishearth

This comment has been minimized.

Copy link
Member

commented Jan 5, 2018

Mod note Please calm down, everyone. Phrases like "Instead of bragging" are not constructive.

In general, talk about proposals, not people. It helps nobody to nitpick on who has the authority to talk about what. Instead, make your point, and explain the context as to why it is relevant. If someone makes a point that is inapplicable to a kind of system, talk about why it is inapplicable; don't focus on the credentials of the people involved.

@rust-lang rust-lang deleted a comment from stepancheg Jan 5, 2018

@rust-lang rust-lang deleted a comment from whitequark Jan 5, 2018

@mbrubeck

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2018

further mod note: Two comments deleted. If you have complaints about how someone is engaging in a discussion, please talk to the mods and we can address it privately, rather than bringing them into the thread itself.

@aturon

This comment has been minimized.

Copy link
Member

commented Feb 1, 2018

Today I finally girded myself to wade back into this thread :-)

To be honest, I think the key problem with this RFC is its title. Failable allocation as a general topic has a huge set of stakeholders with divergent needs, and this RFC makes very clear that it is not trying to solve the general problem -- but its title perhaps suggests otherwise.

The fact of the matter is that try_reserve solves some problems for some people. This is indisputable, and should not be litigated further on this thread.

However, I think part of the frustration on the thread is that others feel they can see their way to a fully general solution that obviates the need for try_reserve, and serves a larger set of use-cases.

This is a classic situation in the Rust world. The danger, though, is that by always looking to the "more perfect" solution we never ship.

The classic way we handle this in Rust is by considering "forward compatibility". In this case: to what extent does try_reserve preclude "more perfect" solutions in the future? And the answer is clearly "it does not". At worst, we may eventually want to deprecate it in favor of something better.

Now, another concern is that try_reserve is the tip of the iceberg, and soon we will have try_XXX functions cropping up all over std. I agree that this would be an unfortunate outcome, but that goes back to the goals of this RFC. If you see the proposal here as specifically handling a sort of "best effort" situation for particular applications, we really only need this for a couple of core data structures (as proposed). @Gankro took great pains to limit the amount of API bloat needed.

By all means, let's discuss a more perfect solution-- on a separate RFC proposing it in detail. But in the meantime, let's ship a useful iteration that is forward-compatible, with the shared understanding that it's an imperfect approach that we don't want to apply universally across std.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

@aturon

By all means, let's discuss a more perfect solution-- on a separate RFC proposing it in detail. But in the meantime, let's ship a useful iteration that is forward-compatible, with the shared understanding that it's an imperfect approach that we don't want to apply universally across std.

We agreed before that for try_push to be useful it would need to correctly report error on OOM and for try_reserve to be useful it would need to guarantee that subsequent calls to push that do not increase the capacity cannot fail. You wrote:

all of the collections try_reserve would be added to support a strong guarantee for its behavior

But this is not true. Vec has no control over this at all: only Vecs allocator has a say on whether this can or can't work.

In my opinion, the best platform-agnostic guarantees that we can provide for Vec::try_{push,reserve} is something like this:

  • try_reserve: iff Vec's allocator commits memory on allocation then a successful try_reserve guarantees that subsequent calls to push that do not increase the Vec's capacity cannot fail; otherwise, the behavior of try_reserve is undefined.

  • try_push: iff Vec's allocator commits memory on allocation then try_push returns error if the allocation fails; otherwise, the behavior of try_push is undefined.

In particular, given that Vec cannot query whether its allocator overcommits memory or not (this is a problem that we might be able to fix though), these functions might need to be unsafe.

Also, revisiting the systems in which the System allocator guarantees that these functions work I can only find one answer: Windows. On Linux, MacOSX, and *BSD, they will generally not work unless the user changes the systems default settings to disable overcommit (terrible idea) or uses a Linux/*BSD distro tailored for being used with overcommit disabled.

On embedded and Linux/MacOSX/*BSDs the user can provide allocators that go both ways. As @whitequark correctly points out, in embedded it makes little sense in general for users to provide allocators that overcommit, but it is still something that can be done.

I've repeated many times that we should split try_xxx into its own RFC so that we can make progress on it without delaying progress on oom_panic, but since it seems that it is "all or nothing", in my opinion the quickest ways to stabilize this are:

    1. make the Vec::try_xxx methods unsafe (we can always remove the unsafe keyword later in a backwards compatible way)
    1. expose them safely only behind a feature flag in the standard library that is enabled by default on windows and disabled otherwise (this allows users using xargo on embedded to enable these as well).
    1. make them safe, but add an unstable method to the Allocator trait that returns true if the allocator overcommits and make the Vec::try_xxx methods unconditionally panic if this method returns true. We could set this method to return true by default and then on Windows we make it return false (no overcommit). Embedded users writing their own Allocators can specify whether their allocator overcommits or not. This can be useful for those on MacOSX/Linux/*BSD that write their own allocator that does not overcommit as well.
    1. have a NonOvercommitingAlloc (for a lack of a better name) trait that refines Alloc, and implement these methods on Vec for NonOvercommitingAllocators only. That way windows and embedded users can mark their allocators as being NonOvercommitingAlloc by just adding an impl.

I don't know. Hopefully others have better ideas but 3 and 4 don't look that bad to me.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

otherwise, the behavior of try_reserve is undefined.

This seems extreme. Adding UB to this case would not be helpful for the use cases mentioned in the original RFC. Firefox, for example, is okay with not being notified of OOMs that were delayed by overcommit, but wants to catch as many OOMs as possible.

Like, 32 bit systems exist, and even with overcommit there can be OOM-on-allocation on those pretty easily.

Hobbling an API completely just because it isn't perfect for some platforms in seems extreme to me.

Rather, it makes more sense to define it as "try_reserve is defined to reserve X memory. If the OS commits memory on allocation, try_reserve is guaranteed to produce an error when it is unable to allocate the memory. If not, it is guaranteed to produce an error in case the allocation operation somehow failed, but may succeed when there isn't actually enough memory to satisfy it"

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

@Manishearth makes sense, what about:

  • try_reserve: iff Vec's allocator commits memory on allocation then a successful try_reserve guarantees that subsequent calls to push that do not increase the Vec's capacity cannot fail. Otherwise, try_reserve might fail or succeed. If it succeeds subsequent calls to push and try_push are not guaranteed to succeed.

Do you know how to remove the undefined behavior for try_push ? I think we actually should add it to push as well. Without this, the API still pretty broken, because the only reason to call try_reserve is to insert something in the Vec afterwards :/

@Manishearth

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

(We discussed this in IRC, @gnzlbg was misusing "undefined behavior")

Reasoning about the OOM killer is out of scope for Rust's safety model ; the OOM killer can kill a Rust program even when the Rust program wasn't allocating.

One could call this "implementation-defined behavior" but it's not even that; the OOM killer is out of scope for Rust, as is the robustness of Rust programs in presence of an attached debugger mutating the process, or /proc/self/mem, or kill -9.

@aturon

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

@gnzlbg Quick clarification: the RFC does not propose to add try_push to std, it only notes that you can define it externally.

The specification you and @Manishearth are hashing out seems just fine to me; this entire enterprise is about a "best effort" API anyway.

@Ericson2314

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

@aturon I totally agree with the principle, but two things. First, the "more perfect solution"'s difficulty is vastly overestimated. As soon as someone fixes the windows errors in rust-lang/rust#47043 I am confident I can make most collections allocator- and fallibility- polymorphic in 2 weeks, tops, seeing that I already did a couple in 2-3 days in https://github.com/QuiltOS/rust/commits/allocator-error.

Second, at the very least, try_reserve should be deprecated as soon as a better solution is available. Its the equivalent of if and unwrap vs pattern matching for allocation: unergonomic and unreasonably difficult to write correct code with, yet easy to understand and so unreasonably attractive to newcommers. So yes, don't let the perfect be the enemy of the good, but also keep in mind the inferior APIs by their mere existence can harm pedagogy and ergonomics.

@aturon aturon merged commit 2b1d50f into rust-lang:master Feb 7, 2018

@aturon

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

This RFC has been merged; the tracking issue is here. See the summary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.