Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Never allow reads from uninitialized memory in safe Rust #837
Conversation
This comment has been minimized.
This comment has been minimized.
aturon
referenced this pull request
Feb 13, 2015
Closed
Safe methods in std::io allow reading uninit memory #20314
aturon
self-assigned this
Feb 13, 2015
quantheory
reviewed
Feb 13, 2015
| will serve to set an explicit policy that: | ||
|
|
||
| **Uninitialized memory can ever be exposed in safe Rust, even when it | ||
| would not lead to undefined behavior**. |
This comment has been minimized.
This comment has been minimized.
quantheory
reviewed
Feb 13, 2015
| # Summary | ||
|
|
||
| Set an explicit policy that uninitialized memory can ever be exposed | ||
| in safe Rust, even when it would not lead to undefined behavior. |
This comment has been minimized.
This comment has been minimized.
quantheory
Feb 13, 2015
Contributor
I noticed the one below first, but this also should say "can never".
This comment has been minimized.
This comment has been minimized.
|
@quantheory Shortest "Detailed Design" ever and I still managed to insert crucial typos! |
quantheory
reviewed
Feb 13, 2015
| memory and typesafe, but they carry security risks. | ||
|
|
||
| In particular, it may be possible to exploit a bug in safe Rust code | ||
| that leads that code to reveal the contents of memory. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I'm pretty good at making typos myself! I do want to take a crack at copyediting all the RFCS one of these days... Regarding the proposal itself, |
This comment has been minimized.
This comment has been minimized.
|
So, what would this RFC mean in practical terms? Sounds like |
This comment has been minimized.
This comment has been minimized.
|
Yes, and (b) in particular means that we would have to zero out memory or something similar in the implementation of |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
vadimcn
commented on text/0000-uninit-memory-policy.md in 229bca6
Feb 14, 2015
|
Would be nice to spell out exactly what constitutes "known" and "unknown" safe code. |
This comment has been minimized.
This comment has been minimized.
|
Given that there are really not too many valid uses for reading uninitialized memory even in unsafe code, this seems like a welcome change for consistency. |
This comment has been minimized.
This comment has been minimized.
|
I'm personally not quite in favor of this proposal. I think that it's a fine line in how we leverage Taken to an extreme, I would find it quite unfortunate if the "I/O rule of thumb" was to use Put another way, the motivation here seems to solely be "uninitialized memory is a security risk," but I think there is a much broader issues about the role of |
This comment has been minimized.
This comment has been minimized.
|
I think that there's a problem of semantics here. Most discussions of "memory safety" that I can recall consider reads of uninitialized data to be unsafe by definition, because of issues like:
The usual safety guarantees in Rust protect you from problem 1. They don't protect you from problem 2; an uninitialized If you define memory safety as meaning only that you avoid undefined behavior in the C/C++ sense, then you can get away with some reads of uninitialized data in safe code. But it's tricky business, and a lot of people would still not consider that to be consistent with what they expect "memory safe" code to allow. |
This comment has been minimized.
This comment has been minimized.
Note that the motivation for this RFC is clear in that this is not a question of memory safety. The only cases being considered are uninitialized arrays of scalars which cannot result in memory safety violations if read. In this sense points 1/2 are somewhat moot as the "objects" are scalars which basically can't be corrupt. Points 3/4/5 do not actually affect memory safety but play into unsafety when related to security. |
This comment has been minimized.
This comment has been minimized.
Absolutely. Which is exactly why I felt this topic merited an RFC, since this is a gray area for which there's not complete agreement, and we need to make a clear policy decision.
That does sound bad! But I'll note that if this policy turns out to be too expensive in practice -- that is, if people end up reaching for
I understand where you're coming from here, but as a rule, I generally prefer to avoid arguments from extremes/slippery slope arguments. We as a community get to decide exactly what safety covers, and thanks to the RFC process we can ensure that changes to this policy are deliberate and relatively clear. I feel like this is a way of framing the issue in much more extreme terms than the RFC is actually talking about. Yes, there are many other security issues, but this RFC is about a specific policy decision with concrete, quantifiable tradeoffs. In the case of IO -- which seems to be the main culprit at the moment -- we can and should measure the performance impact here. But as we're seeing with hashing, it's possible to provide some localized mitigation against security/DoS risks based on a specific cost/benefit analysis. If the local downsides are not too great, why not take steps where we can to be even more ambitious with the problems that Rust helps you catch? I know that in general we've been moving away from opinionation (green threading, high-level IO abstractions, etc), but we shouldn't go into full retreat! Especially in cases like this, and hashing, where there are clear and relatively simple ways to opt out of the guarantee for performance-critical scenarios, without paying any extra performance costs. |
This comment has been minimized.
This comment has been minimized.
That is what I was getting at. If I wasn't clear, I think that we are in perfect agreement about all of the concrete technical details and what the RFC is intended to do. (Though I'd add a nitpick that "scalar" is not entirely the right word; references/pointers, and in some languages strings are considered scalars, after all.)
I think we've miscommunicated. My point is that this is true for some definitions of "memory safety", and trivially false for others, and there is (to my knowledge) no authoritative source for what the "true" definition is. A lot of people would consider a program to be memory-unsafe any time it has a read of uninitialized data which causes non-deterministic behavior, even if undefined behavior is not possible. (See for instance this blog post bemoaning the fact that the definition of memory safety really is not universal and black-and-white. Some of its sources consider reads of uninitialized to be memory errors, while others don't, though unfortunately I think some of the academic stuff is paywalled. Note also that many developer groups de facto consider anything flagged by valgrind's Memcheck or similar tools to be memory-unsafe, except for memory leaks and edge cases not covered by the tool. Typically this includes use of uninitialized data.) |
This comment has been minimized.
This comment has been minimized.
|
I don't think this RFC is about the definition of "memory safety". I think it's about what developers expect from a "safe" language. This is a fuzzy notion, driven by humans rather than computer science, but it's a vital one. In C nobody would bat an eye at this design. But Rust does the safe thing 99.8% of the time. |
This comment has been minimized.
This comment has been minimized.
If that is the case then you might as well not zero to begin with. People who want it zeroed for security reasons cannot use it because it might change at any time unless it is documented. |
This comment has been minimized.
This comment has been minimized.
I wasn't talking about leaving it undocumented; the RFC is (I think) quite clearly setting an explicit policy. I'm just saying that, at some later point, it may be possible to re-evaluate and change this policy -- but we would presumably only do so if there was a strong consensus that the security benefits were minimal and performance drawbacks onerous. Regardless, I don't think this particular point has much bearing on the debate; it was responding to a hypothetical doomsday scenario. The question is, what is the best policy decision we can make given what we know today? |
This comment has been minimized.
This comment has been minimized.
If read_to_end is documented to zero then this cannot be changed post 1.0. |
This comment has been minimized.
This comment has been minimized.
I see what you're getting at, but I don't think we should document it as literally zeroing. Rather, this RFC would establish a global policy about uninitialized memory and safe code. That is, no code would be allowed to rely (in the spec sense) on the values being literally zeros. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton BTW, I seem to remember you taking some measurements about the perf hit for IO. Any chance you could dig those up? |
This comment has been minimized.
This comment has been minimized.
|
Consider the following rule:
I thought this was what this RFC was trying to accomplish and the way it talks about read_to_end it seems that way. But what this RFC really says is that the writer of an application cannot even directly pass uninitialized memory to Furthermore: If this is the rule now then it cannot be changed after 1.0 (and maybe not even in 2.0.) There must be no confusion about this: You're still talking like there is an option here ("perhaps so", "it would be difficult") but there is no such option. If this is part of the 1.0 manual then it will never ever be changed and you cannot make with_extra safe either. |
This comment has been minimized.
This comment has been minimized.
|
There's several sorts of "undefined behaviour", i.e. things that are assumed to never happen:
The section of the manual is focusing on 3, so the title makes perfect sense even if not everything in it falls into 1 or 2. This RFC is effectively making reading from uninit memory undefined behaviour in Rust, which is (somewhat) orthogonal to whether it is UB in LLVM/C/C++. |
This comment has been minimized.
This comment has been minimized.
|
(I think the RFC could probably clarify its use of terminology to disambiguate.) |
This comment has been minimized.
This comment has been minimized.
|
I don't believe there is a need for such a distinction. |
This comment has been minimized.
This comment has been minimized.
|
It doesn't make sense for undefined behaviour in Rust to be directly tied to the back-end the main implementation happens to be using now. |
This comment has been minimized.
This comment has been minimized.
|
That may be so but the current set of undefined behaviors seems to be derived from things that cause UB in LLVM or are expected to cause UB in LLVM once better optimizations become available. Is there any justification at all for making this UB except to stop people from doing it? |
This comment has been minimized.
This comment has been minimized.
|
I retract my vote in favor of this RFC if we're actually treating reads of uninitialized memory as UB, not just "don't do this in safe code." There are legitimate reasons to do this in unsafe code. |
This comment has been minimized.
This comment has been minimized.
jmesmon
commented
Feb 18, 2015
|
@huonw : reading from undef or uninitialized memory is already undefined behavior. This rule goes further than limiting reading from undef: it eliminates our ability to pass undef outside of unsafe even if that undef is never read. |
This comment has been minimized.
This comment has been minimized.
|
META: I'm going to need to step away from this debate for the next couple of days to help with the alpha2 push. I do think that several good questions/concerns have been raised and that the text needs to be more clear. |
This comment has been minimized.
This comment has been minimized.
|
@pythonesque I tend to think that calling such reads undefined behavior may be too strong, but it's worth pointing out that some such cases are undefined behavior in C (and I have no doubt that LLVM will exploit that). See e.g. https://www.securecoding.cert.org/confluence/display/seccode/EXP33-C.+Do+not+read+uninitialized+memory. In any case, I'm curious to know what legit reason you have in mind for reading uninitialized memory? I think that at minimum it makes sense to say that the standard library will not expose uninitialized memory to its clients without an unsafe keyword being required somewhere (and moreover that all libraries are encouraged to follow a similar rule). The exact rules that should apply to third party code -- and the minimum rules required for unsafe code to be considered stable and well-defined -- seem like a somewhat separate (but entangled) topic. At least that's my current feeling. |
This comment has been minimized.
This comment has been minimized.
jmesmon
commented
Feb 18, 2015
|
If we consider accepting this RFC, we need a mechanism to maintain our ability to create efficient apis in a manner transparent to the user of the API. For the specific case of Read and read_to_end, we could probably add a MaybeUndef type that wraps the Frankly, we might need a more general way to indicate that arguments are intended to be used as |
This comment has been minimized.
This comment has been minimized.
jmesmon
commented
Feb 18, 2015
|
@nikomatsakis the problem is that this RFC isn't restricting "C/LLVM undef behavior", it's making it so that bugs in other code don't trigger "C/LLVM undef behavior". The RFC goes much further than forbidding the reading of uninitialized memory (which is already forbidden/undefined). |
This comment has been minimized.
This comment has been minimized.
|
Ah, I just remembered something we used to do in a past life that was a (perhaps?) legitimate use of uninitialized memory. The idea was you wanted to construct a set of integers of fixed-size domain but not pay Because you do no want to pay O(n) cost, you can't even take the time to zero out any memory. The way we did this was to have two arrays, both of the same length, neither initialized, and a counter with the number of entries. One array contains, for each possible value, the length of the set at the time it was added (its "index"). And the other contains, for each possible index, the value that was added at that time. This way you can "check-and-double-check" to find out if a value was indeed added without every initializing the full arrays. This does require reading uninitialized memory, though, because you must read from the struct IntSet {
len: usize,
capacity: usize,
values: Box<[usize]>,
indices: Box<[usize]>,
}
fn insert(&mut self, value: usize) {
assert!(value < self.capacity);
if self.contains(value) { return; }
let index = self.len;
self.len += 1;
self.indices[value] = index;
self.values[index] = value;
}
fn contains(&self, value: usize) -> bool {
let index = self.indices[value];
if index >= self.len { return false; }
self.values[index] == value
}Now, for all I know this violates some C rules. I haven't bothered to check of course. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Yes, that is the example I was thinking of. |
This comment has been minimized.
This comment has been minimized.
|
Sorry I have been a little slow to respond to this thread everyone!
It looks like a TCP stream is slowed down by about 25%. Here I'm just writing tons of data on one thread to another over a TCP socket and printing out the amount of data received each second. The numbers at the bottom show the measurements for me at least. I wanted to get at least one "real-ish world" example (hence the TCP stream). For a more "raw benchmark" I wrote some benchmarks for just reading some repeated data with zeroing both before and afterwards. It looks like the data is roughly the same in that the performance hit is about 20%.
I think this is a good point, and I definitely agree. I think I just wanted to make sure that we were aware of the situation it may put us in. When it comes down to measurements, though, 20% isn't all that much...
Ah yes good point, I think I see where you're coming from now! |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Feb 24, 2015
For a systems language where the standard (IO?) library is focused on "zero-cost abstractions" I rather disagree... In addition, I think @mahkoh's point about "relying on" this behavior is subtler than the point that was addressed in responses. Specifically, this proposal would incur costs without the benefits for security, aside from avoiding the true nasal demons of deeply undefined behavior on reading uninitialized memory.
Basically, I see this as well-intentioned but misguided - in the pursuit of stronger security guarantees in specification, it'll result in weaker security guarantees in practice. I'd suggest that straight |
This comment has been minimized.
This comment has been minimized.
I don't think there is anything subtle about it. It has all been spelled out in this issue but people keep ignoring or evading it.
People using unsafe will still pay this penalty. I've already said this above: There is no way to use the |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Feb 24, 2015
|
@mahkoh, I'm not saying the point is particularly subtle - just that it's subtler than the one that the answers addressed. And yes, one of the places where it'd incentivize adding unsafe APIs is in the Read trait itself. |
This comment has been minimized.
This comment has been minimized.
l0kod
commented
Feb 24, 2015
|
This seems quite close to the RAII property.
This RFC should open the door for more general memory sanitization. cc rust-lang/rust#17046 |
This comment has been minimized.
This comment has been minimized.
|
This formulation is weak anyway as @aturon has later explained that it will be treated as UB. The real formulation should be
|
This comment has been minimized.
This comment has been minimized.
|
Just realized another funny thing: Since unsafe blocks end at safe function calls but not at unsafe function calls, if this proposal is accepted, you can never change unsafe functions to safe because some users might pass uninitialized memory to them. If you change them to safe, the behavior becomes undefined. |
This comment has been minimized.
This comment has been minimized.
|
Thanks everyone who has participated in this discussion so far. I'm reasonably convinced that the wording of the original RFC is problematic, and would like to take a step back and have a broader discussion about these issues as a community -- and hopefully hash out a policy that we can all agree on. To that end, I'm closing this RFC in favor of a new internals thread. I hope to hear from strong proponents on both sides. |
aturon
closed this
Feb 24, 2015
This comment has been minimized.
This comment has been minimized.
|
I have a few points to make First, I am somewhat on the fence about this RFC, but leaning towards a
Maintaining this invariant is possible in However, this RFC doesn't actually address rust-lang/rust#20314. The original issue says that So.... If I misunderstood the original issue, please forgive me, but this request seems very unreasonable to me. Virtually every single type in the rust ecosystem can expose uninitialized memory to the user given a bug in the implementation. And yes, this means that bugs in the implementation could be exploited by an attacker to access secrets in memory like passwords etc... but this is true for all code everywhere. So, to recap, this RFC does not require any zeroing out of memory in EDIT: I did in fact misunderstand the original issue. I have posted an update in the forum thread. |
This comment has been minimized.
This comment has been minimized.
reem
commented
Feb 25, 2015
|
The The flaw in this design is obvious if you try to generalize Reader and Writer to general types instead of just I think this special casing is actually incorrect, and just like with the placement new issues, we actually need more advanced language features to properly deal with this, particularly some form of An alternative fix/workaround for this specific case is for read to take a |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Feb 25, 2015
|
@reem: Well, a "safe" option with a more forgiving contract might be to pass in an In order to be efficient, the callee trusts that there's at least some capacity reserved, and the caller trusts that But if the contract is violated, the cost is "Oh no, we had an annoyingly short outbuffer" or "Oh no, we allocated" rather than "Oh no, we read uninitialized memory." Of course, that doesn't permit windowing a larger buffer for multiple reads like Heck, if someone makes a |
This comment has been minimized.
This comment has been minimized.
jmesmon
commented
Feb 25, 2015
|
@eternaleye yes, type wrapping (in some yet to determned way) is probably a sane solution. That said: I don't think we need to mix Vec in here, as They key problem with the type wrapping is that there doesn't seem to be a good way to say for the caller that "if x, then the data I passed in will certainly now be initialized", and have the type system check it. |
aturon commentedFeb 13, 2015
Set an explicit policy that uninitialized memory can never be exposed
in safe Rust, even when it would not lead to undefined behavior.
See rust-lang/rust#20314.
Rendered