Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe uninitialized byte arrays. #1222

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
5 participants
@Stebalien
Copy link
Contributor

Stebalien commented Jul 21, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 21, 2015

Using uninitialized memory is undefined behavior in terms of the optimization passes in LLVM. For example:

pub fn byte() -> u8 {                             
    unsafe {                                      
        let mut v = Vec::<u8>::with_capacity(100);
        v.set_len(100);                           
        v[10]                                     
    }                                             
}                                                 

Will generate this IR:

define i8 @_ZN4byte20hcb9dde8b210369c6eaaE() unnamed_addr #0 {
"_ZN31collections..vec..Vec$LT$u8$GT$9drop.115617hdcd6a559cef13decE.exit3":
  ret i8 undef
}

This undef can leak into other functions it's inlined into, possibly eliminating bounds checks or other various usages which can lead to undefined behavior.

I think it's possible to get around this by forcing LLVM to consider the memory initialized, but it's unfortunately not as simple as "just making it safe"


Every addressable byte in allocated memory is a valid u8 (byte) by definition.
On Linux at least, one can read `/proc/self/mem` into a buffer so the following
two functions are (virtually) indistinguishable at runtime (on Linux):

This comment has been minimized.

@eddyb

eddyb Jul 21, 2015

Member

Except for the fact that /proc is not necessarily mounted, e.g. a strict chroot might opt out of it.

This comment has been minimized.

@Stebalien

Stebalien Jul 21, 2015

Author Contributor

Good point. But my point was that uninitialized bytes are still bytes.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 21, 2015

I don't understand the "fast IO code" argument. In what cases do you have to expose uninitialized memory for efficiency?

@Stebalien

This comment has been minimized.

Copy link
Contributor Author

Stebalien commented Jul 21, 2015

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Jul 21, 2015

@eddyb, also, there's been a few discussion threads about it recently, e.g. https://users.rust-lang.org/t/reading-from-stdin-performance/2025 (and maybe some on reddit too).

@Stebalien

This comment has been minimized.

Copy link
Contributor Author

Stebalien commented Jul 21, 2015

@alexcrichton I updated the RFC to introduce an intrinsic for asking LLVM to treat some memory as initialized. Unfortunately, I can't find LLVM documentation explaining how to do this.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 21, 2015

Oh, I see, the read method takes &mut [u8] instead of some kind of fixed-capacity buffer.
I remember some discussion about zeroing back when the new I/O traits were introduced, but no mentions of how the read method had a suboptimal signature.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 21, 2015

@Stebalien You don't need an intrinsic, the inline assembly in test::black_box should work just fine: it tells LLVM that anything could happen, including initialization of that memory.

@Stebalien

This comment has been minimized.

Copy link
Contributor Author

Stebalien commented Jul 21, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Jul 21, 2015

IIRC the I/O design discussion featured this problem as a security issue: if you don't zero bytes, then you can leak arbitrary things like private keys that were previously allocated. As such, even if this intrinsic was introduced, it's not clear to me that the standard library would be able to leverage it.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 21, 2015

@Stebalien no, that's for lifetimes of allocas within a function, for stack allocation.

@Stebalien

This comment has been minimized.

Copy link
Contributor Author

Stebalien commented Jul 22, 2015

IIRC the I/O design discussion featured this problem as a security issue: if you don't zero bytes, then you can leak arbitrary things like private keys that were previously allocated. As such, even if this intrinsic was introduced, it's not clear to me that the standard library would be able to leverage it.

@Gankro, I see. I remember seeing some security argument but I thought it was about reading one's own uninitialized memory (which didn't make sense), not accidentally leaking it. I agree, that's a very good reason not to allow this.

@Stebalien Stebalien closed this Jul 22, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.