Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading on uninitialized memory may cause UB #26

Open
JOE1994 opened this issue Jan 8, 2021 · 6 comments
Open

Reading on uninitialized memory may cause UB #26

JOE1994 opened this issue Jan 8, 2021 · 6 comments

Comments

@JOE1994
Copy link

JOE1994 commented Jan 8, 2021

Hello 🦀 ,
we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.

Issue Description

  • metadata::read_vorbis_comment_block() method

claxon/src/metadata.rs

Lines 432 to 438 in 2f05385

let mut vendor_bytes = Vec::with_capacity(vendor_len as usize);
// We can safely set the lenght of the vector here; the uninitialized memory
// is not exposed. If `read_into` succeeds, it will have overwritten all
// bytes. If not, an error is returned and the memory is never exposed.
unsafe { vendor_bytes.set_len(vendor_len as usize); }
try!(input.read_into(&mut vendor_bytes));

claxon/src/metadata.rs

Lines 473 to 475 in 2f05385

let mut comment_bytes = Vec::with_capacity(comment_len as usize);
unsafe { comment_bytes.set_len(comment_len as usize); }
try!(input.read_into(&mut comment_bytes));

  • metadata::read_application_block() method

claxon/src/metadata.rs

Lines 544 to 546 in 2f05385

let mut data = Vec::with_capacity(length as usize - 4);
unsafe { data.set_len(length as usize - 4); }
try!(input.read_into(&mut data));

Methods metadata::read_vorbis_comment_block() & metadata::read_application_block() create an uninitialized buffer and passes it to user-provided ReadBytes implementation. This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).

This part from the Read trait documentation explains the issue:

It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

Suggested Fix

It is safe to zero-initialize the newly allocated u8 buffer before read_into(), in order to prevent user-provided Read from accessing old contents of the newly allocated heap memory.

Also, there are two nightly features for handling such cases.

Thank you for checking out this issue 👍

@memoryruins
Copy link

#19 was opened a few months ago with a proposed a fix for the related parts of code, but it hasn't seen much activity since.

@ruuda
Copy link
Owner

ruuda commented Jan 11, 2021

Thank you for the detailed write-up. I understand that a pathological ReadBytes implementation could inspect the buffer, and that LLVM might do bad things when it does. However, both of the provided ReadBytes implementations only call copy_from_slice on the buffer. They do not pass the buffer directly to io::Read::read. As far as I can tell, the usage by those two provided implementations is sound, but the problem is that a user-provided implementation of ReadBytes might observe the uninitialized memory. (But please explain if I am misunderstanding something.)

My first attempt to prevent this was to replace the with_capacity/set_length pairs with vec![0; length]. I’ve prepared a patch in 199958f that does this. To evaluate the performance impact of that I ran Musium in scanning mode. This code relies heavily on the metadata reading in Claxon, so it’s a good benchmark for this change. On my i7-6700HQ, the number of instructions executed goes up significantly by 4.8%, but there is no evidence that total running time is affected. This makes sense to me on a high-end superscalar CPU, but I fear that on other targets with simpler CPUs such as Raspberry Pi's (which I care about), the extra instructions do translate to extra running time. I haven’t measured that yet, but I would prefer a zero-overhead solution.

I think a zero-overhead solution is possible. All call sites that you pointed out have a length known ahead of time. They pass the uninitialized vector to read_into, but we could turn this around: make read_into allocate the vector. It would just be moving some lines around, so it shouldn’t affect the generated code too much, but it would prevent uninitialized buffers from being passed to user-provided ReadBytes implementations. I went ahead and implemented this in 18ae310, but I haven’t benchmarked it yet, I hope to get around to that soon.

Do you think that 18ae310 would be an adequate solution?

@JOE1994
Copy link
Author

JOE1994 commented Jan 19, 2021

@ruuda Thank you so much for your attention on this issue! 🦀
The fix you made in 18ae310 looks sound & reasonable to me 👍
I apologize for keeping you waiting..

@ruuda
Copy link
Owner

ruuda commented Feb 11, 2021

I wanted to go and benchmark this, but there is a complication: the branch I use in the application is a different one (that adds support for picture metadata), which mostly deletes ReadBytes in favor of just io::Read ... but io::Read can’t provide an implementation for read_vec that is both fast and sound, even though Cursor<[u8]> and BufReader can. I think it can be fixed with a new trait for read_vec, but it requires some larger changes.

@Shnatsel
Copy link
Contributor

Could a 5% performance hit be worth it in exchange for safety in the short term, until the proper solution that requires refactoring actually lands?

The reference FLAC implementation already provides everything but memory safety; I feel memory safety is the main selling point of Claxon, and it's not something that should be compromised on.

@zopsicle
Copy link

zopsicle commented Apr 23, 2023

but io::Read can’t provide an implementation for read_vec that is both fast and sound

This can be implemented as follows:

pub fn read_vec<R>(r: &mut R, onto: &mut Vec<u8>, len: usize) -> Result<usize>
    where R: Read
{
    onto.reserve(len);
    r.take(len as u64).read_to_end(onto)
}

This is obviously sound (no unsafe). read_to_end uses read_buf, so this is fast if the underlying reader implements read_buf to read into uninitialized memory, which is the case for fs::File, io::Take, BufReader, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants