Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zeroize: zeroize entire capacity of Vec #341

Merged
merged 3 commits into from Jan 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS.md
Expand Up @@ -15,5 +15,6 @@ contributions under the terms of the [Apache License, Version 2.0]
* David Tolnay ([@dtolnay](https://github.com/dtolnay))
* Kai Ren ([@tyranron](https://github.com/tyranron))
* Murarth ([@murarth](https://github.com/murarth))
* Niclas Schwarzlose ([@aticu](https://github.com/aticu))
* Yin Guanhao ([@sopium](https://github.com/sopium))
* Will Speak ([@iwillspeak](https://github.com/iwillspeak))
63 changes: 63 additions & 0 deletions zeroize/src/lib.rs
Expand Up @@ -325,6 +325,32 @@ where
/// previous reallocations did not leave values on the heap.
fn zeroize(&mut self) {
self.iter_mut().zeroize();

// Zero the capacity of the `Vec` that is not initialized.
{
// Safety:
//
// This is safe, because `Vec` never allocates more than `isize::MAX` bytes.
// This exact use case is even mentioned in the documentation of `pointer::add`.
let extra_capacity_start = unsafe { self.as_mut_ptr().add(self.len()) as *mut u8 };
let extra_capacity_len = self.capacity().saturating_sub(self.len());

for i in 0..(extra_capacity_len * core::mem::size_of::<Z>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a simpler option than casting to a *mut u8, which is using mem::zeroed to produce the all-zero byte pattern for Z, and ptr::write_volatile to write it, e.g.

for i in 0..extra_capacity_len {
    unsafe { ptr::write_volatile(extra_capacity_start.add(i), mem::zeroed()); }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but it would result in undefined behavior, when Z is not a type where an all-zero pattern is a valid value. However now that I think about it again, it wouldn't make much sense to implement Zeroize for such a type. I'll adjust that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second thought, if it were implemented that way, it would be possible to invoke undefined behavior from safe code. For example this would then result in undefined behavior:

use core::num::NonZeroU8;
use zeroize::Zeroize;

struct NonZero(NonZeroU8);

impl Zeroize for NonZero {
    fn zeroize(&mut self) {
        self.0 = NonZeroU8::new(1).unwrap();
    }
}

fn main() {
    let mut vec = vec![NonZero(NonZeroU8::new(2).unwrap())];
    vec.clear();
    // undefined behavior: this would create a `NonZeroU8` with a
    // memory-representation of all-zeroes while zeroing the
    // uninitialized memory
    vec.zeroize();
}

This may not be a very useful implementation of Zeroize, but it would be a way to invoke undefined behavior in safe code. The other version does not have this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the uninitialized memory be exposed to safe code?

Regardless, the net effect is the same: you are writing zeroed bytes to the excess capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the uninitialized memory be exposed to safe code?

It doesn't need to. According to the doc of mem::zeroed:

There is no guarantee that an all-zero byte-pattern represents a valid value of some type T. For example, the all-zero byte-pattern is not a valid value for reference types (&T and &mut T). Using zeroed on such types causes immediate undefined behavior because the Rust compiler assumes that there always is a valid value in a variable it considers initialized.

Which links to a section in the docs about undefined behavior:

It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.

The same page also states that

Rust code is incorrect if it exhibits any of the behaviors in the following list.

  • [...]
  • Invalid values for a type with a custom definition of invalid values. In the standard library, this affects NonNull<T> and NonZero*.

The example above would therefore be both unsound and incorrect, which is possible from safe code. The values do not need to be accessible by safe code to create the problem here.

With the method which zeroes the memory using u8 this would not be the case, as u8 is guaranteed to be able to hold a value of 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a better solution?

The implementation, as currently written in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems like 6 of one, half dozen of another to me, except looping a byte-at-a-time and doing pointer arithmetic around core::mem::zeroed::<NonNull<T>>() is more complex and potentially slower.

It seems the crux of this is...

Memory behind a raw pointer can hold any value without any problem, because Rust does not consider it to be initialized. The problem only arises when an invalid value of type T exists, which Rust considers initialized, which mem::zeroed() does.

...but we're talking about a buffer which is defined as being:

What does a normal Vec::with_capacity call initialize it to?

Nothing, it does not create a value of type T though and just leaves the memory completely uninitialized.

The contract of Vec is always to initialize this capacity in some way before reading from it.

It seems the claim is it might be considered initialized by the Rust compiler when it is uninitialized, but before its API permits any reads, it will be initialized again.

Can either of you give a concrete example of a tractable problem which is possible given the contract of Vec to always (re)initialize this memory which would not occur with pointer-based zeroing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are doing to the excess buffer of the vector is and never was the problem. You are indeed free to do with the extra capacity of bytes whatever you want. But not however you want. The code that writes invalid or uninitialized values into it must still be UB-free and executing mem::zeroed::<T> is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can either of you give a concrete example of a tractable problem which is possible given the contract of Vec to always (re)initialize this memory which would not occur with pointer-based zeroing?

The problem is not the Vec or its memory, the problem is simply that the code could call mem::zeroed<NonNull<()>>, which is UB. The compiler might make optimizations that would crash the program if such a value ever exists. This would be the same problem in every program, even without a Vec being involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get it now. Thank you for the explanation.

// Safety:
//
// This is safe, because `Vec` never allocates more than `isize::MAX` bytes.
let current_ptr = unsafe { extra_capacity_start.add(i) };
// Safety:
//
// `current_ptr` is valid, because it lies within the allocation of the `Vec`.
// It is also properly aligned, because we write a `u8`, which has an alignment of
// 1.
unsafe { ptr::write_volatile(current_ptr, 0) };
}

atomic_fence();
}

self.clear();
}
}
Expand Down Expand Up @@ -458,6 +484,43 @@ mod tests {
assert!(vec.is_empty());
}

#[cfg(feature = "alloc")]
#[test]
fn zeroize_vec_entire_capacity() {
#[derive(Clone)]
struct PanicOnNonZeroDrop(u64);

impl Zeroize for PanicOnNonZeroDrop {
fn zeroize(&mut self) {
self.0 = 0;
}
}

impl Drop for PanicOnNonZeroDrop {
fn drop(&mut self) {
if self.0 != 0 {
panic!("dropped non-zeroized data");
}
}
}

// Ensure that the entire capacity of the vec is zeroized and that no unitinialized data
// is ever interpreted as initialized
let mut vec = vec![PanicOnNonZeroDrop(42); 2];

unsafe {
vec.set_len(1);
}

vec.zeroize();

unsafe {
vec.set_len(2);
}

drop(vec);
}

#[cfg(feature = "alloc")]
#[test]
fn zeroize_string() {
Expand Down