Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread locals do not guard against recursive initialization. #30228

Closed
eddyb opened this Issue Dec 6, 2015 · 17 comments

Comments

Projects
None yet
6 participants
@eddyb
Copy link
Member

eddyb commented Dec 6, 2015

use std::cell::Cell;
struct Foo;
thread_local!(static INIT: Cell<bool> = Cell::new(false));
thread_local!(static FOO: Foo = Foo::init());
impl Foo {
    fn init() -> Foo {
        INIT.with(|i| {
            let init = i.get();
            i.set(true);
            if !init {
                // Trigger recursive initialization, just once.
                FOO.with(|_| {});
            }
        });
        Foo
    }
}
impl Drop for Foo {
    fn drop(&mut self) {
        FOO.with(|foo| alias_check(self, foo));
    }
}

fn alias_check<T>(m: &mut T, r: &T) {
    println!("alias_check({:p}, {:p})", m, r);
    if m as *mut _ as usize == r as *const _ as usize {
        panic!("aliasing detected!");
    }
}

fn main() {
    FOO.with(|_| {});
}

In current stable (1.4), beta, and nightly the above snippet produces aliased references instead of triggering library asserts.

Currently, libstd assumes there is no value already in the slot when assigning the one produced by the latest initializer call.

It should either drop one of the two values (perhaps after putting the TLS slot in "destructor being called" mode) or panic, if a multiple initialization condition is detected.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 7, 2015

triage: I-nominated

Seems bad! Haven't had too much of a chance to digest yet, but doesn't seem earth-shatteringly bad on the surface at least.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 8, 2015

I... actually don't think this is related to thread locals. The line you've highlighted I believe is indeed the "one at fault", but this seems like it may be a codegen problem?

Currently, for code like this:

pub fn foo(a: &mut Box<i32>, b: Box<i32>) {
    *a = b;
}

This is codegen'd as:

  1. Drop the contents of a
  2. Copy the contents of b into a

Note that this is precisely what's happening on that line of *ptr = Some(value). The destructor is running first, and then the value is being moved into place. This means that the destructor for the TLS value is running without setting the "I'm running a destructor flag", which is not intended! This means that destructors can access the value while it's being destroyed, causing the unsafety here.

This seems like an interesting phenomenon, however, because if the destructor in the function above panicked, that means that we've possibly got a hole in memory! As a proof of concept, let's take a look at this:

struct A(Box<i32>);
struct B<'a>(&'a mut A);

impl Drop for A {
    fn drop(&mut self) {
        if *self.0 == 0 {
            panic!("oh no");
        } else {
            // ...
        }
    }
}

impl<'a> Drop for B<'a> {
    fn drop(&mut self) {
        let a = &self.0;
        println!("{}", a.0);
    }
}

fn foo(b: &mut B) {
    *b.0 = A(Box::new(1));
}

fn main() {
    let mut bomb = A(Box::new(0));
    let mut b = B(&mut bomb);
    foo(&mut b);
}

Here we create a bomb that's "ready to go off" via A(Box::new(0)), and then it's packaged up into another value with a destructor that can later witness its value (e.g. B(&mut bomb)). We then pass it off to a function where b's value is overwritten with a non-bomb (e.g. A(Box::new(1))). Due to the way this is codegen'd, however, the destructor for A is run before the pointer is overwritten, which means that the pointer is never actually overwritten, which in turn means that the destructor for B can later witness this, causing a use-after-free.

When running this program, I get:

thread '<main>' panicked at 'oh no', <anon>:7
-337192192

(of course results may vary).

tl;dr; I think this is a codegen bug, not a TLS bug. Recategorizing as T-compiler, but we should probably still fix thread::local in the meantime to protect against this.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 8, 2015

Ah, cc @rust-lang/compiler, tagging this with I-unsound due to my code example above.

@alexcrichton alexcrichton removed the I-wrong label Dec 8, 2015

@eddyb

This comment has been minimized.

Copy link
Member Author

eddyb commented Dec 8, 2015

That sounds like there are two problems here:

  • TLS not guarding certain kinds of assignment drops (initialization)
  • panicking not guarding assignment drops

I agree that the more general problem is that assignment drops need special handling, but doing it, via swaps for example, in every case seems like overkill and can potentially ruin performance.

It might be worth splitting this into two issues (or 3, one of them tracking the various instances of the general case).

As for the panic case, it might be worth implementing *a = b; as:

try {
    drop_in_place(a);
} finally {
    ptr::write(a, b);
}

alexcrichton added a commit to alexcrichton/rust that referenced this issue Dec 8, 2015

std: Use mem::replace in TLS initialization
Due to rust-lang#30228 it's not currently sound to do `*ptr = Some(value)`, so instead
use `mem::replace` which fixes the soundness hole for now.
@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 8, 2015

The panicking destructor problem is present in MIR.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 8, 2015

#[rustc_mir(graphviz="file.gv")]
pub fn foo<T>(a: &mut T, b: T) {
    *a = b;
}

generates the MIR at https://gist.github.com/arielb1/952593c64e4fffdbf9ed#file-mir-svg

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 8, 2015

Interesting =)

It seems like the way we should codegen *ptr = ... is let mut new_value = ...; swap(ptr, &mut new_value); drop(new_value);, eh?

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 8, 2015

@nikomatsakis

We can have the landing pad write the rvalue into the destination. That would however require a drop terminator. This feels like it would be better than introducing a copy to every write.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 8, 2015

That would basically be the MIR

  %0 = $b
  %1 = $a
  drop $a uwto uw0
  *%1 = %0
  %ret = ()
  return
uw0:
  *%1 = %0
  unwind
@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Dec 8, 2015

It seems like the way we should codegen *ptr = ... is let mut new_value = ...; swap(ptr, &mut new_value); drop(new_value);, eh?

Not sure how relevant it is, but this is similar to what Swift does (according to this presentation ~38 minute onward)

alexcrichton added a commit to alexcrichton/rust that referenced this issue Dec 8, 2015

std: Use mem::replace in TLS initialization
Due to rust-lang#30228 it's not currently sound to do `*ptr = Some(value)`, so instead
use `mem::replace` which fixes the soundness hole for now.
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 10, 2015

On Tue, Dec 08, 2015 at 08:05:03AM -0800, arielb1 wrote:

@nikomatsakis

We can have the landing pad write the rvalue into the destination. That would however require a drop terminator. This feels like it would be better than introducing a copy to every write.

Clever. Worth measuring. Obviously this is somewhat observable (But
you have to have wacky *mut and UnsafeCell usage.)

bors added a commit that referenced this issue Dec 10, 2015

Auto merge of #30267 - alexcrichton:tls-init-oh-my, r=nikomatsakis
Due to #30228 it's not currently sound to do `*ptr = Some(value)`, so instead
use `mem::replace` which fixes the soundness hole for now.
@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 10, 2015

@nikomatsakis

That kind of wacky UnsafeCall abuse is UB.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 14, 2015

@arielb1

That kind of wacky UnsafeCall abuse is UB.

Hmm, perhaps it is. I'm trying to come up with an example of something reasonable but haven't been able to yet.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 14, 2015

@nikomatsakis

The point here is that the destructor is run "in the middle of the assignment", and can try to observe the value in-mid-assignment. If the assignment was to a dereference of a &mut reference (as happened in this case explicitly) that is impossible/UB because that would mean the &mut reference is doubly-borrowed.

However, you can also assign to a dereference of a *mut pointer (within an unsafe block). That case would be just underdefined, but these assignments are uncommon enough I suppose we should just give them the same semantics as if the pointer was reborrowed to a &mut for the duration of the assignment (remembering that "the assignment" here is just the call to the destructor and the memcpy, and does not include the evaluation of the RHS).

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 14, 2015

I think I will just close this issue and reopen it under the appropriate title because the issues are separate enough.

@arielb1 arielb1 closed this Dec 14, 2015

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Dec 15, 2015

Hmm I think that if we fix #30380 we will still want an independent regression test for this issue.

Reopening (with expectation that it will be flagged as needstest after other other bug in depends sues fixed)

@pnkfelix pnkfelix reopened this Dec 15, 2015

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Dec 15, 2015

@pnkfelix

@alexcrichton's PR definitely did come with a test. The TLS problem was aliasing violation (UB) triggered by accessing a value in the middle of its destructor.

@arielb1 arielb1 closed this Dec 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.