Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New destructor semantics #8861

Closed
5 of 7 tasks
nikomatsakis opened this issue Aug 29, 2013 · 11 comments · Fixed by #21972
Closed
5 of 7 tasks

New destructor semantics #8861

nikomatsakis opened this issue Aug 29, 2013 · 11 comments · Fixed by #21972
Assignees
Labels
A-destructors Area: destructors (Drop, ..) A-typesystem Area: The type system P-medium Medium priority

Comments

@nikomatsakis
Copy link
Contributor

Tracking issue for RFC #769, Sound Generic Drop.

Original description follows:

We agreed in a meeting to replace the unsafe destructor code with the rule that a value with a destructor must only contain values of lifetime strictly greater than the value to be destructed. The idea is to prevent values from referencing one another. Permitting borrowed values in destructors enables a number of RAII causes and helps to eliminate the need for once fns. This also relies on the change to prevent & pointers from being placed within managed boxes (which I think has already been made).

I have to figure out precisely how to formalize this rule still. =)

Nominating for backwards compat.


Update from @pnkfelix : Much of this is implemented, and so pnkfelix is listing out specific subtasks, issues, and/or PRs, to try to track what is done and what is left to do:

Far future / No longer deemed necessary

(See also #22321)

@catamorphism
Copy link
Contributor

Accepted for well-defined

@nikomatsakis
Copy link
Contributor Author

Here is a specific example, from #13246, of unsound code we currently accept due to inadequate rules:

This program:

struct Foo { a: int }

impl Drop for Foo {
    fn drop(&mut self) {
        println!("{}", self.a);
    }
}

fn main() {
    {
        let _1 = Foo { a: 1 };
        let _2 = Foo { a: 2 };
        match Foo { a: 3 } {
            _ => {}
        }
    }
    let _4 = Foo { a: 4 };
}

should output

3
2
1
4

but instead outputs

2
1
3
4

It appears that if the bare block around the first part of main is removed, everything runs in the correct order. Moving the "3" Foo into a variable also makes everything run in the right order.

This was originally reported to me in sfackler/rust-postgres#31 as a segfault, but it looks like that's probably just due to heap corruption from a use-after-free caused by this bug.

Update

IR:

join:                                             ; preds = %case_body
  call void @_ZN3Foo14glue_drop.146917h02f639dd107d06daE(%struct.Foo* %_2)
  call void @_ZN3Foo14glue_drop.146917h02f639dd107d06daE(%struct.Foo* %_1)
  call void @_ZN3Foo14glue_drop.146917h02f639dd107d06daE(%struct.Foo* %0) ; <-- 3
  %7 = getelementptr inbounds %struct.Foo* %_4, i32 0, i32 1
  store i8 1, i8* %7
  %8 = getelementptr inbounds %struct.Foo* %_4, i32 0, i32 0
  store i64 4, i64* %8
  call void @_ZN3Foo14glue_drop.146917h02f639dd107d06daE(%struct.Foo* %_4)
  ret void

@pcwalton
Copy link
Contributor

Another problem:

use std::cell::RefCell;
fn main() {
    let b = {
        let a = box RefCell::new(4);
        *a.borrow() + 1
    };
    println!("{}", b);
}
$ valgrind ./foo
==9880== Memcheck, a memory error detector
==9880== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==9880== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==9880== Command: ./foo
==9880== 
==9880== Invalid read of size 8
==9880==    at 0x403AC2: cell::Cell$LT$T$GT$::get::h14749520627650063431::v0.0 (in /home/alex/foo)
==9880==    by 0x403C2A: cell::Ref$LT$$x27b$C$$x20T$GT$.Drop::drop::h5517615880807730945::v0.0 (in /home/alex/foo)
==9880==    by 0x40406A: core..cell..Ref$LT$$x27_$C$int$GT$::glue_drop.1537::h7a74728680c53b3a (in /home/alex/foo)
==9880==    by 0x4031E1: main::hfd2b07b8610388fafaa::v0.0 (in /home/alex/foo)
==9880==    by 0x447512: start::closure.7199 (in /home/alex/foo)
==9880==    by 0x463862: task::Task::run::closure.5303 (in /home/alex/foo)
==9880==    by 0x466C4B: rust_try (in /home/alex/foo)
==9880==    by 0x465565: unwind::try::h7d720262cba71fd0fGd::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x463714: task::Task::run::hac578e5b353eedcbVVc::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x4472CD: start::h0149aeff7c3a4b69Jme::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x447098: lang_start::h3e00b03c17efac073le::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x40332E: main (in /home/alex/foo)
==9880==  Address 0x6014058 is 8 bytes inside a block of size 16 free'd
==9880==    at 0x471379: je_dallocx (in /home/alex/foo)
==9880==    by 0x40371F: heap::deallocate::h7bc897bb314e6fc3Bfa::v0.0 (in /home/alex/foo)
==9880==    by 0x403699: heap::exchange_free::h3a6062ae495b91fcefa::v0.0 (in /home/alex/foo)
==9880==    by 0x403FA0: _$UP$i8::glue_drop.1534::h08c160eb1ce0fd99 (in /home/alex/foo)
==9880==    by 0x4031D4: main::hfd2b07b8610388fafaa::v0.0 (in /home/alex/foo)
==9880==    by 0x447512: start::closure.7199 (in /home/alex/foo)
==9880==    by 0x463862: task::Task::run::closure.5303 (in /home/alex/foo)
==9880==    by 0x466C4B: rust_try (in /home/alex/foo)
==9880==    by 0x465565: unwind::try::h7d720262cba71fd0fGd::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x463714: task::Task::run::hac578e5b353eedcbVVc::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x4472CD: start::h0149aeff7c3a4b69Jme::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x447098: lang_start::h3e00b03c17efac073le::v0.11.0.pre (in /home/alex/foo)
==9880== 
==9880== Invalid write of size 8
==9880==    at 0x403B4A: cell::Cell$LT$T$GT$::set::h11020375487951902910::v0.0 (in /home/alex/foo)
==9880==    by 0x403DA8: cell::Ref$LT$$x27b$C$$x20T$GT$.Drop::drop::h5517615880807730945::v0.0 (in /home/alex/foo)
==9880==    by 0x40406A: core..cell..Ref$LT$$x27_$C$int$GT$::glue_drop.1537::h7a74728680c53b3a (in /home/alex/foo)
==9880==    by 0x4031E1: main::hfd2b07b8610388fafaa::v0.0 (in /home/alex/foo)
==9880==    by 0x447512: start::closure.7199 (in /home/alex/foo)
==9880==    by 0x463862: task::Task::run::closure.5303 (in /home/alex/foo)
==9880==    by 0x466C4B: rust_try (in /home/alex/foo)
==9880==    by 0x465565: unwind::try::h7d720262cba71fd0fGd::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x463714: task::Task::run::hac578e5b353eedcbVVc::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x4472CD: start::h0149aeff7c3a4b69Jme::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x447098: lang_start::h3e00b03c17efac073le::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x40332E: main (in /home/alex/foo)
==9880==  Address 0x6014058 is 8 bytes inside a block of size 16 free'd
==9880==    at 0x471379: je_dallocx (in /home/alex/foo)
==9880==    by 0x40371F: heap::deallocate::h7bc897bb314e6fc3Bfa::v0.0 (in /home/alex/foo)
==9880==    by 0x403699: heap::exchange_free::h3a6062ae495b91fcefa::v0.0 (in /home/alex/foo)
==9880==    by 0x403FA0: _$UP$i8::glue_drop.1534::h08c160eb1ce0fd99 (in /home/alex/foo)
==9880==    by 0x4031D4: main::hfd2b07b8610388fafaa::v0.0 (in /home/alex/foo)
==9880==    by 0x447512: start::closure.7199 (in /home/alex/foo)
==9880==    by 0x463862: task::Task::run::closure.5303 (in /home/alex/foo)
==9880==    by 0x466C4B: rust_try (in /home/alex/foo)
==9880==    by 0x465565: unwind::try::h7d720262cba71fd0fGd::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x463714: task::Task::run::hac578e5b353eedcbVVc::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x4472CD: start::h0149aeff7c3a4b69Jme::v0.11.0.pre (in /home/alex/foo)
==9880==    by 0x447098: lang_start::h3e00b03c17efac073le::v0.11.0.pre (in /home/alex/foo)
==9880== 
5
==9880== 
==9880== HEAP SUMMARY:
==9880==     in use at exit: 0 bytes in 0 blocks
==9880==   total heap usage: 18 allocs, 18 frees, 2,016 bytes allocated
==9880== 
==9880== All heap blocks were freed -- no leaks are possible
==9880== 
==9880== For counts of detected and suppressed errors, rerun with: -v
==9880== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

As much as I'd love to punt on things for 1.0, this seems like a serious issue we need to fix.

pcwalton added a commit to pcwalton/rust that referenced this issue Sep 16, 2014
to have strictly greater lifetime than the values themselves.

Additionally, remove `#[unsafe_destructor]` in favor of these new
restrictions.

This broke a fair amount of code that had to do with `RefCell`s. We will
probably need to do some work to improve the resulting ergonomics. Most
code that broke looked like:

    match foo.bar.borrow().baz { ... }
    use_foo(&mut foo);

Change this code to:

    {
        let bar = foo.bar.borrow();
        match bar.baz { ... }
    }
    use_foo(&mut foo);

This fixes an important memory safety hole relating to destructors,
represented by issue rust-lang#8861.

[breaking-change]
bors added a commit that referenced this issue Nov 20, 2014
…ctor, r=nikomatsakis

(Previously, scopes were solely identified with NodeId's; this
refactoring prepares for a future where that does not hold.)

Ground work for a proper fix to #8861.
@pythonesque
Copy link
Contributor

Does this rule mean code like this would be invalid?

extern crate arena;

fn main() {
    struct Foo<'a> { foo: ::std::cell::Cell<Option<&'a Foo<'a>>> }
    let foo = arena::TypedArena::new();
    let bar = foo.alloc(Foo { foo: ::std::cell::Cell::new(None) });
    bar.foo.set(Some(&*bar));
}

If so, I'm not really on board with it, as this is a very useful pattern (and is perfectly safe in this case), as this change seems to force the use of *const or *mut instead of lifetimes if you ever want cycles. This makes these cases less safe because the borrow checker will no longer verify that the stored collection doesn't move and isn't dropped until you're done with it.

@pythonesque
Copy link
Contributor

As an addendum: this only seems to be a memory safety issue when two things occur together:

  • A struct field has lifetime equal to that of the struct itself.
  • The struct has an explicit destructor.

If the struct does not have an explicit destructor, then for each field, either the field is a reference (in which case there is no drop glue, so it can't cause memory unsafety), the field is a non-reference and has a lifetime equal to itself (in which case we can recursively perform the same check), or the field is a non-reference with a lifetime strictly greater than itself (in which case it's fine and we can move on to the next field).

This may not apply to TypedArena, depending on how the rule is interpreted, but it allows for many other safe patterns.

@aturon
Copy link
Member

aturon commented Jan 8, 2015

Nominating for -beta milestone.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 8, 2015

Moving ahead to the 1.0-beta milestone.

@pnkfelix pnkfelix added this to the 1.0 beta milestone Jan 8, 2015
@pnkfelix pnkfelix removed this from the 1.0 milestone Jan 8, 2015
pcwalton referenced this issue in servo/rust-cssparser Jan 8, 2015
The tokenizer is now fully incremental, rather than yielding a block/function at a time.

The convention is now to take `&mut Parser` as input for parsing, and only consume as necessary.
pnkfelix added a commit to pnkfelix/rust that referenced this issue Jan 27, 2015
This new variant introduces finer-grain code extents, i.e. we now
track that a binding lives only for a suffix of a block, and
(importantly) will be dropped when it goes out of scope *before* the
bindings that occurred earlier in the block.

Both of these notions are neatly captured by marking the block (and
each suffix) as an enclosing scope of the next suffix beneath it.

This is work that is part of the foundation for issue rust-lang#8861.

(It actually has been seen in earlier posted pull requests; I have
just factored it out into its own PR to ease my own rebasing.)

----

These finer grained scopes do mean that some code is newly rejected by
`rustc`; for example:

```rust
let mut map : HashMap<u8, &u8> = HashMap::new();
let tmp = Box::new(2);
map.insert(43, &*tmp);
```

This will now fail to compile with a message that `*tmp` does not live
long enough, because the scope of `tmp` is now strictly smaller than
that of `map`, and the use of `&u8` in map's type requires that the
borrowed references are all to data that live at least as long as the
map.

The usual fix for a case like this is to move the binding for `tmp`
up above that of `map`; note that you can still leave the initialization
in the original spot, like so:

```rust
let tmp;
let mut map : HashMap<u8, &u8> = HashMap::new();
tmp = box 2;
map.insert(43, &*tmp);
```

Similarly, one can encounter an analogous situation with `Vec`: one
would need to rewrite:

```rust
let mut vec = Vec::new();
let tmp = 'c';
vec.push(&tmp);
```

as:

```
let tmp;
let mut vec = Vec::new();
tmp = 'c';
vec.push(&tmp);
```

----

In some corner cases, it does not suffice to reorder the bindings; in
particular, when the types for both bindings need to reflect exactly
the *same* code extent, and a parent/child relationship between them
does not work.

In pnkfelix's experience this has arisen most often when mixing uses
of cyclic data structures while also allowing a lifetime parameter
`'a` to flow into a type parameter context where the type is
*invariant* with respect to the type parameter. An important instance
of this is `arena::TypedArena<T>`, which is invariant with respect
to `T`.

(The reason that variance is relevant is this: *if* `TypedArena` were
covariant with respect to its type parameter, then we could assign it
the longer lifetime when it is initialized, and then convert it to a
subtype (via covariance) with a shorter lifetime when necessary.  But
`TypedArena` is invariant with respect to its type parameter, and thus
if `S` is a subtype of `T` (in particular, if `S` has a lifetime
parameter that is shorter than that of `T`), then a `TypedArena<S>` is
unrelated to `TypedArena<T>`.)

Concretely, consider code like this:

```rust
struct Node<'a> { sibling: Option<&'a Node<'a>> }
struct Context<'a> {
    // because of this field, `Context<'a>` is invariant with respect to `'a`.
    arena: &'a TypedArena<Node<'a>>,
    ...
}
fn new_ctxt<'a>(arena: &'a TypedArena<Node<'a>>) -> Context<'a> { ... }
fn use_ctxt<'a>(fcx: &'a Context<'a>) { ... }

let arena = TypedArena::new();
let ctxt = new_ctxt(&arena);

use_ctxt(&ctxt);
```

In these situations, if you try to introduce two bindings via two
distinct `let` statements, each is (with this commit) assigned a
distinct extent, and the region inference system cannot find a single
region to assign to the lifetime `'a` that works for both of the
bindings. So you get an error that `ctxt` does not live long enough;
but moving its binding up above that of `arena` just shifts the error
so now the compiler complains that `arena` does not live long enough.

SO: What to do? The easiest fix in this case is to ensure that the two
bindings *do* get assigned the same static extent, by stuffing both
bindings into the same let statement, like so:

```rust
let (arena, ctxt): (TypedArena, Context);
arena = TypedArena::new();
ctxt = new_ctxt(&arena);

use_ctxt(&ctxt);
```

Due to the new code rejections outlined above, this is a ...

[breaking-change]
bors added a commit that referenced this issue Jan 27, 2015
 Add `CodeExtent::Remainder` variant; pre-req for new scoping/drop rules.

This new enum variant introduces finer-grain code extents, i.e. we now track that a binding lives only for a suffix of a block, and (importantly) will be dropped when it goes out of scope *before* the bindings that occurred earlier in the block.

Both of these notions are neatly captured by marking the block (and each suffix) as an enclosing scope of the next suffix beneath it.

This is work that is part of the foundation for issue #8861.

(It actually has been seen in earlier posted pull requests, in particular #21022; I have just factored it out into its own PR to ease my own near-future rebasing, and also get people used to the new rules.)

----

These finer grained scopes do mean that some code is newly rejected by `rustc`; for example:

```rust
let mut map : HashMap<u8, &u8> = HashMap::new();
let tmp = Box::new(2);
map.insert(43, &*tmp);
```

This will now fail to compile with a message that `*tmp` does not live long enough, because the scope of `tmp` is now strictly smaller than
that of `map`, and the use of `&u8` in map's type requires that the borrowed references are all to data that live at least as long as the map.

The usual fix for a case like this is to move the binding for `tmp` up above that of `map`; note that you can still leave the initialization in the original spot, like so:

```rust
let tmp;
let mut map : HashMap<u8, &u8> = HashMap::new();
tmp = box 2;
map.insert(43, &*tmp);
```

Similarly, one can encounter an analogous situation with `Vec`: one would need to rewrite:

```rust
let mut vec = Vec::new();
let tmp = 'c';
vec.push(&tmp);
```

as:

```rust
let tmp;
let mut vec = Vec::new();
tmp = 'c';
vec.push(&tmp);
```

----

In some corner cases, it does not suffice to reorder the bindings; in particular, when the types for both bindings need to reflect exactly the *same* code extent, and a parent/child relationship between them does not work.

In pnkfelix's experience this has arisen most often when mixing uses of cyclic data structures while also allowing a lifetime parameter `'a` to flow into a type parameter context where the type is *invariant* with respect to the type parameter. An important instance of this is `arena::TypedArena<T>`, which is invariant with respect to `T`.

(The reason that variance is relevant is this: *if* `TypedArena` were covariant with respect to its type parameter, then we could assign it
the longer lifetime when it is initialized, and then convert it to a subtype (via covariance) with a shorter lifetime when necessary.  But `TypedArena` is invariant with respect to its type parameter, and thus if `S` is a subtype of `T` (in particular, if `S` has a lifetime parameter that is shorter than that of `T`), then a `TypedArena<S>` is unrelated to `TypedArena<T>`.)

Concretely, consider code like this:

```rust
struct Node<'a> { sibling: Option<&'a Node<'a>> }
struct Context<'a> {
    // because of this field, `Context<'a>` is invariant with respect to `'a`.
    arena: &'a TypedArena<Node<'a>>,
    ...
}
fn new_ctxt<'a>(arena: &'a TypedArena<Node<'a>>) -> Context<'a> { ... }
fn use_ctxt<'a>(fcx: &'a Context<'a>) { ... }

let arena = TypedArena::new();
let ctxt = new_ctxt(&arena);

use_ctxt(&ctxt);
```

In these situations, if you try to introduce two bindings via two distinct `let` statements, each is (with this commit) assigned a distinct extent, and the region inference system cannot find a single region to assign to the lifetime `'a` that works for both of the bindings. So you get an error that `ctxt` does not live long enough; but moving its binding up above that of `arena` just shifts the error so now the compiler complains that `arena` does not live long enough.

 * SO: What to do? The easiest fix in this case is to ensure that the two bindings *do* get assigned the same static extent, by stuffing both
bindings into the same let statement, like so:

```rust
let (arena, ctxt): (TypedArena, Context);
arena = TypedArena::new();
ctxt = new_ctxt(&arena);

use_ctxt(&ctxt);
```

----

Due to the new code restrictions outlined above, this is a ...

[breaking-change]
@pnkfelix
Copy link
Member

pnkfelix commented Feb 9, 2015

for people who come to this issue url due to its many references on github, the code base, and IRC: I suspect the most relevant thing for you to look at right now is the Sound Generic Drop RFC PR: rust-lang/rfcs#769

bors added a commit that referenced this issue Feb 11, 2015
This is a resurrection and heavy revision/expansion of a PR that pcwalton did to resolve #8861.

The most relevant, user-visible semantic change is this: #[unsafe_destructor] is gone. Instead, if a type expression for some value has a destructor, then any lifetimes referenced within that type expression must strictly outlive the scope of the value.

See discussion on rust-lang/rfcs#769
@pnkfelix pnkfelix reopened this Feb 12, 2015
@pnkfelix
Copy link
Member

The description for #21972 accidentally closed this ticket, but there is still work to be done; reopening.

Also, assigning to myself since I pretty much own this topic at this point.

@pnkfelix pnkfelix assigned pnkfelix and unassigned nikomatsakis Feb 12, 2015
@pnkfelix
Copy link
Member

reclassifying P-high, not 1.0 since everything for 1.0 is pretty much done (hopefully `unsafe_destructor will indeed be gone).

@pnkfelix pnkfelix removed this from the 1.0 beta milestone Mar 26, 2015
@pnkfelix pnkfelix added P-medium Medium priority and removed P-backcompat-lang labels Mar 26, 2015
@pnkfelix
Copy link
Member

closing; the remaining (hypothetical or far-future) work items are tracked on #22321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-destructors Area: destructors (Drop, ..) A-typesystem Area: The type system P-medium Medium priority
Projects
None yet
6 participants