Box<dyn FnOnce> doesn't respect self alignment #68304

programmerjake · 2020-01-17T05:10:35Z

The following code fails on stable:
When compiled in release mode, the "optimized out" assert is optimized out by LLVM even though it checks the exact same condition as the other assert, as can be verified by testing in debug mode.

Note that it's theoretically possible for the stack to be aligned correctly such that the bug is suppressed, but that's not likely. The 256's can be replaced with larger alignments if that happens.

The bug is caused by the impl of Box<dyn FnOnce> copying self to the stack with the default alignment of 16.

#![allow(dead_code)]
#[repr(align(256))]
struct A {
    v: u8
}

impl A {
    fn f(&self) -> *const A {
        assert_eq!(self as *const A as usize % 256, 0, "optimized out");
        self
    }
}

fn f2(v: u8) -> Box<dyn FnOnce() -> *const A> {
    let a = A { v };
    Box::new(move || {
        a.f()
    })
}

#[test]
fn test() {
    let addr = f2(0)();
    assert_eq!(addr as usize % 256, 0, "addr: {:?}", addr);
}

(Playground)

The text was updated successfully, but these errors were encountered:

programmerjake · 2020-01-17T05:36:56Z

#61042 and #48055 seem related

programmerjake · 2020-01-17T05:43:17Z

This is a soundness hole.

RalfJung · 2020-01-17T12:12:11Z

So... I guess the problem is that the alloca that rustc emits for the unsized local doesn't use align_of_val? That's... kind of bad, because it doesn't look like alloca even supports dynamically setting the alignment: https://llvm.org/docs/LangRef.html#alloca-instruction

If this is correct, the issue should be renamed to something like "unsized locals do not respect alignment".

programmerjake · 2020-01-17T12:16:45Z

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

programmerjake · 2020-01-17T12:28:11Z

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

hanna-kruppe · 2020-01-17T12:39:55Z

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

Messing with the ABI wouldn't help with unsized locals in general (assuming @RalfJung is correct), since e.g. let f: dyn FnOnce() = ...; will also need an alloca with dynamic alignment in general.

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

oli-obk · 2020-01-17T13:26:06Z

Can we call alloca(align + size) and then align the pointer for the local ourselves as a workaround until we come up with a good solution?

crlf0710 · 2020-01-17T14:07:44Z

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

KamilaBorowska · 2020-01-17T14:20:29Z

@crlf0710

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

__builtin_alloca_with_align requires a constant alignment. Non constant alignments are rejected by Clang. This matches semantics of alloca LLVM instruction, so it's not helping, as alignment of value in dyn FnOnce is unknown at compile time when moving from Box<dyn FnOnce>.

int main(void) {
    int alignment = 16;
    __builtin_alloca_with_align(16, alignment);
}

KamilaBorowska · 2020-01-17T14:32:35Z

I think that overallocating for unsized locals of unknown alignment may make sense. You can alloca(sizeof + alignof - 1) and then make use of <*>::align_offset to compute the offset.

Sure, it increases stack requirements, but it seems acceptable for Box<dyn FnOnce>, and in most cases you end up with extra 7 bytes on stack or less. 256 alignment seems like an edge case where wasting 255 bytes doesn't matter.

oli-obk · 2020-01-17T15:00:56Z

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

programmerjake · 2020-01-17T16:20:38Z

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

#[repr(align(1048576))] // 1MiB
struct Overaligned(u8);

#[repr(align(16))]
struct Normal(u8);

trait TakeSelfByValue {
    fn f(self) {
        todo!()
    }
}

impl TakeSelfByValue for Overaligned {}
impl TakeSelfByValue for Normal {}

fn make_overaligned() -> Box<dyn TakeSelfByValue> {
    Box::new(Overaligned(0)) // errors due to too strict alignment
}

fn make_normal() -> Box<dyn TakeSelfByValue> {
    Box::new(Normal(0)) // doesn't error -- alignment is under limit
}

RalfJung · 2020-01-17T18:55:21Z

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

The alignment is dynamically determined though, not sure if saving 15 bytes of stack space is worth the conditional branch?

oli-obk · 2020-01-17T19:20:24Z

Oh right. Yea that's probably not helpful.

hanna-kruppe · 2020-01-17T20:01:41Z

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

Since Fn : FnMut : FnOnce, this would be backwards incompatible (at latest) once trait object upcasting is finally implemented: dyn Fn and dyn FnMut trait objects would have to have the same alignment restriction (because you can upcast to dyn FnOnce), but today you can freely create such objects with any alignment.

eddyb · 2020-03-04T21:19:45Z

I was under the impression that calling Box<dyn FnOnce()> didn't require dynamic alloca, is that related to missing MIR optimizations, meaning we have unnecessary moves which end up duplicating the data on the stack using a dynamic alloca?

RalfJung · 2020-03-04T22:06:27Z

Correctness shouldn't rely on optimizations to fire...

eddyb · 2020-03-05T05:39:40Z

@RalfJung Sure, I was just hoping we weren't relying on "unsized locals", only unsized parameters (i.e. self), for what we're exposing as stable. And I was hoping an optimization wouldn't need to be involved to get there.

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere, since that would make me far more comfortable with Box<dyn FnOnce()>'s implementation or even stabilization.

Ironically, if that works, it's also the subset we could've implemented years earlier than we did, since most of the complexity, unknowns (and apparently bugs) are around dynamic allocas, not calling trait object methods on a dereference of Box.

bjorn3 · 2020-03-05T07:03:03Z

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere

That should be possible. I think everything that is necessary is performing the deref of the box as argument (call FnOnce::call_once(move *_1, _2)) instead of moving the box contents to a local first (_3 = move _1; call FnOnce::call_once(move _3, _2)) That would also make it possible to call Box<FnOnce> in cg_clif. Cranelift doesn't have alloca support, so the current MIR is untranslatable.

eddyb · 2020-03-05T07:21:24Z

@bjorn3 Hmm, but that's not trivially sound because it could lead to MIR calls where argument and return places overlap, which is UB.

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

Maybe we could introduce a temporary for the return value instead, for these kinds of virtual calls? That should remove any potential overlap while allowing the call.

(Unrelatedly, I just realize we use a copying-out shim for vtable entries that take self by value, even when Self would be passed as *Self at the call ABI level. We would probably have to make computing the call ABI a query for this microopt to not be expensive)

bjorn3 · 2020-03-05T08:35:40Z

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

That would result in foo.boxed_closure: Box<FnOnce> being moved to a temp first before dropping foo and calling the boxed closure. If it wasn't moved first before the drop of foo, then borrowck would give an error.

(I meant for this change to be done at mir construction level, not as an optimization pass, so borrowck should catch any soundness issues)

nikomatsakis · 2020-04-03T19:02:06Z

Can you not reinitialize *(x.y) today?

Hmm, that's a good point, I think you can do that. Sigh. I knew that seemed too easy. I remember we used to try and prevent that, but we never did it fully, and in the move to NLL we loosened those rules back up, so this compiles:

fn foo() {
    let mut x = Box::new(vec![22]);
    let y = take(*x);
    *x = vec![44];
}

fn take<T>(x: T) -> T { x }

Still, you could do my proposal for unsized types, though it would introduce an incongruity.

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

nikomatsakis · 2020-04-03T19:11:35Z

@eddyb

At the very least it seems like it would be easier to implement a change only to MIR building, and it should fix problems with Box<dyn FnOnce()> for the time being, even without a separate feature-gate.

To be clear, I wasn't proposing changing anything apart from MIR building either, I don't think.

IIUC, what you proposed is that foo(*x, ...) would be compiled to:

let tmp1 = ..;
..
let tmpN = ..;
foo(*x, tmp1 .. tmpN)

whereas under my proposal (limited to apply to unsized parameters, instead of moved parameters) would yield:

let tmp0 = x;
...
let tmpN = ...;
foo(*tmp0, tmp1..tmpN)

I think I mis-stated the problem earlier. It's not so much that your proposal will introduce compilation errors, it's that I think it changes the semantics of code like this in what I see as a surprising way:

x.foo({ x = Box::new(...); ... })

I believe that this code would compile to the following under your proposal:

// evaluate argument1 into tmp1:
x = Box::new(...);
tmp1 = ...;

// invoke:
foo(*x, tmp1)

which means that foo would be invoked with the new value of x, right?

Under my proposal, the code would still compile, but it would behave in what I see as the expected fashion. Specifically, it would generate:

tmp0 = x;

x = Box::new(...);
tmp1 = ...;

foo(*tmp0, tmp1)

eddyb · 2020-04-03T19:45:50Z

which means that foo would be invoked with the new value of x, right?

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

My point was that if your proposal affects sized moves, you're breaking stable reinitialization.
If it doesn't affect sized moves, I don't see how this applies anymore:

I think what I would like is that our overall behavior is as consistent as possible between Sized and Unsized parameters.

RalfJung · 2020-04-04T07:35:25Z

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

Let me just chime in here and say that this is a pain in Miri. Unsized locals are pretty different from sized locals, I had to introduce a bunch of special hacks to make them work. MIR locals even needed a new state just for them:

rust/src/librustc_mir/interpret/eval_context.rs

Lines 123 to 127 in 6050e52

    
           /// This local is alive but not yet initialized. It can be written to 
        
           /// but not read from or its address taken. Locals get initialized on 
        
           /// first write because for unsized locals, we do not know their size 
        
           /// before that. 
        
           Uninitialized,

Lazily allocating locals seems odd to me, but at least we can consistently do that for all locals. Elsewhere however we needed some special treatment to make this lazy initialization actually work out and catch mistakes where an unsized local is "overwritten" with a value of different size:

rust/src/librustc_mir/interpret/place.rs

Lines 891 to 901 in 6050e52

    
           // This interprets `src.meta` with the `dest` local's layout, if an unsized local 
        
           // is being initialized! 
        
           let (dest, size) = self.force_allocation_maybe_sized(dest, src.meta)?; 
        
           let size = size.unwrap_or_else(|| { 
        
               assert!( 
        
                   !dest.layout.is_unsized(), 
        
                   "Cannot copy into already initialized unsized place" 
        
               ); 
        
               dest.layout.size 
        
           }); 
        
           assert_eq!(src.meta, dest.meta, "Can only copy between equally-sized instances");

This all feels really unprincipled, and it would be great if at some point we could come up with a coherent semantics for unsized locals in MIR.
The worst part is that the size of a local is implicitly determined on the first write to it. It would be much better to be explicit about that, e.g. at StorageLive time.

nikomatsakis · 2020-04-06T15:02:02Z

@eddyb

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.

I don't think we have anything like this today, but presumably we could add such a thing, but I'm not quite sure what you have in mind. When does this prohibition on further accesses take effect?

In particular, the IR for foo(*x, ...) under your proposal, if I understood, would be

... // evaluate remaining arguments
foo(*x, ...) // invoke `foo` with `*x` directly, no temporary

But the access to x that (I believe) we wish to prevent takes place in the "evaluate remaining arguments" part of things. So I think we'd have to add some sort of "pseudo-move" that tells borrowck that x should not be assignable from that point forward, right?

My point was that if your proposal affects sized moves, you're breaking stable reinitialization. If it doesn't affect sized moves, I don't see how this applies anymore: [unsized should be congruent with sized]

Well, it depends on your perspective. It's true that, given the need to permit reinitialization, the check is now specific to unsized values -- i.e., MIR construction will be different depending on whether we know the type to be sized or not. That's unfortunate.

However, I think we still maintain a degree of congruence, from an end-user perspective. In particular, users could not have reinitialized a Box<T: ?Sized> anyhow.

However, under my proposal, users can write things like

x.foo({x = ...; ...})

and the code works as expected (versus either having an unexpected result or getting an error).

In any case, it seems like we're sort of "narrowing in" on a proposal here:

when we see a function call with a parameter that is ?Sized, and the value being passed is a deref, we want to modify the IR to not introduce a temporary and instead pass that parameter directly from its location,
either we do some sort of "no more assignments" lock,
or we move the pointer and introduce a temporary (and then pass in a deref of that temporary)

Does that sound right so far?

eddyb · 2020-04-06T15:31:22Z

If it's easier to do the more flexible approach (by moving the pointer) than the "locking" approach (which is a bit like borrowing the pointer until all arguments are evaluated, then releasing the borrow and doing a move instead), I don't mind it, I was only arguing with the "uniformity" aspect.

However, under my proposal, users can write things like
x.foo({x = ...; ...})
and the code works as expected (versus either having an unexpected result or getting an error).

Something important here: your approach is stabilizable, while mine is meant more for that one impl<A> FnOnce<A> for Box<dyn FnOnce<A>> in the standard library, and not much else.

nikomatsakis · 2020-04-07T14:38:13Z

Yes, I was kind of shooting for something that we could conceivably stabilize.

spastorino · 2020-04-08T19:06:34Z

Removing nomination, this was already discussed in triage meeting and it's under control.

nikomatsakis · 2020-04-10T19:23:21Z

So should we maybe try out the approach I proposed and see how well it works?

spastorino · 2020-04-13T14:58:31Z

@nikomatsakis yes, going to try that out.

nikomatsakis · 2020-04-14T14:21:36Z

For the record, the "new plan" is this.

Change to MIR construction

When generating the arguments to a call, we ordinarily will make a temporary tmp containing the value of each argument:

tmp0 = <arg expr>
...
tmpN = <arg expr>
call(tmp0, .., tmpN)

But we will modify this such that, for arguments that meet the following conditions:

The argument is moved (not copied),
and its type is not known to be Sized,
and the "place" has outer derefs (e.g., we are passing *P where P is some place),

then we assign the place P (instead of *P) to a temporary tmp and we make the argument to the function be *tmp (instead of tmp).

This means that given x(...) where x: Box<dyn FnOnce(...)>, we used to generate

let tmp0 = *x; // tmp0: dyn FnOnce(), requires an alloca
...
FnOnce::call(tmp0, ...)

but now we generate

let tmp0 = x; // tmp0: Box<dyn FnOnce()>, no alloca
...
FnOnce::call(*tmp0, ...)

Rationale

We are now moving the box itself, which is slightly different than the semantics for values known to be sized. In particular, it does not permit "re-initialization" of the box after having moved its contents. But that is not supported for unsized types anyway.

The change does permit modifications to the callee x during argument evaluation and they will work in an analogous way to how they would work with sized values. So given something like:

x.foo({ x = Box::new(...); ... })

we will find that we (a) save the old value of x to a temporary and then (b) overwrite x while evaluating the arguments and then (c) invoke the FnOnce with the old value of x.

nikomatsakis · 2020-04-14T14:36:48Z

Update: There is also some discussion of implementation details on Zulip

RalfJung · 2020-04-14T21:31:41Z

FnOnce::call(*tmp0, ...)

This needs MIR datastructure changes, right? It is not currently representable I think as fn arguments are Operand. (Cc #71117 (comment) where @eddyb argues that these "operands" should already have a weird semantics in case of move.)

eddyb · 2020-04-14T21:51:35Z

move *tmp0 is a perfectly normal operand, I don't know what you mean.
It's the same kind of Operand as we currently have in @nikomatsakis' let tmp0 = *x; (on the RHS)

RalfJung · 2020-04-15T06:40:42Z

Oh sorry, somehow I thought operands could only refer directly to locals, not also dereference them.

I think I am beginning to see the pieces here -- together with @eddyb-style Operand::Move function argument handling, this entirely avoids having to dynamically determine the size of a local. I have to agree that is quite elegant, even though baking "pass-by-reference" into MIR semantics still seems like a heavy hammer to me.

On the other hand this places a severe restriction on now unsized locals may be used, basically taking back large parts of rust-lang/rfcs#1909. Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)? Or should we amend the RFC or at least the tracking issue with the extra restrictions?

eddyb · 2020-04-15T12:14:21Z

I would keep the more general feature but use two different feature-gates (if we can).

And just like with a few other feature-gates (#![feature(const_generics)]) we could warn that #![feature(unsized_locals)] (the general one) is incomplete and could lead to misaligned variables (or something along those lines).

We could also add runtime asserts that the (dynamic) alignment is respected, once we know Box<dyn FnOnce()> could never trigger them (e.g. from stable code).

nikomatsakis · 2020-04-15T14:08:55Z

Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)?

I was wondering about that too. It seems like it'd be good to review the more general feature in any case.

This is partly why I brought up the idea of wanting a guaranteed "no alloca" path -- this was something I remember us discussing long ago, and so in some sense I feel like the work we're doing here isn't just working around LLVM's lack of dynamic alignment support (though it is doing that, too) but also working to complete the feature as originally envisioned.

nikomatsakis · 2020-04-15T14:11:21Z

I added two notes to the tracking issue #48055 so we don't overlook this in the future.

RalfJung · 2020-04-22T06:12:46Z

This is probably expected, but the fix failed to actually make unsized_locals sound: #71416.

jonas-schievink added C-bug Category: This is a bug. I-nominated I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 17, 2020

Centril added the P-high High priority label Jan 17, 2020

jonas-schievink added A-dst Area: Dynamically Sized Types F-unsized_locals `#![feature(unsized_locals)]` labels Jan 17, 2020

oli-obk closed this as completed Jan 17, 2020

oli-obk reopened this Jan 17, 2020

spastorino removed the I-nominated label Apr 8, 2020

nikomatsakis mentioned this issue Apr 14, 2020

Semantics of MIR function calls #71117

Open

aturon mentioned this issue Apr 15, 2020

Tracking issue for RFC #1909: Unsized Rvalues (unsized_locals, unsized_fn_params) #48055

Open

13 tasks

spastorino mentioned this issue Apr 15, 2020

Make Box<dyn FnOnce> respect self alignment #71170

Merged

bors closed this as completed in 45d050c Apr 21, 2020

RalfJung mentioned this issue Apr 22, 2020

unsized_locals fails to uphold alignment requirements #71416

Closed

Mark-Simulacrum mentioned this issue Apr 26, 2020

Implement Fn traits for Rc and Arc #71570

Closed

pnkfelix mentioned this issue Apr 29, 2020

Separate unsized_locals into "sound" and "unsound" (i.e. incorrectly implemented) parts #71694

Closed

spastorino mentioned this issue Apr 30, 2020

Dynamically select alignment of a dyn Trait local value #71695

Closed

RalfJung mentioned this issue May 19, 2020

Separate unsized fn param from unsized locals #72029

Closed

Box<dyn FnOnce> doesn't respect self alignment #68304

Box<dyn FnOnce> doesn't respect self alignment #68304

Comments

programmerjake commented Jan 17, 2020

programmerjake commented Jan 17, 2020

programmerjake commented Jan 17, 2020

RalfJung commented Jan 17, 2020

programmerjake commented Jan 17, 2020

programmerjake commented Jan 17, 2020

hanna-kruppe commented Jan 17, 2020

oli-obk commented Jan 17, 2020

crlf0710 commented Jan 17, 2020

KamilaBorowska commented Jan 17, 2020 • edited Loading

KamilaBorowska commented Jan 17, 2020 • edited Loading

oli-obk commented Jan 17, 2020

programmerjake commented Jan 17, 2020 • edited Loading

RalfJung commented Jan 17, 2020

oli-obk commented Jan 17, 2020

hanna-kruppe commented Jan 17, 2020

eddyb commented Mar 4, 2020

RalfJung commented Mar 4, 2020

eddyb commented Mar 5, 2020

bjorn3 commented Mar 5, 2020 • edited Loading

eddyb commented Mar 5, 2020

bjorn3 commented Mar 5, 2020 • edited Loading

nikomatsakis commented Apr 3, 2020 • edited Loading

nikomatsakis commented Apr 3, 2020

eddyb commented Apr 3, 2020 • edited Loading

RalfJung commented Apr 4, 2020

nikomatsakis commented Apr 6, 2020

eddyb commented Apr 6, 2020

nikomatsakis commented Apr 7, 2020

spastorino commented Apr 8, 2020

nikomatsakis commented Apr 10, 2020

spastorino commented Apr 13, 2020

nikomatsakis commented Apr 14, 2020

Change to MIR construction

Rationale

nikomatsakis commented Apr 14, 2020

RalfJung commented Apr 14, 2020

eddyb commented Apr 14, 2020

RalfJung commented Apr 15, 2020

eddyb commented Apr 15, 2020

nikomatsakis commented Apr 15, 2020

nikomatsakis commented Apr 15, 2020

RalfJung commented Apr 22, 2020

KamilaBorowska commented Jan 17, 2020 •

edited

Loading

KamilaBorowska commented Jan 17, 2020 •

edited

Loading

programmerjake commented Jan 17, 2020 •

edited

Loading

bjorn3 commented Mar 5, 2020 •

edited

Loading

bjorn3 commented Mar 5, 2020 •

edited

Loading

nikomatsakis commented Apr 3, 2020 •

edited

Loading

eddyb commented Apr 3, 2020 •

edited

Loading