Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ZST pointer alignment properly #204

Closed
RalfJung opened this issue Jun 21, 2017 · 41 comments
Closed

Handle ZST pointer alignment properly #204

RalfJung opened this issue Jun 21, 2017 · 41 comments

Comments

@RalfJung
Copy link
Member

Rust nowadays uses align as *const _ for ZST pointers. The reason for this is that some ZSTs do care about their alignment, namely, empty arrays inherit the alignment of their element type: [i32; 0] must be 4-aligned. Using, e.g., a pointer to () as a pointer to [i32; 0] is not, in general, valid.

Miri currently treats all ZST pointers equal, so it does not properly detect this.

I suggest to fix this by getting rid of the ZST allocation and doing what rustc does: Using the alignment cast to a pointer. If you agree, I'd be happy to implement this.

@eddyb
Copy link
Member

eddyb commented Jun 21, 2017

What does miri use the ZST allocation for? MIR Box? It'd be nice if any special ZST behavior would be unobservable from CTFE.

@RalfJung
Copy link
Member Author

It is sued when the memory subsystem is asked for a 0-sized allocation.
It is also used as return pointer for functions that don't return -- I feel that should rather be an Undef ptr.

@RalfJung
Copy link
Member Author

It'd be nice if any special ZST behavior would be unobservable from CTFE.

I don't think this is entirely possible. There is code in libstd and liballoc that treats ZSTs special, like code for slices, arrays and Vec -- if CTFE is allowed to use that code, it can observe some of this special behavior.

@eddyb
Copy link
Member

eddyb commented Jun 22, 2017

Yes but that code is library code that chose to use certain values and that's fine.
I'm mostly worried about what the language itself does - if it goes through a library, we don't care.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 22, 2017

We have the zst alloc so we can be sure that references to zsts don't have an integral address, which would create a behaviour that differs from rustc's. We already differ by treating all zsts as being on the same address, so this doesn't hold anyway...

@eddyb
Copy link
Member

eddyb commented Jun 22, 2017

@oli-obk Yes but anything that's not dynamic allocation already can handle ZSTs like anything else, right? Just allocations with a size of 0 - the new non-allocated locals make that reasonable IMO.

@RalfJung
Copy link
Member Author

@oli-obk So you are saying ZSTs having non-integer address is a deliberate difference to the rustc semantics?

@eddyb I don't understand what you are suggesting here -- to keep things the way they are, or to make every ZST allocation a distinct 0-sized allocation, or something else?

@eddyb
Copy link
Member

eddyb commented Jun 22, 2017

@RalfJung To not special-case ZSTs at all (other than error in __rust_allocate).
Should also invoke the appropriate lang item for the MIR Box instead of implementing it in miri.

@RalfJung
Copy link
Member Author

Okay, so in particular, you suggest to get rid of the ZST allocation?

Should also invoke the appropriate lang item for the MIR Box instead of implementing it in miri.

Oh there's a lang item for this? That sounds like a great solution.

@eddyb
Copy link
Member

eddyb commented Jun 22, 2017

Now that you ask it out loud I'm not sure if trans bypasses it, but either way the logic exists somewhere in a library.

@RalfJung
Copy link
Member Author

Okay so the plan is:

  • Remove the ZST allocation.
  • Check for size 0 in __rust_allocate and friends, and error out.
  • Implement the box MIR operator by calling the exchange_malloc lang item.

Only trouble is, that lang item is only available if we have full MIR for libstd...

@RalfJung
Copy link
Member Author

Okay so @eddyb is fine with special-casing ZST when MIR is missing we can just "fall back" to the current special-casing. That's all a band-aid anyway. Sounds like we got a plan then.

@RalfJung
Copy link
Member Author

I noticed there is another case where miri allocates things (besides the box operator): When making a local ByRef. Not sure how ZSTs should be handled here -- in principle we could dispatch to the lang item here as well, but considering that force_allocation is called like a normal function, that would make things pretty messy.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 23, 2017

We can delay making force_allocation correct until something is decided on the sizeof::<&()>() == 0 front.

If all fails, we can push the exchange_malloc stack frame and start stepping inside force_allocation until said frame returns.

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

@RalfJung ByRef local should not require any special-casing. There ain't any in rustc_trans.

EDIT: Also, this is literally the & operator, calling exchange_malloc for that is just wrong.

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

@oli-obk size_of::<&()>() will never be 0, not without breaking a lot of code (mostly unsafe).

@oli-obk
Copy link
Contributor

oli-obk commented Jun 23, 2017

Then we either can't remove the zst alloc, or we need to decide on a constant value for &() as *const () as usize

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

As discussed on IRC, I'd rather pointers from different allocations not be comparable, at least in the CTFE subset, as there are many situations in which we don't guarantee they're always identical.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 23, 2017

ByRef local should not require any special-casing. There ain't any in rustc_trans.

So it makes them alloca(0)? I mean, rustc doesn't have have to be concerned with the ByRef/ByVal distinction, that's purely a miri thing.

As discussed on IRC, I'd rather pointers from different allocations not be comparable, at least in the CTFE subset, as there are many situations in which we don't guarantee they're always identical.

All pointers are comparable (as in, you can compare them with == and !=). I assume you mean they should not compare equal? That's fair enough, but then, guaranteeing that they will compare inequal is also a strong call.

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

The ByRef/ByVal distinction was copied from rustc_trans which has had it before MIR was a plan.

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

I mean that the comparison should error as an undecideable operation.

@RalfJung
Copy link
Member Author

Are we still talking just about ZSTs or about all pointers?

The ByRef/ByVal distinction was copied from rustc_trans which has had it before MIR was a plan.

Oh, that's somewhat surprising. I thought such detail would be LLVM's job.
So a ByRef translates to an alloca? And for ZSTs, it will be alloca(0)? Is that correct?

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

LLVM will turn allocas back into SSA variables but at a great cost, as many things fold quicker and waste less (bloated) IR RAM when they're in SSA form.

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

No two pointers which are constant allocations can be known to be unequal unless they have distinct contents. Emulated heap, locals and static globals have strong uniqueness for non-ZSTs, while ZSTs may alias (except the heap has its own rules because it's library code and statics may have uniqueness even for ZSTs, but I'm not sure).

@RalfJung
Copy link
Member Author

So &3 != &3 is guaranteed, but &() != &() is not ("strong uniqueness for non-ZSTs")? Sounds like ZSTs are a special case...

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

Well those are constant allocations so neither should be allowed - or we can guarantee deduplication but that is tricky cross-crate.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 23, 2017

I have implemented this in https://github.com/oli-obk/miri/commits/compile_time_feature_elimination along with some changes to allow comparing pointers to heap allocs and statics (since we know these can't alias. Only stack allocations and constants can alias and still be distinct). Still needs some work around constants, and we need to decide what to do about the issue of slice comparisons, since the stdlib compares pointers as an optimization. We could state that == errors if true and != errors if false since the issue is multiple things being the same alloc, and not one thing having multiple addresses. But that's still very surprising and annoying behaviour

@RalfJung
Copy link
Member Author

the stdlib compares pointers as an optimization

Pointers from different allocations?

To back up a little, is the goal here to guarantee that if miri does not error, then the behavior will agree with rustc? That's quite tough around pointer comparison. There's a reason I made the result of the comparison non-deterministic for some cases in my formal Rust model. ;)
Another source of possible disagreement is re-using allocations:

fn main() {
  let x = Box::new(0u32);
  let xaddr = &*x as uint;
  drop(x);
  let y = Box::new(032);
  let yaddr = &*y as uint;
  if xaddr == yaddr { ... }
}

The comparison actually can return true for some allocators. One way for miri to catch this would be to error out when comparing out-of-bounds or dangling pointers.

Oh of course and there's the fact that pointers in "real rustc" will overflow quicker than pointers in miri, because miri offsets start at 0, so you have to literally add 2^64 to overflow them (on 64bit). This is observable by programs that cast the pointer to usize and then do checked_add.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

We could make all pointers overflow at their allocation's size.

Rustc compares pointers from different allocs in the slice comparison impl in https://github.com/rust-lang/rust/blob/master/src/libcore/slice/mod.rs#L2531

Nondeterminism that we can make deterministic by deciding to always be true or false and still correct is fine in my book. E.g. ensuring that heap allocations never alias is a fine abstraction in miri imo, even if a real allocator impl might decide differently.

@eddyb
Copy link
Member

eddyb commented Jun 24, 2017

I think we should really not allow using deallocated pointers for anything, many things (including equality comparisons?) are UB on them, at least in C/LLVM.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

I think we can also compare pointers to mutable locals (always unequal if pointing to different locals) and pointers to mutable locals with pointers to immutable locals (always unequal).

Of course on a real hardware they can alias, but depending on the result causes nondeterministic behaviour.

@RalfJung
Copy link
Member Author

I think we should really not allow using deallocated pointers for anything, many things (including equality comparisons?) are UB on them, at least in C/LLVM.

In C, they become indeterminate values, true. However, last time I brought this up (can't find it right now), people assured me that this is fine to do in LLVM. This is important because Rust considers it safe.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

In miri we need to make the value derterministic, and it already is. A pointer to an allocation is always unequal to a dangling pointer. Dangling pointers can alias if they were produced from the same pointer.

@RalfJung
Copy link
Member Author

Right. That's a legal determinization of (what I consider to be) Rust's behavior. Sounds like a good choice to me. Which is why I don't understand why you want to change the rules now for some allocations, and make the equality test UB.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

The issue is comparing the addresses of locals and constants. Their equality or inequality isn't guaranteed in Rust, no matter what the value at the pointed to location is. Pointers to constants may alias depending on their pointee value or may be different even if the pointee value is the same. If we ensure that constants of the same alignment and bit representation have the same allocation, we could make this deterministic in a way that an ultimate llvm optimizer could also produce.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

My issue is not with UB but nondeterminism.

@RalfJung
Copy link
Member Author

But how is "equality or inequality between locals is not guaranteed" different from "equality or inequality between a dangling pointer and a live heap allocation is not guaranteed"? Both are non-deterministic in Rust, aren't they?

@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2017

I think it's much clearer with the heap, because of the uniqueness guarantee (ignoring dangling raw pointers, since an integer pointer in can also alias a heap allocation in Rust). The stack only has such guarantees for mutable allocations.

@RalfJung
Copy link
Member Author

I don't think that's enough of a difference to warrant extra treatment, extra complication. As far as I am concerned, this bug has been solved by merging #212.

That said, I don't feel too strongly about this.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 25, 2017

Well. We need talk about this, because it will mean that the result of a call to a const fn will differ between calling it at compile time and calling it at runtime even if given the same arguments. If Rust starts to automatically make things constant that formerly only an llvm optimization made constant, this will change the behaviour of the code.

Edit: this decision has further consequences: if we allow leaking this information why don't we also allow comparing pointers into different allocs with <?

@oli-obk
Copy link
Contributor

oli-obk commented Jun 26, 2017

I filed #217 to discus the const eval case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants