Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression from stable: pointer to usize conversion no longer compiles #54709

Closed
mjbshaw opened this issue Oct 1, 2018 · 44 comments
Closed

Regression from stable: pointer to usize conversion no longer compiles #54709

mjbshaw opened this issue Oct 1, 2018 · 44 comments
Labels
A-const-eval Area: constant evaluation (mir interpretation) disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Milestone

Comments

@mjbshaw
Copy link
Contributor

mjbshaw commented Oct 1, 2018

The following code works with Rust 1.29.0:

union TransmuteHack<T: Copy, U: Copy> {
    from: T,
    to: U,
}

pub static VALUE: usize = 42;
pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

But it fails with the current beta (and nightly):

error[E0080]: this static likely exhibits undefined behavior

 --> <source>:7:1

  |

7 | pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered a pointer, but expected the type usize

  |

  = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rust compiler repository if you believe it should not be considered undefined behavior

error: aborting due to previous error

For more information about this error, try `rustc --explain E0080`.

Compiler returned: 1

I think this code should still be valid. I don't see any reason for this to be undefined behavior. This is a hack, yes, but it's very useful for FFI-related work.

@dtolnay
Copy link
Member

dtolnay commented Oct 1, 2018

This check was added in #51361 so mentioning @oli-obk and @RalfJung. It looks quite intentional and justifiable to reject this code. The same code without the unsafe hack is already disallowed in 1.29.0.

pub static ADDRESS: usize = &VALUE as *const usize as usize;
error[E0018]: raw pointers cannot be cast to integers in statics
 --> src/lib.rs:7:29
  |
7 | pub static ADDRESS: usize = &VALUE as *const usize as usize;
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would expect this to be reconsidered after stabilizing #51910. Here is the relevant message on nightly:

error[E0658]: casting pointers to integers in statics is unstable (see issue #51910)
 --> src/lib.rs:7:29
  |
7 | pub static ADDRESS: usize = &VALUE as *const usize as usize;
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = help: add #![feature(const_raw_ptr_to_usize_cast)] to the crate attributes to enable

@oli-obk oli-obk added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. A-const-eval Area: constant evaluation (mir interpretation) labels Oct 1, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2018

Yea, this kind of constant is very dangerous. I personally don't think there can be a good enough use case to outweigh the problems with it.

You can always use a *const () for FFI, which also makes your intent clearer. Pointers are not integers and treating them like that will get you into trouble not just in Rust

@RalfJung RalfJung added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Oct 1, 2018
@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

Agreed, this was a deliberate change in behavior (and we cratered it, I assume?). The union hack was more of a loophole that we closed.

Now, to be fair, this is a static and not a const. For const there are plenty of examples of such a "pointer disguising as an integer" causing problems. For static, however, the situation is less clear: Consts cannot refer to static, so the final value being bad for CTFE is not necessarily a big problem. And the final value is fine for run-time consumption. @oli-obk what do you think? Would it make sense to relax the validity check for statics? (The validity check needs an argument anyway to switch between checking const and runtime validity, because I want to use it in miri.)

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2018

A static can refer to another static's value though. and then we have the same problem.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2018

And I don't think we want to make referring to other statics be dependent on the other static's validity checks succeeding

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

Oh right, good point. Also, promoted code can refer to statics I assume. Same problem.

So, I think this issue should be closed as "this regression is a bugfix". The work-around is to no longer lie about the actual type of the data stored in that static, and make it e.g. of type *const usize.

@oli-obk oli-obk closed this as completed Oct 1, 2018
@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 1, 2018

A static can refer to another static's value though.

Can they? I thought a static could only refer to another static's address, not its value. Which is pretty different.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2018

Works just fine on stable: https://play.rust-lang.org/?gist=a07a6558a87a86b0ed2e23855f2b4c44&version=stable&mode=debug&edition=2015

This was implemented half a year ago by lifting a bunch of restrictions of the old const evaluator

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

And here's an example of how the static it complains about can cause a problem. There's no reasonable way to implement "divide a pointer by 2" in CTFE.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 1, 2018

Works just fine on stable: https://play.rust-lang.org/?gist=a07a6558a87a86b0ed2e23855f2b4c44&version=stable&mode=debug&edition=2015

This was implemented half a year ago by lifting a bunch of restrictions of the old const evaluator

Interesting! I wasn't aware of those changes. Thanks for the demo.

And here's an example of how the static it complains about can cause a problem. There's no reasonable way to implement "divide a pointer by 2" in CTFE.

I guess I don't really see what's wrong in that example. It fails to compile (which is correct) with a reasonable error message in a reasonable location. Seems okay to me. I don't see why this has to be preemptively prevented by disabling the union transmute hack.

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

Things get more tricky if the static with the "bad" usize is in a different crate than the place where it is used.

Also, if you write e.g. &(STATIC / 2) in normal code, it will get promoted to 'static lifetime. That will cause trouble if the STATIC does not follow the rules for const safety.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2018

Also, if you write e.g. &(STATIC / 2) in normal code, it will get promoted to 'static lifetime. That will cause trouble if the STATIC does not follow the rules for const safety.

That only works for constants, I hope. Accessing a static's value only works in statics.

But yes. Creating a bad constant and using it in runtime code can cause everything from weird errors to undefined behaviour if used in runtime places.

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

Still this is probably sth. that some team should look at. AFAIK we do not have a team decision on this so far, and it seems prudent to get one. So maybe we should reopen and tag for, eh, T-lang?

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 1, 2018

Creating a bad constant and using it in runtime code can cause everything from weird errors to undefined behaviour if used in runtime places.

Yes, but usize doesn't have any "bad" values (I'm assuming "bad" means undefined behavior values (e.g. a zero NonZeroUsize) or a trap representation (e.g. a signaling NaN)), does it?

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2018

It doesn't at run-time (except for Uninit aka undef/poison, i.e., except for uninitialized data). It has for constant evaluation. See this blog post for some more thoughts on that.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 2, 2018

So, @RalfJung, after going through the blog post (you definitely understand this better than I do, so let me know if I've misunderstood this), I'm still not sure why this has to be disallowed for CTFE. The expression &VALUE is not a constant expression and should not be (and indeed cannot be) (fully) evaluated at compile time. The address of VALUE is determined by the linker (at best) or the executable loader at runtime. I don't think this expression should have all the strict const eval rules applied to it because it's not a constant expression.

@RalfJung
Copy link
Member

RalfJung commented Oct 2, 2018

Every expression you write in the initializer of a static or const is a constant expression and evaluated at compile time. If the value of a static or const is a pointer, we emit appropriate relocations for LLVM to handle, but that doesn't change the fact that all this is a constant expression -- e.g. if you do ptr.wrapping_offset(4).wrapping_offset(8), we will compute the offset 12 at compile time (well, we would if we allowed those methods to be called, but the underlying engine can handle all that).

What is the problem with using a pointer type here? Pointers are not integers, and for CTFE this is apparent even by looking at what we send to LLVM (for pointers, we need relocations). This distinction is important to describe optimizing compilers even if you ignore constant evaluation, but it surfaces even more in const context.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2018

The problem is that you can already do

let x: &'static u32 = &42;

on stable Rust since a year or so. We need to keep supporting it, so we're very careful to not allow any oddball features like pointer values with integer types. Since &SOME_CONSTANT is (almost) always promoted to a static, we also need to ensure that constants are somewhat sane (which they were by definition until miri came along).

It was an accident that we ever allwoed it.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 2, 2018

Some context about my use case: Objective-C type encoding uses different encodings for integers vs pointers. Some FFI-related data structures use usize rather than *mut T (because they pack extra tag flags in the low two bits) (and to be clear: I'm just writing the Rust FFI bindings; I'm not the author of the native library that I'm interacting with).

I could use *mut T but it would mean I would have to special-case the Objective-C type encodings for each struct that does this, and I'd probably have to create a union (of usize and *mut T) so I can assign the correct pointer to the static variable (and later at run time twiddle the bits as needed).

I'll stop pushing for this, though.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2018

so I can assign the correct pointer to the static variable (and later at run time twiddle the bits as needed).

Why can't you just use the *mut T and during your runtime twiddling convert to a usize, do the twiddling, and then convert back? That seems strictly more correct than storing a pointer in a usize just because you sometimes need to modify its bits direclty.

@RalfJung
Copy link
Member

RalfJung commented Oct 2, 2018

It might also be an option to turn this into a deny-by-default lint for cases when people really are asking for a footgun?

The interaction with promotion is ugly, though.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 5, 2018

Why can't you just use the *mut T and during your runtime twiddling convert to a usize, do the twiddling, and then convert back?

That's what I would need to do. But storing the value in *mut T means that the type encoding for the struct can no longer be automatically derived, since *mut T and usize are encoded differently.

It might also be an option to turn this into a deny-by-default lint for cases when people really are asking for a footgun?

I personally would love for this to be a lint. It can be a bit of a footgun, yes, but low level FFI programming is full of footguns.

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2018

The footgun here, however, is of a different nature: If you ever write &(ADDRESS/2), that code will fail to compile because the compiler thinks it can execute it at compile-time, and when it finds out it cannot, it is too late.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 6, 2018

Forgive my ignorance, but why is that a problem? Failure to compile is exactly what I'd expect in that scenario.

@RalfJung
Copy link
Member

RalfJung commented Oct 6, 2018

Do you really? This code compiles:

union TransmuteHack<T: Copy> {
    from: T,
    to: usize,
}

pub static VALUE: usize = 42;

fn main() {
    println!("{:?}", unsafe { TransmuteHack { from: &VALUE }.to } / 2);
}

But with promotion, this would not:

union TransmuteHack<T: Copy> {
    from: T,
    to: usize,
}

pub static VALUE: usize = 42;
pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

fn main() {
    println!("{:?}", ADDRESS / 2);
}

Notice that println! adds an & in front of the arguments. So in the second case, we have &(ADDRESS/2), which gets promoted and then fails during CTFE. And working around this is quite hard actually, you have to write something like (|| ADDRESS)() / 2 to disable promotion.

@oli-obk Why does the last example compile on stable? Shouldn't it promote and then fail...?

EDIT: Seems like we don't promote uses of a static. Interesting.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 8, 2018

For these I don't see the point in warning. If having a runtime local of reference type with NULL as its value is UB, why isn't a static once the program runs?

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2018

Oh I see, you are saying if the info is any good we should make it a hard error. Makes sense.

@pietroalbini pietroalbini added this to Triaged in 1.30 regressions via automation Oct 11, 2018
@pietroalbini pietroalbini added this to the 1.30 milestone Oct 11, 2018
@pietroalbini pietroalbini removed the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Oct 11, 2018
@pietroalbini pietroalbini modified the milestone: 1.30 Oct 11, 2018
@nikomatsakis
Copy link
Contributor

@RalfJung

So, as long as we decide we will never want to promote expressions depending on the value of a static

I didn't quite understand this part. You did say that "It seems that it is not possible to refer to statics in constants or promoteds", so I guess you mean that -- today -- everything should be fine, since the tainted value stays in "runtime land". (Also, since constants cannot refer to statics (e.g., const X: u32 = Y), so they can't sneak into a promoted indirectly that way.)

But, if we accept these sorts of statics, then in the future we could not lift those restrictions without the results depending on the precise value of the static?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 11, 2018

It's an either or. Either we allow referring to statics in constants and promoteds or we allow creating statics with const-unsafe values. We can't have both.

allow referring to statics don't allow referring to statics
allow statics with compile-time bad values can turn runtime code into hard errors without it being a problem at runtime OK
check statics' initializer value like const values are checked OK OK (today)

@RalfJung
Copy link
Member

Indeed. However, the bottom-left corner of that square has some other quirks -- we have to be very careful in what we let you do with the statics. Taking the address is always okay, but that remains the case even if we allow bad statics! So what remains is read-access to immutable statics (their type must be Freeze). We cannot allow read-access to mutable statics (static mut, but also static _: Cell<T>) because they could read differently at run-time.

So we don't lose all that much here.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 11, 2018

Addresses are not ok if we ever want to allow dereferencing. or do we already do that? Dereferencing a reference to a static is just as bad as directly accessing it

@RalfJung
Copy link
Member

Ah right. I think it'd be fine to never allow dereferencing in promoteds, but for const that seems awfully restrictive.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 11, 2018

These comments have helped clarify some of my initial confusion as to why ADDRESS/2 would be problematic. I didn't realize that:

  1. Constant promotion is a one-way street. That is, once the compiler decides to promote something to a constant and evaluate it at compile time, the compiler won't fall back and "demote" it to a runtime value if it determines it can't actually compute the value at compile time.
  2. Constant promotion is done in non-constant statements and functions (i.e. regular fns, not const fns).

I don't really understand why either of these things are the case. Item 1 would be taken care of if the compiler could "demote" something it had previously promoted and just fall back to evaluating it at run time. Item 2 seems to be unnecessary? The constant propagation optimization pass already exists, so I'm not sure what additional performance benefits are to be gained by doing item 2, and I can't see any additional non-performance benefits that come from constant promotion in regular fns.

I'm just a user of Rust, though, and obviously lack much of the big-picture design efforts and goals that are ongoing right now, so forgive me if my reply is super naive or outright wrong.

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2018

I'm not sure what additional performance benefits are to be gained by doing item 2

Promotion is not at all about performance. It's about letting you write

fn foo() -> &'static i32 { &42 } // note the lifetime!

I suggest you have a look at the RFC for some further motivation.

This also connects to item 1: The order of events is "we decide what to promote", "we do borrow checking", "we do const evaluation". Promoted things get the 'static lifetime, which is exploited by the borrow checker. If later const evaluation fails... well, bad luck, we already did borrow checking assuming it had 'static lifetime so we can't take that back.

@aturon
Copy link
Member

aturon commented Oct 11, 2018

@rfcbot fcp close

@rfcbot
Copy link

rfcbot commented Oct 11, 2018

Team member @aturon has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Oct 11, 2018
@RalfJung
Copy link
Member

Just for context, the lang team is suggesting to close this because it got accidentally stabilized. This could still be a useful feature, but then it should go through the usual process for new features (i.e., an RFC), and not happen by accident.

@nrc nrc removed the I-nominated label Oct 11, 2018
@mjbshaw
Copy link
Contributor Author

mjbshaw commented Oct 12, 2018

Promotion is not at all about performance. It's about letting you write

fn foo() -> &'static i32 { &42 } // note the lifetime!

I see; that makes sense. Thank you. Sorry I kept missing the significance of this.

Okay, well with that in mind I really can't argue against closing this. Thank you all (especially @RalfJung!) for looking into this thoroughly (and for your patience with me).

@pnkfelix
Copy link
Member

discussed at compiler team meeting. FCP (to close) in process.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Oct 18, 2018
@rfcbot
Copy link

rfcbot commented Oct 18, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@pnkfelix
Copy link
Member

closing ahead of actual release in a few hours.

@pietroalbini pietroalbini moved this from Triaged to Won't fix in 1.30 regressions Oct 26, 2018
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: constant evaluation (mir interpretation) disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
No open projects
1.30 regressions
  
Won't fix
Development

No branches or pull requests

10 participants