Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix variant index / discriminant confusion in uninhabited enum branching #89764

Merged
merged 3 commits into from
Jan 20, 2022

Conversation

tmiasko
Copy link
Contributor

@tmiasko tmiasko commented Oct 11, 2021

Fix confusion between variant index and variant discriminant. The pass
incorrectly assumed that for Variants::Single variant index is the same as
variant discriminant.

r? @wesleywiser

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 11, 2021
@tmiasko
Copy link
Contributor Author

tmiasko commented Oct 11, 2021

@RalfJung could you check if this makes sense from MIR semantics perspective. Thanks.

@tmiasko
Copy link
Contributor Author

tmiasko commented Oct 12, 2021

I was looking it if is possible to construct an example demonstrating bugs in
the previous implementation, but that seems to be surprisingly non-trivial,
due to minute detail.

The use of variant index instead of discriminant for single variant layout
seems the most problematic, but in that case MIR building wouldn't have a
reason to check the discriminant in the first place.

Alternatively, one could construct a case were one variant becomes uninhabited
after a substitution during inlining, but the pass never runs again after that.
(EDIT: ah, this would also require variants with data and repr annotation,
which in turn will result in Variants::Multiple which works fine).

The user written match on discriminant doesn't trigger the optimization either
due to mismatch in operand move vs operand copy and extra storage markers:

#![feature(core_intrinsics)]
use core::intrinsics::discriminant_value;

pub enum E { A = 1 }

pub fn f(e: &E) -> u32 {
    match {discriminant_value(e)} {
        1 => 1,
        _ => 0,
    }
}

fn main() {
    assert_eq!(f(&E::A), 1);
}

So this might not be an immediate soundness concern, but who knows ...

@RalfJung
Copy link
Member

RalfJung commented Oct 13, 2021

@RalfJung could you check if this makes sense from MIR semantics perspective. Thanks.

I'm sorry, but it is rather hard to figure out from a diff of a MIR optimization what the semantic question is. Could you be more explicit?

Sadly I won't have time to review this PR.

The pass incorrectly assumed
that Variants::Single indicates that there is a single inhabited variant.

This (for the 2nd time today^^) reminds me of discussions with @eddyb about enum layouts... yeah, Single really means "no more than one inhabited variant", which is indeed rather confusing. Do you want to submit a PR that updates the doc comment? That might help a bit, at least.

The pass
incorrectly assumed that for Variants::Single variant index is the same as
variant discriminant.

The read_discriminant and write_discriminant functions in the CTFE interpreter might be illuminating pieces of code to read. They are pretty well commented (feel free to complain if more comments would be helpful), and I think they are mostly getting things right these days, though there are still corner cases (#89765).

@wesleywiser
Copy link
Member

I'm sorry, but it is rather hard to figure out from a diff of a MIR optimization what the semantic question is. Could you be more explicit?

I think question is: is it valid to turn

bb1: {
    _2 = discriminant(_1);  // typeof(_1) is an uninhabited enum
    switchInt(move _2) -> [0_isize: bb3, 1_isize: bb4, otherwise: bb2]; 
}

into

bb1: {
    _2 = discriminant(_1);  // typeof(_1) is an uninhabited enum
    unreachable;
}

At a glance, that seems valid to me since if the type is uninhabited, we can't have a valid value in _1 anyway so this code should be entirely dead.

@RalfJung
Copy link
Member

RalfJung commented Oct 13, 2021

I guess the question is if we want to consider MIR like this to be UB:

enum Void {}

_1: Void

bb1 {
  _2 = discriminant(_1);
  return;
}

Due to #89765, Miri would currently ICE on such MIR. My preferred solution in that issue would say that asking for the discriminant of Void is like asking for the discriminant of (): both types do not really have discriminants, so we just return 0. That would mean it is not UB, and the optimization is not valid.

But I take it from your question that you would prefer to make this UB? What about the case where _1 has type !, or some other non-enum uninhabited type? What about the case where it has type bool but is currently initialized to an invalid value? We could say the Discriminant MIR primitive requires the place it operates on to be fully initialized, that would trivially give you what you want -- but it might be too aggressive?

@tmiasko
Copy link
Contributor Author

tmiasko commented Oct 13, 2021

The semantics question would be: if MIR reads an enum discriminant and switches on its value, do we require the discriminant value to be valid, and are valid discriminants those that correspond to inhabited enum variants only? (Note that this pass is currently concerned only with enums).

As far as I understand, the LLVM code generation already makes this assumption. If type is uninhabited, trying to read a discriminant produces undef and subsequently branching introduces undefined behaviour. If an enum is inhabited, there is a range metadata attached when loading the encoded discriminant and it accounts for uninhabited variants as well (although it is only a single range).

The main motivation for this PR was a fix mismatch between variant index and variant discriminant use, but given the current behavior of discriminant_for_variant it also forced me to make a decision what to do about enums without variants.

I also wanted to make sure we are on the same page with regards to the semantics, especially given #89765. It seems perfectly fine to me to make this more conservative if needs be.

@RalfJung
Copy link
Member

As far as I understand, the LLVM code generation already makes this assumption. If type is uninhabited, trying to read a discriminant produces undef

It does? Is that some explicit case in our codegen backend?

@tmiasko
Copy link
Contributor Author

tmiasko commented Oct 13, 2021

As far as I understand, the LLVM code generation already makes this assumption. If type is uninhabited, trying to read a discriminant produces undef

It does? Is that some explicit case in our codegen backend?

/// Obtain the actual discriminant of a value.
pub fn codegen_get_discr<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
self,
bx: &mut Bx,
cast_to: Ty<'tcx>,
) -> V {
let cast_to = bx.cx().immediate_backend_type(bx.cx().layout_of(cast_to));
if self.layout.abi.is_uninhabited() {
return bx.cx().const_undef(cast_to);
}

@RalfJung
Copy link
Member

Okay, that does not even check if the type is an enum. So either we need to special-case uninhabited types in Miri (and the semantics) or we need to say that the operand needs to satisfy the validity invariant. I think the former is a bad idea, the only thing that is special about uninhabited types is that their validity invariant is never satisfied.

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 14, 2021
@tmiasko
Copy link
Contributor Author

tmiasko commented Oct 16, 2021

If the current state is that there is a basis for the code generation and this MIR pass (in the valididy requirement for operand), but we have yet to decided this more formally, I think, it would make sense to land this, since it is a bug fix and does not make any new assumptions.

At the same time we can independently consider if we want to continue relying on this semantics, and whether the code generation and this MIR pass need to be changed both.

@RalfJung
Copy link
Member

Fair -- but then Miri should also be changed to actually check the validity invariant in read_discriminant, to match the (for now) intended semantics.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 8, 2021
@wesleywiser
Copy link
Member

Sorry for the delay, just getting caught up on GH notifications now.

Fair -- but then Miri should also be changed to actually check the validity invariant in read_discriminant, to match the (for now) intended semantics.

@RalfJung do you think we should change Miri before/in conjunction with this PR or are you simply saying we should do that at some point in the future?

@RalfJung
Copy link
Member

I am doing the Miri change in #90895.

@wesleywiser
Copy link
Member

Thanks @RalfJung!

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Nov 18, 2021
…=oli-obk

require full validity when determining the discriminant of a value

This resolves (for now) the semantic question that came up in rust-lang#89764: arguably, reading the discriminant of a value is 'using' that value, so we are in our right to demand full validity. Reading a discriminant is somewhat special in that it works for values of *arbitrary* type; all the other primitive MIR operations work on specific types (e.g. `bool` or an integer) and basically implicitly require validity as part of just "doing their job".

The alternative would be to just require that the discriminant itself is valid, if any -- but then what do we do for types that do not have a discriminant, which kind of validity do we check? [This code](https://github.com/rust-lang/rust/blob/81117ff930fbf3792b4f9504e3c6bccc87b10823/compiler/rustc_codegen_ssa/src/mir/place.rs#L206-L215) means we have to at least reject uninhabited types, but I would rather not special case that.

I don't think this can be tested in CTFE (since validity is not enforced there), I will add a compile-fail test to Miri:
```rust
#[allow(enum_intrinsics_non_enums)]
fn main() {
    let i = 2u8;
    std::mem::discriminant(unsafe { &*(&i as *const _ as *const bool) }); // UB
}
```

(I tried running the check even on the CTFE machines, but then it runs during ConstProp and that causes all sorts of problems. We could run it for ConstEval but not ConstProp, but that simply does not seem worth the effort currently.)

r? `@oli-obk`
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Nov 18, 2021
…=oli-obk

require full validity when determining the discriminant of a value

This resolves (for now) the semantic question that came up in rust-lang#89764: arguably, reading the discriminant of a value is 'using' that value, so we are in our right to demand full validity. Reading a discriminant is somewhat special in that it works for values of *arbitrary* type; all the other primitive MIR operations work on specific types (e.g. `bool` or an integer) and basically implicitly require validity as part of just "doing their job".

The alternative would be to just require that the discriminant itself is valid, if any -- but then what do we do for types that do not have a discriminant, which kind of validity do we check? [This code](https://github.com/rust-lang/rust/blob/81117ff930fbf3792b4f9504e3c6bccc87b10823/compiler/rustc_codegen_ssa/src/mir/place.rs#L206-L215) means we have to at least reject uninhabited types, but I would rather not special case that.

I don't think this can be tested in CTFE (since validity is not enforced there), I will add a compile-fail test to Miri:
```rust
#[allow(enum_intrinsics_non_enums)]
fn main() {
    let i = 2u8;
    std::mem::discriminant(unsafe { &*(&i as *const _ as *const bool) }); // UB
}
```

(I tried running the check even on the CTFE machines, but then it runs during ConstProp and that causes all sorts of problems. We could run it for ConstEval but not ConstProp, but that simply does not seem worth the effort currently.)

r? ``@oli-obk``
@RalfJung
Copy link
Member

It seems like MIR building does not agree with the idea that only valid values have their discriminant read: #91029.

@RalfJung
Copy link
Member

Furthermore, this triggers a Stacked Borrows violation here because optional.is_some() now reads the entire Option value, not just the discriminant, which is in conflict with the mutable reference to the contents of that Option.

This seems like a pretty clear sign that reading a discriminant should do the minimal amount of work necessary to determine the discriminant value -- but then there is no justification for this part of codegen. Cc @eddyb

@tmiasko
Copy link
Contributor Author

tmiasko commented Nov 19, 2021

It seems like MIR building does not agree with the idea that only valid values have their discriminant read: #91029.

Ah, after a partial move from a variant an elaborated drop still need to read the enum discriminant to determine which variant is active.

This seems like a pretty clear sign that reading a discriminant should do the minimal amount of work necessary to determine the discriminant value

This suggest an approach which requires only a discriminant to be valid?

@RalfJung
Copy link
Member

Ah, after a partial move from a variant an elaborated drop still need to read the enum discriminant to determine which variant is active.

I did not realize we support partial moves out of enums. In principle we do not need to re-read the discriminant since the after the partial move we know which variant it is in, I think... but that is probably tricky to do.

This suggest an approach which requires only a discriminant to be valid?

Well, that's what we did before. But enums without variants do not have a discriminant so it is trivially valid. Thus the optimization you want to do is unsound under this approach.

@tmiasko
Copy link
Contributor Author

tmiasko commented Nov 19, 2021

This suggest an approach which requires only a discriminant to be valid?

Well, that's what we did before. But enums without variants do not have a discriminant so it is trivially valid.

Another way to look at this would be to say that they are never valid.

Sorry for the delay, just getting caught up on GH notifications now.

Fair -- but then Miri should also be changed to actually check the validity invariant in read_discriminant, to match the (for now) intended semantics.

RalfJung do you think we should change Miri before/in conjunction with this PR or are you simply saying we should do that at some point in the future?

Anyway, I am worried that this discussion distracted us from the actual purpose of this pull request, and delayed fixing the issues it was intended to fix, while being completely about preexisting state of the matter.

@RalfJung
Copy link
Member

Another way to look at this would be to say that they are never valid.

That would be a peculiar special case. Many types do not have discriminants and asking for their discriminant is always fine. I am not a fan if special-casing zero-variant enums. The codegen backend goes even further and specializes uninhbaited types. But again it seems entirely ad-hoc that (i32, ()) has an always-valid discriminant and (i32, !) has an always-invalid discriminant -- I don't like such special cases, they make everything more complicated and harder to reason about.

If we really want to treat zero-variant enums separately we need a Variants::None layout so we can properly reflect this in a type's layout. Currently, zero-variant enums are Variants::Single and we need a lot of ad-hoc special casing all over the place because of this.

Anyway, I am worried that this discussion distracted us from the actual purpose of this pull request, and delayed fixing the issues it was intended to fix, while being completely about preexisting state of the matter.

Well, we learned that the preeexistng semantics are weird at best, and IMO we should change the codegen backend to no longer special-case uninhabited types.

For some time CTFE has been using a dedicated MIR which is never
optimized, so the check for promoted became redundant.
The change is limited to the iteration over indices instead of using
`basic_blocks_mut()` directly, in the case the previous implementation
intentionally avoided invalidating the caches stored in MIR body.
Previously for enums using the `Variants::Single` layout, the variant
index was being confused with its discriminant. For example, in the case
of `enum E { A = 1 }`.

Use `discriminant_for_variant` to avoid the issue.
@tmiasko
Copy link
Contributor Author

tmiasko commented Nov 20, 2021

@tmiasko tmiasko changed the title Fix uninhabited enum branching Fix variant index / discriminant confusion in uninhabited enum branching Nov 20, 2021
RalfJung added a commit to RalfJung/rust that referenced this pull request Nov 20, 2021
…value"

This reverts commit 0a2b7d7, reversing
changes made to 47c1bd1.
This caused several unforeseen problems:
- rust-lang#91029
- rust-lang#89764 (comment)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 20, 2021
Revert "require full validity when determining the discriminant of a value"

This reverts commit 0a2b7d7, reversing
changes made to 47c1bd1.
This caused several unforeseen problems:
- rust-lang#91029
- rust-lang#89764 (comment)

So I think it's best to revert for now while we keep discussing the MIR semantics of getting a discriminant.

r? `@oli-obk`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 20, 2021
Revert "require full validity when determining the discriminant of a value"

This reverts commit 0a2b7d7, reversing
changes made to 47c1bd1.
This caused several unforeseen problems:
- rust-lang#91029
- rust-lang#89764 (comment)

So I think it's best to revert for now while we keep discussing the MIR semantics of getting a discriminant.

r? `@oli-obk`
@RalfJung
Copy link
Member

I opened #91095 for the MIR semantics issue. This already exists in master so it does not have to block this PR.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 12, 2021
Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a test that shows the effect of this change but I don't think it needs to block merging this.

@wesleywiser
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 20, 2022

📌 Commit c3e71d8 has been approved by wesleywiser

@bors
Copy link
Contributor

bors commented Jan 20, 2022

🌲 The tree is currently closed for pull requests below priority 600. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 20, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 20, 2022
…askrgr

Rollup of 13 pull requests

Successful merges:

 - rust-lang#89747 (Add MaybeUninit::(slice_)as_bytes(_mut))
 - rust-lang#89764 (Fix variant index / discriminant confusion in uninhabited enum branching)
 - rust-lang#91606 (Stabilize `-Z print-link-args` as `--print link-args`)
 - rust-lang#91694 (rustdoc: decouple stability and const-stability)
 - rust-lang#92183 (Point at correct argument when async fn output type lifetime disagrees with signature)
 - rust-lang#92582 (improve `_` constants in item signature handling)
 - rust-lang#92680 (intra-doc: Use the impl's assoc item where possible)
 - rust-lang#92704 (Change lint message to be stronger for &T -> &mut T transmute)
 - rust-lang#92861 (Rustdoc mobile: put out-of-band info on its own line)
 - rust-lang#92992 (Help optimize out backtraces when disabled)
 - rust-lang#93038 (Fix star handling in block doc comments)
 - rust-lang#93108 (:arrow_up: rust-analyzer)
 - rust-lang#93112 (Fix CVE-2022-21658)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit d188287 into rust-lang:master Jan 20, 2022
@rustbot rustbot added this to the 1.60.0 milestone Jan 20, 2022
@tmiasko tmiasko deleted the uninhabited-enums branch January 20, 2022 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants