Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the current MIR semantics that are clear from existing code #95320

Merged
merged 11 commits into from
Apr 12, 2022

Conversation

JakobDegen
Copy link
Contributor

This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out.

The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as // instead of /// comments?

If this is too big to review at once, I can split this up.

r? rust-lang/mir-opt

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 26, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 26, 2022
@JakobDegen
Copy link
Contributor Author

JakobDegen commented Mar 26, 2022

I've left a bunch of FIXME's in here. I may try and get to some of them before merging, but they should all be non-critical and can just be added later.

The test change has a relevant Zulip conversation

Also, I didn't add anything for SetDiscriminant pending #95125 . If we decide against such a change, I'll put up a PR documenting the current semantics

/// pointed to type.
///
/// **Needs clarification**: What about metadata resulting from dereferencing wide pointers (and
/// possibly from accessing unsized locals - not sure how those work)? That probably deserves to go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A place stores an address and an optional meta value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably the right thing to have. That being said, I'm going to leave this open until I get a chance to more thoroughly go through all of the documentation and deal with DSTs (or probably someone else should do that because I know very little about it)

/// Not all of these are allowed at every [`MirPhase`] - when this is the case, it's stated below.
///
/// Computing any rvalue begins by evaluating the places and operands in the rvalue in the order in
/// which they appear. These are then used to produce a "value" - the same kind of value that an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be left undefined in which order theybare evaluated.

Copy link
Contributor Author

@JakobDegen JakobDegen Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the ordering question to a "needs clarification" section in an effort to only document things which are already decided.

That being said, I don't think this is a good option. First of all, I'm not sure "order is left indeterminate" implies "any attempt on depending on the order is UB" - it might be the case that we instead get two distinct well-defined possibilities. More importantly though, this doesn't feel like a thing that MIRI can implement, which is concerning.

I also can't produce an example, but I would not be surprised if this is actually required for current Rust, since the evaluation order in Rust is well-defined. I can't get MIR building to produce such MIR, but it would at least imply that the temporaries produced by this are required:

pub fn foo(a: &mut i32) -> i32 {
    let derived = &mut *a;
    *derived + *a
}

// terribly important that they pass the validator. However, I think other passes might
// still see them, in which case they might be surprised. It would probably be better if we
// didn't put this through the MIR pipeline at all.
if matches!(body.source.instance, InstanceDef::Intrinsic(..) | InstanceDef::Virtual(..)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intrinsics may be codegened in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think at that point this check should be refined to verify those bodies that do get codegened only. Even better would be if code elsewhere got changed to stop passing intrinsics that don't get codegened through the MIR validator. cc @oli-obk

/// executed `StorageDead`? How about two `StorageLive`s without an intervening `StorageDead`?
/// Two `StorageDead`s without an intervening `StorageLive`? LLVM says yes, poison, yes. If the
/// answer to any of these is "no," is breaking that rule UB or is it an error to have a path in
/// the CFG that might do this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miri should be made to match whatever is decided. It is currently very relaxed.

///
/// **Needs clarification**: We currently require that the LHS place not overlap with any place
/// read as part of computation of the RHS. This requirement is under discussion in [#68364]. As
/// a part of this discussion, it is also unclear in what order the components are evaluated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think order should be left indetermimate so any attempt on depending on the order is UB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see the comment above - basically the same concerns apply here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving evaluation order indeterminate is a bad idea. We should specify some order eventually, and probably it should be left-to-right. (That's what Miri does.)

The disjointness thing can be enforced without leaving the order indeterminate, using the aliasing model instead. That's better because it can be checked by Miri.

@@ -187,7 +248,7 @@ pub enum TerminatorKind<'tcx> {
/// This allows the memory occupied by "by-value" arguments to be
/// reused across function calls without duplicating the contents.
args: Vec<Operand<'tcx>>,
/// Destination for the return value. If some, the call is converging.
/// Destination for the return value. If none, the call necessarily diverges.
destination: Option<(Place<'tcx>, BasicBlock)>,
Copy link
Member

@bjorn3 bjorn3 Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must not alias the place given to any Operand::Move passed as argument. All codegen backends depend on this property to allow passing Operand::Move by reference in case of a type not fitting inside registers without having to male a copy on the stack or having to restrict the callee to not mutate the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in a mention of that in the "needs clarification" section above. Based on the status of the linked issue, it seems that it is still unclear how we are justifying that optimization

@JakobDegen
Copy link
Contributor Author

cc @RalfJung - want to check that everything written here (of specific concern is the place and operand stuff) is at least compatible with your model of MIR

/// Assign statements roughly correspond to an assignment in Rust proper (`x = ...`) except
/// without the possibility of dropping the previous value (that must be done separately, if at
/// all). The *exact* way this works is undecided. It probably does something like evaluating
/// the LHS and RHS, and then doing the inverse of a place to value conversion to write the
Copy link
Member

@RalfJung RalfJung Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think "place to value conversion" is a terrible term, though I know it is standard in C. This operation accesses memory! It shouldn't be described as if it was a simple cast.

So I'd say something more along the lines of: it evaluates the LHS and RHS to places, loads a value from the place denoted by the RHS, and then stores that value into the place denoted by the LHS.

But also, this doesn't even do place-to-value conversion? The RHS is evaluated to a value, and that is stored to the place denoted by the LHS.

/// **Needs clarification**: Is it permitted to have two `StorageLive`s without an intervening
/// `StorageDead`? Two `StorageDead`s without an intervening `StorageLive`? LLVM says poison,
/// yes. If the answer to any of these is "no," is breaking that rule UB or is it an error to
/// have a path in the CFG that might do this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CFG paths are relevant. What's relevant is the actual sequence of statements that are executed when the function runs. There's basically a boolean flag dead-or-alive for each local that starts out 'dead' for locals with annotations, and that is set appropriately by these annotations. Accessing (evaluating the place expression) of a local is UB if it is currently 'dead'.

Actually it's more than that, each StorageLive allows the local to be put at a different location in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CFG paths are relevant

Well, because of the MaybeStorageLiveLocals in the validator, this is already the case. Certainly CFG paths should not be relevant in the spec/miri, but the docs here have to also mention what is required by the validator, which might have a stronger opinion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the storage check currently present in the validator should be considered to be merely a debugging aid. It is possible to write unsafe code that uses local when it does not have storage. With additional optimizations present such code would fail the check. When we get to that point the check should be disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a good point. With the current setup, we can't turn *r into x even if we know that r = &mut x;, because that might turn UB into an ICE. I'll mention this in the documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, we should probably mention that we often don't want to "remove" UB in MIR optimizations in general, since keeping it visible to the codegen backend is important.

For example, we don't want to lower switchInt [0 => bb3] otherwise: bbUnreachable to goto bb3, since that keeps LLVM from knowing about the impossible condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these kinds of things should definitely be documented, but imo not here. I do want to start an organized way of defining "canonicalized MIR" at some point, because I think that will be increasingly important, and this seems like the kind of thing to include there

/// complicated. They are only legal on parent places that are references, pointers, or `Box`. A
/// `Deref` projection begins by creating a value from the parent place, as if by
/// [`Operand::Copy`]. It then dereferences the resulting pointer, creating a place of the
/// pointed to type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this operation can cause UB: the crated place must be dereferencable and suitably aligned, as determined by its type.

/// One possible model that I believe makes sense is that "part of memory" is actually just the
/// address of the beginning of the referred to range of bytes. For sized types, the size of the
/// range is then stored in the type, and for unsized types it's stored (possibly indirectly,
/// through a vtable) in the metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what Miri does, and I think it is the right model.

///
/// [value-def]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md
///
/// The most common way to create values is via a place to value conversion. A place to value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now this is the right place for the rant that I put earlier. ;)

I don't think we should all this a "conversion". That sounds like an as cast, like just putting something into a different representation. That's not at all what happens though. This is a load from memory, and should be described as that.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 11, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Apr 11, 2022

📌 Commit 8732bf5 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 11, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Apr 12, 2022
Document the current MIR semantics that are clear from existing code

This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out.

The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as `//` instead of `///` comments?

If this is too big to review at once, I can split this up.

r? rust-lang/mir-opt
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 12, 2022
…askrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#95320 (Document the current MIR semantics that are clear from existing code)
 - rust-lang#95722 (pre-push.sh: Use python3 if python is not found)
 - rust-lang#95881 (Use `to_string` instead of `format!`)
 - rust-lang#95909 (rustdoc: Reduce allocations in a `theme` function)
 - rust-lang#95910 (Fix crate_type attribute to not warn on duplicates)
 - rust-lang#95920 (use `Span::find_ancestor_inside` to get right span in CastCheck)
 - rust-lang#95936 (Fix a bad error message for `relative paths are not supported in visibilities` error)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 1d35179 into rust-lang:master Apr 12, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 12, 2022
@JakobDegen JakobDegen deleted the mir-docs branch April 17, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet