Better documentation of our MIR dialects #86299

RalfJung · 2021-06-14T16:43:33Z

We like to pretend that MIR is a single language, but that is not actually true. There are at least two different languages represented by MIR; I will call them MIR "dialects" in the following:

Pre-drop-elaboration MIR: this dialect implicitly keeps a drop flag for each local variable. Move operands and calling Drop sets that flag. The Drop and DropAndReplace terminators honor that flag and do not call drop_in_place when the drop flag indicates this variable has already been moved/dropped.
Post-drop-elaboration MIR: this dialect has no implicit drop flags, and each Drop terminator indicates an unconditional call to drop_in_place. (DropAndReplace terminators AFAIK cannot exist any more in this dialect.)

The problem with multiple dialects is that they can have different rules for interpreters, code generators, and optimizations, so we need to be careful not to use the wrong optimization with the wrong dialect.

Miri can only interpret the latter dialect. All our optimizations run post-drop-elaboration, so I assume they are all correct only for that second dialect (though some might also be correct for the first one).

I'd like to propose that we treat these dialects more explicitly: a mir::Body should indicate which dialect it is written in, such that optimizations, codegen, and Miri can ensure they are working with MIR of the right dialect. This becomes more pressing since #86155 is suggesting to add yet another dialect.

Cc @oli-obk -- there is some overlap here with the "phase" in the MIR validator, but I don't think this is entirely the same concept. Different dialects correspond to different phases, but we might have phases within a dialect where certain instructions are being lowered without changing the operational semantics of the language.

Cc @bjorn3 @rust-lang/wg-mir-opt

The text was updated successfully, but these errors were encountered:

bjorn3 · 2021-06-14T16:48:28Z

For drop-elaboration: Maybe split Drop into Drop and DropIfNecessary. The former is used only after drop-elaboration and the latter only before drop-elaboration.

For AbortUnwindingCalls: Why can't this be done before all optimizations and maybe even during mir building? Doesn't running it at the end make MIR inlining remove aborts for unwinding past an inlined extern "C" function?

RalfJung · 2021-06-14T16:59:49Z

Why can't this be done before all optimizations and maybe even during mir building?

I asked @alexcrichton exactly that question in #86155. :)

For drop-elaboration: Maybe split Drop into Drop and DropIfNecessary. The former is used only after drop-elaboration and the latter only before drop-elaboration.

That would be another option, indeed. However, the question of whether the Abstract Machine keeps track of drop flags seems like a rather global one, where a dialect switch seems appropriate to me. Otherwise we have to make slightly subtle arguments based on the fact that yes, drop flags exist in the Abstract Machine, and they are always updated by moving or dropping, but if there are no more DropIfNecessary and DropAndReplace in the code then the drop flags have no effects so we can ignore their existence. IOW, we'd still rely on a "dialect-like" global property, namely that only Drop is used. Optimizations that work on the post-drop-elaboration dialect could still easily be wrong when run on the earlier dialect -- e.g. imagine replacing a move by a copy (which should be a correct transformation post-drop-elaboration).

bjorn3 · 2021-06-14T17:32:10Z

e.g. imagine replacing a move by a copy (which should be a correct transformation post-drop-elaboration).

That can make code slower. In case of move for large arguments only a pointer to the local is passed. For copy and const a copy is first made to an anonymous stackslot and then a pointer to this copy is passed. The callee directly uses the memory pointer to by the passed pointer argument as backing storage for the argument local and may change it. This is not something we can change. It is part of the C calling convention on most systems.

rust/compiler/rustc_codegen_ssa/src/mir/block.rs

Lines 755 to 765 in a216131

    
           // The callee needs to own the argument memory if we pass it 
        
           // by-ref, so make a local copy of non-immediate constants. 
        
           match (arg, op.val) { 
        
               (&mir::Operand::Copy(_), Ref(_, None, _)) 
        
               | (&mir::Operand::Constant(_), Ref(_, None, _)) => { 
        
                   let tmp = PlaceRef::alloca(&mut bx, op.layout); 
        
                   op.val.store(&mut bx, tmp); 
        
                   op.val = Ref(tmp.llval, None, tmp.align); 
        
               } 
        
               _ => {} 
        
           }

I have been meaning to change the call arguments from Operand to Place to make this clearer, but that is currently blocked on simd_* taking a const generic instead of a const argument.

tschuett · 2021-06-14T19:54:49Z

https://mlir.llvm.org encourages and supports to have more dialects.

RalfJung · 2021-06-15T08:14:32Z

That can make code slower. In case of move for large arguments only a pointer to the local is passed. For copy and const a copy is first made to an anonymous stackslot and then a pointer to this copy is passed. The callee directly uses the memory pointer to by the passed pointer argument as backing storage for the argument local and may change it.

How is that observable? Making an extra copy seems fine here.
(There is some kind of typo in the last sentence so I cannot entirely parse it.)

In MIR, as far as I am aware (i.e. in the preliminary spec in my head), move and copy make no difference. Miri also implements it like that. If replacing one by the other doesn't work then Miri must treat them differently somehow. But while I could imagine turning copy into move being problematic (even though, according to Miri -- the closest thing to a MIR spec we have -- this is a fine transformation), I can't see how the inverse, turning a move into a copy, can be problematic. Cc #71117 rust-lang/unsafe-code-guidelines#188

EDIT: Oh, I now realize you just said it can make code slower, not that it is wrong. Sure, but that's not relevant. I didn't say that it is a smart or useful change to make. I just said that it is a correct change to make. That is all that matters.

RalfJung · 2021-06-16T14:31:17Z

https://mlir.llvm.org encourages and supports to have more dialects.

I think dialects are fine if they are treated explicitly in a principled way. Possibly we could take some ideas from MLIR there, are you familiar with the details?

What I am worried about is "accidental" dialects without proper documentation and care.

tschuett · 2021-06-16T14:36:41Z

https://mlir.llvm.org encourages and supports to have more dialects.

I think dialects are fine if they are treated explicitly in a principled way. Possibly we could take some ideas from MLIR there, are you familiar with the details?

What I am worried about is "accidental" dialects without proper documentation and care.

Dialects in MLIR are almost for free. You can define your own operations independent of other dialects.

It sounds as if you are talking about two implicit dialects (pre and post drop-elaboration). Maybe make the distinction explicit and call them out as two dialects with different names. I assume the second dialect would be simpler than the first one.

It smells, if you distinguish dialects by a bit or an enum instead of types.

inquisitivecrystal · 2021-06-20T07:51:40Z

@rustbot label +C-enhancement +T-compiler +T-doc

bjorn3 · 2021-06-20T08:02:01Z

This is not user-facing documentation, so I don't think it it relevant to the doc team.

@rustbot label -T-doc

RalfJung · 2021-08-06T14:20:37Z

#86155 landed, so we now also have a phase during which "unwinding from a fn call inside a nounwind function" implicitly aborts, and another phase during which it is UB.

pierwill · 2022-03-24T18:11:10Z

Would this be better in the rustc API docs or the rustc dev guide, or maybe parts of both?

RalfJung · 2022-03-24T18:43:09Z

Parts of it is actual code (like a mir::Body "knowing" which dialect it is written in), for the rest -- it matters less where the docs are than that they are in some canonical place. :)

RalfJung · 2022-09-02T09:40:20Z

#99102 made some progress here (Cc @JakobDegen).

One thing I am a bit confused by is that the transition from "conditional drop + implicit abort-on-unwind" to making both of these explicit seems to happen atomically when doing from analysis MIR to runtime MIR, but AFAIK these are two separate passes. So what do we say about the MIR in between those passes?

JakobDegen · 2022-09-02T10:00:10Z

My current approach is basically that we say nothing, and that the fact that this atomic transition occurs over two implementations of MirPass is an implementation detail. In practice, what this means is that there is no expectation that these passes can be reordered or that executing some but not others won't cause breakage. That's not ideal, but I think the way to fix this is to have fewer semantic differences, and not to define more Mir dialects

RalfJung mentioned this issue Jun 14, 2021

rustc: Fill out remaining parts of C-unwind ABI #86155

Merged

bjorn3 added the A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Jun 14, 2021

rustbot removed the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Jun 20, 2021

RalfJung mentioned this issue Jun 27, 2022

Revise how MIR variants are distinguished rust-lang/compiler-team#522

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better documentation of our MIR dialects #86299

Better documentation of our MIR dialects #86299

RalfJung commented Jun 14, 2021

bjorn3 commented Jun 14, 2021

RalfJung commented Jun 14, 2021

bjorn3 commented Jun 14, 2021 •

edited

Loading

tschuett commented Jun 14, 2021

RalfJung commented Jun 15, 2021 •

edited

Loading

RalfJung commented Jun 16, 2021

tschuett commented Jun 16, 2021 •

edited

Loading

inquisitivecrystal commented Jun 20, 2021 •

edited

Loading

bjorn3 commented Jun 20, 2021

RalfJung commented Aug 6, 2021

pierwill commented Mar 24, 2022

RalfJung commented Mar 24, 2022

RalfJung commented Sep 2, 2022

JakobDegen commented Sep 2, 2022

Better documentation of our MIR dialects #86299

Better documentation of our MIR dialects #86299

Comments

RalfJung commented Jun 14, 2021

bjorn3 commented Jun 14, 2021

RalfJung commented Jun 14, 2021

bjorn3 commented Jun 14, 2021 • edited Loading

tschuett commented Jun 14, 2021

RalfJung commented Jun 15, 2021 • edited Loading

RalfJung commented Jun 16, 2021

tschuett commented Jun 16, 2021 • edited Loading

inquisitivecrystal commented Jun 20, 2021 • edited Loading

bjorn3 commented Jun 20, 2021

RalfJung commented Aug 6, 2021

pierwill commented Mar 24, 2022

RalfJung commented Mar 24, 2022

RalfJung commented Sep 2, 2022

JakobDegen commented Sep 2, 2022

bjorn3 commented Jun 14, 2021 •

edited

Loading

RalfJung commented Jun 15, 2021 •

edited

Loading

tschuett commented Jun 16, 2021 •

edited

Loading

inquisitivecrystal commented Jun 20, 2021 •

edited

Loading