Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of MIR assignments, around aliasing, ordering, and primitives. #68364

Open
eddyb opened this issue Jan 19, 2020 · 29 comments
Open

Semantics of MIR assignments, around aliasing, ordering, and primitives. #68364

eddyb opened this issue Jan 19, 2020 · 29 comments
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Jan 19, 2020

In today's MIR, an indirect assignment like *p = *q; is similar to, but not exactly the same as:

tmp = *q;
*p = tmp;

The differences are:

  • performance: they only produce the same codegen for primitives
    • this is assuming tmp isn't used elsewhere, allowing codegen to treat it like an SSA value, resulting in store(p, load(q)), which is also what *p = *q codegens to
    • for non-primitives, the amount of data being copied doubles, as tmp must be in memory
  • correctness: only *p = *q; is UB (AFAIK) if p..p+size overlaps q..q+size
    • this also likely only affects non-primitives, which have to copy data in memory, but we could decide to ignore the type and always make it UB when they overlap

For the purposes of this discussion, a primitive is:

  • scalar (bool, char, integer, float, or pointer/reference)
  • vector (SIMD-enabled array of scalars)

Scalar pairs likely also should/need to be included, due to how easy they are to support in any code that already handles scalars, and also due to their use in wide pointers/references.


What's interesting about primitives, though, is that some kinds of Rvalues (the RHS of the assignment) always produce primitive values, because they're primitive operations.

The Rvalue variants which are always primitive, today, are:

  • Ref (&T / &mut T - may become dependent on custom DSTs in the future)
  • AddressOf (*const T / *mut T - may become dependent on custom DSTs in the future)
  • Len (usize)
  • Cast, other than unsizing (scalar)
  • BinaryOp, UnaryOp (scalar, or maybe also vector)
  • CheckedBinaryOp (pair of integer and bool - only if we consider scalar pairs to be primitive)
  • NullaryOp(SizeOf) (usize)
  • NullaryOp(Box) (Box<T>)
  • Discriminant (integer)

Which leaves these variants as potentially relying on memory operations to write the result:

  • Use (any type, one copy)
  • Repeat ([T; N], N copies)
  • Cast, specifically unsizing (any type implementing CoerceUnsized, per-field copies)
  • Aggregate (any ADT, per-field copies)

If we want to remain conservative, we could ignore types for the latter, and just assume that the destination of the assignment cannot overlap any memory read in the Operands of the Rvalue.

We could even cement the distinction by moving the always-primitive operations into a new PrimOp enum, and/or move the other Rvalues to their own statements (e.g. introduce Copy(*p, *q)), but that's more aesthetic than anything for the time being.

At the very least, we should probably document these differences, and make sure that miri only allows overlaps in the cases we don't consider UB (either abstractly, or due to our choice of codegen).


Another interesting aspect of the always-primitive ops is that they're "pure functions" of their operands (other than NullaryOp(Box), I suppose, but that could be replaced with a call to a lang item returning a Box<MaybeUninit<T>>, instead).

This means that if we wanted to, we could replace some of the intermediary locals with an PrimOp DAG, a bit like SSA but without φ (phi) nodes or a strict instruction stream.
All of the necessary ordering would still happen at the statement level (so this is nowhere near as complex as VSDG), but we might see some benefits in scalar-heavy code.


Asides aside, cc @RalfJung @rust-lang/wg-mir-opt


The latest proposal is here

@jonas-schievink jonas-schievink added A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 19, 2020
@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jan 19, 2020
@nikomatsakis
Copy link
Contributor

@eddyb I can't quite tell if there is a proposal here. I think you're proposing that we alter MIR construction for some kinds of rvalues to avoid the temporary?

@eddyb
Copy link
Member Author

eddyb commented Jan 24, 2020

@nikomatsakis Sorry, I didn't know how to phrase it but...
I don't think the current (implicit) semantics are clearly defined, and there is some amount of implicitly relying on whatever LLVM does.

The only immediate proposal here in terms of changing the MIR would be that split of Rvalue into "produces some primitive" vs "shuffles data around".

But the discussion I want to start is on the actual semantics, maybe also diving into how explicit we can make the purity, ordering, etc. of various MIR concepts.

EDIT: maybe I should've focused more on how Operand behaves like a Place in some situations, which complicates its semantics (i.e. it's only not Place-like for primitive types, because those can be eagerly loaded).

@RalfJung
Copy link
Member

RalfJung commented Jan 28, 2020

correctness: only *p = *q; is UB (AFAIK) if p..p+size overlaps q..q+size

I agree. We should make sure Miri actually correctly catches this in all cases. We are doing the right thing for "big" copies, but Miri has some optimizations for scalars where we might fail to check things properly. There even is a FIXME for that. ;)

maybe I should've focused more on how Operand behaves like a Place in some situations, which complicates its semantics (i.e. it's only not Place-like for primitive types, because those can be eagerly loaded).

That's just an optimization in Miri though, not part of the spec.

MIR values, the simple way

Let us ignore types for a minute. Conceptually, if I were to formally specify MIR, I'd say that an Rvalue (or I should probably rather say, a Value) is a sequence of bytes (Rust bytes -- so, they track initializedness and pointer provenance). p = q; is specified as first evaluating p to a Place, then evaluating q to a Value (this reads from memory!), and then storing the latter into the former (with a check to make sure there is no overlap). At least I think that is the evaluation order...

In the special case you considered, *p = *q, we should be careful to remember that *q is a place that needs to be coerced to a value to actually appear on the RHS of an assignment. If we call that coercion load (because it has the effect of loading a sequence of bytes from memory), the assignment should really be written *p = load(*q). The load is responsible for the "read" part of this, the assignment itself only performs a write.

In Miri, we don't want to carry around these temporary sequences of bytes, that's why Miri's Operand has a "lazy" case that delays the load until the time the bytes are actually stored into some place. But that is supposed to be an implementation detail of the interpreter, not anything that actually shows up in the spec -- and thus not something that anyone outside of the interpreter should be concerned with.

typed MIR values

Reality is more complex than that, though -- for example, the spec above fails to explain that padding gets "reset" to Undef on a copy. So, I think the proper way to specify this is to say that a MIR Value is something like the enum Value from this document.

When evaluating a Value like the RHS of an assignment, we load the (Rust) bytes from memory, then perform the conversion of that byte sequence to a Value according to the representation relation of the type that is used for this assignment. When storing a Value into a place, we consult the representation relation again to figure out how to represent this Value as a byte sequence.

@eddyb
Copy link
Member Author

eddyb commented Jan 28, 2020

That's just an optimization in Miri though, not part of the spec.

Eagerly loading is unavoidable in codegen, and I wasn't thinking of miri when I wrote that.

In Miri, we don't want to carry around these temporary sequences of bytes, that's why Miri's Operand has a "lazy" case that delays the load until the time the bytes are actually stored into some place. But that is supposed to be an implementation detail of the interpreter, not anything that actually shows up in the spec -- and thus not something that anyone outside of the interpreter should be concerned with.

This is the same in codegen, for anything you can't eagerly load.
There's no real alternative short of introducing copies and temporaries which aren't in MIR, AFAIK.

When evaluating a Value like the RHS of an assignment, we load the (Rust) bytes from memory, then perform the conversion of that byte sequence to a Value according to the representation relation of the type that is used for this assignment. When storing a Value into a place, we consult the representation relation again to figure out how to represent this Value as a byte sequence.

But... there is no place to keep an arbitrary sequence of bytes other than memory.
If there was, almost none of what I wrote would be relevant.

@RalfJung
Copy link
Member

RalfJung commented Jan 28, 2020

codegen should then better make sure that what it does is indistinguishable from the spec I proposed. :) I think it would be a big bad mess to have to deal with that distinction of "small" and "big" values in the spec.

And I think that indeed this is indistinguishable for UB-free programs, because of the requirement that the two regions of memory must not overlap. If this wasn't the case, it would not be correct for Miri to perform this internal optimization.

But... there is no place to keep an arbitrary sequence of bytes other than memory.

We are writing a spec. Math has all the space we could want. :D

@eddyb
Copy link
Member Author

eddyb commented Jan 28, 2020

But... there is no place to keep an arbitrary sequence of bytes other than memory.

We are writing a spec. Math has all the space we could want. :D

Sure, but it sounded like an argument for not requiring "no overlap", because if you can read all of the Operands in the RHS before writing anything to the LHS Place, an overlap wouldn't cause any issues.

What you want spec-wise, IMO, if you're going down that route, is that Rvalue is computing "into" the destination Place and so can begin writing before reading any Operands.

@RalfJung
Copy link
Member

Sure, but it sounded like an argument for not requiring "no overlap", because if you can read all of the Operands in the RHS before writing anything to the LHS Place, an overlap wouldn't cause any issues.

The argument for not allowing overlap is to give implementations (codegen, Miri) more freedom. But at the same time the spec should be as simple as possible to make reasoning as simple as we can make it.

What you want spec-wise, IMO, if you're going down that route, is that Rvalue is computing "into" the destination Place and so can begin writing before reading any Operands.

I am not sure what you have in mind, but it sounds a lot more complicated than even what I proposed.^^ I'd rather separate concerns. There is no reason that Value computation has to even mention Places.

@eddyb
Copy link
Member Author

eddyb commented Jan 28, 2020

The argument for not allowing overlap is to give implementations (codegen, Miri) more freedom. But at the same time the spec should be as simple as possible to make reasoning as simple as we can make it.

The spec is not realistically implementable without that "freedom", so I can't think of it as "a nice thing on top".
I also can't think of "overlaps disallowed" as "simple" if it doesn't arise from the definition of assignment.

Alright, "value computation" doesn't have to mention Place, but maybe Assign can be defined to "lock/invalidate the destination" before computing the value?


In terms of being able to reason about possible optimizations/lowerings of MIR constructs, I highly dislike the possibility of more than one Place being interacted with where some of the Places are in Operands.

Ignoring unsizing coercions (which we could make its own statement or rely on a shim etc.), the only situations which ever need to require non-overlap, involve bulk copying of data, either to create an aggregate, or just between two Places.

And in my experience, optimizing away bulk copies, even if only ones between locals (let alone anything involving alias analysis), is fundamentally much harder than anything SSA-like over primitives, because there is no way to reason about any one past value, without reintroducing more of the bulk copies you're trying to remove.

I mean, it's literally graph coloring, heh.

So you can see why I don't want that kind of NP "conflict reasoning" where avoidable.


Or maybe MIR suffers from the LLVM problem of being too high-level for its own good, just at a different level, and/or I shouldn't care about non-Place optimizations on it.

@eddyb
Copy link
Member Author

eddyb commented Jan 30, 2020

Alright, talking to @RalfJung in PM a bit made me realize there is a path forward in terms of making MIR more optimization-friendly without complicating the spec.

In terms of having "two MIRs", there's more than one way:

  • "borrowck MIR" and "optimization MIR", both inside the compiler
    • this is the only path I saw at first, for dealing with irreconcilable differences
    • there are a lot of downsides (and a few upsides) to actually doing this, but we don't even need to discuss them, because it's entirely unnecessary!
  • "spec MIR" and "rustc MIR", the latter expanding to the former
    • that is, "rustc MIR" would have "sugar" or "compact representations" on top on the "spec MIR" concepts, but they would remain semantically compatible
    • we can optimize "spec MIR" for simplicity, and "rustc MIR" for optimizations, as long as there is a simple mapping from the latter to the former
    • "rustc MIR" could distinguish between these while using @RalfJung's spec:
      • non-primitives, where it needs to rely on the non-overlap rule (for codegen that doesn't introduce unnecessary copies)
      • primitives, where it can just use the value computation directly, because the byte sequences are small enough

So for example, if we wanted to start using DAGs for pure operations involving primitives, we might have something like this in "rustc MIR":

  x     y
╭─┴─╮ ╭─┴─╮
╰(*)╯ ╰(*)╯
  ╰─(+)─╯
 z = ╯

(I could name the nodes, but I wanted to emphasize the DAG nature instead)

That would expand (or "serialize") into this "spec MIR":

tmp0 = x
tmp1 = Mul(tmp0, tmp0)
tmp2 = y
tmp3 = Mul(tmp2, tmp2)
tmp4 = Add(tmp1, tmp3)
z = tmp4

(6 Assigns with the semantics @RalfJung described earlier, trivially no conflicts)

Note that even if you had rustc use MIR that looked like this (which I think it actually does today, or only in some situations), it would produce the same codegen, because of all the tmp{0,1,2,3,4} locals would be deemed SSA-like.
So the DAG version would just be an optimized representation, avoiding explicit MIR locals when they would fit a certain subset of SSA.

Am I understanding your position on this correctly, @RalfJung?

@nikomatsakis
Copy link
Contributor

Let me see if I can rephrase things to make sure I'm following.

First, @eddyb was saying that they wish to be sure that they can copy bytes from the source to the destination without any intermediates in codegen if necessary (i.e., for non-scaler values). One way to do that would be to define the "spec" to carry around a "destination".

But @RalfJung proposed instead that we impose the rule that the source/destination cannot overlap and define the spec in terms of loading a value, and then show that this permits the implementation the freedom to do the more optimal thing.

Next the question was raised whether we want more than one internal MIR, or if we just want to have "rustc MIR" so long as it has a canonical conversion to "spec MIR". It seems like the latter is a pretty standard construct and quite reasonable.

One caveat is that I could also imagine that we might want to have MIR construction generate things (e.g., the false edges that borrowck relies on) that we eventually strip out or compile down to simpler things (but perhaps after borrowck). We already do this for DROP terminators, for example, which effectively change their semantics after drop elaboration. This also seems relevant to the discussion we've been having about how to represent more complex places. So basically we already have "pre-optimization" and "post-optimization" IRs, in my opinion, even if they share rustc data structures.

If those things wind up being important to "spec MIR", of course, and we wish for miri to view them, then we either have to retain them throughout optimizations or to execute miri on "pre-optimization" code.

@RalfJung
Copy link
Member

RalfJung commented Feb 15, 2020

@eddyb

So you can see why I don't want that kind of NP "conflict reasoning" where avoidable.

Not really... I am not saying the compiler has to make any analysis like that, all I am saying is that we should give it the freedom to do so.

I think it is a mistake to make the spec "maximally tight" wrt any given implementation. We should leave more freedom where it makes sense so that we can adjust our implementation strategy in the future, or add more optimizations.

I highly dislike the possibility of more than one Place being interacted with where some of the Places are in Operands.

There are no Places in Operands -- as I (think I) said, that's just a Miri implementation detail.
In a "proper" reference interpreter, Operand would be defined as Vec<MiriByte>, with MiriByte being something like the Byte enum from this document.

Well, actually, in a "really proper" interpreter Operand would be replaced by the Value type from this document.


The spec is not realistically implementable without that "freedom", so I can't think of it as "a nice thing on top".
I also can't think of "overlaps disallowed" as "simple" if it doesn't arise from the definition of assignment.

These are good points. I agree that adding the "overlap" rule as something like an afterthought is ugly, and I would prefer if it would arise more naturally. I just have not found a way to do that, given my constraints of also keeping the evaluation of places and values independent of each other.

Alright, "value computation" doesn't have to mention Place, but maybe Assign can be defined to "lock/invalidate the destination" before computing the value?

So, basically "C-style sequencing points"-light? I am not sure if that's really less ugly than the "overlap" rule...


that is, "rustc MIR" would have "sugar" or "compact representations" on top on the "spec MIR" concepts, but they would remain semantically compatible

Yes that sounds very reasonable.

we can optimize "spec MIR" for simplicity, and "rustc MIR" for optimizations, as long as there is a simple mapping from the latter to the former

Agreed. :)

Am I understanding your position on this correctly, @RalfJung?

I... think so. I am not entirely sure what your proposed lowering from "rustc MIR" to "spec MIR" is relative to the "overlap" rule, but I agree with your high-level statements and that is always a good start. :D

@RalfJung
Copy link
Member

RalfJung commented Feb 15, 2020

@eddyb as a follow-up to "it would be better for non-overlap to arise naturally":

One alternative I thought of is to define Operand/Value as "something that can be stored in memory", so basically a "thunked computation" that, when given a location (the result of computing a place), puts all its stuff into it. That corresponds slightly better to what Miri actually does than the Vec<MiriByte> (the "indirect" Operand variant basically represents such a thunk, remembering where to copy from when the final destination location is known). I think this is close to what you meant when you said "Rvalue is computing into the destination Place". Somehow I missed that before and just realized it right now when re-reading your posts.

But as far as I can see this does not naturally give rise to a non-overlap rule either, and it fails to explain why padding is "lost" on typed copies and why copying value that violate the validity invariant is UB, so I ditched this in favor of the simpler, first-order approach of directly representing the value.

You seem to think that this "destination-passing style" solves the overlap problem, could you explain how? The evaluation of places and operands would still perform a particular sequence of memory operations in a particular order, there is no UB if any of these operations overlap.

@eddyb
Copy link
Member Author

eddyb commented Feb 15, 2020

There are no Places in Operands -- as I (think I) said, that's just a Miri implementation detail.

To be clear, I meant "syntactically", in MIR, mir::Operand may contain a mir::Place, and I was talking about multiple mir::Places being involved in some operation (and therefore not being allowed to overlap) with at least one of them being from a mir::Operand.
That is, to me mir::Operand as "terminology" implies being fully evaluatable to a value (not just in the mathematical sense of the spec, but also in real implementations) before whatever operation it's an operand of, is performed.

So to give a perhaps more explicit example, I would think that a borrow (mir::Rvalue::Ref) taking a mir::Operand would be nonsensical (and indeed we use a mir::Place).

That's probably less relevant given all the other stuff that has been discussed since, but I wanted to clarify that I wasn't talking about the dynamic semantics.


I think this is close to what you meant when you said "Rvalue is computing into the destination Place". Somehow I missed that before and just realized it right now when re-reading your posts.
...
You seem to think that this "destination-passing style" solves the overlap problem, could you explain how? The evaluation of places and operands would still perform a particular sequence of memory operations in a particular order, there is no UB if any of these operations overlap.

Glad that proposal makes sense! (even if it's less necessary now that I grasp the overall picture better)

I agree there is a potential defined outcome of overlaps (after all, memmove exists). AFAIK, the main reason the memcpy/memmove distinction exists is a "forward loop" implementation will do the wrong thing for memcpy(p, p+x, n) if p..p+n overlaps with p+x..p+x+n, because at least one source byte will be destroyed by a write to the destination, replacing it with a (different) source byte.
So memmove takes the penalty of a check + moving to a "reverse loop" implementation (which may also be less efficient, depending on the platform).

Something we could do is use memmove unless the mir::Places are "obviously disjoint", allowing the spec to omit the overlap rule (not that it would be a smart move to leave it out completely), but I think that would hurt writing non-primitives to &mut T pointers.

To answer your question more directly: even with destination-passing, if everything is defined in terms of "byte sequences", spec-wise something would probably have to say "destination can't overlap with any operands" or "first thing we do is write undef to the destination" (which can be just as well in Assign, without destination-passing into Rvalue. and also undef might not be strong enough to imply the UB of overlap, poison is more appropriate?).


I... think so. I am not entirely sure what your proposed lowering from "rustc MIR" to "spec MIR" is relative to the "overlap" rule, but I agree with your high-level statements and that is always a good start. :D

I was hoping to get that across with the example in #68364 (comment), but here it is: "expand constructs (which don't want to take advantage of overlap), into multiple statements, mentioning at most one non-temporary mir::Place per statement, meaning the overlap rule doesn't kick in".
In that example, all of the temporaries are SSA-like scalars, so codegen wouldn't be penalized even if it was working with the temporary-laden spec MIR.

@RalfJung
Copy link
Member

I agree there is a potential defined outcome of overlaps

I am not just thinking of a potential outcome. Such a "destination-passing style operand/value" does not have to do the entire copy in a single go -- in fact, considering things like padding, it seems reasonable to assume that the many fields of a struct are all copied separately. That means that even if memcpy semantics are used, an overlap might go undetected -- the two structs can overlap without any pair of matching fields overlapping.

So this, too, would require some explicit additional ad-hoc clause that determines the "overall memory range" read by the operand/value, and a clause saying that this must not overlap with the destination place. In other words, this is just a ad-hoc and hacky as the variant I proposed (the one you didn't like).

which can be just as well in Assign, without destination-passing into Rvalue. and also undef might not be strong enough to imply the UB of overlap, poison is more appropriate?

We don't have undef and poison, we have one thing that matches poison (Miri calls it Undef, though). Also, both of them can be overwritten in LLVM, so neither raises the UB we want. For that we'd need C-style "sequence points", i.e., some mechanism to "lock" memory itself.

expand constructs (which don't want to take advantage of overlap), into multiple statements [...]

I don't understand the goal of this -- even if codegen does not want to take advantage of the overlap rule for some constructs, there is no harm in applying the rule anyway. It's not an if-and-only-if rule.

@RalfJung
Copy link
Member

According to @eddyb, discussion at #71005 (comment) is relevant here. In particular, maybe whatever we use to model "return place must not alias with anything" could also be used to model "left-hand side of assignment must not alias with anything".

@RalfJung RalfJung added the A-miri Area: The miri tool label Apr 14, 2020
@RalfJung
Copy link
Member

Inspired by #71117 and rust-lang/miri#1330, here is another possible semantics for MIR assignment place = value:

  • Evaluate the place expression to a place value (i.e., a location).
  • Retag the place to obtain a new tag with unique permission.
  • Evaluate value expression to a value (i.e., an element of this enum).
  • Write the evaluated value into the evaluated place using the fresh tag.

This avoids explicitly talking about "overlap" or values having location in memory, i.e., this entirely maintains that the only thing you can do with a place expression is evaluating it to a place, and the only thing you can do with a value expression is evaluating it to an (abstract) value. The fact that Miri avoids explicitly manifesting that abstract value is entirely hidden. If evaluating the value "overlaps" with the place, the fresh tag we added will get removed, which means the write at the end will cause UB. Assignment always terminates, so there is no way to avoid the UB if any overlap happens (thus, we do not need to add protectors).

This is in fact eerily similar to formal models of C sequence points that I have seen before.

That is, to me mir::Operand as "terminology" implies being fully evaluatable to a value (not just in the mathematical sense of the spec, but also in real implementations) before whatever operation it's an operand of, is performed.

I think this latest proposal almost achieves that. Evaluating an assignment uses "evaluate the RHS to a value" an opaque step. The step does not happen before anything specific to assignments, but that seems fine to me -- this is still compositional, even if composition is not a simple eval-lhs; eval-rhs; post-processing. "The behavior of the whole is a function of the behavior of its pieces" is still satisfied by eval-lhs; pre-processing; eval-rhs; post-processing.

@nikomatsakis
Copy link
Contributor

@RalfJung that model seems very elegant

@JakobDegen
Copy link
Contributor

JakobDegen commented Apr 5, 2022

@RalfJung this is possible, but this model is slightly more restrictive than I think it needs to be. In particular, it doesn't only affect the computed places on the RHS, but also their inputs. Because of this, I'd suggest instead this model:

  • Evaluate the LHS place expression to a place value.
  • Do the same for all place expressions appearing on the RHS.
  • Retag the LHS place
  • Retag the RHS place
  • Compute the RHS expression to a value
  • Store this to the LHS using the retagged place

Specifically, this allows things like _1 = _2[_1] where _1: usize, _2: [usize; 10]. I believe backends are ok with this version, although maybe they're not

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2022

Do the same for all place expressions appearing on the RHS.

This is non-compositional. The RHS is a value expression. I am rather opposed to anything that treats this not as an opaque value expression that is being evaluated to a value. In particular, we should not have to syntactically look "into" that expression -- that's just a disaster to reason about.

That said, @tmandry points out that we generate things like _1 = Add(_1, ...) today under some conditions. Given that, I don't see how anyone can say that our assignments require LHS and RHS to be disjoint. I think it would be very confusing and fragile to somehow only do this for some types, or some kinds of value expressions (the latter would also not be compositional).

@bjorn3
Copy link
Member

bjorn3 commented Apr 9, 2022

For calls the LHS and RHS have to be disjoint lest we miscompile them with all existing backends. I think @tmandry's example is a bad example. It only happens because of a special case for AddAssign for integers and floats (where it doesn't hurt) and can easily fixed using an extra temporary (which doesn't have any runtime cost with cg_clif and I think also with cg_llvm even without optimizations).

@JakobDegen
Copy link
Contributor

JakobDegen commented Apr 9, 2022

This is non-compositional. The RHS is a value expression. I am rather opposed to anything that treats this not as an opaque value expression that is being evaluated to a value. In particular, we should not have to syntactically look "into" that expression -- that's just a disaster to reason about.

Yeah, this is a good point.

I've thought about this a bunch, and have a new suggestion that I think is compatible with existing code and not too bad to reason about. For rvalues Use, Repeat, and Aggregate, we use the model Ralf suggested above. For all other rvalues, we first evaluate the RHS to a value, then the LHS to a place, and store the value in the place (consequently allowing overlaps). Importantly, this is not type dependent as suggested before; it is dependent only on the rvalue variant.

Distinguishing those three rvalues might seem arbitrary, but I'd actually like to argue that it is not. Fundamentally, when we worry about overlap we are concerned with the implementation running into a very particular type of bug: It computes one part of the value first, stores it to the place, and then goes and computes another part of the value which is now incorrect because the store aliased. Importantly, this type of bug is only ever possible if the value can be computed in a piecewise fashion. For example, an implementation might do the copying for a Use rvalue in a piecewise manner (this is what memcpy is). However, it can't do any such thing for addition - computing the result of an addition requires reading the entire input first.

Furthermore, although this suggestion might technically be considered non-compositional, I do not think that this case is particularly interesting or concerning. If we really wanted to, we could recover the compositional nature by splitting the rvalue enum into two parts and having two assign statement kinds to clearly show that the semantics differ. Of course I don't actually think we should do this - right now, we can recover the same information by checking which variant the rvalue has.

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2022

For calls the LHS and RHS have to be disjoint lest we miscompile them with all existing backends.

This issue is about MIR assignments. Calls are not MIR assignments, they are a totally separate language construct (TerminatorKind::Call vs StatementKind::Assign) that is just confusingly printed with the same = operator. See #71117 for the discussion around calls.

@RalfJung
Copy link
Member

For rvalues Use, Repeat, and Aggregate, we use the model Ralf suggested above. For all other rvalues, we first evaluate the RHS to a value, then the LHS to a place, and store the value in the place (consequently allowing overlaps). Importantly, this is not type dependent as suggested before; it is dependent only on the rvalue variant.

I think a "canonical" thing to do here would be to have both kinds of Assign and encode via bool-like flag which one we want. But I can view your proposal as simply implicitly indicating the value of the bool based on the syntax of the rvalue, rather than explicitly storing it, so I could live with this. However, it means MIR transformations have to be careful when turning one kind of rvalue into another, as that might change this implicit flag -- so maybe an explicit flag is better. We could still use your proposed strategy to initialize the flag when MIR is constructed.

@JakobDegen
Copy link
Contributor

JakobDegen commented Apr 20, 2022

MIR transformations have to be careful when turning one kind of rvalue into another, as that might change this implicit flag

Indeed. InstCombine is actually broken under this model today, so I would want to fix that before we let anything make use of this. I'm undecided on whether we should introduce this flag now though. I think that its use cases can probably better be served by utility functions that do the right thing. This has the benefit of 1) simpler MIR in which all temporaries are explicit, and 2) those utility functions can be written to be more precise and less likely to introduce unneeded temporaries than we would get if some pass always conservatively sets the may_overlap flag.

@JakobDegen
Copy link
Contributor

@bjorn3 (sorry for picking on you, I never know who to ask codegen questions to). To the extent that you know, are current implementations of codegen compatible with the above semantics? I would of course go check this myself before I open a PR that suggests documenting these as the semantics

@bjorn3
Copy link
Member

bjorn3 commented Apr 20, 2022

Cg_ssa and cg_clif both use memcpy for assignments of values not stored as SSA values I believe. Memcpy doesn't allow overlapping.

@JakobDegen
Copy link
Contributor

The version I'm referring to requires MIR to be non-overlapping for Rvalue::Use, Rvalue::Aggregate, and Rvalue::Repeat, so the most obvious places to use memcpy are fine.

@bjorn3
Copy link
Member

bjorn3 commented Apr 21, 2022

In that case I think current codegen is compatible with your proposal.

@tmandry
Copy link
Member

tmandry commented Apr 23, 2022

That said, @tmandry points out that we generate things like _1 = Add(_1, ...) today under some conditions.

The person you're referring to as @tmandry is, in fact, @tmiasko.

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 27, 2022
Fix Dest Prop

Closes rust-lang#82678, rust-lang#79191 .

This was not originally a total re-write of the pass but is has gradually turned into one. Notable changes:

 1. Significant improvements to documentation all around. The top of the file has been extended with a more precise argument for soundness. The code should be fairly readable, and I've done my best to add useful comments wherever possible. I would very much like for the bus factor to not be one on this code.
 3. Improved handling of conflicts that are not visible in normal dataflow.  This was the cause of rust-lang#79191. Handling this correctly requires us to make decision about the semantics and specifically evaluation order of basically all MIR constructs (see specifically rust-lang#68364 rust-lang#71117.  The way this is implemented is based on my preferred resolution to these questions around the semantics of assignment statements.
 4. Some re-architecting to improve performance. More details below.
 5. Possible future improvements to this optimization are documented, and the code is written with the needs of those improvements in mind. The hope is that adding support for more precise analyses will not require a full re-write of this opt, but just localized changes.

### Regarding Performance

The previous approach had some performance issues; letting `l` be the number of locals and `s` be the number of statements/terminators, the runtime of the pass was `O(l^2 * s)`, both in theory and in practice. This version is smarter about not calculating unnecessary things and doing more caching. Our runtime is now dominated by one invocation of `MaybeLiveLocals` for each "round," and the number of rounds is less than 5 in over 90% of cases. This means it's linear-ish in practice.

r? `@oli-obk` who reviewed the last version of this, but review from anyone else would be more than welcome
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants