New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: hint::black_box #2360

Open
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 12, 2018

Adds black_box to core::hint.

Rendered.

[motivation]: #motivation

The implementation of these functions is backend-specific, and must be provided
by the standard library.

This comment has been minimized.

@sfackler

sfackler Mar 12, 2018

Member

Probably worth adding why these are useful for benchmarks to the motivation section.

This comment has been minimized.

@gnzlbg

gnzlbg Mar 12, 2018

Author Contributor

Yeah, that was a bit scarce. I've expanded the motivation.

@gnzlbg gnzlbg force-pushed the gnzlbg:black_box branch 4 times, most recently from b3f4441 to 8a9ae3f Mar 12, 2018


Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then
`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a
register even though it is not used by anything afterwards.

This comment has been minimized.

@rkruppe

rkruppe Mar 12, 2018

Member

Nit, but this doesn't really match what the implementation of black_box does (and what the linked godbolt shows). It forces the value into memory, and any register traffic is due to that (e.g. in the linked godbolt, the add edi, 2 is not a dead definition, it's immediately stored to [rbp - 4], and that store is dead).

This comment has been minimized.

@gnzlbg

gnzlbg Mar 12, 2018

Author Contributor

@rkruppe how would you formulate this for the guide-level explanation? Maybe I can just say that 4 must be stored into memory and leave it at that.

This comment has been minimized.

@rkruppe

rkruppe Mar 12, 2018

Member

That seems decent. But it's tricky to say anything about this sort of feature, because it's inherently very tied to optimizer capabilities and the resulting machine code =/

This comment has been minimized.

@gnzlbg

gnzlbg Mar 13, 2018

Author Contributor

So I've reworded this part a bit, but maybe for the guide level explanation we might want to be even more vague and just leave it at: "the value cannot be discarded" or "prevents the value x from being optimized away".

@gnzlbg gnzlbg force-pushed the gnzlbg:black_box branch from 8a9ae3f to ec42fd0 Mar 12, 2018

pub fn clobber() -> ();
```

flushes all pending writes to memory. Memory managed by block scope objects must

This comment has been minimized.

@rkruppe

rkruppe Mar 12, 2018

Member

This wording (flushing pending writes) makes me uncomfortable because it's remniscient of memory consistency models, including hardware ones, when this is just a single-threaded and compiler-level restriction. Actually, come to think of it, I'd like to know the difference between this and compiler_fence(SeqCst). I can't think of any off-hand.

This comment has been minimized.

@gnzlbg

gnzlbg Mar 12, 2018

Author Contributor

when this is just a single-threaded and compiler-level restriction

This is correct.

compiler_fence(SeqCst). I can't think of any off-hand.

I can't either, but for some reason they do generate different code: https://godbolt.org/g/G2UoZC

I'll give this some more thought.

This comment has been minimized.

@nagisa

nagisa Mar 12, 2018

Contributor

The difference between asm! with a memory clobber and compiler_fence exists in the fact, that memory clobber requires compiler to actually reload the memory if they want to use it again (as memory is… clobbered – considered changed), whereas compiler_fence only enforces that memory accesses are not reordered and the compiler still may use the usual rules to figure that it needn’t to reload stuff.

This comment has been minimized.

@gnzlbg

gnzlbg Mar 13, 2018

Author Contributor

@nagisa the only thing that clobber should do is flush pending writes to memory. It doesn't need to require that the compiler reloads memory on reuse. Maybe the volatile asm! with memory clobber is not the best way to implement that.

This comment has been minimized.

@rkruppe

rkruppe Mar 13, 2018

Member

@nagisa Thank you. I was mislead by the fact that fences prohibit some load-store optimization to thinking they'd also impact things like store-load forwarding on the same address with no intervening writes.

@gnzlbg asm! with memory is simply assumed to read from and write to all memory and all its effects follow from that. However, to be precise, this does not mean any reloads are introduced after a clobbering inline asm, it just means that all the loads that is already there (of which are a lot given that every local and many temporaries are stack slots) can't be replaced with values loaded from the same address before the clobber.

If you want something less strong, you need to be precise. "Flushing loads writes" is not really something that makes intuitive sense at the compiler level (and it surely has no effect on runtime, for example caches or store buffers?).

This comment has been minimized.

@gnzlbg

gnzlbg Mar 13, 2018

Author Contributor

@rkruppe mem::clobber() should be assumed to read from all memory with effects, so that any pending memory stores must have completed before the clobber. All the loads that are already there in registers, temporary stack slots, etc, should not be invalidated by the clobber.

This comment has been minimized.

@rkruppe

rkruppe Mar 13, 2018

Member

Okay that is a coherent concept. I'm not sure off-hand how to best implement that in LLVM. (compiler_fence is probably not enough, since it permits dead store elimination if the memory location is known to not escape.)

This comment has been minimized.

@rkruppe

rkruppe Mar 13, 2018

Member

However, come to think of it, what's the difference between black_box(x) (which is currently stated to just force a write of x to memory) and let tmp = x; clobber(); (which writes x to the stack slot of tmp and then forces that store to be considered live)?

This comment has been minimized.

@gnzlbg

gnzlbg Mar 13, 2018

Author Contributor

That's a good question and this relates to what I meant with "block scope" . In

{ 
    let tmp = x; 
    clobber();
}

this {} block is the only part of the program that knows the address of tmp so no other code can actually read it without invoking undefined behavior. So in this case, clobber does not make the store of tmp live, because nothing outside of this scope is able to read from it, and this scope doesn't read from it but executes clobber instead (clobbering "memory" does not clobber temporaries AFAICT).

However, if one shares the address of tmp with the rest of the program:

{ 
    let tmp = x; 
    black_box(&tmp);
    clobber();
}

then clobber will force the store of tmp to be considered live.

let mut v = Vec::with_capacity(n);
bench.iter(|| {
// Escape the vector pointer:
mem::black_box(v.as_ptr());

This comment has been minimized.

@leodasvacas

leodasvacas Mar 12, 2018

does the black box need to be inside each iteration?

This comment has been minimized.

@gnzlbg

gnzlbg Mar 13, 2018

Author Contributor

Not in this case because the vector does not grow. If the vector would be allowed to grow on push, then the answer would be yes.

@Centril Centril added the T-libs label Mar 12, 2018

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 13, 2018

I was already concerned about the interaction with the memory model, and this comment basically confirmed my worst suspicions: There are extremely subtle interactions between the memory model and what these function can do.

I'm starting to believe that we'll be unable to give any guarantes that are of any use to benchmark authors. But if this is the case, these functions become a matter of chasing the optimizer and benchmarking reliably requires checking the asm to see your work wasn't optimize out. That's clearly very unappealing, but has always been true of micro benchmarks, so maybe that's just unavoidable.

This also raises the question of why this needs to be in std, if it's not compiler magic (like compiler_fence is) but a natural consequence of the memory model and inline asm. Aside from the stability of inline asm, which is not a very convincing argument to me.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Mar 13, 2018

I'm temporarily closing this till we resolve the semantics of these functions in the memory model repo.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Mar 13, 2018

The issue in the memory model repo is: nikomatsakis/rust-memory-model#45

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Jun 11, 2018

Update. The summary of the discussion on the memory model repo is that clobber should be scrapped, and the RFC updated with the definition of black box agreed on there.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Aug 28, 2018

Any updates on opening the new RFC?

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Aug 29, 2018

I've updated the RFC with the discussion of the memory model repo.

It now just proposes pub unsafe fn core::hint::black_box<T>(x: T) -> T, which is specified as an unknown unsafe function that returns x. It is a no-op in the virtual machine, but the compiler has to assume that it can perform any valid operation on x that unsafe Rust code is allowed to perform.

Other changes:

  • moved the function from core::mem to core::hint since that appears to be a more suitable place for this functionality which is a no-op.

Open questions:

  • do we need a "safe" alternative? What safe and unsafe Rust are allowed to do differs, e.g., if the user passes it a raw pointer unsafe Rust is allowed to dereference it while safe Rust is not.

  • In the memory model repo discussion it was a bit unclear (to me) whether this function should take a reference or not. I've keep the interface from the test crate which takes a value, which allows users to pass it a &/&mut/*mut... or whatever they want, but I am not sure this is the right call.

  • should we call this black_box or something else ? (e.g. unknown)

cc @RalfJung @ubsan @rkruppe

@gnzlbg gnzlbg reopened this Aug 29, 2018

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Aug 29, 2018

Unsure if we should be opening a new RFC or reopening this one, cc @rust-lang/libs

```

In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x`
down to `4`, but `4` must be stored into memory even though it is not read by

This comment has been minimized.

@rkruppe

rkruppe Aug 29, 2018

Member

"4 must be stored into memory" does not follow from the definition of black_box given here, it only has to be materialized in some way (e.g., in a register specified by the calling convention). Of course, the current (and likely most practical) implementation using inline assembly does force the value into memory, but that's neither here nor there.

`unsafe` Rust code is allowed to, and requires the compiler to be maximally
pessimistic in terms of optimizations. The compiler is still allowed to optimize
the expression generating `x`. This function returns `x` and is a no-op in the
virtual machine.

This comment has been minimized.

@RalfJung

RalfJung Aug 30, 2018

Member

Which virtual machine?

I know what you mean but many readers will likely be confused.

This comment has been minimized.

@gnzlbg

gnzlbg Aug 30, 2018

Author Contributor

Yeah I don't think there is a good way to put it right now beyond maybe saying "abstract" instead of "virtual".

If we had a memory model, that memory model would specify the abstract machine that Rust runs on and on which the memory model is valid, and I could just refer to that abstract machine here.

Maybe I can rephrase that as "is a no-op in the abstract machine of Rust's memory model (whatever that memory model might end up being)" or something like that ?

This comment has been minimized.

@RalfJung

RalfJung Aug 31, 2018

Member

I don't think most people will know what this means.

I would say something like "You can rely on black_box being a NOP just returning x, but the compiler will optimize under the pessimistic assumption that black_box might do anything with the data it got".


is an _unknown_ function, that is, a function that the compiler cannot make any
assumptions about. It can potentially use `x` in any possible valid way that
`unsafe` Rust code is allowed to, and requires the compiler to be maximally

This comment has been minimized.

@RalfJung

RalfJung Aug 30, 2018

Member

The emphasis here is on allowed to -- as opposed to "any possible way that unsafe Rust could use x". I think this should be made clearer. Maybe link to the nomicon?

Actually, "any possible way that safe Rust could use x" is almost more precise, but may also give the wrong impression.

This comment has been minimized.

@gnzlbg

gnzlbg Aug 30, 2018

Author Contributor

The emphasis here is on allowed to -- as opposed to "any possible way that unsafe Rust could use x". I think this should be made clearer. Maybe link to the nomicon?

Thanks, I think this makes more sense.

Actually, "any possible way that safe Rust could use x" is almost more precise, but may also give the wrong impression.

How so? I thought unsafe Rust was allowed to use unsafe for, e.g., dereferencing raw pointers, while if this would be a safe function, it couldn't do so. In any case, whatever this function does, it cannot invoke undefined behavior and the compiler is allowed to assume that.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 30, 2018

Seems fine to me! I was worried you'd try to talk about which memory black_box could touch, but you actually are using the spec I was hoping for. :)

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 12, 2019

If that's true, then I'd prefer we not have a black_box function and instead provide functions that satisfy the individual requirements of different use-cases

That's what the original version of this RFC actually did (it provided clobber to flush reads to memory, and do_not_optimize(x) to flush a write of x to memory).

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 13, 2019

But that's not the case anymore, and I think its for the better.

black_box solves one problem: it allows you to write benchmarks on stable Rust that are not removed even if you don't perform an expensive side-effecting operation that could significantly alter the benchmarks result (performing a volatile write, printing the result to stdout, etc.).

Leaving the RFC / spec to the side, in practice,black_box solves this problem by doing two things:

  • In the Rust language optimization pipeline: black_box(T) -> T performs a volatile write of its argument to global memory, reads a T from global memory, and returns this T. The return value and the argument are different..
  • In the Rust language, however, black_box is the identity function, it has no side-effects, it's a nop in the abstract machine, and when generating machine code, it generates no instructions.

From a specification / RFC point-of-view, we can't specify both, they are incompatible. Either black_box has side-effects or it does not, but it can't both have and not have side-effects.

This means that black_box is unreliable. From a specification point-of-view, since we can't specify it both ways, either extra instructions are introduced, or the benchmark might be optimized away. From an implementation point-of-view, we can't implement the Rust pipeline part of black_box for all targets / backends, so in some applications, black_box is not going to be useful.

The current RFC specifies black_box as being a nop in the abstract machine (compiles to nothing). This is something that we can guarantee.

It then leaves as a QoI issue whether somehow magically the program is processed by the Rust pipeline according to the volatile reads / writes semantics. Documentation wise, we would document on which target the "Rust pipeline semantics" are uphold when optimizing code, and users can follow the volatile read / write model when reasoning about optimizations. The spec does not allow you to rely on this, users can do so at their own risk.

The volatile read / write model is very coarse grained, but relatively simple for users to understand. It allows users to write simple benchmarks that are able to discover many performance issues. This won't be enough for users that will try to squeeze everything out of their CPUs, but the intent of Rust #[bench] (and criterion) is to be accurate enough while still being as convenient as possible for the majority of benchmarks.

A convenient API to disable optimizations in a sensible way for the majority of benchmarks is an important use case. This is the use case that this RFC addresses, and for this use case black_box does not need to be 100% reliable.


@cramertj

If that's true, then I'd prefer we not have a black_box function and instead provide functions that satisfy the individual requirements of different use-cases, or otherwise capture as many of these use-cases as we can in the semantics of black_box itself.

I agree that more fine grained APIs are worth exploring, but the use cases these address are not the same as the use case that black_box addresses AFAICT.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 13, 2019

@gnzlbg

I agree that more fine grained APIs are worth exploring, but the use cases these address are not the same as the use case that black_box addresses AFAICT.

I agree that they're not the use-cases addressed by the black_box proposed in this change. These use-cases are important, though, and people are familiar with the name black_box to refer to this thing. The function proposed here is more like dont_elide_this_value_for_benchmarking_maybe. Sorry if it feels like i'm dragging this more minimal usecase into a battle around the semantics of memory models and unsafe code, but I think if we offer a function called black_box then it's important that it do the thing people want.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 13, 2019

I would be fine with hint::pessimize or hint::unoptimze or similar, but some of the more verbose names seem like they would make it annoying to find/use these tools.

@comex

This comment has been minimized.

Copy link

comex commented Feb 13, 2019

I don't think adding statements about "pointers escaped" helps the slightest with getting us out of the "langauge legalese limbo". I have not seen any way in which such statements are useful to actually make precise statements about what your code does (in the sense that you would need to e.g. argue why an optimization is correct, or to perform a proof). In fact, I think it is exactly statements of this kind that got us very deep into a language legalese jungle in, e.g., LLVM. A jungle that we are now lost in and have a hard time navigating -- because there is no one clear language with a certain well-behaved semantics, but instead just a whole lot of (frequently mutually incompatible) constraints floating around in people's heads for what effect different things have on the current optimizer. Then someone writes a new pass and everything falls apart. This is not compositional, and it won't lead us to a reliable language.

I'd argue that black_box's semantics can be defined reasonably precisely within the context of a certain compilation model: compiling to some sort of low-level assembly code, where memory consists of bytes (which always have concrete numeric values), and there is a concept of a call to an unknown foreign function (i.e. there is some sort of ABI, as opposed to guaranteed whole program optimization). In this case, the definition is something like "generate code as if this argument were passed to an unknown foreign function". As long as you stay within that compilation model, I think we actually could determine whether or not any given optimization pass correctly respects those semantics.

However, that still leaves ambiguity as soon as you step outside that model. You don't need to go all the way to miri for that, either. Consider MemorySanitizer, an LLVM pass (which rustc does support) that modifies codegen to detect reads from uninitialized memory. In @cramertj's example, a region of uninitialized memory is frozen before being passed to a method. That method is not expected to actually read from the memory; the freezing is essentially a hack to prevent other (largely hypothetical) compiler optimizations from breaking things. But based on the semantics of black_box as I understand them, it could read from it if it wanted to; the result would be the aforementioned "arbitrary-but-defined bytes". However, if you turn on MemorySanitizer, it will (AFAIK) still complain about the uninitialized read! Ultimately, this discrepancy exists because the compilation model no longer fits the above definition. With MemorySanitizer, memory (as observed by loads and stores prior to the transformation) is no longer a series of concrete bytes; there is a separate "undefined" state that doesn't correspond to any numeric value.

Since MemorySanitizer is just one environment Rust wants to target where black_box is inadequate (miri, of course, is another), we definitely need more structured primitives to address specific use cases. However, as I said in an earlier comment, I think Rust can and should define black_box's semantics from the perspective of that compilation model, and require that implementations which fit that model (as currently configured) must uphold those semantics, whereas ones that don't may have looser requirements.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 13, 2019

Sorry if it feels like i'm dragging this more minimal usecase into a battle around the semantics of memory models and unsafe code, but I think if we offer a function called black_box then it's important that it do the thing people want.

I think I was just misinterpreting your concern. Is your only concern about this API the name this RFC gives it, or do you have other concerns ?


@eddyb

hint::pessimize or hint::unoptimze

In Google Benchmark, this API is called doNotOptimize, but I prefer the names you propose.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Feb 13, 2019

I thought we wanted to avoid names like "unoptimize" because they imply the code inside the proverbial black box doesn't get optimized at all, while we only want to prevent certain kinds of benchmark-invalidating optimizations like throwing out the entire benchmark as dead code or constant folding using test values outside the box, without preventing any of the optimizations "fully inside the box" that we're probably actively trying to benchmark.

I've never heard of "black box" having any precise meaning related to UB or constant-time-ness or any of the other use cases described, but if that is a real thing, this seems like a case where every name has problems and we'll just have to pick which one is least bad in practice (both options seem equally guessable to me for users interested in the intended benchmarking use case).

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 13, 2019

Fwiw, I had never heard of black_box before Rust so I associate it solely with benchmarking. We could also recommend users to write hint:: so that it is more clear that it's a hint.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 13, 2019

@gnzlbg

I think I was just misinterpreting your concern. Is your only concern about this API the name this RFC gives it, or do you have other concerns ?

If we give the API this name, then IMO it should do the other things. If there's another name that we can use that is clearer about the fact that this is not guaranteed to do anything at all, then i'm fine with it (though I think the other thing is also useful and something that we should offer in some form).

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 13, 2019

I'd argue that black_box's semantics can be defined reasonably precisely within the context of a certain compilation model: compiling to some sort of low-level assembly code, where memory consists of bytes (which always have concrete numeric values), and there is a concept of a call to an unknown foreign function (i.e. there is some sort of ABI, as opposed to guaranteed whole program optimization). In this case, the definition is something like "generate code as if this argument were passed to an unknown foreign function". As long as you stay within that compilation model, I think we actually could determine whether or not any given optimization pass correctly respects those semantics.

I understand how this effects the results returned by various analyses performed inside a compiler. But what this does not do is define what executing black_box does. You say it is like a "call to an unknown foreign function", but of course there is no assembly instruction or language construct that means "call an unknown foreign function" -- all function calls must and will be known when the program actually executes.

We could specify this as "generate some random piece of code an run it", but of course that will do no good -- the piece of code could do the wrong thing.^^ I think it might work to say that we angelically pick a random piece of code an run it, but, uh, that seems very weird and I have no idea if it works and it is certainly out-of-scope for this benchmarking utility.

("Angelic non-determinism" means that the programmer gets to make the choice during program execution, so for the program to be well-behaved it is sufficient if there is a way to make a good choice. This is in contrast to "daemonic non-determinism" where the compiler/OS/CPU/... gets to make the choice, so for the program to be well-behaved all possible choices must be considered. The usual non-determinism of the allocator or the scheduler is daemonic.)

@hdevalence

This comment has been minimized.

Copy link

hdevalence commented Feb 13, 2019

@gnzlbg

@hdevalence
I'm very disappointed with the direction that this RFC went.

I know zero about writing constant time code, but to ensure that your code is constant time, wouldn't you need to track secrets and ensure that they do not appear on any branches, etc. ? That sounds very different from "do not optimize this code" (whatever that might mean).

Yes, constant-time protections are a very different problem, using black_box is not a solution to them, and, TO BE VERY CLEAR, my code DOES NOT USE IT FOR THAT PURPOSE [1]. I made the mistake of trying to keep track of my own work on a public repo and linked to this discussion from a placeholder issue there, which caused Github to add a cross-reference.

What I'm disappointed about is that at that point, a number of people in this discussion decided to express unsolicited opinions about what I was doing, that it was wrong, and then inserted language into the RFC that's specifically derogatory to my application, even though nobody actually bothered to ask what I was doing, make a good-faith effort to understand it, or trust that I had any domain expertise of my own.

There aren't "twenty different intents" behind black_box, there was a pretty clear one (nicely re-summarized by @comex a few comments ago), before it was removed.

[1]: Just so there is no misunderstanding about intent, I'm putting this in bold all-caps not because I'm upset with @gnzlbg, but because I've said it before, months ago, and nobody seemed to listen or notice, probably because RFC discussions are huge walls of text that nobody actually reads and at least this way that fact will stand out.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Feb 13, 2019

I'm confused now. While there is certainly a question up-thread about whether @hdevalence was after timing guarantees, this appears to have been cleared up by @hdevalence's first reply and never brought up again until just now. The motivation for specifying black_box as the "identity function" instead of as an "unknown function" as I understand it is simply that "unknown function" is not a concept we have any way to rigorously specify in a portable way (looking at the commit history, I don't think this was ever not the intent; the wording merely got clearer over time). I think @comex is arguing that we can rigorously specify it in a non-portable way, and @RalfJung's reply is saying that even within those non-portable assumptions the problem remains that we can't rigorously specify it. Finally, everyone seems to agree that implementations should make black_box actually inhibit certain kinds of optimizations on platforms where it makes sense to do so.

So, unless someone suggested a concrete way to go about rigorously specifying this and we all missed it, I'm not sure what we're even disagreeing about now, if anything.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 13, 2019

Finally, everyone seems to agree that implementations should make black_box actually inhibit certain kinds of optimizations on platforms where it makes sense to do so.

I think there's another camp here which is suggesting that it not even reliably do that much, and just maybe inhibit an optimization in order to help with benchmarks. If that's what's wanted here, then i'd be fine moving forward with e.g. bench_used or something rather than black_box, in order to indicate the more limited scope here.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Feb 13, 2019

@hdevalence wrote:

There aren't "twenty different intents" behind black_box, there was a pretty clear one (nicely re-summarized by @comex a few comments ago), before it was removed.

I have no skin in the "constant time" kerfuffle, but since I was quoted I should state for the record that I did not count whatever they're doing in dalek-cryptography among the figurative "twenty" because I had in fact forgotten entirely about the whole affair. I was instead thinking more of things that might be broadly categorized as:

  • wanting to hinder a specific optimization that is considered undesirable for some reason
  • wanting to hinder a broad category of optimizations (e.g., for benchmarking)
  • wanting to to get away with doing something that is UB by hiding it from "The Optimizer"

... with the additional dimension of "how serious" it would be if they didn't achieve it.

What @comex described is an outline of how an implementation should behave in one common scenario. It is OK as such, though for reasons @RalfJung explained eloquently, it is not at all sufficient for actual semantics or specification. It would great to have a non-normative note that describes how it's implemented in rustc, though! But I also want to point out that what they described is a mechanism, a means to an end, not an end itself.

The actual intents I am referring to are, for example, "I don't want this microbenchmark to be trivialized" or "I want to launder this memory so that it is considered initialized" or "I want to write to memory behind a &T without The Optimizer messing it up". Some of these (most, I would hope) are reasonable things to want, but how appropriate the mechanism @comex described is for them varies. It's decent for benchmarking, for other things a more fine-grained tool is desirable, and for yet other things it's insufficient because what is desired clashes completely with other language semantics or the "escape hatch" being a nop is unacceptable. For example, I don't think any precise meaning black_box put forth in this discussion would really cover the proposed ptr::freeze, and it can't help someone who wants to do UB and get away with it.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Feb 13, 2019

Finally, everyone seems to agree that implementations should make black_box actually inhibit certain kinds of optimizations on platforms where it makes sense to do so.

I think there's another camp here which is suggesting that it not even reliably do that much, and just maybe inhibit an optimization in order to help with benchmarks.

Interesting. Unfortunately I don't understand what distinction you're trying to draw here, so maybe this is the crux of my confusion. Is there some meaningful status in between the binary "normative part of the specification required for an implementation to be correct" and "non-normative suggestion/recommendation/encouragement/QoI issue"?

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Feb 13, 2019

I think the constant time stuff is mostly a red herring, I apologize for using it as an example. Folks were talking about it and I included it, I should have recalled the full context.

--

I like @rkruppe's analysis:

  1. wanting to hinder a specific optimization that is considered undesirable for some reason
  2. wanting to hinder a broad category of optimizations (e.g., for benchmarking)
  3. wanting to to get away with doing something that is UB by hiding it from "The Optimizer"

3 is what we're trying to avoid providing in this RFC, and 2 is what the goal of the RFC was. 1 is interesting because it's very similar to 2, however 1 requires more spec work and is tricky to do in a portable manner (though @comex's comment is a step)

As I understand it most use cases of 1 are after someone has looked at the generated assembly and optimization pipeline anyway, i.e. it's somewhat compiler-specific. People using black_box for use case 1 may not care as much about alternative rust compilers doing the same thing.

With that in mind, a feasible middle ground is that we specify black_box to not have any special properties, however rustc documents black_box as doing X on platform Y, etc, allowing people to rely on it for compiler optimization hinting purposes (but still not for UB purposes)

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 13, 2019

I don't think we should introduce any implementation defined behavior here because rustc, as the only compliant Rust implementation, will make this behavior de-facto in a way which all other implementations will eventually have to follow since some folks will expect it. The non-portable nature doesn't help either.

Moreover, I don't think it's helpful for the use-case of benchmarking. To me, the fact that this RFC doesn't guarantee anything, beyond being the identity function, makes it more fit for benchmarking as this allows us to change how hint::black_box achieves its purpose on varying platforms depending on how LLVM and other backends change their optimization pipelines. We have plenty of similarly vague hints guaranteeing nothing: #[inline], likely, #[optimize(...)], hint::spin_loop, and hint::unreachable_unchecked.

For the use-cases that require guarantees, we've already started work on freeze and on the All Hands we also discussed inline assembly (which should fit well given "[..] after someone has looked at the generated assembly and optimization pipeline anyway, [..]"). I find those tailored mechanisms to provide more clarity wrt. semantics.

Given existing constraints as well as:

If that's what's wanted here, then i'd be fine moving forward with e.g. bench_used or something rather than black_box, in order to indicate the more limited scope here.

I think the next steps here are to enter a bikeshed phase, find a replacement name for black_box (which could be bench_used) that we are all happy with, and then :shipit:.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Feb 13, 2019

will make this behavior de-facto in a way which all other implementations will eventually have to follow since some folks will expect it

I think there are ways of documenting this that avoid this effect; this can be explicitly documented as a "don't rely on this for unsafe code", and the people relying on this to prevent optimizations are at a stage where they're looking at llvm output anyway.

Use case 1 is something that's pretty valuable -- I've certainly had to do stuff like that in the past, though usually through other means -- and supporting that use case would be nice if we can do so without requiring a proper spec.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Feb 13, 2019

I think there are ways of documenting this that avoid this effect; this can be explicitly documented as a "don't rely on this for unsafe code", and the people relying on this to prevent optimizations are at a stage where they're looking at llvm output anyway.

That still leaves us with people relying on this for non-unsafe cases (e.g. security) and makes it harder to change how hint::black_box operates later.

Use case 1 is something that's pretty valuable -- I've certainly had to do stuff like that in the past, though usually through other means -- and supporting that use case would be nice if we can do so without requiring a proper spec.

Sure; but we can provide mechanisms to satisfy those needs separately without baking them into black_box. If you are going to look at the generated assembly in any case, it seems more readable to use inline assembly for that or some other tailored mechanism.

@comex

This comment has been minimized.

Copy link

comex commented Feb 13, 2019

I understand how this effects the results returned by various analyses performed inside a compiler. But what this does not do is define what executing black_box does. You say it is like a "call to an unknown foreign function", but of course there is no assembly instruction or language construct that means "call an unknown foreign function" -- all function calls must and will be known when the program actually executes.

I'll try answering this in two ways.

First, it's true that my proposed definition describes the behavior of a compiler rather than explicitly defining the semantics of the program itself. However, since the definition is already restricted to compilation models involving a compiler that produces assembly code, that doesn't make it incomplete. It's easier to specify with respect to a compiler because the framework we usually use to describe program semantics doesn't include an explicit notion of a low-level machine. But since the definition is complete, if we had a well-defined framework with such a notion, it necessarily could be described in terms of that framework.

Second, if I do try to word it in terms of semantics, it would be based on the idea of synchronizing the low-level machine state with the high-level language state.

After all… do you agree that asm! can at least theoretically be given well-defined semantics? (I say theoretically, because asm! is definitely complicated and subtle enough that fully describing its semantics would be a significant undertaking. However, it's also a feature that low-level Rust programs need, so unlike black_box it can't just be punted. :)

The current definition of black_box is:

pub fn black_box<T>(dummy: T) -> T {
    // we need to "use" the argument in some way LLVM can't
    // introspect.
    unsafe { asm!("" : : "r"(&dummy)) }
    dummy
}

The assembly inside the asm! block is allowed to assume that the low-level state reflects the high-level language state: it can do a low-level memory read and expect to read any value that was previously (in program order) written there by high-level code. Thus, before executing the assembly, the compiler-generated code must flush any pending writes, and generally transfer the high-level state to the low-level state. Similarly, the assembly is allowed to modify the low-level state and expect later high-level code to see those modifications, so the generated code must perform the reverse process after executing the assembly.

The assembly is restricted in one way: it must only access data through its specified inputs and possibly[1] through global variables. It's not allowed to guess where on the stack or in registers some data will be stored; if it guesses (even correctly) and modifies such data, the result is undefined behavior, because the compiler is allowed to assume that "unobservable" low-level state is unaffected by the asm! block.

Now, the current implementation makes somewhat stronger guarantees about the low-level state than necessary. It:

  1. forces the value into memory (implied by materializing &dummy);
  2. forces &dummy itself into a register (which then may go unused); and
  3. possibly should[1] make the compiler assume that global data could have been modified, in addition to data accessible via pointers in dummy.

Ideally, it should be enough to guarantee that at the time the low-level and high-level states are synchronized, the value exists somewhere in the low-level state – in memory, in a register, or even in multiple places – as long as all those bits of state, along with any memory reachable via pointers in the value (but not globals), are then copied back from low-level state to high-level state.

In reality, the inline assembly block is empty and thus does nothing. However, the low-level state at that moment must be considered externally observable and modifiable. If I attach a debugger to the program, set a breakpoint at the right PC, and modify that state at that moment, the program must continue to behave correctly (which is of course not guaranteed in general for modifications from a debugger).

[1] I'm a bit confused about whether the asm block is actually buggy and ought to include the memory clobber. The LLVM language reference suggests it is not buggy: "The one exception is that a clobber string of “~{memory}” indicates that the assembly writes to arbitrary undeclared memory locations – not only the memory pointed to by a declared indirect output." However, this is meant to be a replica of GCC inline assembly, and the [GCC manual] says that the memory clobber "tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters)." At least based on the documentation, it does seem like LLVM is intentionally implementing stronger semantics, but I'm not sure why.

@comex

This comment has been minimized.

Copy link

comex commented Feb 14, 2019

By the way, what I wrote above sounds really complicated, but I really don't think it should be a blocker to stabilizing black_box. Those semantics are already implemented, and have been for years. The only target where they're not implemented is WebAssembly, and that's just a bug, really (LLVM should just support inline assembly for wasm). We know how Cranelift or any future backend should implement it. We even know how to translate it to C (at least GCC-flavored C – just translate the asm block).

The only question is what miri should do if it wants to implement it. I do not think it should go around freezing memory or otherwise try to simulate its high-level effects on a native code target. There would have to be a special-purpose MIR construct to replace the existing asm block, which would inhibit MIR optimizations in the same way, following the contract that black_box inhibits optimizations. But miri itself should then just do the obvious thing and treat it as a no-op. (Possibly it should freeze the bytes of the argument itself, but not anything else reachable from it.) black_box is fundamentally a low-level construct whose effects depend on the compilation target. That's okay. The documentation can warn about that.

The main potential downside I can see would be that applications might inadvertently rely on guarantees about black_box's high-level effects which wouldn't apply on all targets. We could instead stabilize black_box with weaker semantics and expect applications that need the stronger semantics to use asm! – at least that makes it obvious that low-level behavior is involved.

The problem with that argument, though, is that just having the documentation say X behavior is not guaranteed is not particularly effective at preventing people from writing code that relies on that behavior. The best we can do is point people in the right direction, where "right" in this case includes higher-level primitives that freeze memory and such. And that can happen regardless of what black_box itself guarantees.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 14, 2019

Thus, before executing the assembly, the compiler-generated code must flush any pending writes,

LLVM currently has to do this, but what if a new optimization pass is added to LLVM that understands the semantics of this asm! block? That pass (a pattern match on empty assembly blocks) could prove that this does nothing, and remove the block.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 14, 2019

I'm a bit confused about whether the asm block is actually buggy

FWIW the implementation used in the RFC examples does use clobber (e.g. https://rust.godbolt.org/z/wDckJF), whether you use it or not (and volatile) depends on what you want to do with the intrinsic. These aren't equivalent, but for the problem this RFC solves that is not important.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 14, 2019

First, it's true that my proposed definition describes the behavior of a compiler rather than explicitly defining the semantics of the program itself. However, since the definition is already restricted to compilation models involving a compiler that produces assembly code, that doesn't make it incomplete. It's easier to specify with respect to a compiler because the framework we usually use to describe program semantics doesn't include an explicit notion of a low-level machine. But since the definition is complete, if we had a well-defined framework with such a notion, it necessarily could be described in terms of that framework.

I am afraid I strongly disagree. It is far from complete. Languages are not specified by a compiler, the compiler is justified against the language specification. You described the implementation of black_box in the compiler backend, but that does not make it a useful specification. I am also at loss of words, I only know so many ways to say the same thing.^^

If you make the compiler part of the spec, you have lost any ability to judge whether the compiler is correct -- it is part of the spec and correct-by-definition. That is not a useful criterion.

Just to give an example, assume we change our compilation pipeline such that it optimizes the final assembly code after linking. Assume static linking so that there is no unknown code. Without black_box, it is quite clear what we have to be careful about: do not change program behavior, period. With black_box, we now have to also magically inhibit optimizations around a construct that doesn't even exist any more at this stage in the pipeline. We don't even have a way to answer the question "what makes an optimization here correct", and that's a pretty bad place to be in.

We need a way to specify what this function does without mentioning a compiler. And the more I think about it, the more I am convinced that it is a maximally angelic effect from the compiler's perspective, with the additional out-of-band requirement that the only effect the programmer may choose is no effect.

do you agree that asm! can at least theoretically be given well-defined semantics?

Yes. But under that semantics, black_box would indeed be a NOP, because the semantics would say that (after some mapping between high-level state and low-level state as you suggested), you run the assembly code. That code does nothing. Then we map back. So, under any semantics for asm! that actually models what happens (executing the empty assembly code), black_box is still a NOP.

The only reason black_box happens to work is that LLVM does not analyze the assembly code inside the asm!. But there is nothing in the hypothetical asm! spec that prevents it from doing so. These annotations only serve to describe an "upper bound" to what the asm! block does, to aid the compiler in its analysis. IOW, it is UB for an asm! block to do things that are not covered by the annotations -- but throwing away the annotations and instead doing a proper analysis of the included assembly code is always correct. In that sense, the annotations work just like some of the attributes that C compilers have, such as noreturn or readonly.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Feb 14, 2019

Checking my box and unsubscribing. I think this thread is an example of the failure of balancing the abstract goal of a well specified language against the practical goal of satisfying our users needs. I am much more interested in having the black_box defined in rustc, which does what we all know it does, be stable than worrying about what semantics it has in hypothetical future Rust implementations that may or may never happen. I think the blasé attitude with which these hypothetical concerns have been centered, and the disrespect and disregard shown to active users of this feature, is a prime example of callousness and heartlessness that our RFC process brings out. This thread disappoints me.

@comex

This comment has been minimized.

Copy link

comex commented Feb 14, 2019

If you make the compiler part of the spec, you have lost any ability to judge whether the compiler is correct -- it is part of the spec and correct-by-definition. That is not a useful criterion.

In theory, I'm not talking about any specific compiler or even any specific optimizations – only a general description of a compiler that produces some form of low-level assembly code.

The rest of your comment is about the idea of the compiler optimizing or analyzing the contents of an inline assembly block. That would definitely violate the assumptions of some existing use cases of inline assembly in C, including, among others, uses that implement the equivalent of black_box, and uses where the assembly is expected to be patched at runtime (the Linux kernel likes to do this). Therefore, it would not be compatible with the current semantics of asm in C, and most likely asm! in Rust.

Indeed, the runtime patching example shows that with asm, the machine code itself is part of the observable (and mutable) state of the program. There is no way to specify this without a compiler.

It's probably not necessary to go that far just for black_box, but I don't see much reason not to, either.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 14, 2019

I think this thread is an example of the failure of balancing the abstract goal of a well specified language against the practical goal of satisfying our users needs. I am much more interested in having the black_box defined in rustc, which does what we all know it does, be stable than worrying about what semantics it has in hypothetical future Rust implementations that may or may never happen.

I am arguing strongly against people claiming that black_box is reasonably specified. I'd argue much less strongly against people saying that it is worth having an unspecifiable primitive, but that is not the argument that was made. In fact it seems like freeze + a renamed version of what is being discussed here covers a vast amount of use-cases, and the process of identifying use-cases not covered by this is still on-going (in parallel with at least two other discussions all happening in the same thread :/ ).

Also notice that even freeze, which is vastly less powerful than black_box, was discussed and argued against by users, not people doing language specs, years ago. The opposition is far from just being hypothetical.

I think it is an attitude of ignoring specifiability concerns that has lead C/C++ to where they are today. And maybe being extremely widely used but impossible to reason about counts as success, but I think we can do better than that -- and we should at least explicitly acknowledge when we are deliberately repeating such mistakes.

Finally, (assuming that part of your comment is pointed, amongst others, at me), I think it is unfair to claim I would disrespect the user's needs. I inquired for more details about what @cramertj needs, which probably got lost in the discussion (there was no reply to this). I also provided concrete arguments for what I think is wrong with black_box, that you chose to not reply to.

There is no precedent in Rust for adding an unspecifiable primitive. This goes against the core value of offering safety guarantees, of being able to (informally) reason about what your code does. Giving up on this and dismissing concerns of the kind "LLVM could break this at any time by doing any one of the following things that seems perfectly in-contract for a compiler to do" as purely hypothetical is not helpful. Playing users and language specifies against each other as if they wouldn't have a shared interest is not helpful, either. The helpful thing to do here is try and figure out what kinds of guarantees are needed.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 14, 2019

The rest of your comment is about the idea of the compiler optimizing or analyzing the contents of an inline assembly block. That would definitely violate the assumptions of some existing use cases of inline assembly in C, including, among others, uses that implement the equivalent of black_box, and uses where the assembly is expected to be patched at runtime (the Linux kernel likes to do this). Therefore, it would not be compatible with the current semantics of asm in C, and most likely asm! in Rust.

If that is indeed part of the semantics of asm!, then I cannot say I am convinced it is specifiable. Clearly the specification is not "run this assembly code", it is something much more complex than that.

Notice, however, that one part of my argument was assuming we'd optimize the assembly code, after everything LLVM does. That is certainly something we are able to do, because CPUs do that! So if the expectation is that the assembly file will be patched before execution, relying in not even semantics-preserving transformations having happened, than clearly the semantics of the file generated by LLVM is... uh, something really weird, but not "this is an assembly file". This seems like a really hard situation to specify properly. Does anyone know of any work in this direction?

Lucky enough, we don't have to solve this problem here and now, I think.^^

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 14, 2019

Maybe we should shrink this in scope even further and have two primitives?

  • bench_obscure (or bench_input)
    • fn(T) -> T (identity-like)
    • may prevent some optimizations from seeing through the valid T value
      • more specifically, things like const/load-folding and range-analysis
    • miri would still check the argument, and so it couldn't be e.g. uninitialized
    • the argument computation can be optimized-out (unlike bench_used)
    • mostly implementable today with the same strategy as black_box
  • bench_used (or bench_output)
    • fn(T) -> () (drop-like)
    • may prevent some optimizations from optimizing out the computation of its argument
    • the argument is not treated as "escaping into unknown code"
      • i.e. you can't implement bench_obscure(x) as { bench_used(&mut x); x }, what that would likely prevent is placing x into a register instead of memory, but optimizations might still see the old value of x, as if it couldn't have been mutated
    • potentially implementable like black_box but readonly/readnone

This seems specific enough to benchmarks/performance profiling to be less of a liability, miri and the documentation can be clear about this being much unlike freeze, and more things can be optimized.
(and it certainly doesn't need to depend on the semantics of asm!)

It also feels significantly better than black_box in that we're not reusing a primitive for two subtly distinct sides ("input" and "output") of a computation that we want to measure performance characteristics of, with subtly different (de)optimization needs.

You could even imagine having a hint::bench(T, impl FnOnce(T) -> U) -> U abstraction which handles the asymmetry while requiring less manual control.

Bonus points for integrating the input/output with a bench framework so users never even have to use these hints directly.

@comex

This comment has been minimized.

Copy link

comex commented Feb 14, 2019

@RalfJung I agree it's not a problem we have to solve for black_box, so I'm going to resist the temptation to reply, to avoid cluttering this thread more than I already have. But, since stabilizing asm! is also a goal, I'd be interested in talking more about it somewhere else. :)

The freeze example is interesting. It seems like the opposition is along very different lines from what this thread has discussed: not that it's hard to specify, or that future implementations might have trouble defining or upholding the semantics, but basically "there is no good reason to ever use this". I suppose you might make the same argument about the implicit freezing that black_box can do, but overall it seems like a fairly different debate.

Anyway, I don't mind @eddyb's specification; it seems potentially more user-friendly than the current version, which can be confusing. I still think it would also be fine to stabilize black_box as is, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment