Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: hint::black_box #2360
Conversation
sfackler
reviewed
Mar 12, 2018
| [motivation]: #motivation | ||
|
|
||
| The implementation of these functions is backend-specific, and must be provided | ||
| by the standard library. |
This comment has been minimized.
This comment has been minimized.
sfackler
Mar 12, 2018
Member
Probably worth adding why these are useful for benchmarks to the motivation section.
This comment has been minimized.
This comment has been minimized.
gnzlbg
force-pushed the
gnzlbg:black_box
branch
4 times, most recently
from
b3f4441
to
8a9ae3f
Mar 12, 2018
rkruppe
reviewed
Mar 12, 2018
|
|
||
| Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then | ||
| `4`, but it is not allowed to discard `4`. Instead, it must store `4` into a | ||
| register even though it is not used by anything afterwards. |
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 12, 2018
Member
Nit, but this doesn't really match what the implementation of black_box does (and what the linked godbolt shows). It forces the value into memory, and any register traffic is due to that (e.g. in the linked godbolt, the add edi, 2 is not a dead definition, it's immediately stored to [rbp - 4], and that store is dead).
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 12, 2018
Contributor
@rkruppe how would you formulate this for the guide-level explanation? Maybe I can just say that 4 must be stored into memory and leave it at that.
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 12, 2018
Member
That seems decent. But it's tricky to say anything about this sort of feature, because it's inherently very tied to optimizer capabilities and the resulting machine code =/
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 13, 2018
Contributor
So I've reworded this part a bit, but maybe for the guide level explanation we might want to be even more vague and just leave it at: "the value cannot be discarded" or "prevents the value x from being optimized away".
gnzlbg
force-pushed the
gnzlbg:black_box
branch
from
8a9ae3f
to
ec42fd0
Mar 12, 2018
rkruppe
reviewed
Mar 12, 2018
| pub fn clobber() -> (); | ||
| ``` | ||
|
|
||
| flushes all pending writes to memory. Memory managed by block scope objects must |
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 12, 2018
Member
This wording (flushing pending writes) makes me uncomfortable because it's remniscient of memory consistency models, including hardware ones, when this is just a single-threaded and compiler-level restriction. Actually, come to think of it, I'd like to know the difference between this and compiler_fence(SeqCst). I can't think of any off-hand.
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 12, 2018
Contributor
when this is just a single-threaded and compiler-level restriction
This is correct.
compiler_fence(SeqCst). I can't think of any off-hand.
I can't either, but for some reason they do generate different code: https://godbolt.org/g/G2UoZC
I'll give this some more thought.
This comment has been minimized.
This comment has been minimized.
nagisa
Mar 12, 2018
•
Contributor
The difference between asm! with a memory clobber and compiler_fence exists in the fact, that memory clobber requires compiler to actually reload the memory if they want to use it again (as memory is… clobbered – considered changed), whereas compiler_fence only enforces that memory accesses are not reordered and the compiler still may use the usual rules to figure that it needn’t to reload stuff.
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 13, 2018
Contributor
@nagisa the only thing that clobber should do is flush pending writes to memory. It doesn't need to require that the compiler reloads memory on reuse. Maybe the volatile asm! with memory clobber is not the best way to implement that.
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 13, 2018
•
Member
@nagisa Thank you. I was mislead by the fact that fences prohibit some load-store optimization to thinking they'd also impact things like store-load forwarding on the same address with no intervening writes.
@gnzlbg asm! with memory is simply assumed to read from and write to all memory and all its effects follow from that. However, to be precise, this does not mean any reloads are introduced after a clobbering inline asm, it just means that all the loads that is already there (of which are a lot given that every local and many temporaries are stack slots) can't be replaced with values loaded from the same address before the clobber.
If you want something less strong, you need to be precise. "Flushing loads writes" is not really something that makes intuitive sense at the compiler level (and it surely has no effect on runtime, for example caches or store buffers?).
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 13, 2018
Contributor
@rkruppe mem::clobber() should be assumed to read from all memory with effects, so that any pending memory stores must have completed before the clobber. All the loads that are already there in registers, temporary stack slots, etc, should not be invalidated by the clobber.
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 13, 2018
Member
Okay that is a coherent concept. I'm not sure off-hand how to best implement that in LLVM. (compiler_fence is probably not enough, since it permits dead store elimination if the memory location is known to not escape.)
This comment has been minimized.
This comment has been minimized.
rkruppe
Mar 13, 2018
Member
However, come to think of it, what's the difference between black_box(x) (which is currently stated to just force a write of x to memory) and let tmp = x; clobber(); (which writes x to the stack slot of tmp and then forces that store to be considered live)?
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 13, 2018
•
Contributor
That's a good question and this relates to what I meant with "block scope" . In
{
let tmp = x;
clobber();
}this {} block is the only part of the program that knows the address of tmp so no other code can actually read it without invoking undefined behavior. So in this case, clobber does not make the store of tmp live, because nothing outside of this scope is able to read from it, and this scope doesn't read from it but executes clobber instead (clobbering "memory" does not clobber temporaries AFAICT).
However, if one shares the address of tmp with the rest of the program:
{
let tmp = x;
black_box(&tmp);
clobber();
}then clobber will force the store of tmp to be considered live.
gnzlbg
referenced this pull request
Mar 12, 2018
Closed
Move test::black_box to std and stabilize it #1484
leodasvacas
reviewed
Mar 12, 2018
| let mut v = Vec::with_capacity(n); | ||
| bench.iter(|| { | ||
| // Escape the vector pointer: | ||
| mem::black_box(v.as_ptr()); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
gnzlbg
Mar 13, 2018
Contributor
Not in this case because the vector does not grow. If the vector would be allowed to grow on push, then the answer would be yes.
Centril
added
the
T-libs
label
Mar 12, 2018
This comment has been minimized.
This comment has been minimized.
|
I was already concerned about the interaction with the memory model, and this comment basically confirmed my worst suspicions: There are extremely subtle interactions between the memory model and what these function can do. I'm starting to believe that we'll be unable to give any guarantes that are of any use to benchmark authors. But if this is the case, these functions become a matter of chasing the optimizer and benchmarking reliably requires checking the asm to see your work wasn't optimize out. That's clearly very unappealing, but has always been true of micro benchmarks, so maybe that's just unavoidable. This also raises the question of why this needs to be in std, if it's not compiler magic (like |
This comment has been minimized.
This comment has been minimized.
|
I'm temporarily closing this till we resolve the semantics of these functions in the memory model repo. |
gnzlbg
closed this
Mar 13, 2018
gnzlbg
referenced this pull request
Mar 13, 2018
Open
semantics of black_blox and clobber in the memory model #45
This comment has been minimized.
This comment has been minimized.
|
The issue in the memory model repo is: nikomatsakis/rust-memory-model#45 |
This comment has been minimized.
This comment has been minimized.
|
Update. The summary of the discussion on the memory model repo is that |
This comment has been minimized.
This comment has been minimized.
|
Any updates on opening the new RFC? |
This comment has been minimized.
This comment has been minimized.
|
I've updated the RFC with the discussion of the memory model repo. It now just proposes Other changes:
Open questions:
|
gnzlbg
reopened this
Aug 29, 2018
This comment has been minimized.
This comment has been minimized.
|
Unsure if we should be opening a new RFC or reopening this one, cc @rust-lang/libs |
rkruppe
reviewed
Aug 29, 2018
| ``` | ||
|
|
||
| In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x` | ||
| down to `4`, but `4` must be stored into memory even though it is not read by |
This comment has been minimized.
This comment has been minimized.
rkruppe
Aug 29, 2018
Member
"4 must be stored into memory" does not follow from the definition of black_box given here, it only has to be materialized in some way (e.g., in a register specified by the calling convention). Of course, the current (and likely most practical) implementation using inline assembly does force the value into memory, but that's neither here nor there.
RalfJung
reviewed
Aug 30, 2018
| `unsafe` Rust code is allowed to, and requires the compiler to be maximally | ||
| pessimistic in terms of optimizations. The compiler is still allowed to optimize | ||
| the expression generating `x`. This function returns `x` and is a no-op in the | ||
| virtual machine. |
This comment has been minimized.
This comment has been minimized.
RalfJung
Aug 30, 2018
Member
Which virtual machine?
I know what you mean but many readers will likely be confused.
This comment has been minimized.
This comment has been minimized.
gnzlbg
Aug 30, 2018
Contributor
Yeah I don't think there is a good way to put it right now beyond maybe saying "abstract" instead of "virtual".
If we had a memory model, that memory model would specify the abstract machine that Rust runs on and on which the memory model is valid, and I could just refer to that abstract machine here.
Maybe I can rephrase that as "is a no-op in the abstract machine of Rust's memory model (whatever that memory model might end up being)" or something like that ?
This comment has been minimized.
This comment has been minimized.
RalfJung
Aug 31, 2018
Member
I don't think most people will know what this means.
I would say something like "You can rely on black_box being a NOP just returning x, but the compiler will optimize under the pessimistic assumption that black_box might do anything with the data it got".
RalfJung
reviewed
Aug 30, 2018
|
|
||
| is an _unknown_ function, that is, a function that the compiler cannot make any | ||
| assumptions about. It can potentially use `x` in any possible valid way that | ||
| `unsafe` Rust code is allowed to, and requires the compiler to be maximally |
This comment has been minimized.
This comment has been minimized.
RalfJung
Aug 30, 2018
Member
The emphasis here is on allowed to -- as opposed to "any possible way that unsafe Rust could use x". I think this should be made clearer. Maybe link to the nomicon?
Actually, "any possible way that safe Rust could use x" is almost more precise, but may also give the wrong impression.
This comment has been minimized.
This comment has been minimized.
gnzlbg
Aug 30, 2018
Contributor
The emphasis here is on allowed to -- as opposed to "any possible way that unsafe Rust could use x". I think this should be made clearer. Maybe link to the nomicon?
Thanks, I think this makes more sense.
Actually, "any possible way that safe Rust could use x" is almost more precise, but may also give the wrong impression.
How so? I thought unsafe Rust was allowed to use unsafe for, e.g., dereferencing raw pointers, while if this would be a safe function, it couldn't do so. In any case, whatever this function does, it cannot invoke undefined behavior and the compiler is allowed to assume that.
This comment has been minimized.
This comment has been minimized.
|
Seems fine to me! I was worried you'd try to talk about which memory |
This comment has been minimized.
This comment has been minimized.
comex
commented
Oct 10, 2018
•
|
A good point of comparison might be There are other hypothetical targets where it's ambiguous what the attributes should mean. For example, in a Rust implementation targeting the JVM, should Similarly, you could have an implementation where the meaning is unambiguous but the implementation strategy makes the effect impossible to achieve. For example, a portable Rust-to-assembly compiler could be implemented by compiling to C and then calling edit: In the case of foo(x * 2);-> if x == 3 {
foo(6);
} else {
foo(x * 2);
}which could cause observable timing effects depending on the value. And then of course there's the hardware, which does who knows what. |
Centril
reviewed
Oct 10, 2018
Centril
added
T-compiler
and removed
T-lang
labels
Oct 10, 2018
This comment has been minimized.
This comment has been minimized.
|
With the recent round of changes in ea2dfeb, satisfying my notes in #2360 (comment) and #2360 (comment) with respect to guarantees, I believe the language team's interest and business with respect to this RFC is concluded. |
This comment has been minimized.
This comment has been minimized.
|
@Centril I hope the last commit incorporates all of your feedback. @RalfJung it might be worth going through this again :/ @hdevalence Is the requirement for constant-time machine code binary (yes/no) or does the requirement "a certain degree of constant-timeness" also exist? In practice, In the spec, however, |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg Yes, thank you; I think you've captured the important points :) |
This comment has been minimized.
This comment has been minimized.
hdevalence
commented
Oct 10, 2018
|
As @gnzlbg described originally,
The proposal seems to now instead be that, "in practice" (i.e., in the Rust implementation which actually exists), this identity function will really act as an unknown function, and users are encouraged to reason about this identity function as if it were an unknown function, i.e., as if it had properties which it doesn't actually have. Specifying Finally, I'm sorry to have referenced this issue from an external repo, which has apparently caused the discussion to be derailed into whether |
This comment has been minimized.
This comment has been minimized.
The reference-level explanation states the semantics of That is, the Rust implementation that actually exists does not need to, and cannot provide, the same special effects for all targets (e.g. NVPTX, WASM, ASM.js, etc. already define What this means for users is that they cannot rely on |
This comment has been minimized.
This comment has been minimized.
Our formal language is unfortunately still further from a "small MIR" than I'd like. It's on my TODO list but unlikely to happen before my thesis.
I am not aware of any papers, it's an open problem after all. ;) The various special problems crypto code has sure have their individual descriptions somewhere, here's one for zeroing memory.
It's fine and I don't think it was very derailed, most of the discussion was totally relevant. Links are great :) |
This comment has been minimized.
This comment has been minimized.
I did, and it looks good! |
RalfJung
reviewed
Oct 11, 2018
text/0000-bench-utils.md Outdated
This comment has been minimized.
This comment has been minimized.
|
Since |
This comment has been minimized.
This comment has been minimized.
GabrielMajeri
commented
Oct 12, 2018
|
@liigo The term black box is often used when talking about testing or software development. The fact that it seems to link up with And "black hole" doesn't suggest the value goes through (an unspecified process) unharmed, just that it goes to get destroyed. |
This comment has been minimized.
This comment has been minimized.
|
Google Benchmark calls this |
This comment has been minimized.
This comment has been minimized.
|
On the bikeshed side, I think
|
This comment has been minimized.
This comment has been minimized.
comex
commented
Oct 12, 2018
|
This comment has been minimized.
This comment has been minimized.
|
cc @japaric -- How would stabilizing this affect |
This comment has been minimized.
This comment has been minimized.
It would allow them to be more reliable on stable Rust. Currently, criterion uses |
Centril
added
A-hint
A-profiling
labels
Nov 22, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bheisler
commented
Nov 25, 2018
added a commit
to mellium/xmpp-addr
that referenced
this pull request
Jan 18, 2019
This comment has been minimized.
This comment has been minimized.
|
Any progress on moving this forward? What is blocking it? |
gnzlbg commentedMar 12, 2018
•
edited
Adds
black_boxtocore::hint.Rendered.