Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize volatile copy and set functions #2728

Open
wants to merge 1 commit into
base: master
from

Conversation

@dancrossnyc
Copy link

commented Jul 20, 2019

Rendered

Stabilize the volatile_copy_memory, volatile_copy_nonoverlapping_memory
and volatile_set_memory intrinsics as ptr::copy_volatile,
ptr::copy_nonoverlapping_volatile and ptr::write_bytes_volatile,
respectively.

I initiated discussion on this back in June here: https://internals.rust-lang.org/t/pre-rfc-stabilize-volatile-copy-and-set-functions/9872

Unfortunately, it fell off my plate, but I'd like to dust if off and move it forward.

Signed-off-by: Dan Cross cross@gajendra.net

Show resolved Hide resolved text/0000-volatile-copy-and-set.md
Show resolved Hide resolved text/0000-volatile-copy-and-set.md Outdated
Show resolved Hide resolved text/0000-volatile-copy-and-set.md Outdated
@Centril

This comment has been minimized.

Copy link
Member

commented Jul 20, 2019

@dancrossnyc dancrossnyc force-pushed the dancrossnyc:master branch from 777739b to 8c5dc1b Jul 21, 2019

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 21, 2019

Can't these new methods be implemented (in user code) in terms of the ones we are already exposing? If yes, the RFC should say why new APIs should be added to libstd, given the motivation is an extremely niche use-case. If no, the RFC should explain why the "obvious" implementation in terms of read_volatile and write_volatile is not correct.

EDIT: Ah, I think I found the answer in the "alternatives" section:

inelegant, less likely to perform well, and opens up the possibility of bugs

At least the first and third point can easily be taken care of by writing this in a crate once and auditing it properly. In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.


Finally, RFC 1467 was in mild error in asserting that no analogue
for the proposed intrinsics exist in C. `memcpy`, `memmove` and `memset`
on volatile-qualified data provide this functionality in standard C.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 21, 2019

Member

Please provide a source for this. A quick search only turns up information that says the opposite, e.g. here:

memcpy is incompatible with volatile objects

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 21, 2019

Member

Also looking at this LLVM IR it does not seem like the memcpy is volatile in any sense, or am I missing something?

This comment has been minimized.

Copy link
@dancrossnyc

dancrossnyc Jul 21, 2019

Author

You are right and I was wrong.

Indeed, the situation in C is even worse than me simply being incorrect; if I'm reading the standard right, access a volatile-qualified datum via a non-const-qualified Lvalue is undefined behavior. So, for example, a single element struct containing a volatile array, or casting away the volatile, might be UB (in the C sense); yuck. One of the many reasons we are switching to Rust....

I added that language to try and make this proposal independent of the details of LLVM's implementation. I've attempted a revision to make the intent clearer; I'm still not quite happy with it, but I'm also going to be out of town for a few days, so forgive me if I don't revise that right away. :-)

Regarding the other comment:

Can't these new methods be implemented (in user code) in terms of the ones we are already exposing? If yes, the RFC should say why new APIs should be added to libstd, given the motivation is an extremely niche use-case. If no, the RFC should explain why the "obvious" implementation in terms of read_volatile and write_volatile is not correct.

EDIT: Ah, I think I found the answer in the "alternatives" section:

inelegant, less likely to perform well, and opens up the possibility of bugs

At least the first and third point can easily be taken care of by writing this in a crate once and auditing it properly. In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.

I removed the mention of efficiency. I think the main issue is preventing duplication and potential bugs. Memset and memcpy are easy; overlapping memmove is a bit weirder.

This comment has been minimized.

Copy link
@dancrossnyc

dancrossnyc Jul 21, 2019

Author

Oh one other small note about performance: I think the compiler is in an interesting position where it can do things like loop unrolling while respecting icache alignment and things like that. As long as the semantics of the abstract machine are preserved, I don't see any reason why it couldn't do things like that in a way that would be, perhaps, more challenging with a hand-rolled loop: particularly if that loop lived in another crate.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 22, 2019

Member

I added that language to try and make this proposal independent of the details of LLVM's implementation.

That's totally great, but the RFC shouldn't state incorrect things about C/C++. ;)

Thanks for updating the RFC!

Regarding the other comment:

Discussion is way easier if you keep the topics within a thread instead of jumping between them.

This comment has been minimized.

Copy link
@dancrossnyc

dancrossnyc Jul 26, 2019

Author

I added that language to try and make this proposal independent of the details of LLVM's implementation.

That's totally great, but the RFC shouldn't state incorrect things about C/C++. ;)

Precisely why I acknowledged my error and removed that wording. :-)

Thanks for updating the RFC!

Regarding the other comment:

Discussion is way easier if you keep the topics within a thread instead of jumping between them.

Indeed. Unfortunately, the Github UI would not allow me to respond to that thread.

@dancrossnyc dancrossnyc force-pushed the dancrossnyc:master branch from 8c5dc1b to a4c2cba Jul 21, 2019

@spunit262

This comment has been minimized.

Copy link

commented Jul 22, 2019

In terms of performance, compilers don't touch volatile accesses anyway (that's their point), so this seems a questionable argument to me.

Which is exactly why these intrinsics are needed. They are a single operation so there's nothing to optimized, but a hand written loop does need to be optimized, but can't as the writes are volatile.

```

Finally, it is important that this proposal not tie the Rust language
to the semantics of any given implementation, such as thos defined by

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 22, 2019

Member
Suggested change
to the semantics of any given implementation, such as thos defined by
to the semantics of any given implementation, such as those defined by

This comment has been minimized.

Copy link
@dancrossnyc
and `write_volatile`: resulting loads and stores cannot be elided, and
the relative order to loads and stores cannot be reordered with respect
to one another, though other operations can be reordered with respect
to volatile operations.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 22, 2019

Member

I am not entirely sure what this long-winding paragraph is supposed to tell me. :) The open questions around volatile are not new, but can't we specify the new operations entirely in terms of the old? That would avoid even having to talk about volatile semantics in this RFC.

This comment has been minimized.

Copy link
@dancrossnyc

dancrossnyc Jul 26, 2019

Author

I'm not entirely sure what you mean. Do you mean that we should define the semantics of these operations to match those of write_volatile and read_volatile? Or that we should mention the LLVM semantics? When I put the proto-RFC on internals, I was asked specifically to avoid tying it to LLVM.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 26, 2019

Member

Do you mean that we should define the semantics of these operations to match those of write_volatile and read_volatile?

That. Volatile accesses are not new to Rust, so this RFC shouldn't have to deal with the complexity of specifying them.

interfaces for the `volatile_copy_memory` or `volatile_set_memory`
intrinsics, as they were "not used often" nor provided in C.
However, when writing low-level code, it is sometimes also useful
to be able to execute volatile copy and set operations.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 22, 2019

Member

This should state, early in the motivation, that volatile copy and set operations are already possible in Rust, but have to be open-coded. The way this is written now, it sounds as if currently there was no way to do volatile copy or set in Rust.

This RFC, if I am understanding correctly, is entirely a "convenience"-RFC: it's about adding APIs to libcore that could already be implemented in a user library. So this should be clear from the start. Then the motivation should argue why this is an API that should move to libcore.

For such APIs it helps tremendously if there is an (ideally already tested and used) implementation in a user crate that it can point to. Even better if you can point at instances where people tried to implement this but got it wrong.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

@spunit262

Which is exactly why these intrinsics are needed. They are a single operation so there's nothing to optimized, but a hand written loop does need to be optimized, but can't as the writes are volatile.

The intrinsic is a single operation in the IR that lowers to lots of assembly, and likely pretty much the same assembly as a hand-coded loop. I see no reason why the intrinsic should perform any better. Without an experiment demonstrating a performance difference between using the intrinsic and using the open-coded version, I do not think you can reasonably claim that there will be any difference.

@KrishnaSannasi

This comment has been minimized.

Copy link

commented Jul 22, 2019

@RalfJung what about unaligned volatile operations, right now {read, write}_volatile assume that the input pointer is aligned, and you would need copy_volatile at least to get unaligned volatile reads and writes and (copy_nonoverlapping_volatile to get efficient unaligned volatile reads and writes).

@rkruppe

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

While it's not properly documented (nothing about these intrinsics is), they do in fact lower to IR that requires the pointers to be aligned. This is also consistent with now the non-volatile equivalents, and pointer functions in general, work.

Furthermore, you can also get a version without alignment requirements (no matter whether it's an intrinsic or implemented in library code) by treating the memory as having n * size_of::<T>() elements of type u8 instead (or MaybeUninit<u8> to be on the safe side w.r.t. uninitialized memory). That does byte-sized accesses instead of T-sized ones (if the latter are possible at all on the hardware), but nothing about volatile_{copy,set}_memory (neither current docs, nor LLVM's docs, nor this RFC) guarantees the access size anyway, so if you're worried about tearing, those functions don't help you anyway.

@spunit262

This comment has been minimized.

Copy link

commented Jul 23, 2019

@RalfJung
I've already looked at the assembly, these intrinsics lower to a call to memset/memcpy/memmove or a small number of loads and stores.

sequence of sending IPIs makes no reference to `proto_sipi_page` and
since the `bytes` field is not visible outside of `new`, this
illustrates a situation in which the compiler _could_ theoretically
elect to elide the copy.

This comment has been minimized.

Copy link
@parched

parched Jul 25, 2019

It seems like the problem isn't with the memcpy, but with sending the IPI. The IPI needs to be a release operation (from the hardware's view and the compiler's). I'm not familiar with x86 and you haven't given the implementation of send_sipi so I don't know if it is already. If it's not, you might just need some sort of barrier between the mempcy and sending the IPI.

This comment has been minimized.

Copy link
@comex

comex Jul 26, 2019

Indeed, the motivating example is flawed.

In this case, copy_proto_page_to_phys_mem is apparently an extern function that's called with a pointer to proto_sipi_page, so the compiler has no choice but to ensure the contents of proto_sipi_page were actually written before the call. The fact that the pointer is first casted to usize doesn't change that – since it's first casted to a raw pointer, Stacked Borrows would still allow copy_proto_page_to_phys_mem to access it. So the example as written is 100% well-defined even without a volatile copy.

On the other hand, if this were done in a manner incompatible with Stacked Borrows, such as (I think?) by transmuting the reference to usize, then making the copy volatile would not help.

For one thing, the example code does not copy directly from INIT_CODE to proto_sipi_page. It copies to bytes, a local variable in the new function; that variable is then moved into a SIPIPage, which is a memcpy, and then moved to the caller, which is another memcpy. If you switched to a volatile copy, you'd be ensuring the data was actually written to some stack allocation representing bytes... but not that bytes was actually copied to proto_sipi_page!

This distinction not just theoretical but, unfortunately, real. If you compile the example in debug mode, the assembly emitted for SIPIPage::new contains a call to core::intrinsics::copy followed by two compiler-inserted calls to memcpy. Ideally LLVM's optimizer would be able to elide both of those memcpys, but if you compile in release mode, it actually only elides one of them. There is still an intermediate copy between bytes and proto_sipi_page.

You could fix the example in practice by directly copying from INIT_CODE to proto_sipi_page. But that wouldn't fix the UB in theory (that would result if you violated Stacked Borrows; as I said, the example as written is fine). If the optimizer for some reason thought that proto_sipi_page was unused, it would be free to, e.g., reuse the stack slot for other data, even if you had previously made volatile writes into it.

More generally, I'm not sure there's any way to redeem the example to actually require volatile memcpy. If the Rust code mapped the physical memory and copied the data directly to that mapping, then you'd finally be at a point where release operations matter (as mentioned by @parched). But those are easily obtained. If the implementation send_sipi is an extern function, calling it is a compiler release operation). If it's based on asm!, you would have to mark the assembly sequence with the memory clobber (to indicate that, in the words of the GCC manual, it "performs memory reads or writes to items other than those listed in the input and output operands"), which is also a compiler release operation. The one catch is that you'd have to make sure you didn't hold a mutable borrow of the data across the call, since then it would be UB for the function or asm block to read from it. However, using volatile to write the data in the first place wouldn't fix that UB. At best it would make it slightly harder for the compiler to exploit.

I'm happy to see volatile memcpy stabilized, but it seems more useful for something like IPC.

This comment has been minimized.

Copy link
@dancrossnyc

dancrossnyc Jul 26, 2019

Author

This is why I dislike contrived code examples in places like this. :-)

What do you mean about utility for IPC, precisely? As in shared-memory message passing or something like that?

This comment has been minimized.

Copy link
@comex

comex Jul 26, 2019

Yes. Caveat: In this recent thread, there was no real consensus on whether or not using volatile for shared memory in Rust is recommended - or even legal. But it is what C++ people recommend. If volatile is suited for shared memory, then volatile_copy would be an ideal way to safely copy large buffers into or out of shared memory.

Dan Cross
Stabilize volatile copy and set functions
Stabilize the `volatile_copy_memory`, `volatile_copy_nonoverlapping_memory`
and `volatile_set_memory` intrinsics as `ptr::copy_volatile`,
`ptr::copy_nonoverlapping_volatile` and `ptr::write_bytes_volatile`,
respectively.

Signed-off-by: Dan Cross <cross@gajendra.net>

@dancrossnyc dancrossnyc force-pushed the dancrossnyc:master branch from a4c2cba to 71e99e2 Jul 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.