Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

Closed
wants to merge 42 commits into from
Closed

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

wants to merge 42 commits into from

Conversation

HadrienG2
Copy link

@HadrienG2 HadrienG2 commented Oct 16, 2019

Basically my proposal to move forward on volatile support in Rust, by interfacing LLVM's atomic volatile which I consider to be a better match for volatile's intended "defer to hardware memory model" semantics than today's ptr::[read|write]_volatile in every single respect.

Emerges from the #152 discussion, and a step toward resolving it, but will not fully resolve that problem because as far as I know, that's impossible without LLVM involvement.

Having this RFC reviewed here is intended as a way to weed out obvious problems in a small and focused group before going for the broader audience of a pre-RFC on IRLO.

Rendered

@HadrienG2 HadrienG2 changed the title Interfacing LLVM's "atomic volatile" in Rust RFC: Interfacing LLVM's "atomic volatile" in Rust Oct 16, 2019
Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads really nicely. Good job!

rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved
@Amanieu
Copy link
Member

Amanieu commented Oct 17, 2019

In the case of shared memory, there is no reason why volatile accesses should be limited to primitive types. In fact, it is very common to want to read/write an entire struct in the shared memory. With this API you would have to read/write each field of the struct individually, which makes the code more complicated for no benefit.

Shared memory tends to have two components:

  • Control signals, which acts as a sort of "lock" and indicate if a process is currently reading/writing to a portion of the shared memory. This is generally done using atomic types such as AtomicUsize. Note that there is no need for volatile atomics here, plain atomics are fine.
  • Data, access to which is governed by the control signals. If the memory is shared between trusted processes then we just use plain loads and stores since we "know" that the other process isn't writing to data that we may be concurrently reading (or vice versa). However if the processes are untrusted then we have to use volatile accesses in case the other process is concurrently writing to data that we are accessing (whether reading or writing). Volatile writes allow us to "fire and forget": it's the other process's problem if it corrupts the data we are sending it. Volatile reads allow us to read a "snapshot" of the data in shared memory, which we can then validate before processing it.

At no point do we ever need volatile atomics. In fact, volatile atomics would only be useful for theoretical MMIO which must be interacted with atomic instructions instead of normal loads and stores, but AFAIK no such hardware exists.

Also, memory orderings don't really make sense in the context of MMIO. It is sometimes necessary to ensure ordering of writes to MMIO, but the fence instructions emitted by atomics will only ensure synchronization with other CPU cores. A specialized set of fence instructions are needed to ensure synchronization with other memory-mapped hardware (on ARM this is the difference between DMB ISH and DMB SY).


1. Concurrent use of volatile memory operations on a given memory location is
considered to be Undefined Behavior, and may therefore result in unintended
compilation output if detected by the optimizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this definition of volatile accesses: data races with volatile operations do not create data races (assuming both sides of the concurrent access are volatile), it just produces garbage data (i.e. whatever happened to be in that memory location at the time). This is what is meant by volatiles acting just like a load or store instruction.

I'll accept that my interpretation is controversial, but this is how compilers implement volatile because it matches how all of the existing C/C++ code uses volatile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a definition, it just documents that we currently do not guarantee the behavior of programs where this happens, which is true, because our read_volatile/write_volatile functions are implemented on top of LLVM, and LLVM does not provide this guarantee.

I think everybody agrees that everything would be much better if LLVM were to guarantee this, but we opened a documentation bug and it doesn't look like LLVM will be guaranteeing this any time soon: https://bugs.llvm.org/show_bug.cgi?id=42435

Copy link
Author

@HadrienG2 HadrienG2 Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, LLVM actively warn that they consider concurrent volatile accesses to be data races in their atomics documentation:

The rule is essentially that all memory accessed with basic loads and stores by multiple threads should be protected by a lock or other synchronization; otherwise, you are likely to run into undefined behavior. If your frontend is for a “safe” language like Java, use Unordered to load and store any shared variable. Note that NotAtomic volatile loads and stores are not properly atomic; do not try to use them as a substitute. (Per the C/C++ standards, volatile does provide some limited guarantees around asynchronous signals, but atomics are generally a better solution.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll accept that my interpretation is controversial, but this is how compilers implement volatile because it matches how all of the existing C/C++ code uses volatile.

We tried to convince LLVM to make this their official docs, and they refused. So, while what you say is true for current LLVM, this may change any time and LLVM does not seem willing to commit to a proper guarantee here. Until LLVM changes their stanza, our hands are bound.

@asajeffrey
Copy link

cc @elast0ny

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 17, 2019

In the case of shared memory, there is no reason why volatile accesses should be limited to primitive types. In fact, it is very common to want to read/write an entire struct in the shared memory. With this API you would have to read/write each field of the struct individually, which makes the code more complicated for no benefit.

This is definitely a case where the MMIO and shared memory use cases are at odds with each other. In principle, the shared memory use case can be handled on top of primitives for MMIO (guaranteed absence of tearing) by building a memcpy operation on top of those. But in practice, I could accept this as a motivation for keeping ptr::[read|write]_volatile around.

Shared memory tends to have two components:

  • Control signals, which acts as a sort of "lock" and indicate if a process is currently reading/writing to a portion of the shared memory. This is generally done using atomic types such as AtomicUsize. Note that there is no need for volatile atomics here, plain atomics are fine.

I believe that this assertion is not generally true.

The single most central assumption of the Rust Abstract Machine, which almost every single compiler optimization relies on for correctness, is that the contents of memory can only change when the program allows it to.

Atomics are not exempt from this rule. They simply ask rustc to consider the possibility of interaction between threads instead of restricting the analysis to the current thread. But even with atomics, it should be okay for rustc to optimize this program...

use std::sync::atomic::{AtomicBool, Ordering};

static x: AtomicBool = AtomicBool::new(true);

fn main() {
    x.store(true, Ordering::Relaxed);
    println!("{}", x.load(Ordering::Relaxed));
}

...into this one...

use std::sync::atomic::AtomicBool;

static x: AtomicBool = AtomicBool::new(true);

fn main() {
    println!("{}", true);
}

...and indeed, rustc already performs this sort of optimizations today when presented with a very simple program like the one above. For sure, today, the optimizations will go away as soon as your program becomes just a tad bit more complex. But there's no fundamental reason why that is the case, it's just a limitation of the optimizer's current program analysis capabilities.

Future-proofing against this is why we need atomic volatile operations in shared memory scenarios. To tell the compiler that its knowledge of memory accesses in a certain region, and of their visibility to the outside world, is not complete, even if there's no obvious side-effect going on, and that a conservative stance towards optimization must be adopted. Because what rustc's optimizer does on toy programs today, it may do on non-toy programs in the future...

  • Data, access to which is governed by the control signals. If the memory is shared between trusted processes then we just use plain loads and stores since we "know" that the other process isn't writing to data that we may be concurrently reading (or vice versa). However if the processes are untrusted then we have to use volatile accesses in case the other process is concurrently writing to data that we are accessing (whether reading or writing). Volatile writes allow us to "fire and forget": it's the other process's problem if it corrupts the data we are sending it. Volatile reads allow us to read a "snapshot" of the data in shared memory, which we can then validate before processing it.

Using plain loads and stores for interprocess communication is subtly incorrect for the same reason that using non-volatile atomics is:

Unless you get lucky with the aforementioned poor ability of rustc and LLVM to optimize atomics-based code, these effects are likely to break shared memory communication based on plain loads and stores. The underlying issue is that the compiler's model of an isolated address space is incorrect in the case of shared memory, and must be amended by disabling the kind of optimization described above through use of volatile.

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

At no point do we ever need volatile atomics. In fact, volatile atomics would only be useful for theoretical MMIO which must be interacted with atomic instructions instead of normal loads and stores, but AFAIK no such hardware exists.

As explained by this RFC, volatile atomics bring two very useful properties with respect to current volatile operations even if the only thing that one uses them for are Relaxed loads and stores:

  • They have well-defined data race semantics, just like hardware loads and stores. This property is something which everyone expects of volatile, but which non-atomic volatile doesn't actually guarantee. It is more relevant to the shared memory use case than it is to the MMIO use case.
  • They are guaranteed not to be translated into multiple hardware instructions. Trying to do a volatile load which is too large for hardware is a compilation error. This latter property is very useful for catching MMIO errors at compile time.

Also, memory orderings don't really make sense in the context of MMIO. It is sometimes necessary to ensure ordering of writes to MMIO, but the fence instructions emitted by atomics will only ensure synchronization with other CPU cores. A specialized set of fence instructions are needed to ensure synchronization with other memory-mapped hardware (on ARM this is the difference between DMB ISH and DMB SY).

Indeed, orderings stronger than Relaxed are not generally enough for MMIO. As you correctly point out, different synchronization mechanisms are required by some CPU architectures when communicating with memory-mapped hardware. Where the stronger orderings become useful is in the shared memory use case, where as previously explained using non-volatile atomic operations are not strong enough, and at risk of breaking in the future as rustc's optimizer becomes smarter about whole-program analysis.

@RalfJung
Copy link
Member

RalfJung commented Oct 17, 2019

This is definitely a case where the MMIO and shared memory use cases are at odds with each other. In principle, the shared memory use case can be handled on top of primitives for MMIO (guaranteed absence of tearing) by building a memcpy operation on top of those. But in practice, I could accept this as a motivation for keeping ptr::[read|write]_volatile around.

IMO ptr::[read|write]_volatile should be implemented via "elementwise volatile copy", similar to rust-lang/compiler-builtins#311.

This avoids relying on LLVM to do the tearing for us, meaning we can quite clearly document what is happening.

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

I don't agree with the "which is why" part here. seqlocks shouldn't use any volatile, period -- that's what I said above, so I don't know how that got construed into an argument in support of volatile. ;)

Atomic volatile are useful for shared-memory concurrency, to answer questions like "what happens on a data race". They are more well-behaved that non-atomic volatile because their behavior is better specified. I'd love if we could get rid of non-atomic volatile as that would side-step a lot of torny questions, but alas, some hardware disagrees... but I feel it would be fine for such hardware to have to jump through hoops and not use the normal convenient APIs.

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 17, 2019

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

I don't agree with the "which is why" part here. seqlocks shouldn't use any volatile, period -- that's what I said above, so I don't know how that got construed into an argument in support of volatile. ;)

Bad wording, I suspect. This was intended to respond to @Amanieu's repeated assertions that using volatile to handle data races is okay because it would just return garbage torn data.

Volatile and atomic are orthogonal. You need volatile when your memory accesses are visible to someone else (or conversely you are seeing someone else's memory accesses), and you need atomics when one thread is racing with another. If you are racing with a foreign process in shared memory, you need both. Ergo, atomic volatile is useful.

I'd love if we could get rid of non-atomic volatile as that would side-step a lot of torny questions, but alas, some hardware disagrees... but I feel it would be fine for such hardware to have to jump through hoops and not use the normal convenient APIs.

You may like the section where I mention that we could actually support more hardware with atomic volatile if we expanded our model of atomic operations beyond global cache coherence (aka Relaxed).

@Amanieu
Copy link
Member

Amanieu commented Oct 17, 2019

The single most central assumption of the Rust Abstract Machine, which almost every single compiler optimization relies on for correctness, is that the contents of memory can only change when the program allows it to.

This doesn't sound right. If that were true then any memory that interacts with a system call would need to be accessed with volatile, because anything that happens in the kernel is outside the Rust Abstract Machine. Consider the case of a buffer being filled from a file: the compiler clearly cannot assume that the contents of the buffer remain the same across read calls, even though there is nothing in the program modifying the contents of that buffer.

But even with atomics, it should be okay for rustc to optimize this program...

rustc can optimize this program because it can prove that the address of x is never escapes the current program. This would not be the case if the AtomicBool came from a shared memory region that was allocated with a system call. In that situation the compiler cannot assume that it has exclusive access to the memory region and therefore cannot optimize the store away.

Again, you are misunderstanding the reason the compiler is allowed to perform these optimizations. This is because there is no synchronization operation (i.e. an atomic operation or a fence) between the loads/stores, so it is allowed to assume the memory contents have not changed. However any shared data structure (whether intra-process or in shared memory) will use atomic operations on the control signals to control access to data, which prevents merging the loads/stores across synchronization operations.

@comex
Copy link

comex commented Nov 23, 2019

Now, I will readily admit that I do not know enough about compiler optimizers to precisely evaluate whether the above wish is merely ridiculous or downright insane.

Not impossible, but likely very unreliable in practice, at least in the general case where you have some arbitrary data structures containing Arcs somewhere in them. It would only take one function call the compiler couldn't track (e.g. to a trait object), where an Arc field is transitively reachable from any parameter, to kill the optimization – for all objects which are ever assigned to that field, or which the compiler can't prove aren't (so in practice, probably all objects of the same type).

In the future, it might be possible to relax this constraint with suitable annotations on function signatures, either manual or (preferably) automatically inserted by the compiler during analysis.
This will never work in the presence of annotation-free opaque FFI, like syscalls, where the compiler must assume the worst... unless we can somehow bound what FFI is allowed to do without requesting special permission, much like LLVM bounds what volatile is allowed to do even if it's supposed to be a "defer to hardware semantics and all bets are off" optimization barrier.

First, I don't know why you'd be passing an Arc, or a pointer to a structure containing one, to a system call in the first place.

Second, I don't see why you'd want the default for FFI functions to be more conservative than the default for non-FFI functions. In your scenario, if you had a Rust function where:

  • a parameter was an Arc, or a reference to a structure containing one, etc.;
  • the compiler (for whatever reason) couldn't automatically insert a "single-threaded only" annotation, e.g. due to trait object dynamism; and
  • the user didn't manually add such an annotation;

the compiler would have to assume that it was multithreaded. The same should apply to FFI functions with pointer parameters. If anything, FFI code is more likely to do "weird things" than Rust code, not less.

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

@HadrienG2
Copy link
Author

HadrienG2 commented Nov 23, 2019

Not impossible, but likely very unreliable in practice, at least in the general case where you have some arbitrary data structures containing Arcs somewhere in them. It would only take one function call the compiler couldn't track (e.g. to a trait object), where an Arc field is transitively reachable from any parameter, to kill the optimization – for all objects which are ever assigned to that field, or which the compiler can't prove aren't (so in practice, probably all objects of the same type).

Indeed, making this work reliably on trait objects would probably require either of...

  • Manual annotations which forbid implementations of a trait to do certain things.
  • Semi-automated proof that all implementations of a trait are visible and don't do these things (would probably require some sort of language-level sealed trait support).
  • Compiler proof that only certain implementations of a trait with the right properties may be used as a trait object in this particular part of the code (seems close to the devirtualization problem in C++, which doesn't bode well for practical applicability).

I guess of those three, a compiler understanding of sealed traits would be option that's closest to being actually feasible in the medium term... and it definitely wouldn't resolve the general problem, but only solve the problem for sealed traits.


First, I don't know why you'd be passing an Arc, or a pointer to a structure containing one, to a system call in the first place.

For Arc I'm not sure if I can think of a good example, but for the others I could totally see myself dumping a struct with an AtomicXyz or Mutex member to some kind of I/O interface...

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

Fair enough, that's also what I remembered the #215 conclusion was. If we accept it, then we go back to the "we don't really have a use for atomic volatile beyond atomic loads and stores" starting point above. To answer your underlying question about why I find this conclusion somewhat unsatisfying (but probably not so unsatisfying as to feel like relitigating it too much)...

Second, I don't see why you'd want the default for FFI functions to be more conservative than the default for non-FFI functions.

FFI is special in how opaque it is. With Rust code, as it's currently compiled at least, the compiler has a lot of freedom to look through the code of the crate that is being compiled and its dependencies and to infer many useful properties from it, whereas with FFI we only get to do that in very specific compilation scenarios (cross-language LTO) and at a very low level (LLVM only).

This means that in the FFI case, how we set the default interface contract is very important, because often it's the only interface contract we're going to get. Unlike in Rust, where the compiler can potentially infer a lot more than what the developer told him in the code.

You could reasonably argue, however, that writing FFI bindings is already so much work that it wouldn't hurt too much to ask performance-conscious people to add "may not do X or Y" manual annotations to the FFI bindings which are used in the hot parts of their code.

(Note also that there is some argument in there for being careful if we ever try to stabilize some form of Rust ABI for e.g. dynamic linking, as it will have mostly the same problems as C FFI if done in the most straightforward way of mapping functions to ELF/PE symbols without any extra metadata.)


By the way, if I didn't manage to convince you that i'm insane yet...

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

Would this allow things like excluding memory which could not have possibly been touched by any other thread from Acquire and Release barriers, occasionally to the point of eliding those barriers altogether if they can't possibly have an observable effect on program behavior? 😋

That intuitively seems even more far-fetched than eliding atomics in single-threaded code, but if doable, it could be a way to resolve the longstanding Great Atomics Ordering expert debate by making the "safe choice" of using SeqCst no more costly than the "explicit/fast choice" of using the simplest Ordering that fits the application's constraints.

@HadrienG2
Copy link
Author

HadrienG2 commented Dec 8, 2019

Alright, I'm finally done giving this a good read and updating it with respect to the outcomes of the recent discussion with @RalfJung . In the end, the magnitude of the changes made this pretty much a full rewrite, which together with shortage of my volunteer time is why it took so much time.

Things that I'm still wondering about:

To answer the latter question, I could use the opinion of people who implemented compiler backends for those kind of targets. Any suggestion of a good place to ask the question?

@RalfJung
Copy link
Member

To answer the latter question, I could use the opinion of people who implemented compiler backends for those kind of targets. Any suggestion of a good place to ask the question?

I'd start on IRLO.

@comex
Copy link

comex commented Dec 27, 2019

(Replying to @RalfJung's comment which GitHub is now hiding under a "show resolved" button...)

I could also totally imagine an MMIO situation where a concurrent data structure (say, a concurrent ring buffer) is shared between the CPU and an external device, and atomic MMIO is needed for coordination between the two. Or is that not needed because CPUs treat MMIO specially and the caching effects (that usually mandate synchronization and fences even on the CPU level) don't show up there?

Overly long answer:

Yeah, it’s common to have a sequence like:

  1. Write some data to a buffer in RAM
  2. Write to an MMIO register to tell a device to start the operation
  3. Device starts reading from that buffer using DMA

You often do need some kind of synchronization between steps 1 and 2, but it’s dependent on the hardware and how you’ve configured it. It might be any of:

  • Nothing at all (x86, maybe, sometimes)
  • The same type of memory barrier used for inter-CPU synchronization
  • A stronger type of barrier (e.g. ARM’s Inner Shareable vs. Outer Shareable distinction)
  • Flushing the cache (on systems where DMA ignores the cache and accesses the underlying RAM directly)

Traditionally the synchronization is done as a separate operation; prepping for DMA is usually considered a more heavyweight activity that doesn’t need to be optimized as tightly as atomics. But, for example, ARMv8 has a dedicated store-release instruction, and AFAIK it can provide the necessary guarantees for this, depending on how the memory is configured. In other words, it’s a use case where ‘volatile atomic release store’ would have meaningful semantics, and where you can’t get the same instruction sequence a different way (as opposed to architectures where the assembly for store-release is just a barrier followed by a normal store). It’s just… not really necessary and not usually done. But it’s at least plausible that some future system would have a stronger demand for it.

@HadrienG2
Copy link
Author

I opened an IRLO thread to discuss the remaining issue of applicability of unordered atomics to super-weak memory models (language virtual machines, GPUs...)

@HadrienG2
Copy link
Author

HadrienG2 commented Jan 4, 2020

New concern from the GPU discussion, but which is also relevant on x86 IIRC: some hardware architectures define SIMD loads and stores to be element-wise atomic, not atomic as a whole. If I'm aiming for total deprecation of ptr::volatile_[read|write] as the end goal, I need to figure out how to handle this in the proposed formalism. Maybe I should take some inspiration from LLVM's element-wise atomic memcpy here...

@Diggsey
Copy link

Diggsey commented Jan 5, 2020

some hardware architectures define SIMD loads and stores to be element-wise atomic, not atomic as a whole.

This is what we were discussing here too: #209

@HadrienG2
Copy link
Author

HadrienG2 commented Jan 5, 2020

Ah, yes, thank you for reminding me about this topic and linking to it!

Reading through it again gives me the impression that volatile SIMD loads and stores are a rather tricky beast, because...

  • They are not, and cannot be, available in atomic form at the backend level (it wouldn't make sense, since they can tear), only in non-atomic volatile form.
  • The most frequently discussed fix for exposing their element-wise atomic nature at the language level, which is to handle them as element-wise unordered atomic memcpys and expect LLVM to optimize the memcpy into a SIMD load, does not work for volatile accesses.
    • At the LLVM level, the @llvm.memcpy.element.unordered.atomic intrinsic does not have an isvolatile parameter like the base @llvm.memcpy intrinsic.
    • Even if it did have a volatile variant, optimization of e.g. *const u8 element-wise atomic memcpys into SIMD loads is the opposite of everything that volatile stands for.

As far as I can tell, our options here are to...

  1. Keep non-atomic volatile read/write operations around and use them for SIMD types, since we don't have anything better available. Hopefully they'll keep compiling into something sensible even in multi-threaded environments, as they do today even if no written contract mandates that.
  2. Implement volatile SIMD load/stores in terms of a loop of other atomic volatile operations, knowing that the generated code will be much slower than a hardware SIMD load or store operation and that it goes against the user desire of generating the right load/store instruction.
  3. Somehow integrate the hardware notion of "well-defined data race that can tear" into the language memory model, as an alternative to atomics for well-defined synchronization, and use that for SIMD volatile loads & stores.
  4. Not support volatile SIMD loads and stores at all.

@HadrienG2
Copy link
Author

HadrienG2 commented Jan 6, 2020

So, I've been having a very interesting chat with @mcy, who had some reservations about the currently proposed API. Here's my biased and incomplete summary.

First of all, the concerns:

  • At a conceptual level, this API feels wrong because it features volatile objects (if only as an abstract concept that can only be manipulated behind a pointer) rather than volatile accesses only (which are generally agreed to be the best mental model for volatile).
  • At a practical level, this API feels unpleasant because it doesn't deal very well with the classic MMIO scenario of having a whole *mut [u8] memory-mapped I/O range full of memory-mapped registers with a hardware-specific layout.

We felt that it should be possible to do better on these fronts by trying to stay closer to the original design of <*mut T>::[read|write]_volatile(), that VolatileT deviated a bit too much from this just for the sake of expressing that T can be read and written atomically.


With this in mind, a possible alternative API design would be to use a trait to express volatile operations...

// This trait is unsafe to implement, because implementors must guarantee that the
// operations are indeed atomic and volatile, with semantics as described by this RFC.
unsafe trait VolatileLoadStore {
    // This operation is unsafe to use because the pointer might not be valid
    unsafe fn load_volatile(self: *const Self) -> Self;

    // This operation is unsafe to use because the pointer might not be valid
    unsafe fn store_volatile(self: *mut Self, val: Self);
}

...and then we would implement that trait for every type which has native machine load and store operations.

Speaking personally, I do like this design direction better than the current design, but that's another major RFC rewrite so I wanted to test waters here before changing all the text.


Here's my quick biased evaluation.

Advantages:

  • Simpler abstractions, it's just vanilla use of traits without weird wrapper types.
  • Not based on NonNull, so messing around with pointers in MMIO regions is easier.
  • No notion of atomic objects, it's atomic accesses all the way down.
  • Trait-based formalism is more elegant than one VolatileXyz per Xyz machine type.

Drawbacks:

  • Naming conflict with ptr::[read|write]_volatile(), hard to find a good way to differentiate between the atomic version and the non-atomic version without making things confusing. @mcy was thinking about [read|write]_volatile_once(), but I'm not sure if this convention would be very clear outside of those few people who ever dealt with Linux's internal synchronization primitives. The above [load|store] vs [read|write] proposal is not great either.
  • Strong typing barrier between volatile and non-volatile accesses must now be provided by external crates instead of being a built-in core/std feature.

Neutral:

  • Does not resolve the arbitrary_self_type issue.
  • Does not resolve the dependency on raw pointers.
  • No impact on semantics.
  • Compatible with all envisioned extensions of the VolatileT concept.

@HadrienG2
Copy link
Author

HadrienG2 commented Feb 27, 2020

While I do not have that much time to dedicate to this RFC at the moment, I just wanted to mention that the ongoing discussion about non-dereferenceable references is extremely relevant to its API design.

@HadrienG2
Copy link
Author

Alright, 2-3 months later I think it's fair to say that I will not manage to finish this one myself anytime soon, as life got busy with too many other things.

I hope that the notes which I left at the end of this PR thread will be enough to help anyone interested in pushing this forward into taking things over from where I left them. Otherwise, feel free to ask me any question about whatever seems unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants