Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Linux kernel memory model #348

Open
DemiMarie opened this issue Jul 7, 2022 · 22 comments
Open

Support for Linux kernel memory model #348

DemiMarie opened this issue Jul 7, 2022 · 22 comments

Comments

@DemiMarie
Copy link

Rust should support the Linux kernel memory model in addition to the C/C++ memory model. This means that all of the primitives provided and used by Linux should be able to be implemented efficiently, and that practices used by the Linux kernel (such as using non-atomic reads/writes + barriers instead of atomic operations) should not be UB.

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2022

I don't think I agree. Or more specifically, to my knowledge, there does not exist something that I would call a Linux kernel memory model -- as in, some sort of definition that takes as input a program source code and produces as output a portable set of allowed behaviors of that program (including whether it has UB).

The LKMM that I have seen is a bunch of rules that you have to follow to "get what you want", and is basically defined by "whatever the target hardware does". That works a lot better than I would ever have expected, but I don't see a way to extract from that any kind of operational or even axiomatic semantics. It should be possible in principle to actually prove that a Rust compiler is correct, and that is AFAIK not possible with the LKMM since there is no suitable spec to prove anything against.

practices used by the Linux kernel (such as using non-atomic reads/writes + barriers instead of atomic operations) should not be UB

I strongly disagree with this one. All those atomic accesses were introduced for a reason. Linux gets away with this only because it pre-dates the C++ model and now it can force GCC to keep supporting whatever it is that it relies on. But I don't even know of a precise definition that one could use to judge whether GCC is a correct compiler for the LKMM or not, other than "ask Paul McKenny".

To be clear, what Linux and specifically Paul are doing is quite amazing -- but we have memory models now that can actually be defined formally and provably have properties we want (such as enabling a whole lot of optimizations on non-atomic accesses and allowing efficient compilation to real hardware). The LKMM does not meet these criteria, as far as I know. It is probably broken under various optimizations that the C++ memory model lets us do, and that Rust hence should do for the sake of code that works properly against that model. (For example reordering a release fence followed by a non-atomic write.)

@DemiMarie
Copy link
Author

I don't think I agree. Or more specifically, to my knowledge, there does not exist something that I would call a Linux kernel memory model -- as in, some sort of definition that takes as input a program source code and produces as output a portable set of allowed behaviors of that program (including whether it has UB).

That is a valid point, but I still think there is some value in what the LKMM provides. Specifically, it should be possible to translate code that uses the LKMM to code that follows Rust’s memory model, and to do so without a performance penalty. Right now, there is no way to do so in at least some cases, such as rcu_dereference().

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2022

Now, given that Rust is on its way into the Linux kernel, clearly something has to happen. But I feel strongly that that something is not "officially support the LKMM". I think it should be more on the line of what happens in C: begrudgingly accept that Linux is Too Big To Break, and provide best-effort support. This may entail a special flag to change compiler behavior so that we don't have to pessimize programs that use the official model. Or maybe there are other ways. But I think we want to do whatever we can to ensure that Linux remains the only program using this memory model.

That is a valid point, but I still think there is some value in what the LKMM provides. Specifically, it should be possible to translate code that uses the LKMM to code that follows Rust’s memory model, and to do so without a performance penalty. Right now, there is no way to do so in at least some cases, such as rcu_dereference().

I will absolutely accept that the C++ memory model has its flaws. :) And I am not terribly surprised to hear that it has gaps, where some idioms you want to express, cannot be expressed. I am all for better understanding those gaps and working towards fixing them.
We even have an example of that: seqlocks cannot be written in the current model, but hopefully they can in the future once we have bytewise atomic memcpy. I don't know what exactly needs to happen for full RCU support, but I would hope it's not "overhaul the entire model".

@fbq
Copy link

fbq commented Aug 17, 2022

Hi, my name is Boqun Feng, and I'm one of the maintainers of Linux Kernel Memory Model. And since I'm also working on the Rust-for-Linux project, I'm so happy to see efforts on making Rust's memory model work with Linux kernel development. Yes, I agree with @RalfJung, "officially supporting the LKMM" is not something I want from Rust. I'm more interested in whether our existing idioms can be expressed correctly and efficiently (if any case, elegantly ;-)) in Rust, and I expect that in the process something will happen to Rust (introducing new APIs, e.g. core::arch::load, clarification on corner cases, better documentation, etc.). Likewise something will also happen to Linux on both C and Rust side, e.g. adaptation on more compiler-friendly (or optimization-friendly) ways for parallel programming code. It should be beneficial to both Rust and Linux kernel.

Currently, one of the biggest concern I receive from other kernel developers is in Rust reference:

Rust does not yet have a defined memory model. Various academics and industry professionals are working on various proposals, but for now, this is an under-defined place in the language.

Because they are afraid that Rust still doesn't have the memory (ordering) model... So besides say hi, I also want to ask what I can show to the kernel developers about the status and progress on Rust memory model formalization? My guess is that it will eventually happen in Minirust if I understand the purpose of that project. Btw, I like that project, and I'm going to keep an eye on that project ;-)

A few comments below:

Or more specifically, to my knowledge, there does not exist something that I would call a Linux kernel memory model -- as in, some sort of definition that takes as input a program source code and produces as output a portable set of allowed behaviors of that program (including whether it has UB).

Not sure it fulfils your expectation, but we do support to check a small piece of code with herd, you can find the rules of the checking and samples in tools/memory-model of the Linux kernel source.

The LKMM that I have seen is a bunch of rules that you have to follow to "get what you want", and is basically defined by "whatever the target hardware does". That works a lot better than I would ever have expected, but I don't see a way to extract from that any kind of operational or even axiomatic semantics. It should be possible in principle to actually prove that a Rust compiler is correct, and that is AFAIK not possible with the LKMM since there is no suitable spec to prove anything against.

We do have "AN OPERATIONAL MODEL" section in our tools/memory-model/Documentation/explanation.txt, but I guess it's just Alan's brief idea on the operational model. But given we define the formal check rules in herd, so our operational model should be an extension (or specification) for the operational model used by herd in the paper

I don't know what exactly needs to happen for full RCU support, but I would hope it's not "overhaul the entire model".

Below immediately comes to my mind:

  • RCU list or other data structures can read a field of pointers while other modify them. So core::arch::load may be needed. That said, I can't think of a problem if we make all the pointers AtomicPtr. It will need some coding to figure out.

  • RCU's rcu_dereference is a consume atomic load as @DemiMarie mentioned. Actually, in Linux, all volatile atomic loads (i.e. READ_ONCE) provides address dependencies. And I'm not sure how to express that in Rust.

  • rcu_read_lock and rcu_read_unlock imply compiler barriers, because they need to form a critical section so that compiler cannot reorder the accesses from the critical section to outside. And Does the concept of a compiler fence make any sense? #347 shows that compiler barriers are at least hard to model if I understand correctly.

Anyway, I'm sure there are a lot "interesting" places left to discover. Paul also wrote a blog series about "cases and opportunities", and it's worth reading.

Just say hi and if there is any question about LKMM or Linux kernel overall, feel free to ask!

@DemiMarie
Copy link
Author

@fbq So a few questions:

  1. Would core::arch::{load, store, load_unaligned, store_unaligned} be helpful? Kernel writers are definitely one of the intended targets for these functions, but I have never done anything with MMIO, so I am not the right person to answer that question.
  2. Does Linux basically need a way to prevent a data dependency from turning into a control dependency?
  3. How should the various places that rely on ordinary stores/loads and barriers be handled?

@DemiMarie
Copy link
Author

To elaborate on the last one: A pattern that (IIUC) is common in kernel code is a bunch of ordinary stores followed by a barrier and a final store that publishes everything. The compiler should be free to reorder stores before the barrier, but not to reorder across the barrier.

@fbq
Copy link

fbq commented Aug 18, 2022

@fbq So a few questions:

  1. Would core::arch::{load, store, load_unaligned, store_unaligned} be helpful? Kernel writers are definitely one of the intended targets for these functions, but I have never done anything with MMIO, so I am not the right person to answer that question.

load and load_unaligned are definitely helpful, since at least watchdogs (softlockup, rcu stall warnning) need to read some data and don't care about others writing at the same time.

Also there is always the read-ahead-lock "optimization" as follow:

if (READ_ONCE(shared_a) == SOME_STATE) { // check the state without lock, race is ok.
    spin_lock();
    if (shared_a == SOME_STATE) { // recheck after lock held.
        <do something>
    }
    spin_unlock();
}

for store and store_unaligned, I don't think it's as useful as the load as long as all the code is in Rust. If a write races with other write or a normal/plain read, there is probably something wrong.

However, in Rust-for-Linux, Rust code needs to work with C code. For example a struct may be defined in C and one of its field is defined as int:

struct some_struct {
...
    int foo;
...
}

but all accesses to some_struct are done via READ_ONCE() or WRITE_ONCE(), i.e. atomic operations. Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write, core::arch::store should be used IIUC.

  1. Does Linux basically need a way to prevent a data dependency from turning into a control dependency?

Could you provide an example? I'm not sure how a data dependency can be turned into a control dependency.

  1. How should the various places that rely on ordinary stores/loads and barriers be handled?

If the code fits the acquire+release pattern, then better to use acquire+release. And "ordinary stores followed by a publish store" falls into this category. An interesting case is "published by load":

// all locations are initialized as zeros.

int P0(int *x, int *y, int *a) {
    int r0;

    *x = 1;
    *y = 2;
    smp_mb(); // a full barrier, similar to fence(SeqCst);
    r0 = READ_ONCE(*a); // Load s
}

int P1(int *x, int *y, int *a) {
    int r1;
    int r2;

    WRITE_ONCE(*a, 1); // Store t
    smp_mb();
    r1 = *x;
    r2 = *y;
}

Linux memory model guarantees that if r0 is 0, then r1 == 1 and r2 == 2. Because if Load s fails to read the result of Store t, then it means t propagates later than s executes, and these two smp_mb() order accesses before and after. If we want to use this pattern in Rust, I guess we will have to use atomc::fence() and replace WRITE_ONCE() and READ_ONCE() with atomics.

There is also a paper describing how to use C/C++ atomics and fences to Linux kernel's atomics and fences. How that is helpful.

@fbq
Copy link

fbq commented Aug 18, 2022

Note I'm not that used to MMIO either, but here is what I know about: Linux kernel has two sets of APIs for MMIO:

  • read{b,w,l,q}() and write{b,w,l,q}(): strongly ordered, cannot be reordered with other memory access by compiler or hardware, there are necessary barriers in these API on some architecures.
  • read{b,w,l,q}_relaxed() and write{b,w,l,q}_relaxed(): relaxed ordering, can be reordered with other memory accesses by compiler, but they have volatile semantics, so technically they cannot be reordered with each other by compiler. But I think this is not what we need.

There are also four barriers that work with the _relaxed() for fine-grained ordering control:

  • __io_bw(): before write barrier, prevents a MMIO write reordered with previous memory accesses
  • __io_aw(): after write barrier, prevents a MMIO write reordered with afterwards memory accesses
  • __io_br(): before read barrier, prevents a MMIO read reordered with previous memory accesses
  • __io_ar(): after read barrier, prevents a MMIO read reordered with afterwards memory accesses

All the above are not modeled by LKMM, so their formal semantics is unclear (only informally described in code).

IIUC, core::arch::{load,store} roughly map to _relaxed() version of MMIO operations, which means the barrier and strongly ordered APIs are still missing (or we can implement them with asm!).

@comex
Copy link

comex commented Aug 18, 2022

A few points:

First, keep in mind that you could simply transliterate the C versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust, using ptr::read_volatile/ptr::write_volatile in place of C volatile loads and stores, and asm! in place of C asm blocks. If you do, you'll end up with the same LLVM IR instructions (or GCC equivalent with rustc_codegen_gcc), which will get passed to the same optimizer, and which ultimately will work or not work to the same extent as the C versions.

As you say, there may be room to improve things on both the C and Rust sides, but I think it's important to establish this as a baseline possibility.

  • RCU's rcu_dereference is a consume atomic load as @DemiMarie mentioned. Actually, in Linux, all volatile atomic loads (i.e. READ_ONCE) provides address dependencies. And I'm not sure how to express that in Rust.

You can't. LLVM and GCC are unable to guarantee address dependencies will be respected in the face of arbitrary compiler optimizations. Hence, for C11/C++11 atomics, they fall back to implementing memory_order_consume as memory_order_acquire – that is, they emit a (runtime) barrier so that the code will be correct even if address dependencies are broken. rustc is ultimately passing code through the same optimizers and so cannot do better.

The only guaranteed way to do a dependent read without a barrier would be to write the entire sequence of read followed by dependent read in assembly (they must be together in a single asm! block).

This is all true in both Rust and C. But Linux's C code relies on of address dependencies despite the lack of guarantee, taking advantage of the fact that in practice, GCC and LLVM will only break them in rather pathological cases. Linux's Rust code could do the same, and again, it'll work or not work to the same extent as the C code.

A lack of guarantee may be a concern when it comes to safety. Rust normally attempts to guarantee memory safety for safe code even if that code is pathological. And many use cases for atomics rely on correct ordering for memory safety. So you could imagine a fully-safe function calling another function (in another module) which is safe but with unsafe code inside, where the callee is expected to be inlined; the caller function then includes pathological code designed to trick the optimizer into dropping address dependencies in the inlined copy of the callee.

But then, the same concern may apply to Rust code that calls C functions, especially if cross-language LTO is enabled (in which case C code can be inlined into Rust).

It doesn't seem to me to be worth worrying that much about, but YMMV.

@sdroege
Copy link

sdroege commented Aug 18, 2022

Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write, core::arch::store should be used IIUC.

This seems like something that should probably be solved at the bindgen level and either with annotations in the C code that are understood by bindgen (say, a /* ATOMIC */ comment in the header file next to the struct field), or by separate overrides for bindgen. The former also has the advantage that it works as documentation for C programmers.

@DemiMarie
Copy link
Author

  • RCU's rcu_dereference is a consume atomic load as @DemiMarie mentioned. Actually, in Linux, all volatile atomic loads (i.e. READ_ONCE) provides address dependencies. And I'm not sure how to express that in Rust.

You can't. LLVM and GCC are unable to guarantee address dependencies will be respected in the face of arbitrary compiler optimizations. Hence, for C11/C++11 atomics, they fall back to implementing memory_order_consume as memory_order_acquire – that is, they emit a (runtime) barrier so that the code will be correct even if address dependencies are broken. rustc is ultimately passing code through the same optimizers and so cannot do better.

The only guaranteed way to do a dependent read without a barrier would be to write the entire sequence of read followed by dependent read in assembly (they must be together in a single asm! block).

Has there been progress on this on the LLVM/GCC side? There is a reason that Linux behaves as it does, and that is that doing so is critical for best performance on modern hardware. And no, doing the whole thing in assembler is not an option.

It is worth noting that for Spectre v1 mitigation, it is actually necessary for the compiler to promote control dependencies to address dependencies in many cases.

@bjorn3
Copy link
Member

bjorn3 commented Aug 18, 2022

There is a reason that Linux behaves as it does, and that is that doing so is critical for best performance on modern hardware.

The linux kernel memory model predates C++11. C++11's memory model was designed with compiler optimization in mind, while the linux kernel memory model was designed under the assumption that it is possible to disable certain optimizations in compilers even though neither GCC nor LLVM actually supports this. And even then the linux kernel memory model isn't as efficient as possible on all architectures. Alpha doesn't respect "read-to-read address dependencies" (aka consume atomics still need a fence) while Itanium doesn't guarantee consistent "read-to-read ordering for normal loads from the same variable" according to https://lwn.net/Articles/718628/.

It is worth noting that for Spectre v1 mitigation, it is actually necessary for the compiler to promote control dependencies to address dependencies in many cases.

That happens after all optimizations that break address dependencies are already performed.

@DemiMarie
Copy link
Author

There is a reason that Linux behaves as it does, and that is that doing so is critical for best performance on modern hardware.

The linux kernel memory model predates C++11. C++11's memory model was designed with compiler optimization in mind, while the linux kernel memory model was designed under the assumption that it is possible to disable certain optimizations in compilers even though neither GCC nor LLVM actually supports this.

That is fair, but I am still not aware of a way to get similar performance without exploiting address dependencies.

And even then the linux kernel memory model isn't as efficient as possible on all architectures. Alpha doesn't respect "read-to-read address dependencies" (aka consume atomics still need a fence) while Itanium doesn't guarantee consistent "read-to-read ordering for normal loads from the same variable" according to https://lwn.net/Articles/718628/.

Both of these architectures are pretty much dead.

@fbq
Copy link

fbq commented Aug 18, 2022

A few points:

First, keep in mind that you could simply transliterate the C versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust, using ptr::read_volatile/ptr::write_volatile in place of C volatile loads and stores, and asm! in place of C asm blocks. If you do, you'll end up with the same LLVM IR instructions (or GCC equivalent with rustc_codegen_gcc), which will get passed to the same optimizer, and which ultimately will work or not work to the same extent as the C versions.

As you say, there may be room to improve things on both the C and Rust sides, but I think it's important to establish this as a baseline possibility.

Let's see if I understand your point ;-)

So your point is "whatever you ask from Rust memory model, it must be something supported by LLVM/GCC, because currently you get the same guarantees for C"? If so I must say I expect more:

So LKMM describe the conceptual model that kernel developers have for parallel programming and of course I understand there are a lot of fairy tales behind it, and no real world compiler can guarantee everything required by the model. We are lucky because Linux is too big to break (or compilers have mercy on us), so the model can be used ;-) But it's a model describing programmers' ideas and there are tons of examples using them inside kernel source, so why not look into them and see how to support the idioms (not the model itself) without any fairy tale?

My (maybe juvenile) idea is while Rust is refining its memory model, some idioms of LKMM can be taken into consideration for support, and sometimes it may mean changing the backend (LLVM/GCC). IIUC the LLVM memory model design today is mostly a result of effects from C++11 memory model, so given Linux kernel is probably one of the biggest projects involved in parallel programming, maybe we get a chance to affect LLVM memory model too ;-)

To be clear, by "supporting idioms", I don't mean changing C code lines by lines into Rust, if Rust already have a pattern, we should fit us in.

There must be a way that two groups of smart people (language/compiler people and OS kernel developers) can work together instead of one saying "UB" and the other saying "volatile".

@fbq
Copy link

fbq commented Aug 18, 2022

Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write, core::arch::store should be used IIUC.

This seems like something that should probably be solved at the bindgen level and either with annotations in the C code that are understood by bindgen (say, a /* ATOMIC */ comment in the header file next to the struct field), or by separate overrides for bindgen. The former also has the advantage that it works as documentation for C programmers.

Yeah, that also works. And for Linux kernel, we probably better fix these areas where a int is actually a atomic_t. As I said, I do see the usage of core::arch::load, but I don't know much use for core::arch::store. FWIW, we even have an ASSERT_EXCLUSIVE_WRITER with kcsan to detect writers' race. But other kernel developers may disagree with me...

@fbq
Copy link

fbq commented Aug 19, 2022

There is a reason that Linux behaves as it does, and that is that doing so is critical for best performance on modern hardware.

The linux kernel memory model predates C++11. C++11's memory model was designed with compiler optimization in mind, while the linux kernel memory model was designed under the assumption that it is possible to disable certain optimizations in compilers even though neither GCC nor LLVM actually supports this. And even then the linux kernel memory model isn't as efficient as possible on all architectures. Alpha doesn't respect "read-to-read address dependencies" (aka consume atomics still need a fence) while Itanium doesn't guarantee consistent "read-to-read ordering for normal loads from the same variable" according to https://lwn.net/Articles/718628/.

That's true, we use "stronger than a volatile read" implementation for READ_ONCE() on Alpha and Itanium. But also as @DemiMarie , these are unpopular architectures and there are no new machine with these architectures. Maybe one of the reasons they are deprecate is because of lacking of address dependencies for normal reads? ;-) Joking aside, since you brought up the optimization, do you happen to know any research or benchmarks showing the performance improvement of optimization based on C++11 memory model, e.g. a project switches to C++11 memory model and gain performance improvement? Just curious.

To be clear, kernel developers (including me) love optimization, in any case, we want to enable the optimization instead of disabling it.

I think it's somewhat unfair to say that the Linux kernel memory model is designed under the assumption that we can disable certain optimizations of the compilers. because most of the idioms the LKMM supports come from abstraction of patterns in assembly code: we know x86 provides address dependencies, we know ARM64 provides address dependencies, we know PPC provides address dependencies, etc. So we abstract the idea of address dependencies into a high level. It's simply "if we are going to write similar pattens of assembly code for all architectures, why don't we abstract this pattern into a high-level language?". And abstracting patterns of assembly code is, IIUC, one of the reasons that C was invented and widely used. Disabling the optimization (and assume we can disable) is unfortunately a side effect of this kind of abstraction. If a language memory model can cover the abstraction for these patterns, I think I'm happy to switch to ;-)

@RalfJung
Copy link
Member

Hi @fbq, nice to hear from you! I am super excited about Rust coming to the Linux kernel, so I very much hope we can find something here that works for both sides. :)

Currently, one of the biggest concern I receive from other kernel developers is in Rust reference:

Note that this does not refer to a concurrency memory model, this refers to the sequential part of the memory model. Arguably, C does not have a defined memory model in that sense either -- there is an implicit model hiding in the C spec, but it has not been properly described yet. Some people have tried, but there are still a lot of open questions.

Rust has very high standards for what it calls a defined memory model (basically it has to be mathematically unambiguous), which is why we haven't committed to anything official yet. We are working on it though, and we do hope to form an official "operational semantics team" soon that has this as one of its core objectives. :)

But for the purpose of this thread, we are talking about the concurrency part of the memory model, and there Rust (for now) just does exactly what C and C++ do. So if you consider the C/C++ concurrency memory models defined, then so is the Rust model.

Not sure it fulfils your expectation, but we do support to check a small piece of code with herd, you can find the rules of the checking and samples in tools/memory-model of the Linux kernel source.

Thanks for pointing that out, I was not aware of this! This file does look like the heart of a graph-based memory model indeed. Is there any systematic work on comparing this with the C++ model?

RCU's rcu_dereference is a consume atomic load as @DemiMarie mentioned. Actually, in Linux, all volatile atomic loads (i.e. READ_ONCE) provides address dependencies. And I'm not sure how to express that in Rust.

Yeah, consume is basically unworkable in any high-level language. To my knowledge nobody has managed to come up with a satisfying definition in any of C, C++, LLVM, or Rust. The very notion of a "data dependency" is meaningless when the compiler is allowed to replace "X * 0" by "0", and thus remove dependencies (to give just a simple example).

So, consume-like reasoning can pretty much only be done in inline assembly, and by using the hardware notion of a data dependency.

but all accesses to some_struct are done via READ_ONCE() or WRITE_ONCE(), i.e. atomic operations. Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write,

So the requirement here would be that whatever the Rust side does is ABI compatible with what the C side does.

Honestly you are probably best off by defining your own AtomicI32 and then taking responsibility of ensuring compatibility. The official Rust AtomicI32 type is only guaranteed to interact well with code that follows the C++ memory model, and it seems hard to guarantee that this will be compatible with the Linux kernel memory model.

There must be a way that two groups of smart people (language/compiler people and OS kernel developers) can work together instead of one saying "UB" and the other saying "volatile".

I surely hope there is such a way. :) But Rust is currently quite far away from attempting to define its own concurrency memory model. We don't have the expertise for that, and we are busy specifying other parts of a language (a we put a higher standard to what we consider a 'specification' than what C/C++ do).

And specifically consume has some very fundamental problems where there haven't even been proposals for how to overcome them. As a start, someone would need to figure out how to specify 'consume' without uttering the words 'data dependency'. That seems hard. ;)

(I think it will involve some explicit annotations, where you give names to statements and say "ensure that that statement is suitably ordered with that other statement".)

So I am curious now what the goal of this thread here is:

  • Do we 'just' want to ensure that Rust can be used in the Linux kernel as reliably as C can? Then I think we should follow the existing approach of the kernel violating the language spec, and rely on the real-world evidence that this happens to work in practice. This should use exactly the same idioms as what C uses, generating the same LLVM IR, to ensure that if this breaks, that means LLVM is also breaking the C side of the Linux kernel, which I assume they will be reluctant to do even if their specification allows it. I assume the same will probably happen for rustc itself, if it ever starts to optimize atomics. I am not happy with declaring a spec-violating program as "too big to fail", but with the current state of the art, I also do not know a better approach. After all, Linux is extremely systematic in just how it violates the spec, and it has a real-world need that you claim cannot be satisfied in the realm of the spec, so as long as we ensure we have a good line of communication to diagnose any problem that comes up here, I think this can be made to work. It works for C, after all.
  • Or do we want to go well beyond what happens in C, and actually define a memory model that both has the performance characteristics required by Linux, and can be defined precisely as a language-level graph-based (or ideally operational) concurrency memory model? This is a monumental task and a large open problem, and should not be on the list of blocking issues for anything that needs to get done in the next 5 years. ;)

And abstracting patterns of assembly code is, IIUC, one of the reasons that C was invented and widely used.

Sadly, the C of today is unsuited for this task, and so is Rust. Neither of these languages are 'macro assemblers', even if C was intended to start out as one. The moment the C specification started to talk about an 'abstract machine' and the 'as-if rule', it stopped being a language to abstract over assembly, and started being a high-level language with its own abstract notion of memory.

The goals of having strong compiler optimizations and abstracting over assembly patters are fundamentally at odds with each other. It is impossible to have both. Therefore what we do instead is we aim to provide operations that can be described entirely in the abstract, but that lead to the kind of assembly code that is needed for maximal performance. This leads to a lot of complexity such as pointer provenance, a very funny notion of "uninitialized memory" -- and a concurrency memory model that needs to be considered in its own terms, and where thinking of it like "x86 does this, ARM does that, Power does that" quickly leads to incorrect results. Any time a programmer comes and says "but here I have this assembly pattern that I want to generate and I cannot write well-defined C or Rust code that does that", that's a gap in the language that, ideally, should be fixed by extending the abstract language such that the programmer can get what they want and need. Sadly, for consume and data dependencies, nobody has figured out how to do that yet...

There has been some interesting work on an alternative approach to relaxed memory, one of my favorites is this paper; there is a talk available here. This calculus is defined in a more abstract way than the C++ memory model, leaving it to the compiler to ensure the required ordering in the most efficient way -- so if the compiler can generate assembly code with a certain data dependency, it can use that and doesn't have to also generate fences or anything like that.

@DemiMarie

To elaborate on the last one: A pattern that (IIUC) is common in kernel code is a bunch of ordinary stores followed by a barrier and a final store that publishes everything. The compiler should be free to reorder stores before the barrier, but not to reorder across the barrier.

That is what 'relaxed' accesses are designed to allow.

That is fair, but I am still not aware of a way to get similar performance without exploiting address dependencies.

There is a way, it is to use inline assembly. :)

@comex

First, keep in mind that you could simply transliterate the C versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust, using ptr::read_volatile/ptr::write_volatile in place of C volatile loads and stores, and asm! in place of C asm blocks. If you do, you'll end up with the same LLVM IR instructions (or GCC equivalent with rustc_codegen_gcc), which will get passed to the same optimizer, and which ultimately will work or not work to the same extent as the C versions.

Indeed I think that is probably the best approach. It avoids accidentally limiting how we can evolve the Rust memory model (that underlies our official Atomic types) in the future, and while it does technically lead to UB in Rust, this is already the exact same situation that the Linux kernel finds itself in for C.

Moving to the official Atomic types in Rust should be done together with moving to the official atomic operations in C, if at all. I see little reason for using volatile in C but then using Atomic in Rust.

@DemiMarie
Copy link
Author

That is fair, but I am still not aware of a way to get similar performance without exploiting address dependencies.

There is a way, it is to use inline assembly. :)

That would work for one or two uses. As you can probably imagine, it will not work for every single use of rcu_dereference() in the entire kernel. There are just too many of them.

@Lokathor
Copy link
Contributor

Without knowing the precise details of that function in C, usually in Rust you would write that function as an inline(always) function that internally uses the correct Asm for the build target.

You can even make it a macro_rules that expands to an asm block if you want each specific use to perform its own dedicated register allocation.

@fbq
Copy link

fbq commented Aug 21, 2022

Hi @fbq, nice to hear from you! I am super excited about Rust coming to the Linux kernel, so I very much hope we can find something here that works for both sides. :)

Likewise ;-) Thanks for the reply. I will leave my reply to the "goal of the thread" to a separate comment because that may need to get more people's attention and opinions.

Currently, one of the biggest concern I receive from other kernel developers is in Rust reference:

Note that this does not refer to a concurrency memory model, this refers to the sequential part of the memory model. Arguably, C does not have a defined memory model in that sense either -- there is an implicit model hiding in the C spec, but it has not been properly described yet. Some people have tried, but there are still a lot of open questions.

Rust has very high standards for what it calls a defined memory model (basically it has to be mathematically unambiguous), which is why we haven't committed to anything official yet. We are working on it though, and we do hope to form an official "operational semantics team" soon that has this as one of its core objectives. :)

Thanks for the explanation. I'm all for models with high standards, that's one of the reasons I love about Rust community.

But for the purpose of this thread, we are talking about the concurrency part of the memory model, and there Rust (for now) just does exactly what C and C++ do. So if you consider the C/C++ concurrency memory models defined, then so is the Rust model.

Not sure it fulfils your expectation, but we do support to check a small piece of code with herd, you can find the rules of the checking and samples in tools/memory-model of the Linux kernel source.

Thanks for pointing that out, I was not aware of this! This file does look like the heart of a graph-based memory model indeed. Is there any systematic work on comparing this with the C++ model?

The acacdemic effort of Linux Kernel Memory Model can be found here, and there is a "Comparison with the C11 model" section. However, the comparison is more on different results of litmus tests for two models, rather than models themselves. Plus, LKMM models RCU ordering, which is not part of C++ until recently.

RCU's rcu_dereference is a consume atomic load as @DemiMarie mentioned. Actually, in Linux, all volatile atomic loads (i.e. READ_ONCE) provides address dependencies. And I'm not sure how to express that in Rust.

Yeah, consume is basically unworkable in any high-level language. To my knowledge nobody has managed to come up with a satisfying definition in any of C, C++, LLVM, or Rust. The very notion of a "data dependency" is meaningless when the compiler is allowed to replace "X * 0" by "0", and thus remove dependencies (to give just a simple example).

This is one proposal to improve the case. But I don't know whether the committee loves it or not ;-)

So, consume-like reasoning can pretty much only be done in inline assembly, and by using the hardware notion of a data dependency.

but all accesses to some_struct are done via READ_ONCE() or WRITE_ONCE(), i.e. atomic operations. Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write,

So the requirement here would be that whatever the Rust side does is ABI compatible with what the C side does.

Honestly you are probably best off by defining your own AtomicI32 and then taking responsibility of ensuring compatibility. The official Rust AtomicI32 type is only guaranteed to interact well with code that follows the C++ memory model, and it seems hard to guarantee that this will be compatible with the Linux kernel memory model.

I will leave this to the "goal of the thread" comment.

There must be a way that two groups of smart people (language/compiler people and OS kernel developers) can work together instead of one saying "UB" and the other saying "volatile".

I surely hope there is such a way. :) But Rust is currently quite far away from attempting to define its own concurrency memory model. We don't have the expertise for that, and we are busy specifying other parts of a language (a we put a higher standard to what we consider a 'specification' than what C/C++ do).

Understood. And good things are worth waiting ;-) I'm simply here to say: if Rust is going to define its own concurrency memory model or refine LLVM's model, please do take Linux kernel parallel idioms into consideration, and if answers from kernel developers are needed, I'm always here ;-)

The goals of having strong compiler optimizations and abstracting over assembly patters are fundamentally at odds with each other. It is impossible to have both. Therefore what we do instead is we aim to provide operations that can be described entirely in the abstract, but that lead to the kind of assembly code that is needed for maximal performance. This leads to a lot of complexity such as pointer provenance, a very funny notion of "uninitialized memory" -- and a concurrency memory model that needs to be considered in its own terms, and where thinking of it like "x86 does this, ARM does that, Power does that" quickly leads to incorrect results. Any time a programmer comes and says "but here I have this assembly pattern that I want to generate and I cannot write well-defined C or Rust code that does that", that's a gap in the language that, ideally, should be fixed by extending the abstract language such that the programmer can get what they want and need.

Totally agreed! The abstract language should be designed to help programmers rather than trick them ;-)

Sadly, for consume and data dependencies, nobody has figured out how to do that yet...

There has been some interesting work on an alternative approach to relaxed memory, one of my favorites is this paper; there is a talk available here. This calculus is defined in a more abstract way than the C++ memory model, leaving it to the compiler to ensure the required ordering in the most efficient way -- so if the compiler can generate assembly code with a certain data dependency, it can use that and doesn't have to also generate fences or anything like that.

Thanks for the pointer, I will certainly take a look.

@RalfJung
Copy link
Member

Understood. And good things are worth waiting ;-) I'm simply here to say: if Rust is going to define its own concurrency memory model or refine LLVM's model, please do take Linux kernel parallel idioms into consideration, and if answers from kernel developers are needed, I'm always here ;-)

Yes we should absolutely do that! And thanks for the offer. :)

I am also excited about the work on the C++ side to support some of the LKMM idioms, and curious what will come out of that. Rust people are closely watching what C++ does in this space.

@fbq
Copy link

fbq commented Aug 21, 2022

@RalfJung

So I am curious now what the goal of this thread here is:

  • Do we 'just' want to ensure that Rust can be used in the Linux kernel as reliably as C can? Then I think we should follow the existing approach of the kernel violating the language spec, and rely on the real-world evidence that this happens to work in practice. This should use exactly the same idioms as what C uses, generating the same LLVM IR, to ensure that if this breaks, that means LLVM is also breaking the C side of the Linux kernel, which I assume they will be reluctant to do even if their specification allows it. I assume the same will probably happen for rustc itself, if it ever starts to optimize atomics. I am not happy with declaring a spec-violating program as "too big to fail", but with the current state of the art, I also do not know a better approach. After all, Linux is extremely systematic in just how it violates the spec, and it has a real-world need that you claim cannot be satisfied in the realm of the spec, so as long as we ensure we have a good line of communication to diagnose any problem that comes up here, I think this can be made to work. It works for C, after all.
  • Or do we want to go well beyond what happens in C, and actually define a memory model that both has the performance characteristics required by Linux, and can be defined precisely as a language-level graph-based (or ideally operational) concurrency memory model? This is a monumental task and a large open problem, and should not be on the list of blocking issues for anything that needs to get done in the next 5 years. ;)

I understand that the memory model for the second option is indeed challenging, although I'm happy to see we finally have one, but that requires a lot of smart brains and many years. So I would say let's don't block experimenting Rust for Linux kernel development because of that.

However, if we choose the first option, I think I do have some concerns as a developer of the Rust-for-Linux project, so I add other maintainers and reviewers to the discussion:

@ojeda @wedsonaf @alex @bjorn3 @nbdd0121

The first option basically means if Rust side code wants to sync with C side code, there will be UBs, and if I understand correctly we actually try very hard to prevent UBs in the Rust-for-Linux project. I'm not sure how others feel about this...

If we are going to allow UBs in the Rust-for-Linux project, there is more work to do: we will have to maintain each of them, so that we can remove them once we can. An idea comes to my mind is a unsound!() macro to mark the code:

unsound!{
    // Assuming volatile read data race is safe.
    some_ptr.read_volatile(...)
}

or we can define unsound_read_volatile and others, or other solution to track UBs ;-(

Also if we use the asm!() solution for atomics and others as below:

but all accesses to some_struct are done via READ_ONCE() or WRITE_ONCE(), i.e. atomic operations. Now if we use bindgen to generate the same struct to Rust side, foo will be i32 instead of AtomicI32, and if Rust side wants to write,

So the requirement here would be that whatever the Rust side does is ABI compatible with what the C side does.

Honestly you are probably best off by defining your own AtomicI32 and then taking responsibility of ensuring compatibility. The official Rust AtomicI32 type is only guaranteed to interact well with code that follows the C++ memory model, and it seems hard to guarantee that this will be compatible with the Linux kernel memory model.

and

@comex

First, keep in mind that you could simply transliterate the C versions of READ_ONCE/WRITE_ONCE, barriers, etc. directly to Rust, using ptr::read_volatile/ptr::write_volatile in place of C volatile loads and stores, and asm! in place of C asm blocks. If you do, you'll end up with the same LLVM IR instructions (or GCC equivalent with rustc_codegen_gcc), which will get passed to the same optimizer, and which ultimately will work or not work to the same extent as the C versions.

Indeed I think that is probably the best approach. It avoids accidentally limiting how we can evolve the Rust memory model (that underlies our official Atomic types) in the future, and while it does technically lead to UB in Rust, this is already the exact same situation that the Linux kernel finds itself in for C.

Moving to the official Atomic types in Rust should be done together with moving to the official atomic operations in C, if at all. I see little reason for using volatile in C but then using Atomic in Rust.

, not only we will have to implement them, but also we need to have documentation or guidelines describing their usage, for example:

  • If Rust side wants to sync with C side, then the asm! version of atomics should be used.
  • If synchronization only happens on Rust, then Rust atomics can be used.
  • Rust synchronization mechanism purely based on Rust atomics cannot be used for protecting a memory object that also get parallel accesses from C side.
  • ...

Anything else we should do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants