Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm libc (and musl) are incompatible with llvm memcpy assumptions #73516

Open
RalfJung opened this issue Nov 27, 2023 · 23 comments
Open

llvm libc (and musl) are incompatible with llvm memcpy assumptions #73516

RalfJung opened this issue Nov 27, 2023 · 23 comments

Comments

@RalfJung
Copy link
Contributor

The libc llvm seems to declare its memcpy with the restrict qualifier, matching the signature in the C standard:

LLVM_LIBC_FUNCTION(void *, memcpy,
(void *__restrict dst, const void *__restrict src,
size_t size)) {

This means it will be compiled in a way such that when the two pointers are used to access the same memory, and at least one access is a write, there is UB. This is UB both in the C source semantics and in LLVM IR (via noalias).

However, LLVM itself assumes that the libc memcpy is always defined behavior when dst==src. (This comes up frequently on this issue tracker, e.g. at #60734, #55399. But as far as I know it is not mentioned in the LLVM documentation.)

Reading the llvm-libc sources, I did not find any guards that would short-circuit the function when dst==src. It seems very much like the usual copy loop will be executed, loading from src and storing to dst. This is UB due to the restrict keyword, making LLVM incompatible with its own libc even when said libc is compiled with LLVM -- unless I misunderstood something. (There are so many macros in the libc it's very hard to be sure what the actually compiled code will be.^^)

musl also uses restrict in its memcpy, so it is likewise incompatible with LLVM.

@lntue
Copy link
Contributor

lntue commented Nov 27, 2023

@gchatelet : Do you mind taking a look at this? Thanks!

@SchrodingerZhu
Copy link
Contributor

SchrodingerZhu commented Nov 27, 2023

POSIX and the C standards are explicit that employing memcpy() with overlapping areas produces undefined behavior.

Shouldn't this be UB by standards?

@RalfJung
Copy link
Contributor Author

Yes it is, but when the compiler calls a particular memcpy implementation, that implementation could make more guarantees than what the standard prescribes. LLVM assumes this to be the case. However, llvm-libc does not seem to actually provide such guarantees due to the restrict (and it's not documented as a guarantee either AFAIK). See #12135 for some of the history here.

@SchrodingerZhu
Copy link
Contributor

Ah, I see. llvm.memcpy behaves differently with __llvm_libc::memcpy. This may introduce complications.

@gchatelet gchatelet added the bug Indicates an unexpected problem or unintended behavior label Nov 27, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 27, 2023

@llvm/issue-subscribers-bug

Author: Ralf Jung (RalfJung)

The libc llvm seems to declare its `memcpy` with the `restrict` qualifier, matching the signature in the C standard:

LLVM_LIBC_FUNCTION(void *, memcpy,
(void *__restrict dst, const void *__restrict src,
size_t size)) {

This means it will be compiled in a way such that when the two pointers are used to access the same memory, and at least one access is a write, there is UB. This is UB both in the C source semantics and in LLVM IR (via noalias).

However, LLVM itself assumes that the libc memcpy is always defined behavior when dst==src. (This comes up frequently on this issue tracker, e.g. at #60734, #55399. But as far as I know it is not mentioned in the LLVM documentation.)

Reading the llvm-libc sources, I did not find any guards that would short-circuit the function when dst==src. It seems very much like the usual copy loop will be executed, loading from src and storing to dst. This is UB due to the restrict keyword, making LLVM incompatible with its own libc even when said libc is compiled with LLVM -- unless I misunderstood something. (There are so many macros in the libc it's very hard to be sure what the actually compiled code will be.^^)

musl also uses restrict in its memcpy, so it is likewise incompatible with LLVM.

@gchatelet
Copy link
Contributor

@RalfJung thx for bringing this issue to my attention.

However, LLVM itself assumes that the libc memcpy is always defined behavior when dst==src. (This comes up frequently on this issue tracker, e.g. at #60734, #55399. But as far as I know it is not mentioned in the LLVM documentation.)

I believe this is stated here for the IR but I haven't seen anything regarding requirements for the libc. Let me run a bunch of tests and get back to you with a proper answer.

@RalfJung
Copy link
Contributor Author

RalfJung commented Nov 27, 2023

Yes, the semantics of the builtin are documented. But that's guarantees LLVM makes to its users: if you call the builtin that way, all is good.

The fact that this becomes a libcall to some libc symbol means that LLVM also makes similar assumptions, where it is then someone else's responsibility to really ensure that memcpy (the library function, not the builtin) behaves that way. For clang there is an old WIP patch to document this at https://reviews.llvm.org/D86993, but I am not aware of anything for LLVM itself.

(For Rust, what I am really hoping for here is that we can get a guarantee that if the LLVM IR only calls the builtin in a way that the argument ranges are disjoint, then LLVM will also only ever emit libcalls to memcpy with disjoint arguments. That is, LLVM will never itself do any transformation that would rely on the fact that the memcpy builtin is well-defined for src==dst. That would be great news for Rust since then Rust could avoid making the src==dst assumption about memcpy -- an assumption that is problematic as demonstrated by the restrict attribute issue for musl and llvm-libc.)

@RalfJung
Copy link
Contributor Author

Cc @nikic who's been involved in a bunch of these discussions.

@nikic
Copy link
Contributor

nikic commented Nov 27, 2023

Just so we're clear, when Ralf says "incompatible" he means "theoretically incompatible". There is no practical issue here, because this usage of restrict will not lead to an actual miscompile even if both pointers are equal (only if there is non-exact overlap, which is excluded).

There is no action that needs to be taken on the part of LLVM libc. If any action is taken, it will be on the side of LLVM.

@SchrodingerZhu
Copy link
Contributor

If llvm.memcpy does have a looser requirement, isn't it the case that LLVM should emit the check when lowering the code? (Either dynamically or statically if the libc target is known)

@RalfJung
Copy link
Contributor Author

RalfJung commented Nov 28, 2023

It is theoretical in the sense that I am not aware of a compiler actually compiling memcpy in a way that this would be an issue. But I also have not checked the assembly.

Re-reading the definitions of restrict, I think this depends on whether "memory locations that are modified, by any means, during the execution of the function" includes memory locations that have their original value written back into them. Is doing a write always a modification, or is it only a modification if the write changes the value stored there?

EDIT: it was pointed out to me that the standard also contains this non-normative note:

"Modify" includes the case where the new value being stored is the same as the previous value.

So probably this is indeed intended to be UB.

@gchatelet
Copy link
Contributor

I was reading more of the linked bugs and this GCC comment caught my attention. We don't currently use such strategies and I don't see it happening in the future but there's probably some hypothetical implementation of memcpy that will break when src==dst. That being said, by C standard definition "If copying takes place between objects that overlap, the behavior is undefined" : so I agree with @nikic #73516 (comment) that if something is to be done this would be on LLVM side (e.g., guard lowering to libc memcpy call with a check that src!=dst).

@gchatelet gchatelet removed the bug Indicates an unexpected problem or unintended behavior label Nov 28, 2023
@RalfJung
Copy link
Contributor Author

I was reading more of the linked bugs and this GCC comment caught my attention.

Ah, that's an interesting one. And with restrict compilers would even be allowed to introduce such "prefetch with write intent" as part of the optimizations.

@gchatelet
Copy link
Contributor

@RalfJung how do you want to move forward with this issue? It seems to me that this would be an issue for llvm itself. Maybe just documentation improvement?

@RalfJung
Copy link
Contributor Author

RalfJung commented Nov 30, 2023

Right, I think there's two issues here:

  1. LLVM's assumptions should be documented somewhere in the official docs, someplace that has a URL one can reference.
  2. LLVM's assumptions should be re-evaluated given that de jure, both llvm-libc and musl (when interpreted according to the C standard, and even when building with clang and interpreting the generated LLVM IR according to the LLVM LangRef) do not actually satisfy the assumptions. To justify the status quo, one has to do reasoning based on "but compilers don't actually do such optimizations even though they are allowed to", which is a fragile argument to make.

An alternative to (2) would have been to remove restrict from llvm-libc, but it seems the general consensus is that that is not the way to go.

@michaelrj-google
Copy link
Contributor

If you want a copy that's safe even if the ranges overlap then you want memmove.
From the man page:

DESCRIPTION
       The memmove() function copies n bytes from memory area src to memory area dest.  The memory areas may overlap:
       copying  takes  place as though the bytes in src are first copied into a temporary array that does not overlap
       src or dest, and the bytes are then copied from the temporary array to dest.

@gchatelet
Copy link
Contributor

memmove has a cost though, it cannot be as fast as memcpy. IMHO, in order to have correct and efficient code we need to guard memcpy calls with a pointer inequality check.

@jyknight
Copy link
Member

jyknight commented Dec 8, 2023

LLVM requires that memcpy of the same address work, because it uses memcpy for struct copies, e.g. struct X { char n[1000]; }; void foo(struct X*a, struct X*b) { *a = *b; }, and this must still work if a and b are equal. GCC does the same, BTW.

I'm certain we don't want to add a pointer inequality check to the struct copy codegen, nor switch it to call memmove.

@RalfJung
Copy link
Contributor Author

RalfJung commented Dec 8, 2023

I'm certain we don't want to add a pointer inequality check to the struct copy codegen, nor switch it to call memmove.

The only alternative is to have a pointer inequality check in memcpy, or remove the restrict from memcpy (and document in the libc that exact overlap is allowed). Are you certain that's better? To me this sounds like a case that needs actual data rather than speculation.

GCC does the same, BTW.

I am aware: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667, https://sourceware.org/bugzilla/show_bug.cgi?id=31055

In those discussions, the proposal came up of having a __memcpy_lax that explicitly allows exact overlap, to support this common need of a C compiler without compromising the performance of code that uses memcpy according to its documented contract.


Also, what you are describing is a concern of clang, not LLVM. LLVM is not concerned with C semantics, only with LLVM IR semantics. In Rust we'd ideally like to use LLVM without making this assumption about memcpy. We are taking care to call the LLVM memcpy builtin only with strictly non-overlapping pointers. It would be great to get a guarantee that LLVM will then also never perform any memcpy libcalls where the pointers overlap. In other words, even though the LLVM builtin says that equal pointers are allowed, it would be great to get a guarantee that LLVM transformations will never exploit this and introduce new memcpy with equal pointers. Therefore, all libcalls to memcpy with equal pointers correspond to the frontend generating a builtin memcpy with equal pointers, and if the frontend generates no such call, one can use LLVM with a memcpy implementation that does not allow equal pointers.

Basically, frontends should be able to choose whether they require a non-standard memcpy that allows exact overlap or not. clang is obviously free to make its own decisions, but in Rust I'd rather not rely on "yeah it's UB but compilers don't optimize restrict like that".

@jyknight
Copy link
Member

jyknight commented Dec 8, 2023

The only alternative is to have a pointer inequality check in memcpy, or remove the restrict from memcpy (and document in the libc that exact overlap is allowed)

Those aren't the only available alternatives. The assembly code generated by today's compilers already works with exact overlap, with restrict, and without an extra branch.

I don't know if removing the restrict qualifier affects the performance of the generated code. If it doesn't, simply remove it to make the code semantically correct. But if it does: memcpy is performance-critical enough that it's worth it to find a different solution which preserves the current codegen -- without being theoretically invalid.

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

@jyknight
Copy link
Member

jyknight commented Dec 8, 2023

We are taking care to call the LLVM memcpy builtin only with strictly non-overlapping pointers.

Does that automatically fall out of Rust's model, or do you explicitly take extra care? (That is: could it improve codegen for Rust to stop taking such care, given the guarantee that memcpy does work with equal pointers?)

@RalfJung
Copy link
Contributor Author

RalfJung commented Dec 15, 2023

The assembly code generated by today's compilers already works with exact overlap, with restrict, and without an extra branch.

I agree with this factual statement.

However, which compilers guarantee this to be the case? clang doesn't, as far as I can tell, given the LLVM IR semantics. So in the status quo we now have to audit the generated assembly after each compiler update, or something like that? Certainly not a great solution.

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

restrict is defined in terms of the memory being accessed through this pointer vs other pointers. It applies certain restrictions on each memory access. I don't see any way to incorporate a concept of "exact overlap" into that definition.

The only way out I can see for restrict is to to say that when a write writes the exact same value that is already stored at that location in memory, then it doesn't count as "mutation". In that case memcpy on exact overlap would not perform any mutation and therefore restrict doesn't impose any restrictions. However, defining "the exact same value" is tricky (e.g., do we mean the same abstract value -- at which type even? -- or the same underlying representation). I don't know if all compilers would be happy to commit so such a changed restrict. If compilers want to use the presence of "write to restrict-qualified pointer" to do things like "tell the CPU that certain memory can be discarded since it will be overwritten soon", then that would no longer be possible. (Those are ofc exactly the kind of optimizations that are also incompatible with the exact-overlap assumption.)

Does that automatically fall out of Rust's model, or do you explicitly take extra care? (That is: could it improve codegen for Rust to stop taking such care, given the guarantee that memcpy does work with equal pointers?)

This automatically falls out of Rust's model, specifically the disjointness information in Rust's &mut type. I'm not aware of a case where Rust would benefit from allowing exact overlap. The Rust intrinsic that corresponds to the LLVM memcpy builtin is called copy_nonoverlapping and it does require proper disjointness. Neither the Rust compiler nor the Rust standard library have a situation where the copy is on two regions that are "either disjoint or equal". We always either know it is disjoint (so we use copy_nonoverlapping aka memcpy), or we know nothing (so we use copy aka memmove).

For the specific situation where this happens in C, *to = *from, in Rust if these are raw pointers then we do allow partial overlap so we have to use memmove anyway. This is a rare situation; almost all code uses references and then to must be a mutable reference so we know it cannot alias from and we can use memcpy and we can prove full disjointness.

@frederick-vs-ja
Copy link
Contributor

One option there would be to document an extended guarantee of "restrict" as not being invalid on exact overlap.

No, this definitely defeats the intent of restrict.

The only way out I can see for restrict is to to say that when a write writes the exact same value that is already stored at that location in memory, then it doesn't count as "mutation". In that case memcpy on exact overlap would not perform any mutation and therefore restrict doesn't impose any restrictions.

Hmm... it seems that we can say "An implementation may assume reading via expression E based on a restricted pointer p obtains the same value as (...), and the behavior is undefined if such assumption is violated." (which is not perfect, see below).

However, defining "the exact same value" is tricky (e.g., do we mean the same abstract value -- at which type even? -- or the same underlying representation).

Note that the formal defintion (located at, e.g., WG14 N3096 6.7.3.1) uses "expression" that should have a well-defined type, so it shouldn't be hard to specify this.

The major problem I see is that just modifying restrict seemly inevitably weakens alias analyzation. E.g. in this example

void fun(int * restrict p, int * restrict q)
{
    *p = *q;
    *q = *p;
}

IIUC an implementation can assume that p != q per current rules. But it seems that any method that makes memcpy(p, p, n) well-defined by modifying restrict would defeat such alias analyzation.

So I guess a possible option would be weakening the preconditions of memcpy which needs to drop restrict (while a weaker variant of restrict can still be used).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants