Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc][setjmp] fix setjmp test #87837

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nickdesaulniers
Copy link
Member

This would consistently fail for me locally, to the point where I could not run
ninja libc-unit-tests without ninja libc_setjmp_unittests failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.

This would consistently fail for me locally, to the point where I could not run
`ninja libc-unit-tests` without `ninja libc_setjmp_unittests` failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (llvm#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 5, 2024

@llvm/pr-subscribers-libc

Author: Nick Desaulniers (nickdesaulniers)

Changes

This would consistently fail for me locally, to the point where I could not run
ninja libc-unit-tests without ninja libc_setjmp_unittests failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.


Full diff: https://github.com/llvm/llvm-project/pull/87837.diff

1 Files Affected:

  • (modified) libc/src/setjmp/x86_64/setjmp.cpp (+7-16)
diff --git a/libc/src/setjmp/x86_64/setjmp.cpp b/libc/src/setjmp/x86_64/setjmp.cpp
index 8b6981d4cc34a2..bbc08d42bfd617 100644
--- a/libc/src/setjmp/x86_64/setjmp.cpp
+++ b/libc/src/setjmp/x86_64/setjmp.cpp
@@ -16,22 +16,13 @@
 namespace LIBC_NAMESPACE {
 
 LLVM_LIBC_FUNCTION(int, setjmp, (__jmp_buf * buf)) {
-  register __UINT64_TYPE__ rbx __asm__("rbx");
-  register __UINT64_TYPE__ r12 __asm__("r12");
-  register __UINT64_TYPE__ r13 __asm__("r13");
-  register __UINT64_TYPE__ r14 __asm__("r14");
-  register __UINT64_TYPE__ r15 __asm__("r15");
-
-  // We want to store the register values as is. So, we will suppress the
-  // compiler warnings about the uninitialized variables declared above.
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wuninitialized"
-  LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->rbx) : "r"(rbx) :);
-  LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r12) : "r"(r12) :);
-  LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r13) : "r"(r13) :);
-  LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r14) : "r"(r14) :);
-  LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r15) : "r"(r15) :);
-#pragma GCC diagnostic pop
+  asm("mov %%rbx, %[rbx]\n\t"
+      "mov %%r12, %[r12]\n\t"
+      "mov %%r13, %[r13]\n\t"
+      "mov %%r14, %[r14]\n\t"
+      "mov %%r15, %[r15]"
+      : [rbx] "=m"(buf->rbx), [r12] "=m"(buf->r12), [r13] "=m"(buf->r13),
+        [r14] "=m"(buf->r14), [r15] "=m"(buf->r15));
 
   // We want the rbp of the caller, which is what __builtin_frame_address(1)
   // should return. But, compilers generate a warning that calling

Copy link
Contributor

@gchatelet gchatelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for cleaning this up and fixing the failing test.

LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r14) : "r"(r14) :);
LIBC_INLINE_ASM("mov %1, %0\n\t" : "=m"(buf->r15) : "r"(r15) :);
#pragma GCC diagnostic pop
asm("mov %%rbx, %[rbx]\n\t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be asm volatile?

Copy link
Member Author

@nickdesaulniers nickdesaulniers Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; volatile just keeps the inline asm blob around in case the outputs are DCE'd. In this case, because we write through the function parameter, we can't DCE the inline asm blob. You can verify this by compiling this change and disassembling this function and observing that the inline asm is not discarded.

Theoretically, if were to inline setjmp (perhaps due to LTO) and someone accidentally called setjmp but then did not use the jmp_buf parameter ever then we could DCE this inline asm blob. Which we would want.

But because this inline asm would not be safe to inline under LTO, and because this function also uses dangerous constructs like __builtin_frame_address, we should add [[noinline]] to this function definition.

@jyknight
Copy link
Member

jyknight commented Apr 8, 2024

As I've said before, https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/13 the setjmp function cannot be implemented in C.

This change addresses ONE way in which things can go wrong by trying to have a C implementation, but does not address the underlying problem: it shouldn't be in C in the first place.

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Apr 8, 2024

As I've said before, https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/13 the setjmp function cannot be implemented in C.

Careful James, only a Sith deals in absolutes.

With my change, the disassembly of setjmp is near ideal (modulo one extra push/pop of the frame pointer, which I think is because we recently reintroduced frame pointers but could probably be fixed by building this TU with -fomit-frame-pointers).

This change addresses ONE way in which things can go wrong by trying to have a C implementation, but does not address the underlying problem: it shouldn't be in C in the first place.

Ah! It seems that I'm late to the party, and this has already been discussed elsewhere. (Replying to points from that thread, though perhaps I should instead be making those points on that thread)

Adding any .s files probably requires significant infrastructure which currently doesn’t exist in libc

I don't think so. But I think the compiler provides significant controls that can be used before resorting to out of line assembly, and not all of those controls have been discussed or explored yet. More so opening the door for out of line asm is something we'd like to avoid as then it can be used in future arguments for "here's memcpy in out of line asm; look, you use out of line asm here and here and here so I should be able to use it, too."

I wanted to double-check my understanding of this policy and its aims, as to me it seems like writing out the desired code in a function with the naked attribute would be a better solution

You cannot refer to function parameters of naked functions: https://godbolt.org/z/b963seKxd

I guess that's why @efriedma-quic was using offsetof: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/6?u=nickdesaulniers

@jrtc27 's point about aarch64 GCC appears still valid. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77882 It wont matter for this function in this PR; but I should revisit the other implementations of setjmp/longjmp beyond just x86.

You also can't split a function into sub-functions which are naked then force them to inline: https://godbolt.org/z/rxzszr1PK (thank god, that would have been a war crime; I agree with points in that discord that wrapper functions here would be problematic).

But modulo that one extra push/pop (which I think is fixed with -fomit-frame-pointer), I don't see what benefit naked would provide us here; we already get optimal assembly. (Just tested -fomit-frame-pointer and -momit-leaf-frame-pointer which did not omit the push/pop from the prolog/epilog, so perhaps then naked fn attrs with @efriedma-quic 's pattern is the final optimization we can do here).

And to a point made here:

it seems pretty fragile that you need to compile this with very specific optimization flags in order for it to work

As long as llvm-project requires the usage of cmake, we have tight controls of the compiler flags. Downstreams that don't use cmake will need to match compiler flags when certain TUs use specific compiler flags in controlled and well understood ways to control code gen. Having code be correct under any possible combination of compiler flags is something we will not support.

These are so magic, that basically any instrumentation that the compiler might add, like sanitizers, profiling, etc, will break them.

That is a valid point, and I agree 100% with you. But that doesn't preclude the usage of C here.

We solved that in the Linux kernel with a small handful of function attributes:

At the very least, seeing __builtin_frame_address without either always_inline or noinline fn attrs is immediately a red flag. While that might not matter for individual TUs, we 100% want to support LTO'ing the libc. So adding noinline here seems like the most basic addition, but we'll probably want to add MORE function attributes to this implementation.

That's orthogonal to fixing this test though, since the failure is well understood.

@efriedma-quic
Copy link
Collaborator

As noted on the Discourse thread, the issue isn't what the compiler generates right now; it's what the compiler promises to generate. For setjmp() in particular, no matter how you write the code, there's no way reliably get the value of RBX on entry to the function in a non-naked function using inline asm.

I guess there's nothing stopping us from adding a clang intrinsic to return the value of rbx on entry, but it would only be useful for writing setjmp() implementations. Just marking the function naked seems easier.

You cannot refer to function parameters of naked functions: https://godbolt.org/z/b963seKxd . I guess that's why @efriedma-quic was using offsetof

Right.

@efriedma-quic
Copy link
Collaborator

As long as llvm-project requires the usage of cmake, we have tight controls of the compiler flags. Downstreams that don't use cmake will need to match compiler flags when certain TUs use specific compiler flags in controlled and well understood ways to control code gen. Having code be correct under any possible combination of compiler flags is something we will not support.

This is true to some extent, but depending on the exact form of the function prologue is excessively fragile.

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Apr 8, 2024

I've implemented setjmp here using a naked fn attr just to get a feel for it: #88054

I think it would be easier if clang had __builtin_stack_address. #82632 is tagged a good beginner bugs, but unless we already have a corresponding intrinsic in IR, I don't think it's a good beginner bug.

@jyknight
Copy link
Member

jyknight commented Apr 8, 2024

Implementing the set of functions I mentioned in the discourse-thread as pure-asm is required for correctness -- that can either be via a "naked" function or in a separate asm file. There is really no other way to guarantee the proper behavior, vs just "happens to work today".

More so opening the door for out of line asm is something we'd like to avoid as then it can be used in future arguments for "here's memcpy in out of line asm; look, you use out of line asm here and here and here so I should be able to use it, too."

I don't think that needs to be a concern. Document why this small number of functions must be implemented fully in asm (that it's not about performance!), and that solves that problem.

I've implemented setjmp here using a naked fn attr just to get a feel for it

That is a definite improvement. Using a naked function could be theoretically correct. However, this correctness applies ONLY if your code has a single asm statement. That significantly limits the usefulness of being in a C file.

Compared to a standalone-asm file, a naked function has other downsides:

  1. There is a risk of a developer inadvertently putting non-asm code into the function, thus breaking the correctness (e.g. the __builtin_return_address call in the draft PR, or a theoretical call to a proposed __builtin_stack_address, or something else)
  2. Compilers often fail to implement the attribute correctly, because it's such a rarely-used feature. GCC only supports it on a subset of platforms.

The primary advantage I see of using inline-asm is the ability to use offsetof to get the offsets of jmp_buf members. If you instead write the function in a pure asm file, you'll need to get the applicable type-offsets into asm-usable #define statements. But, IMO that's not a huge problem (for these specific functions), because they are so extremely platform-specific already. It's not a big lift to write such offset defines manually, and then assert correspondence (that the #define values match the actual struct offsets) via static_assert on the C side.

...actually, I now note that even the offsetof constant operands of the inline-asm is something GCC's docs explicitly say you cannot validly do in a naked function. I'm not sure if there's a good reason for that or not. But maybe that eliminates that advantage, too...

@nickdesaulniers
Copy link
Member Author

Document why this small number of functions must be implemented fully in asm (that it's not about performance!), and that solves that problem.

I agree and I think having the project's stance on out of line asm written out and agreed to in advance would make these discussions easier and blameless in the future. This should be link-able from https://libc.llvm.org/dev/code_style.html.

I've implemented setjmp here using a naked fn attr just to get a feel for it

That is a definite improvement. Using a naked function could be theoretically correct. However, this correctness applies ONLY if your code has a single asm statement. That significantly limits the usefulness of being in a C file.

Perhaps having the compiler generate debug info is a small consolation, but I agree the points in favor of maintaining these implementations are dwindling.

Compared to a standalone-asm file, a naked function has other downsides:

  1. There is a risk of a developer inadvertently putting non-asm code into the function, thus breaking the correctness (e.g. the __builtin_return_address call in the draft PR, or a theoretical call to a proposed __builtin_stack_address, or something else)
  2. Compilers often fail to implement the attribute correctly, because it's such a rarely-used feature. GCC only supports it on a subset of platforms.

To me, 2 is the larger issue. We'll hit that when we revisit setjmp/longjmp for aarch64. While we could drop support when using GCC as the host compiler for building those two functions for that one target, I'd rather not.

The primary advantage I see of using inline-asm is the ability to use offsetof to get the offsets of jmp_buf members. If you instead write the function in a pure asm file, you'll need to get the applicable type-offsets into asm-usable #define statements. But, IMO that's not a huge problem (for these specific functions), because they are so extremely platform-specific already. It's not a big lift to write such offset defines manually, and then assert correspondence (that the #define values match the actual struct offsets) via static_assert on the C side.

I'll give it a shot in a new PR tomorrow. That should give us a few options where we can better quantify the tradeoffs.

nickdesaulniers added a commit to nickdesaulniers/llvm-project that referenced this pull request Apr 9, 2024
This fixes libc_setjmp_unittests for me.

This would consistently fail for me locally, to the point where I could not run
ninja libc-unit-tests without ninja libc_setjmp_unittests failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (llvm#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.

Adding out of line asm to llvm-libc is opening pandora's box. This should not
be merged without discussion and buy in from maintainers. We should have a
policy in place for _when_ it's acceptable to use out of line asm or not.

Link: llvm#87837
Link: llvm#88054
Link: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249
nickdesaulniers added a commit to nickdesaulniers/llvm-project that referenced this pull request Apr 9, 2024
It would be helpful in future code reviews to document a policy with regards to
where and when Assembler sources are appropriate. That way when reviewers point
out infractions, they can point to this written policy, which may help
contributors understand that it's not the solely personal preferences of
reviewers but rather a previously agreed upon rule by maintainers.

Link: llvm#87837
Link: llvm#88157
Link: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/12
@nickdesaulniers
Copy link
Member Author

FWIW: here's a quick reimplementation in out of line assembler: #88157

That MUST NOT land without updating our coding style to reflect the policy around usage of out of line assembler.

So three different contrasting fixes:

  1. [libc][setjmp] fix setjmp test #87837 (this PR): play whack-a-mole with disabling compiler instrumentation. We can keep going and add function attributes to disable inlining (for LTO), sanitizers, and coverage.
  2. [libc][setjmp] fix setjmp test via naked fn attr #88054 : naked fn + 100% inline asm. I kind of like this because we can lean on the register allocator, and get debug info. But this approach will not work for aarch64 with gcc.
  3. [libc][setjmp][x86] implement setjmp in terms of out of line asm #88157 : 100% out of line asm. A bit more brittle than [libc][setjmp] fix setjmp test via naked fn attr #88054 (not super happy with the offsets).

Either of the asm approaches will not be fun to extend in the future, when we'll want to do something fancier in our implementation. IIUC, bionic seems to store a cookie and mangle registers to harden the impl.

I've also written up a draft of a policy for assembler sources here which will be useful no matter which way we go, but especially useful if we go with 3.

@nickdesaulniers nickdesaulniers marked this pull request as draft April 9, 2024 20:10
nickdesaulniers added a commit to nickdesaulniers/llvm-project that referenced this pull request Apr 22, 2024
It would be helpful in future code reviews to document a policy with regards to
where and when Assembler sources are appropriate. That way when reviewers point
out infractions, they can point to this written policy, which may help
contributors understand that it's not the solely personal preferences of
reviewers but rather a previously agreed upon rule by maintainers.

Link: llvm#87837
Link: llvm#88157
Link: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/12
nickdesaulniers added a commit that referenced this pull request Apr 22, 2024
It would be helpful in future code reviews to document a policy with
regards to
where and when Assembler sources are appropriate. That way when
reviewers point
out infractions, they can point to this written policy, which may help
contributors understand that it's not solely the personal preferences of
individual reviewers but instead rather a previously agreed upon rule by
maintainers.

Link: #87837
Link: #88157
Link:
https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/12
nickdesaulniers added a commit that referenced this pull request May 20, 2024
This would consistently fail for me locally, to the point where I could not run
ninja libc-unit-tests without ninja libc_setjmp_unittests failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.

The implemenation should be rewritten entirely. I've proposed three different
ways to do so (linked below). Until we decide which way to go, at least disable
this hardening feature for this function for now so that the unit tests go back
to green.

Link: #87837
Link: #88054
Link: #88157
Fixes: #91164
ilovepi added a commit that referenced this pull request Jul 23, 2024
We want to avoid any possibility that the compiler will insert a
prologue/epilogue violating the calling contracts for these special
functions, potentially clobbering registers that must be preserved. To
do that they should be marked naked, as is already the case on ARM.
See #87837 for further context.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
We want to avoid any possibility that the compiler will insert a
prologue/epilogue violating the calling contracts for these special
functions, potentially clobbering registers that must be preserved. To
do that they should be marked naked, as is already the case on ARM.
See #87837 for further context.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants