New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please merge r331925 and r331928 into 6.0.1 #36860
Comments
assigned to @tstellar |
This adds a new feature to Clang, which I'm not certain is appropriate for a bugfix release like this, especially given that the feature has only been in the product for less than two weeks. Granted, it's not a particularly complex new feature. Can you give more background on why this is needed? |
This is required to fix a Linux kernel crash reported here: https://lkml.org/lkml/2018/5/7/534 We also reproduced this issue on Android and Chrome OS kernels. |
I'll leave it to Tom to decide whether to move this over or not. The changes in the patch are pretty direct and obvious (most of the real work for this happens in the backend), but at the same time, it is a new Clang feature that hasn't had time to get much feedback from the community. I'm not opposed to moving the patch on technical grounds, but I do worry about having new features appear in bugfix releases. |
I can appreciate not wanting features added to bugfix releases. At the same time, the Linux kernel community really appreciates the responsiveness to which the llvm community can fix reported issues. Getting this fix into users hands sooner helps the linux community continue adding clang support. |
Nick - is this a regression from a previous release of Clang? |
Richard, what do you think about this? |
I don't think so, but I've asked the original reporter to clarify: But here's a link from today where an arm64 kernel maintainer asked what released version of clang folks should be using to test patches. Linux kernel maintainers frown upon requests to build their compiler from source to have the latest fixes. Backporting these fixes allows us to iterate faster, rather than wait for the next major compiler rev where we get to "try again" which is exhausting. |
I don't think there's any technical reason we couldn't take this patch. [Adding an attribute breaks binary compatibility for our serialized AST files. We had intended to retain binary compatibility for those files across patch releases, but it looks like we never actually removed the repository version check for releases, so patch releases end up being spuriously considered incompatible.] So it's a question of policy: do we firmly say "no new features in patch releases", or is there a bar for a feature being so important that we're prepared to ship it in a patch release? And if so, does this patch meet that bar? I can see valid arguments in both directions. |
Please, I beg the release managers to reconsider. The Linux kernel is the largest open source source C codebase, and being able to compile a working Linux kernel with Clang is a feather in Clang's cap. Every regression/bug in Clang impedes the work of folks trying to get the Linux kernel to be compatible with Clang. We need a way to get fixes into developers hands faster than 6 months. Otherwise, we can't send patches to kernel maintainers because they're unwilling to compile their compiler from source. Meanwhile 6 months go by of more regressions or incompatibilities in the kernel because we cant get the continuous integration up with Clang because we have to wait 6 months for a new release. Then this cycle repeats. It can feel Sisyphean. I'm tired of being told "come back in 6 months with a 'real compiler'" from kernel maintainers. We have a real compiler, and I think it's the best, but I need to convince a lot of other people. We need this as an olive branch. I appreciate the hardline of being able to simply say "no" and walk away, maintaining a form of purity in release process. Just know that doing so impedes a major effort that takes significant coordination amongst two different large open source communities. |
Does gcc's stack protector code also clobber RCX or is this something that only happens with clang? |
Would it work to use the no_caller_saved_registers attribute? |
I've looked at this issue a little more closely and to me it seems like __attribute((no_stack_protector)) is just working around a bug in the kernel source. My understanding is that the problem here is callers of native_save_fl() expect it to preserve more registers than required by the standard x86 calling convention, however, nowhere is the information communicated to the compiler. There is a macro in the kernel source, PV_CALLEE_SAVE_REGS_THUNK, that will wrap the functions with a thunk that preserves all the caller's registers when calling a function, but this is not used for native_save_fl(). Maybe this is for performance reasons? Using __attribute((no_stack_protector)) doesn't actually fix the bug, it just prevents clang from emitting code that clobbers registers that the caller didn't expect(in this case RCX). There is nothing to guarantee that some other change to clang won't emit code that clobbers RCX. I think a better fix would be to use PV_CALLEE_SAVE_REGS_THUNK or maybe try the attribute I mentioned in my previous comment, these seem like more robust solutions. |
Hi Tom, Thanks for the suggestions. I have asked Matthias, who was working on this crash to look into them. Nevertheless, __nostackprotector is needed in many places in Linux kernel and having a clang equivalent is going to be helpful. |
Hi Tom, Here is the detailed reply from our kernel dev, Alistair Strachan, who debugged this issue on Android. I hope this clarifies the need for this patch. Verbatim msg:
GCC's stack guard code does normally clobber RCX, however GCC does not emit a stack guard for functions that are marked "static inline" even if it is forced to create an out of line copy. The example here is something like: static inline unsigned long native_save_fl(void) static struct pv_irq_ops irq_ops = { Clang will always emit a stack guard for out of lined functions, regardless of the "static inline" annotation, so it will always clobber rcx in this case.
Yes (a stack guard is still generated). The kernel already has an equivalent / more efficient macro called PV_CALLEE_SAVE_REGS_THUNK(). Neither is a particularly good solution because both have to save many more registers than are actually used by the function, so they end up generating a lot of additional code. As the pv_irq_ops are used in performance critical paths with assembler patches, neither would be an acceptable solution.
You could definitely argue that making assumptions about how the compiler will inject stack guards or not, is a bug in the kernel. However, GCC does not inject a stack guard, and clang does, and that is the only reason this code does not work. The only way to get the necessary control otherwise would be to write all of these functions in assembler.
Sort of. It really just assumes %rcx won't be clobbered, because it's not expecting stack guard code to be generated, because GCC does not generate stack guard code.
To me, that's the definition of fixing this bug.
These functions are supposed to all be written as "asm volatile", so the kernel programmer normally knows exactly which registers are clobbered or not. However, because the compiler generates the stack guard code any time a stack is allocated by a C function, and this unconditionally clobbers %rcx, the programmer cannot control this clobber with clang.
See above. PV_CALLEE_SAVE_REGS_THUNK will generate code that is ~10x larger than the existing code. |
Posting another note from Alistair: This is based on on the observed behavior of -fstack-protector and -fstack-protector-strong, which are both applying a heuristic to decide whether to inject a stack guard. GCC will inject a stack guard if -fstack-protector-all is specified (which the kernel does not use). |
Can you give us an idea what the callee code looks like? Rather than working around this by trying to turn off the parts of the compiler that happen to use rcx, if you want to implement a custom calling convention the right solution is probably to mark the function as attribute((naked)) and use inline assembly to implement the custom calling convention. However, there's too little information on this bug to be at all sure about that. |
Callee Code |
Hi, I was able to boot into a paravirtualized Linux v4.14.42 (LTS) kernel with (strong) stack-protector enabled on Debian/testing AMD64. I used LLVM/Clang version 7-svn332830 packages from <apt.llvm.org> repositories and have added Clang's "no_stack_protector" attribute to the Linux sources and marked native_save_fl() accordingly. For more details please see the thread "Clang patch stacks for LTS kernels (v4.4 and v4.9) and status update" on LKML. I would like to see the patches in this bug-report also for LLVM/Clang v6.0.1. Thanks to all involved people. Regards,
[1] https://marc.info/?t=150344380900002&r=1&w=2 |
Would it be possible to change the constraint in the inline assembly from =rm to =r? In my reduced test case that prevents clang from emitting the stack protector. |
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints "'m': A memory operand is allowed, with any kind of address that the machine supports in general. Note that the letter used for the general memory constraint can be re-defined by a back end using the TARGET_MEM_CONSTRAINT macro." Does that mean that without specifying |
From what I understand, if you just supply 'r', then the compiler will only be allowed to pass a register value into the inline assembly string. However, this is just a work-around to make the code work with clang 6.0.0. The kernel code is still wrong in my opinion, because it relies on implementation specific behavior to work correctly. Even gcc could decide to do something different and this code would break. I think Richard's suggestion in comment #17 is the best solution to actually fix this issue correctly without making performance worse. |
Note that Tom's suggestion in commment #22 is effectively reverting kernel commit : commit ab94fcf528d127fcb490175512a8910f37e5b346
I'm curious what [ Impact: performance ] means in detail. |
Sedat, can you please test reverting f1f029c7bfbf4ee1918b90a431ab823bed812504 rather than your commits: [PATCH 1/2] compiler-clang.h: Add __nostackprotector function attribute |
Starting upstream thread: https://lkml.org/lkml/2018/5/23/1003 |
I have reverted "x86: allow "=rm" in native_save_fl()" [1] and this boots a paravirtualized/strong-stackprotected Linux-kernel in QEMU. Linux-kernel: 4.14.43 [1] https://git.kernel.org/linus/ab94fcf528d127fcb490175512a8910f37e5b346 |
A test case showing gcc generates no different disassembly for "=r" vs "=rm". |
[RFC] kernel patch Please let me know if I'm missing anyone, or need to reword anything. |
[RFC] kernel patch v2 |
[RFC] kernel patch v3 Add testing links. |
Re: naked attribute in comment #17, seems to be a few issues with it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927 |
Looks like someone mentioned on the GCC Bug that the contents of the naked function must be purely inline asm. I expect the non-asm return statement could be causing problems. |
Is it safe (not UB) to have something like: attribute((naked)) (I haven't done a ton of inline assembly programming; seeing a C function with a return type other than void and no return statement is kind of jarring). I mean, looks good in godbolt (gcc 8 still generate |
Why is that safe? It doesn't have a return instruction, it would seem. I'd expect you to need a 'ret' in the inline asm, potentially? And that certainly seems to fix the problem in GCC's code gen (it has a ud2 to make sure it's a hard crash when you run off the end of a function without a ret (LLVM does something similar at -O0 for code like: "int f() { }")).
"good" except that it executes a ud2. That's what I mean - seems like a possible/reasonable interpretation that the inline asm for a naked function should include the ret instruction itself. |
Oops.
Did you verify this? Quick testing with godbolt shows that it now generates: 0000000000000000 <save_flags4>: which, uh, I guess works, but I wouldn't expect it to pass any form of code review. |
I'm guessing that's the intended use/output from GCC, but don't know for sure. I assume it doesn't look at the assembly at all, and puts the ud2 in there unconditionally as a backstop. That's potentially what LLVM does - if it can't prove that the function returns on all paths, puts a ud2 there - but it's quite possible that the program logic does ensure the program always returns through other paths that can't be readily proved by LLVM. It doesn't seem unreasonable to treat the inline asm as opaque/unanalyzed, though. |
Given that this is a kernel bug and there are several possible fixes that can be applied to the kernel, I don't think this is enough to justify stretching the release rules and accepting a new feature like this. |
Tom, re comment #37, I agree. I appreciate and am blown away by your help; you've really gone above and beyond to help us here. I owe you one. Sedat, I've attached 3 patches (and a cover letter). Can you please help verify by testing them before I send them upstream? As soon as I get your green light, I'll add your tested by and send them upstream. I still want to revisit upstreaming your patch that adds no_stack_protector to compiler-clang.h, but I don't want to combine it with this patch set. |
Also, please let me know if I should reword anything or add anyone to the commit messages. |
[RFC] kernel patch v5 0000-cover-letter.patch |
[RFC] kernel patch v5 0003-x86-paravirt-make-native_save_fl-extern-inline.patch hueristic -> heuristic changed Wording-mostly-by: tags to Debugged-by: tags. |
[RFC] kernel patch v6 0003-x86-paravirt-make-native_save_fl-extern-inline.patch |
Hi Nick, 1st, it's interesting to see what the root cause is - linux-kbuild system which is not inheriting/modifies x86/kbuild_cflags. I had to slightly modify 0001 and 0002 against Linux v4.14.43 - the two other solutions were tested against the same version. Not sure if a "YES, it boots." in QEMU/KVM and on bare metal is sufficient for you? Do you need any logs and/or objdumps? Some comments on the patch-series: The most interesting for me is living of co-working between LLVM/Clang and Linux-kernel folks. [1] ClangBuiltLinux/linux#16 |
Yep, thanks for verifying.
No. I can build+objdump. The disassembly looks good, but a boot test makes me more confident.
Depends on the version of Clang. Most versions you'd use today define STDC_VERSION to 201112 so looks like -std=c11 is the default.
Me too. ;) |
I have tested this new patchset against Linux v4.14.47 (with different llvmlinux-related extra patches I had in my testing before) and can boot into QEMU/KVM and on bare metal. |
Fix in the upstream linux kernel with commits: torvalds@d03db2b |
Extended Description
r331925 adds support for __attribute((no_stack_protector)) needed for a Linux kernel crash issue.
r331928 is fix for a test case that was missed in r331925.
https://reviews.llvm.org/rL331925
https://reviews.llvm.org/rL331928
The text was updated successfully, but these errors were encountered: