Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please merge r331925 and r331928 into 6.0.1 #36860

Closed
m-gupta opened this issue May 17, 2018 · 56 comments
Closed

Please merge r331925 and r331928 into 6.0.1 #36860

m-gupta opened this issue May 17, 2018 · 56 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla wontfix Issue is real, but we can't or won't fix it. Not invalid

Comments

@m-gupta
Copy link
Contributor

m-gupta commented May 17, 2018

Bugzilla Link 37512
Resolution WONTFIX
Resolved on Jul 16, 2018 16:08
Version 6.0
OS Linux
Blocks #35997
CC @AaronBallman,@dwblaikie,@nickdesaulniers,@tstellar

Extended Description

r331925 adds support for __attribute((no_stack_protector)) needed for a Linux kernel crash issue.
r331928 is fix for a test case that was missed in r331925.

https://reviews.llvm.org/rL331925
https://reviews.llvm.org/rL331928

@m-gupta
Copy link
Contributor Author

m-gupta commented May 17, 2018

assigned to @tstellar

@AaronBallman
Copy link
Collaborator

This adds a new feature to Clang, which I'm not certain is appropriate for a bugfix release like this, especially given that the feature has only been in the product for less than two weeks. Granted, it's not a particularly complex new feature.

Can you give more background on why this is needed?

@m-gupta
Copy link
Contributor Author

m-gupta commented May 17, 2018

This is required to fix a Linux kernel crash reported here:

https://lkml.org/lkml/2018/5/7/534

We also reproduced this issue on Android and Chrome OS kernels.

@AaronBallman
Copy link
Collaborator

I'll leave it to Tom to decide whether to move this over or not. The changes in the patch are pretty direct and obvious (most of the real work for this happens in the backend), but at the same time, it is a new Clang feature that hasn't had time to get much feedback from the community. I'm not opposed to moving the patch on technical grounds, but I do worry about having new features appear in bugfix releases.

@nickdesaulniers
Copy link
Member

I can appreciate not wanting features added to bugfix releases. At the same time, the Linux kernel community really appreciates the responsiveness to which the llvm community can fix reported issues. Getting this fix into users hands sooner helps the linux community continue adding clang support.

@dwblaikie
Copy link
Collaborator

Nick - is this a regression from a previous release of Clang?

@tstellar
Copy link
Collaborator

Richard, what do you think about this?

@nickdesaulniers
Copy link
Member

Nick - is this a regression from a previous release of Clang?

I don't think so, but I've asked the original reporter to clarify:
https://lkml.org/lkml/2018/5/18/1191

But here's a link from today where an arm64 kernel maintainer asked what released version of clang folks should be using to test patches. Linux kernel maintainers frown upon requests to build their compiler from source to have the latest fixes. Backporting these fixes allows us to iterate faster, rather than wait for the next major compiler rev where we get to "try again" which is exhausting.

@nickdesaulniers
Copy link
Member

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented May 18, 2018

I don't think there's any technical reason we couldn't take this patch. [Adding an attribute breaks binary compatibility for our serialized AST files. We had intended to retain binary compatibility for those files across patch releases, but it looks like we never actually removed the repository version check for releases, so patch releases end up being spuriously considered incompatible.]

So it's a question of policy: do we firmly say "no new features in patch releases", or is there a bar for a feature being so important that we're prepared to ship it in a patch release? And if so, does this patch meet that bar? I can see valid arguments in both directions.

@nickdesaulniers
Copy link
Member

Please, I beg the release managers to reconsider.

The Linux kernel is the largest open source source C codebase, and being able to compile a working Linux kernel with Clang is a feather in Clang's cap. Every regression/bug in Clang impedes the work of folks trying to get the Linux kernel to be compatible with Clang.

We need a way to get fixes into developers hands faster than 6 months.

Otherwise, we can't send patches to kernel maintainers because they're unwilling to compile their compiler from source. Meanwhile 6 months go by of more regressions or incompatibilities in the kernel because we cant get the continuous integration up with Clang because we have to wait 6 months for a new release. Then this cycle repeats. It can feel Sisyphean.

I'm tired of being told "come back in 6 months with a 'real compiler'" from kernel maintainers. We have a real compiler, and I think it's the best, but I need to convince a lot of other people.

We need this as an olive branch. I appreciate the hardline of being able to simply say "no" and walk away, maintaining a form of purity in release process. Just know that doing so impedes a major effort that takes significant coordination amongst two different large open source communities.

@tstellar
Copy link
Collaborator

Does gcc's stack protector code also clobber RCX or is this something that only happens with clang?

@tstellar
Copy link
Collaborator

Would it work to use the no_caller_saved_registers attribute?

https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers-gnu-no-caller-saved-registers

@tstellar
Copy link
Collaborator

I've looked at this issue a little more closely and to me it seems like __attribute((no_stack_protector)) is just working around a bug in the kernel source.

My understanding is that the problem here is callers of native_save_fl() expect it to preserve more registers than required by the standard x86 calling convention, however, nowhere is the information communicated to the compiler. There is a macro in the kernel source, PV_CALLEE_SAVE_REGS_THUNK, that will wrap the functions with a thunk that preserves all the caller's registers when calling a function, but this is not used for native_save_fl(). Maybe this is for performance reasons? Using __attribute((no_stack_protector)) doesn't actually fix the bug, it just prevents clang from emitting code that clobbers registers that the caller didn't expect(in this case RCX). There is nothing to guarantee that some other change to clang won't emit code that clobbers RCX.

I think a better fix would be to use PV_CALLEE_SAVE_REGS_THUNK or maybe try the attribute I mentioned in my previous comment, these seem like more robust solutions.

@m-gupta
Copy link
Contributor Author

m-gupta commented May 19, 2018

Hi Tom,

Thanks for the suggestions. I have asked Matthias, who was working on this crash to look into them.

Nevertheless, __nostackprotector is needed in many places in Linux kernel and having a clang equivalent is going to be helpful.

@m-gupta
Copy link
Contributor Author

m-gupta commented May 21, 2018

Hi Tom,

Here is the detailed reply from our kernel dev, Alistair Strachan, who debugged this issue on Android. I hope this clarifies the need for this patch.

Verbatim msg:

Does gcc's stack protector code also clobber RCX or is this something that only happens with clang?

GCC's stack guard code does normally clobber RCX, however GCC does not emit a stack guard for functions that are marked "static inline" even if it is forced to create an out of line copy. The example here is something like:

static inline unsigned long native_save_fl(void)
{
// something very trivial and controllable that we know won't use a bazillion registers
return x;
}

static struct pv_irq_ops irq_ops = {
.save_fl = native_save_fl; // compiler must now emit out of line copy
}

Clang will always emit a stack guard for out of lined functions, regardless of the "static inline" annotation, so it will always clobber rcx in this case.

Would it work to use the no_caller_saved_registers attribute?

Yes (a stack guard is still generated). The kernel already has an equivalent / more efficient macro called PV_CALLEE_SAVE_REGS_THUNK(). Neither is a particularly good solution because both have to save many more registers than are actually used by the function, so they end up generating a lot of additional code.

As the pv_irq_ops are used in performance critical paths with assembler patches, neither would be an acceptable solution.

I've looked at this issue a little more closely and to me it seems like __attribute((no_stack_protector)) is just working around a bug in the kernel source.

You could definitely argue that making assumptions about how the compiler will inject stack guards or not, is a bug in the kernel. However, GCC does not inject a stack guard, and clang does, and that is the only reason this code does not work. The only way to get the necessary control otherwise would be to write all of these functions in assembler.

My understanding is that the problem here is callers of native_save_fl() expect it to preserve more registers than required by the standard x86 calling convention, however, nowhere is the information communicated to the compiler.

Sort of. It really just assumes %rcx won't be clobbered, because it's not expecting stack guard code to be generated, because GCC does not generate stack guard code.

Using __attribute((no_stack_protector)) doesn't actually fix the bug, it just prevents clang from emitting code that clobbers registers that the caller didn't expect(in this case RCX).

To me, that's the definition of fixing this bug.

There is nothing to guarantee that some other change to clang won't emit code that clobbers RCX.

These functions are supposed to all be written as "asm volatile", so the kernel programmer normally knows exactly which registers are clobbered or not. However, because the compiler generates the stack guard code any time a stack is allocated by a C function, and this unconditionally clobbers %rcx, the programmer cannot control this clobber with clang.

I think a better fix would be to use PV_CALLEE_SAVE_REGS_THUNK or maybe try the attribute I mentioned in my previous comment, these seem like more robust solutions.

See above. PV_CALLEE_SAVE_REGS_THUNK will generate code that is ~10x larger than the existing code.

@m-gupta
Copy link
Contributor Author

m-gupta commented May 21, 2018

Posting another note from Alistair:

This is based on on the observed behavior of -fstack-protector and -fstack-protector-strong, which are both applying a heuristic to decide whether to inject a stack guard. GCC will inject a stack guard if -fstack-protector-all is specified (which the kernel does not use).

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented May 21, 2018

Can you give us an idea what the callee code looks like? Rather than working around this by trying to turn off the parts of the compiler that happen to use rcx, if you want to implement a custom calling convention the right solution is probably to mark the function as attribute((naked)) and use inline assembly to implement the custom calling convention. However, there's too little information on this bug to be at all sure about that.

@tstellar
Copy link
Collaborator

Callee Code
Here are snippets of the callee code extracted into a single file.

@llvmbot
Copy link
Collaborator

llvmbot commented May 22, 2018

Hi,

I was able to boot into a paravirtualized Linux v4.14.42 (LTS) kernel with (strong) stack-protector enabled on Debian/testing AMD64. I used LLVM/Clang version 7-svn332830 packages from <apt.llvm.org> repositories and have added Clang's "no_stack_protector" attribute to the Linux sources and marked native_save_fl() accordingly.

For more details please see the thread "Clang patch stacks for LTS kernels (v4.4 and v4.9) and status update" on LKML.

I would like to see the patches in this bug-report also for LLVM/Clang v6.0.1.

Thanks to all involved people.

Regards,

  • Sedat -

[1] https://marc.info/?t=150344380900002&r=1&w=2
[2] https://marc.info/?l=linux-kernel&m=152699137621116&w=2
[3] https://marc.info/?l=linux-kernel&m=152699147621230&w=2

@tstellar
Copy link
Collaborator

Would it be possible to change the constraint in the inline assembly from =rm to =r? In my reduced test case that prevents clang from emitting the stack protector.

@nickdesaulniers
Copy link
Member

https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints

"'m': A memory operand is allowed, with any kind of address that the machine supports in general. Note that the letter used for the general memory constraint can be re-defined by a back end using the TARGET_MEM_CONSTRAINT macro."

Does that mean that without specifying m, memory operands are disallowed?

@tstellar
Copy link
Collaborator

https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints

"'m': A memory operand is allowed, with any kind of address that the machine
supports in general. Note that the letter used for the general memory
constraint can be re-defined by a back end using the TARGET_MEM_CONSTRAINT
macro."

Does that mean that without specifying m, memory operands are disallowed?

From what I understand, if you just supply 'r', then the compiler will only be allowed to pass a register value into the inline assembly string.

However, this is just a work-around to make the code work with clang 6.0.0. The kernel code is still wrong in my opinion, because it relies on implementation specific behavior to work correctly. Even gcc could decide to do something different and this code would break.

I think Richard's suggestion in comment #​17 is the best solution to actually fix this issue correctly without making performance worse.

@nickdesaulniers
Copy link
Member

Note that Tom's suggestion in commment #​22 is effectively reverting kernel commit :

commit ab94fcf528d127fcb490175512a8910f37e5b346
Author: H. Peter Anvin hpa@zytor.com
Date: Tue Aug 25 16:47:16 2009 -0700

x86: allow "=rm" in native_save_fl()

This is a partial revert of f1f029c7bfbf4ee1918b90a431ab823bed812504.

"=rm" is allowed in this context, because "pop" is explicitly defined
to adjust the stack pointer *before* it evaluates its effective
address, if it has one.  Thus, we do end up writing to the correct
address even if we use an on-stack memory argument.

The original reporter for f1f029c7bfbf4ee1918b90a431ab823bed812504 was
apparently using a broken x86 simulator.

[ Impact: performance ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gabe Black <spamforgabe@umich.edu>

I'm curious what [ Impact: performance ] means in detail.

@nickdesaulniers
Copy link
Member

Sedat, can you please test reverting f1f029c7bfbf4ee1918b90a431ab823bed812504 rather than your commits:

[PATCH 1/2] compiler-clang.h: Add __nostackprotector function attribute
[PATCH 2/2] x86/paravirt: Mark native_save_fl() with __nostackprotector attribute

@nickdesaulniers
Copy link
Member

Starting upstream thread: https://lkml.org/lkml/2018/5/23/1003

@llvmbot
Copy link
Collaborator

llvmbot commented May 24, 2018

I have reverted "x86: allow "=rm" in native_save_fl()" [1] and this boots a paravirtualized/strong-stackprotected Linux-kernel in QEMU.

Linux-kernel: 4.14.43
LLVM/Clang: 7~svn332830
Kernel-configs: CONFIG_PARAVIRT=y and CONFIG_CC_STACKPROTECTOR_STRONG=y

[1] https://git.kernel.org/linus/ab94fcf528d127fcb490175512a8910f37e5b346

@nickdesaulniers
Copy link
Member

A test case showing gcc generates no different disassembly for "=r" vs "=rm".
Disassembly of combinations of CONFIG_STACK_PROTECTOR=*, gcc vs clang, and "=r" vs "=rm", showing gcc generates no different disassembly for "=r" vs "=rm".

@nickdesaulniers
Copy link
Member

[RFC] kernel patch
[RFC] kernel patch

Please let me know if I'm missing anyone, or need to reword anything.

@nickdesaulniers
Copy link
Member

[RFC] kernel patch v2
Added wording from Matthias Kaehlcke.

@nickdesaulniers
Copy link
Member

[RFC] kernel patch v3
Remove jlama from commit message, as per ClangBuiltLinux/linux#16 (comment)

Add testing links.

@nickdesaulniers
Copy link
Member

Re: naked attribute in comment #​17, seems to be a few issues with it:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927
https://lkml.org/lkml/2018/5/25/636

@dwblaikie
Copy link
Collaborator

Re: naked attribute in comment #​17, seems to be a few issues with it:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927
https://lkml.org/lkml/2018/5/25/636

Looks like someone mentioned on the GCC Bug that the contents of the naked function must be purely inline asm. I expect the non-asm return statement could be causing problems.

@nickdesaulniers
Copy link
Member

Is it safe (not UB) to have something like:

attribute((naked))
unsigned long save_flags4(void) {
asm volatile("pushf; pop %rax;");
}

(I haven't done a ton of inline assembly programming; seeing a C function with a return type other than void and no return statement is kind of jarring).

I mean, looks good in godbolt (gcc 8 still generate ud2 though):
https://godbolt.org/g/f2hLs6

@dwblaikie
Copy link
Collaborator

Is it safe (not UB) to have something like:

attribute((naked))
unsigned long save_flags4(void) {
asm volatile("pushf; pop %rax;");
}

Why is that safe? It doesn't have a return instruction, it would seem. I'd expect you to need a 'ret' in the inline asm, potentially? And that certainly seems to fix the problem in GCC's code gen (it has a ud2 to make sure it's a hard crash when you run off the end of a function without a ret (LLVM does something similar at -O0 for code like: "int f() { }")).

(I haven't done a ton of inline assembly programming; seeing a C function
with a return type other than void and no return statement is kind of
jarring).

I mean, looks good in godbolt (gcc 8 still generate ud2 though):
https://godbolt.org/g/f2hLs6

"good" except that it executes a ud2. That's what I mean - seems like a possible/reasonable interpretation that the inline asm for a naked function should include the ret instruction itself.

@nickdesaulniers
Copy link
Member

Why is that safe? It doesn't have a return instruction, it would seem.

Oops.

I'd
expect you to need a 'ret' in the inline asm, potentially? And that
certainly seems to fix the problem in GCC's code gen (it has a ud2 to make
sure it's a hard crash when you run off the end of a function without a ret
(LLVM does something similar at -O0 for code like: "int f() { }")).

Did you verify this? Quick testing with godbolt shows that it now generates:

0000000000000000 <save_flags4>:
0: 9c pushfq
1: 58 pop %rax
2: c3 retq
3: 0f 0b ud2

which, uh, I guess works, but I wouldn't expect it to pass any form of code review.

@dwblaikie
Copy link
Collaborator

Why is that safe? It doesn't have a return instruction, it would seem.

Oops.

I'd
expect you to need a 'ret' in the inline asm, potentially? And that
certainly seems to fix the problem in GCC's code gen (it has a ud2 to make
sure it's a hard crash when you run off the end of a function without a ret
(LLVM does something similar at -O0 for code like: "int f() { }")).

Did you verify this? Quick testing with godbolt shows that it now generates:

0000000000000000 <save_flags4>:
0: 9c pushfq
1: 58 pop %rax
2: c3 retq
3: 0f 0b ud2

which, uh, I guess works, but I wouldn't expect it to pass any form of code
review.

I'm guessing that's the intended use/output from GCC, but don't know for sure. I assume it doesn't look at the assembly at all, and puts the ud2 in there unconditionally as a backstop.

That's potentially what LLVM does - if it can't prove that the function returns on all paths, puts a ud2 there - but it's quite possible that the program logic does ensure the program always returns through other paths that can't be readily proved by LLVM. It doesn't seem unreasonable to treat the inline asm as opaque/unanalyzed, though.

@tstellar
Copy link
Collaborator

Given that this is a kernel bug and there are several possible fixes that can be applied to the kernel, I don't think this is enough to justify stretching the release rules and accepting a new feature like this.

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

Tom, re comment #​37, I agree. I appreciate and am blown away by your help; you've really gone above and beyond to help us here. I owe you one.

Sedat, I've attached 3 patches (and a cover letter). Can you please help verify by testing them before I send them upstream? As soon as I get your green light, I'll add your tested by and send them upstream.

I still want to revisit upstreaming your patch that adds no_stack_protector to compiler-clang.h, but I don't want to combine it with this patch set.

@nickdesaulniers
Copy link
Member

Also, please let me know if I should reword anything or add anyone to the commit messages.

@nickdesaulniers
Copy link
Member

[RFC] kernel patch v5 0000-cover-letter.patch
fixed typo: extremly -> extremely

@nickdesaulniers
Copy link
Member

[RFC] kernel patch v5 0003-x86-paravirt-make-native_save_fl-extern-inline.patch
typo:

hueristic -> heuristic

changed Wording-mostly-by: tags to Debugged-by: tags.

@nickdesaulniers
Copy link
Member

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 1, 2018

Hi Nick,

1st, it's interesting to see what the root cause is - linux-kbuild system which is not inheriting/modifies x86/kbuild_cflags.

I had to slightly modify 0001 and 0002 against Linux v4.14.43 - the two other solutions were tested against the same version.
0003 was applicable without modifications.
Compiler: LLVM/Clang version 7~svn333018.

Not sure if a "YES, it boots." in QEMU/KVM and on bare metal is sufficient for you?

Do you need any logs and/or objdumps?
Shall I send you this in private or attach in [1]?

Some comments on the patch-series:
CC Linux/x86 folks or use "get_maintainer.pl" script.
Can you clarify in the patches Clang's default is C99 when '-std=gnu89' comaptibility is not forced (and results in different compiler behaviour)?

The most interesting for me is living of co-working between LLVM/Clang and Linux-kernel folks.

[1] ClangBuiltLinux/linux#16
[2] https://clang.llvm.org/compatibility.html
[3] https://clang.llvm.org/compatibility.html#inline

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

Not sure if a "YES, it boots." in QEMU/KVM and on bare metal is sufficient
for you?

Yep, thanks for verifying.

Do you need any logs and/or objdumps?
Shall I send you this in private or attach in [1]?

No. I can build+objdump. The disassembly looks good, but a boot test makes me more confident.

Can you clarify in the patches Clang's default is C99 when '-std=gnu89'
comaptibility is not forced (and results in different compiler behaviour)?

Depends on the version of Clang. Most versions you'd use today define STDC_VERSION to 201112 so looks like -std=c11 is the default.

The most interesting for me is living of co-working between LLVM/Clang and
Linux-kernel folks.

Me too. ;)

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 5, 2018

I have tested this new patchset against Linux v4.14.47 (with different llvmlinux-related extra patches I had in my testing before) and can boot into QEMU/KVM and on bare metal.

@nickdesaulniers
Copy link
Member

Fix in the upstream linux kernel with commits:

torvalds@d03db2b
torvalds@0e2e160
torvalds@d0a8d93

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@Quuxplusone Quuxplusone added the wontfix Issue is real, but we can't or won't fix it. Not invalid label Jan 20, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla wontfix Issue is real, but we can't or won't fix it. Not invalid
Projects
None yet
Development

No branches or pull requests

7 participants