[AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code #86450

AngryLoki · 2024-03-24T20:07:54Z

In Gentoo 23.0 (upcoming) and hardened profile -fcf-protection=full is added automatically via /etc/clang/x86_64-pc-linux-gnu-clang++.cfg (as well as other flags). However this flag does not work well with heterogeneous hip code:

cd /tmp && wget https://raw.githubusercontent.com/ROCm-Developer-Tools/HIP-CPU/master/examples/vadd_hip/vadd_hip.cpp

# -fcf-protection=full is added manually for demonstration
/usr/lib/llvm/18/bin/clang++ --offload-arch=native -x hip vadd_hip.cpp -o vadd_hip \
-fno-stack-protector --hip-link -fcf-protection=full -nogpulib

error: option 'cf-protection=return' cannot be specified on this target

Although it is possible to use -fcf-protection=full -Xarch_device -fcf-protection=none to override this, it is irritating, as it can not be added to all files, as it produces warning: argument unused during compilation: '-Xarch_device -fcf-protection=none' for non-hip files.

In #70799 you added code to "not emit the stack protector metadata on unsupported architectures". Can you do the same for -fcf-protection=..., to apply CET only for host code? Thanks!

The text was updated successfully, but these errors were encountered:

AngryLoki · 2024-03-24T20:10:05Z

CC @jhuber6 as an author of stack-protector PR.

llvmbot · 2024-03-24T20:19:51Z

@llvm/issue-subscribers-backend-amdgpu

Author: None (AngryLoki)

In Gentoo 23.0 (upcoming) and hardened profile `-fcf-protection=full` is added automatically via `/etc/clang/x86_64-pc-linux-gnu-clang++.cfg` (as well as [other flags](https://wiki.gentoo.org/wiki/Hardened/Toolchain#Changes)). However this flag does not work well with heterogeneous hip code:

cd /tmp &amp;&amp; wget https://raw.githubusercontent.com/ROCm-Developer-Tools/HIP-CPU/master/examples/vadd_hip/vadd_hip.cpp

# -fcf-protection=full is added manually for demonstration
/usr/lib/llvm/18/bin/clang++ --offload-arch=native -x hip vadd_hip.cpp -o vadd_hip -fno-stack-protector --hip-link -fcf-protection=full -nogpulib

error: option 'cf-protection=return' cannot be specified on this target

Although it is possible to use -fcf-protection=full -Xarch_device -fcf-protection=none, to override this, but it is very irritating and it can not be added to all files, as it produces warning: argument unused during compilation: '-Xarch_device -fcf-protection=none' for non-hip files.

In #70799 you added code to "not emit the stack protector metadata on unsupported architectures". Can you do the same for -fcf-protection=..., to apply CET only for host code? Thanks!

jhuber6 · 2024-03-24T20:22:32Z

The stack protector stuff was kind of hacky, I remember some similar case that @yxsamliu and @MaskRay looked into but forget which flag it was.

AngryLoki · 2024-03-24T20:54:40Z

Huh, actually, after submitting this bug, I remembered flag -Xarch_host. So I reported additional bug to Gentoo to consider rewriting /etc/clang/gentoo-hardened.cfg with:

-Xarch_host -fstack-clash-protection
-Xarch_host -fstack-protector-strong
-Xarch_host -fPIE
-include "/usr/include/gentoo/fortify.h"
-Xarch_host -fcf-protection=full

However I still think the best solution is to solve this in Clang. As far as I know, there are already many flags that are not passed to GPU codegen (even turning -O2 into -O3) and -fcf-protection is just one of them.

(edit: fix copy&paste mistake)

jhuber6 · 2024-03-24T20:58:06Z

Huh, actually, after submitting this bug, I remembered flag -Xarch_device. So I reported additional bug to Gentoo to consider rewriting /etc/clang/gentoo-hardened.cfg with:
-Xarch_device -fstack-clash-protection
-Xarch_device -fstack-protector-strong
-Xarch_device -fPIE
-include "/usr/include/gentoo/fortify.h"
-Xarch_device -fcf-protection=full

I'm assuming you mean -Xarch_host? That will apply all of those to the device build.

However I still think the best solution is to solve this in Clang. As far as I know, there are already many flags that are not passed to GPU codegen (even turning -O2 into -O3) and -fcf-protection is just one of them.

Currently we kind of just treat this on a case-by-case basis. There's a lot of pain that come from these single source languages where we try to mash two separate targets into a single compiler invocation. Same problem with all of our numerous hacks around the glibc headers when included from the GPU.

AngryLoki · 2024-03-24T21:08:57Z

I'm assuming you mean -Xarch_host?

Yes, copy&paste mistake.

oidn needs llvm/llvm-project#86450 (comment) to be applied

…ags to fix GPU compilation Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777 Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Related upstream bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>

Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777. Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> Closes: #35926 Signed-off-by: Michał Górny <mgorny@gentoo.org>

stalkerg · 2024-04-07T15:35:54Z

The same issue with dev-libs/libclc package.

stalkerg · 2024-04-09T09:57:57Z

https://bugs.gentoo.org/928961

jhuber6 · 2024-04-09T20:47:36Z

These things are permanent issues caused by pretending that two separate compilations for two separate architectures is a single compiler invocation. The easiest solution would be the following that simply prevents the option from being forwarded to the offloading device, essentially making it behave like -Xarch_host was always added.

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 766a9b91e3c0..0d19c67778e0 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6866,9 +6866,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   if (Args.hasArg(options::OPT_nogpulib))
     CmdArgs.push_back("-nogpulib");
 
-  if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-    CmdArgs.push_back(
-        Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+  if (JA.getOffloadingDeviceKind() == Action::OFK_None) {
+    if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
+      CmdArgs.push_back(
+          Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+    }
   }
 
   if (Arg *A = Args.getLastArg(options::OPT_mfunction_return_EQ))

However, this would prevent users from enabling it intentionally or overriding that behavior.

yxsamliu · 2024-04-11T15:00:01Z

These things are permanent issues caused by pretending that two separate compilations for two separate architectures is a single compiler invocation. The easiest solution would be the following that simply prevents the option from being forwarded to the offloading device, essentially making it behave like -Xarch_host was always added.

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 766a9b91e3c0..0d19c67778e0 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6866,9 +6866,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   if (Args.hasArg(options::OPT_nogpulib))
     CmdArgs.push_back("-nogpulib");
 
-  if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
-    CmdArgs.push_back(
-        Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+  if (JA.getOffloadingDeviceKind() == Action::OFK_None) {
+    if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
+      CmdArgs.push_back(
+          Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
+    }
   }
 
   if (Arg *A = Args.getLastArg(options::OPT_mfunction_return_EQ))

However, this would prevent users from enabling it intentionally or overriding that behavior.

It seems currently we do not have a better way to handle this. If a target starts to support these options we could re-enable them for that target.

Summary: This patch prevents the `-fcf-protection=` flag from being passed to the device compilation during offloading. This is not supported on CUDA and AMD devices, but if the user is compiling with fcf protection for the host it will fail to compile. We have a lot of these cases with various hacked together solutions, it would be nice to have a single solution to detect from the driver if a feature like this can be used for offloading, but for now this should resolve the issue. Fixe: llvm#86450

arsenm · 2024-04-12T12:55:23Z

It seems currently we do not have a better way to handle this. If a target starts to support these options we could re-enable them for that target.

We could just implement this, but it's probably not what anyone wants by default. I think we should interpret the flag as host-only, but I think it would be good to allow enabling it specifically for the device

github-actions bot added the new issue label Mar 24, 2024

AngryLoki changed the title ~~[AMDGPU] Do not emit -fcf-protection=full on GPU architectures~~ [AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code Mar 24, 2024

AngryLoki mentioned this issue Mar 24, 2024

media-gfx/blender: Bump Blender ebuilds and dependencies gentoo/gentoo#34869

Closed

EugeneZelenko added backend:AMDGPU and removed new issue labels Mar 24, 2024

jhuber6 assigned MaskRay, yxsamliu, jhuber6 and arsenm Mar 24, 2024

yretenai added a commit to yretenai/neptune-overlay that referenced this issue Mar 25, 2024

use osl 1.13 (on cg repo) for blender, fix oidn dying

7952930

oidn needs llvm/llvm-project#86450 (comment) to be applied

AngryLoki mentioned this issue Mar 26, 2024

sys-devel/clang-common: add -Xarch_host to CET and stack-protector flags to fix GPU compilation gentoo/gentoo#35926

Closed

jhuber6 linked a pull request Apr 11, 2024 that will close this issue

[Offload] Do not pass -fcf-protection= for offloading #88402

Open

AngryLoki mentioned this issue May 20, 2024

How to insert cxx flag '-fno-stack-protector' for clang when using rtc? ROCm/clr#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code #86450

[AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code #86450

AngryLoki commented Mar 24, 2024 •

edited

Loading

AngryLoki commented Mar 24, 2024

llvmbot commented Mar 24, 2024

jhuber6 commented Mar 24, 2024

AngryLoki commented Mar 24, 2024 •

edited

Loading

jhuber6 commented Mar 24, 2024

AngryLoki commented Mar 24, 2024

stalkerg commented Apr 7, 2024

stalkerg commented Apr 9, 2024

jhuber6 commented Apr 9, 2024

yxsamliu commented Apr 11, 2024

arsenm commented Apr 12, 2024

[AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code #86450

[AMDGPU] -fcf-protection=full should not be applied to GPU targets in heterogenous code #86450

Comments

AngryLoki commented Mar 24, 2024 • edited Loading

AngryLoki commented Mar 24, 2024

llvmbot commented Mar 24, 2024

jhuber6 commented Mar 24, 2024

AngryLoki commented Mar 24, 2024 • edited Loading

jhuber6 commented Mar 24, 2024

AngryLoki commented Mar 24, 2024

stalkerg commented Apr 7, 2024

stalkerg commented Apr 9, 2024

jhuber6 commented Apr 9, 2024

yxsamliu commented Apr 11, 2024

arsenm commented Apr 12, 2024

AngryLoki commented Mar 24, 2024 •

edited

Loading

AngryLoki commented Mar 24, 2024 •

edited

Loading