Skip to content

Conversation

@kees
Copy link
Contributor

@kees kees commented Oct 24, 2025

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These should represent worst-case expansion to ensure correct branch range calculations and code layout.

Update the Size field for each ARM sub-architecture:

  • ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
  • Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
  • Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target register is r12, requiring r3 to be used as scratch and spilled/restored. The Thumb1 size was overestimated and has been corrected to the actual worst-case of 19 instructions.

cc @nathanchance

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These
should represent worst-case expansion to ensure correct branch range
calculations and code layout.

Update the Size field for each ARM sub-architecture:

- ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
- Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
- Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target
register is r12, requiring r3 to be used as scratch and spilled/restored.
The Thumb1 size was overestimated and has been corrected to the actual
worst-case of 19 instructions.
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2025

@llvm/pr-subscribers-backend-arm

Author: Kees Cook (kees)

Changes

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These should represent worst-case expansion to ensure correct branch range calculations and code layout.

Update the Size field for each ARM sub-architecture:

  • ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
  • Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
  • Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target register is r12, requiring r3 to be used as scratch and spilled/restored. The Thumb1 size was overestimated and has been corrected to the actual worst-case of 19 instructions.

cc @nathanchance


Full diff: https://github.com/llvm/llvm-project/pull/164917.diff

1 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+5-4)
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 53be1677b96e3..ce79029b17d12 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -6546,7 +6546,8 @@ def KCFI_CHECK_ARM
     : PseudoInst<(outs), (ins GPR:$ptr, i32imm:$type), NoItinerary, []>,
       Sched<[]>,
       Requires<[IsARM]> {
-  let Size = 28; // 7 instructions (bic, ldr, 4x eor, beq, udf)
+  let Size = 40; // worst-case 10 instructions @ 4 bytes each
+                 // (push, bic, ldr, 4x eor, pop, beq, udf)
 }
 
 def KCFI_CHECK_Thumb2
@@ -6554,15 +6555,15 @@ def KCFI_CHECK_Thumb2
       Sched<[]>,
       Requires<[IsThumb2]> {
   let Size =
-      32; // worst-case 9 instructions (push, bic, ldr, 4x eor, pop, beq.w, udf)
+      34; // worst-case (push.w[2], bic[4], ldr[4], 4x eor[16], pop.w[2], beq.w[4], udf[2])
 }
 
 def KCFI_CHECK_Thumb1
     : PseudoInst<(outs), (ins GPR:$ptr, i32imm:$type), NoItinerary, []>,
       Sched<[]>,
       Requires<[IsThumb1Only]> {
-  let Size = 50; // worst-case 25 instructions (pushes, bic helper, type
-                 // building, cmp, pops)
+  let Size = 38; // worst-case 19 instructions @ 2 bytes each
+                 // (2x push, 3x bic-helper, subs+ldr, 13x type-building, cmp, 2x pop, beq, bkpt)
 }
 
 //===----------------------------------------------------------------------===//

@nathanchance
Copy link
Member

Thanks, this survives allmodconfig with and without CONFIG_THUMB2_KERNEL

@kees kees merged commit c9a45d3 into llvm:main Oct 25, 2025
10 checks passed
@kees kees deleted the arm-kcfi branch October 25, 2025 03:27
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 25, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/20148

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x597b8e5ccd00 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x597b8e5ccd00 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 25, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/13441

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[       OK ] AddressSanitizer.AtoiAndFriendsOOBTest (2290 ms)
[ RUN      ] AddressSanitizer.HasFeatureAddressSanitizerTest
[       OK ] AddressSanitizer.HasFeatureAddressSanitizerTest (0 ms)
[ RUN      ] AddressSanitizer.CallocReturnsZeroMem
[       OK ] AddressSanitizer.CallocReturnsZeroMem (10 ms)
[ DISABLED ] AddressSanitizer.DISABLED_TSDTest
[ RUN      ] AddressSanitizer.IgnoreTest
[       OK ] AddressSanitizer.IgnoreTest (0 ms)
[ RUN      ] AddressSanitizer.SignalTest
[       OK ] AddressSanitizer.SignalTest (207 ms)
[ RUN      ] AddressSanitizer.ReallocTest
[       OK ] AddressSanitizer.ReallocTest (53 ms)
[ RUN      ] AddressSanitizer.WrongFreeTest
[       OK ] AddressSanitizer.WrongFreeTest (112 ms)
[ RUN      ] AddressSanitizer.LongJmpTest
[       OK ] AddressSanitizer.LongJmpTest (0 ms)
[ RUN      ] AddressSanitizer.ThreadStackReuseTest
[       OK ] AddressSanitizer.ThreadStackReuseTest (6 ms)
[ DISABLED ] AddressSanitizer.DISABLED_MemIntrinsicUnalignedAccessTest
[ DISABLED ] AddressSanitizer.DISABLED_LargeFunctionSymbolizeTest
[ DISABLED ] AddressSanitizer.DISABLED_MallocFreeUnwindAndSymbolizeTest
[ RUN      ] AddressSanitizer.UseThenFreeThenUseTest
[       OK ] AddressSanitizer.UseThenFreeThenUseTest (144 ms)
[ RUN      ] AddressSanitizer.FileNameInGlobalReportTest
[       OK ] AddressSanitizer.FileNameInGlobalReportTest (118 ms)
[ DISABLED ] AddressSanitizer.DISABLED_StressStackReuseAndExceptionsTest
[ RUN      ] AddressSanitizer.MlockTest
[       OK ] AddressSanitizer.MlockTest (0 ms)
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadedTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowIn
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowLeft
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowRight
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFHigh
[ DISABLED ] AddressSanitizer.DISABLED_DemoOOM
[ DISABLED ] AddressSanitizer.DISABLED_DemoDoubleFreeTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoNullDerefTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoFunctionStaticTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoTooMuchMemoryTest
[ RUN      ] AddressSanitizer.LongDoubleNegativeTest
[       OK ] AddressSanitizer.LongDoubleNegativeTest (0 ms)
[----------] 19 tests from AddressSanitizer (28020 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 2 test suites ran. (28025 ms total)
[  PASSED  ] 22 tests.

  YOU HAVE 1 DISABLED TEST

Step 34 (run instrumented asan tests [aarch64/bluejay-userdebug/TQ3A.230805.001]) failure: run instrumented asan tests [aarch64/bluejay-userdebug/TQ3A.230805.001] (failure)
...
[ RUN      ] AddressSanitizer.HasFeatureAddressSanitizerTest
[       OK ] AddressSanitizer.HasFeatureAddressSanitizerTest (0 ms)
[ RUN      ] AddressSanitizer.CallocReturnsZeroMem
[       OK ] AddressSanitizer.CallocReturnsZeroMem (10 ms)
[ DISABLED ] AddressSanitizer.DISABLED_TSDTest
[ RUN      ] AddressSanitizer.IgnoreTest
[       OK ] AddressSanitizer.IgnoreTest (0 ms)
[ RUN      ] AddressSanitizer.SignalTest
[       OK ] AddressSanitizer.SignalTest (207 ms)
[ RUN      ] AddressSanitizer.ReallocTest
[       OK ] AddressSanitizer.ReallocTest (53 ms)
[ RUN      ] AddressSanitizer.WrongFreeTest
[       OK ] AddressSanitizer.WrongFreeTest (112 ms)
[ RUN      ] AddressSanitizer.LongJmpTest
[       OK ] AddressSanitizer.LongJmpTest (0 ms)
[ RUN      ] AddressSanitizer.ThreadStackReuseTest
[       OK ] AddressSanitizer.ThreadStackReuseTest (6 ms)
[ DISABLED ] AddressSanitizer.DISABLED_MemIntrinsicUnalignedAccessTest
[ DISABLED ] AddressSanitizer.DISABLED_LargeFunctionSymbolizeTest
[ DISABLED ] AddressSanitizer.DISABLED_MallocFreeUnwindAndSymbolizeTest
[ RUN      ] AddressSanitizer.UseThenFreeThenUseTest
[       OK ] AddressSanitizer.UseThenFreeThenUseTest (144 ms)
[ RUN      ] AddressSanitizer.FileNameInGlobalReportTest
[       OK ] AddressSanitizer.FileNameInGlobalReportTest (118 ms)
[ DISABLED ] AddressSanitizer.DISABLED_StressStackReuseAndExceptionsTest
[ RUN      ] AddressSanitizer.MlockTest
[       OK ] AddressSanitizer.MlockTest (0 ms)
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadedTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowIn
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowLeft
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowRight
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFHigh
[ DISABLED ] AddressSanitizer.DISABLED_DemoOOM
[ DISABLED ] AddressSanitizer.DISABLED_DemoDoubleFreeTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoNullDerefTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoFunctionStaticTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoTooMuchMemoryTest
[ RUN      ] AddressSanitizer.LongDoubleNegativeTest
[       OK ] AddressSanitizer.LongDoubleNegativeTest (0 ms)
[----------] 19 tests from AddressSanitizer (28020 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 2 test suites ran. (28025 ms total)
[  PASSED  ] 22 tests.

  YOU HAVE 1 DISABLED TEST
program finished with exit code 0
elapsedTime=2330.827130

dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…4917)

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These
should represent worst-case expansion to ensure correct branch range
calculations and code layout.

Update the Size field for each ARM sub-architecture:

- ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
- Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
- Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target
register is r12, requiring r3 to be used as scratch and
spilled/restored. The Thumb1 size was overestimated and has been
corrected to the actual worst-case of 19 instructions.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…4917)

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These
should represent worst-case expansion to ensure correct branch range
calculations and code layout.

Update the Size field for each ARM sub-architecture:

- ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
- Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
- Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target
register is r12, requiring r3 to be used as scratch and
spilled/restored. The Thumb1 size was overestimated and has been
corrected to the actual worst-case of 19 instructions.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…4917)

The KCFI_CHECK pseudo-instruction size for ARM got miscalculated. These
should represent worst-case expansion to ensure correct branch range
calculations and code layout.

Update the Size field for each ARM sub-architecture:

- ARM: 28 → 40 bytes (10 instructions @ 4 bytes when r3 spill needed)
- Thumb2: 32 → 34 bytes (mixed 16/32-bit instructions with r3 spill)
- Thumb1: 50 → 38 bytes (19 instructions @ 2 bytes with r2+r3 spills)

The ARM and Thumb2 sizes were underestimating the case where the target
register is r12, requiring r3 to be used as scratch and
spilled/restored. The Thumb1 size was overestimated and has been
corrected to the actual worst-case of 19 instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants