Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ubsan][pgo] Pass to remove ubsan checks based on profile data #83471

Merged
merged 6 commits into from
Mar 7, 2024

Conversation

vitalybuka
Copy link
Collaborator

UBSAN checks can be too expensive to be used
in release binaries. However not all code affect
performace in the same way. Removing small
number of checks in hot code we can performance
loss, preserving most of the checks.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Vitaly Buka (vitalybuka)

Changes

UBSAN checks can be too expensive to be used
in release binaries. However not all code affect
performace in the same way. Removing small
number of checks in hot code we can performance
loss, preserving most of the checks.


Patch is 22.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83471.diff

6 Files Affected:

  • (added) llvm/include/llvm/Transforms/Instrumentation/RemoveTrapsPass.h (+32)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/Instrumentation/CMakeLists.txt (+1)
  • (added) llvm/lib/Transforms/Instrumentation/RemoveTrapsPass.cpp (+95)
  • (added) llvm/test/Transforms/RemoveTraps/remove-traps.ll (+397)
diff --git a/llvm/include/llvm/Transforms/Instrumentation/RemoveTrapsPass.h b/llvm/include/llvm/Transforms/Instrumentation/RemoveTrapsPass.h
new file mode 100644
index 00000000000000..58f6bbcec5dc9d
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Instrumentation/RemoveTrapsPass.h
@@ -0,0 +1,32 @@
+//===- RemoveTrapsPass.h ----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file provides the interface for the pass responsible for removing
+/// expensive ubsan checks.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_UBSANOPTIMIZATIONPASS_H
+#define LLVM_TRANSFORMS_INSTRUMENTATION_UBSANOPTIMIZATIONPASS_H
+
+#include "llvm/IR/Function.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Pass.h"
+
+namespace llvm {
+
+// This pass is responsible for removing optional traps, like llvm.ubsantrap
+// from the hot code.
+class RemoveTrapsPass : public PassInfoMixin<RemoveTrapsPass> {
+public:
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+};
+
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index f26d95ab1e479c..d6f7130dbfe396 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -175,6 +175,7 @@
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Instrumentation/PoisonChecking.h"
+#include "llvm/Transforms/Instrumentation/RemoveTrapsPass.h"
 #include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h"
 #include "llvm/Transforms/Instrumentation/SanitizerCoverage.h"
 #include "llvm/Transforms/Instrumentation/ThreadSanitizer.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 093c1f8aaad438..a7f51bbf41a613 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -421,6 +421,7 @@ FUNCTION_PASS("print<uniformity>", UniformityInfoPrinterPass(dbgs()))
 FUNCTION_PASS("reassociate", ReassociatePass())
 FUNCTION_PASS("redundant-dbg-inst-elim", RedundantDbgInstEliminationPass())
 FUNCTION_PASS("reg2mem", RegToMemPass())
+FUNCTION_PASS("remove-traps", RemoveTrapsPass())
 FUNCTION_PASS("safe-stack", SafeStackPass(TM))
 FUNCTION_PASS("scalarize-masked-mem-intrin", ScalarizeMaskedMemIntrinPass())
 FUNCTION_PASS("scalarizer", ScalarizerPass())
diff --git a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
index ee9aa73ff03403..b23a6ed1f08415 100644
--- a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -17,6 +17,7 @@ add_llvm_component_library(LLVMInstrumentation
   PGOInstrumentation.cpp
   PGOMemOPSizeOpt.cpp
   PoisonChecking.cpp
+  RemoveTrapsPass.cpp
   SanitizerCoverage.cpp
   SanitizerBinaryMetadata.cpp
   ValueProfileCollector.cpp
diff --git a/llvm/lib/Transforms/Instrumentation/RemoveTrapsPass.cpp b/llvm/lib/Transforms/Instrumentation/RemoveTrapsPass.cpp
new file mode 100644
index 00000000000000..7a7c604741e92c
--- /dev/null
+++ b/llvm/lib/Transforms/Instrumentation/RemoveTrapsPass.cpp
@@ -0,0 +1,95 @@
+//===- RemoveTrapsPass.cpp --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Instrumentation/RemoveTrapsPass.h"
+
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Intrinsics.h"
+#include <cstdint>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "remove-traps"
+
+static constexpr unsigned MaxRandomRate = 1000;
+
+static cl::opt<int> HotPercentileCutoff(
+    "remove-traps-percentile-cutoff-hot", cl::init(0),
+    cl::desc("Alternative hot percentile cuttoff. By default "
+             "`-profile-summary-cutoff-hot` is used."));
+static cl::opt<float> RandomRate(
+    "remove-traps-random-rate", cl::init(0.0),
+    cl::desc(
+        "Probability to use for pseudorandom unconditional checks removal."));
+
+STATISTIC(NumChecksTotal, "Number of checks");
+STATISTIC(NumChecksRemoved, "Number of removed checks");
+
+static SmallVector<IntrinsicInst *, 16>
+removeUbsanTraps(Function &F, FunctionAnalysisManager &FAM,
+                 ProfileSummaryInfo *PSI) {
+  SmallVector<IntrinsicInst *, 16> Remove;
+
+  if (F.isDeclaration())
+    return {};
+
+  auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(F);
+
+  int BBCounter = 0;
+  for (BasicBlock &BB : F) {
+    for (Instruction &I : BB) {
+      IntrinsicInst *II = dyn_cast<IntrinsicInst>(&I);
+      if (!II)
+        continue;
+      auto ID = II->getIntrinsicID();
+      if (ID != Intrinsic::ubsantrap)
+        continue;
+      ++NumChecksTotal;
+
+      bool IsHot = false;
+      if (PSI) {
+        uint64_t Count = 0;
+        for (const auto *PR : predecessors(&BB))
+          Count += BFI.getBlockProfileCount(PR).value_or(0);
+
+        IsHot = HotPercentileCutoff.getNumOccurrences()
+                    ? PSI->isHotCountNthPercentile(HotPercentileCutoff, Count)
+                    : PSI->isHotCount(Count);
+      }
+
+      if ((IsHot) || ((F.getGUID() + BBCounter++) % MaxRandomRate) <
+                         RandomRate * RandomRate) {
+        Remove.push_back(II);
+        ++NumChecksRemoved;
+      }
+    }
+  }
+  return Remove;
+}
+
+PreservedAnalyses RemoveTrapsPass::run(Function &F,
+                                       FunctionAnalysisManager &AM) {
+  if (F.isDeclaration())
+    return PreservedAnalyses::all();
+  auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
+  ProfileSummaryInfo *PSI =
+      MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
+
+  auto Remove = removeUbsanTraps(F, AM, PSI);
+  for (auto *I : Remove)
+    I->eraseFromParent();
+
+  return Remove.empty() ? PreservedAnalyses::all() : PreservedAnalyses::none();
+}
diff --git a/llvm/test/Transforms/RemoveTraps/remove-traps.ll b/llvm/test/Transforms/RemoveTraps/remove-traps.ll
new file mode 100644
index 00000000000000..af9cdab2f29204
--- /dev/null
+++ b/llvm/test/Transforms/RemoveTraps/remove-traps.ll
@@ -0,0 +1,397 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes='function(remove-traps)' -S | FileCheck %s --check-prefixes=NOPROFILE
+; RUN: opt < %s -passes='function(remove-traps)' -remove-traps-random-rate=999999 -S | FileCheck %s --check-prefixes=ALL
+; RUN: opt < %s -passes='require<profile-summary>,function(remove-traps)' -S | FileCheck %s --check-prefixes=HOT
+; RUN: opt < %s -passes='require<profile-summary>,function(remove-traps)' -remove-traps-percentile-cutoff-hot=700000 -S | FileCheck %s --check-prefixes=HOT70
+
+target triple = "x86_64-pc-linux-gnu"
+
+declare void @llvm.ubsantrap(i8 immarg)
+
+define dso_local noundef i32 @simple(ptr noundef readonly %0) {
+; NOPROFILE-LABEL: define dso_local noundef i32 @simple(
+; NOPROFILE-SAME: ptr noundef readonly [[TMP0:%.*]]) {
+; NOPROFILE-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; NOPROFILE-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; NOPROFILE:       3:
+; NOPROFILE-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; NOPROFILE-NEXT:    unreachable
+; NOPROFILE:       4:
+; NOPROFILE-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; NOPROFILE-NEXT:    ret i32 [[TMP5]]
+;
+; ALL-LABEL: define dso_local noundef i32 @simple(
+; ALL-SAME: ptr noundef readonly [[TMP0:%.*]]) {
+; ALL-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; ALL-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; ALL:       3:
+; ALL-NEXT:    unreachable
+; ALL:       4:
+; ALL-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; ALL-NEXT:    ret i32 [[TMP5]]
+;
+; HOT-LABEL: define dso_local noundef i32 @simple(
+; HOT-SAME: ptr noundef readonly [[TMP0:%.*]]) {
+; HOT-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT:       3:
+; HOT-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; HOT-NEXT:    unreachable
+; HOT:       4:
+; HOT-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT-NEXT:    ret i32 [[TMP5]]
+;
+; HOT70-LABEL: define dso_local noundef i32 @simple(
+; HOT70-SAME: ptr noundef readonly [[TMP0:%.*]]) {
+; HOT70-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT70-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT70:       3:
+; HOT70-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; HOT70-NEXT:    unreachable
+; HOT70:       4:
+; HOT70-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT70-NEXT:    ret i32 [[TMP5]]
+;
+  %2 = icmp eq ptr %0, null
+  br i1 %2, label %3, label %4
+
+3:
+  tail call void @llvm.ubsantrap(i8 22)
+  unreachable
+
+4:
+  %5 = load i32, ptr %0, align 4
+  ret i32 %5
+}
+
+
+define dso_local noundef i32 @hot(ptr noundef readonly %0) !prof !36 {
+; NOPROFILE-LABEL: define dso_local noundef i32 @hot(
+; NOPROFILE-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF16:![0-9]+]] {
+; NOPROFILE-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; NOPROFILE-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; NOPROFILE:       3:
+; NOPROFILE-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; NOPROFILE-NEXT:    unreachable
+; NOPROFILE:       4:
+; NOPROFILE-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; NOPROFILE-NEXT:    ret i32 [[TMP5]]
+;
+; ALL-LABEL: define dso_local noundef i32 @hot(
+; ALL-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF16:![0-9]+]] {
+; ALL-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; ALL-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; ALL:       3:
+; ALL-NEXT:    unreachable
+; ALL:       4:
+; ALL-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; ALL-NEXT:    ret i32 [[TMP5]]
+;
+; HOT-LABEL: define dso_local noundef i32 @hot(
+; HOT-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF16:![0-9]+]] {
+; HOT-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT:       3:
+; HOT-NEXT:    unreachable
+; HOT:       4:
+; HOT-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT-NEXT:    ret i32 [[TMP5]]
+;
+; HOT70-LABEL: define dso_local noundef i32 @hot(
+; HOT70-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF16:![0-9]+]] {
+; HOT70-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT70-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT70:       3:
+; HOT70-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; HOT70-NEXT:    unreachable
+; HOT70:       4:
+; HOT70-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT70-NEXT:    ret i32 [[TMP5]]
+;
+  %2 = icmp eq ptr %0, null
+  br i1 %2, label %3, label %4
+
+3:
+  tail call void @llvm.ubsantrap(i8 22)
+  unreachable
+
+4:
+  %5 = load i32, ptr %0, align 4
+  ret i32 %5
+}
+
+define dso_local noundef i32 @veryHot(ptr noundef readonly %0) !prof !39 {
+; NOPROFILE-LABEL: define dso_local noundef i32 @veryHot(
+; NOPROFILE-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF17:![0-9]+]] {
+; NOPROFILE-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; NOPROFILE-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; NOPROFILE:       3:
+; NOPROFILE-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; NOPROFILE-NEXT:    unreachable
+; NOPROFILE:       4:
+; NOPROFILE-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; NOPROFILE-NEXT:    ret i32 [[TMP5]]
+;
+; ALL-LABEL: define dso_local noundef i32 @veryHot(
+; ALL-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF17:![0-9]+]] {
+; ALL-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; ALL-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; ALL:       3:
+; ALL-NEXT:    unreachable
+; ALL:       4:
+; ALL-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; ALL-NEXT:    ret i32 [[TMP5]]
+;
+; HOT-LABEL: define dso_local noundef i32 @veryHot(
+; HOT-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF17:![0-9]+]] {
+; HOT-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT:       3:
+; HOT-NEXT:    unreachable
+; HOT:       4:
+; HOT-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT-NEXT:    ret i32 [[TMP5]]
+;
+; HOT70-LABEL: define dso_local noundef i32 @veryHot(
+; HOT70-SAME: ptr noundef readonly [[TMP0:%.*]]) !prof [[PROF17:![0-9]+]] {
+; HOT70-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; HOT70-NEXT:    br i1 [[TMP2]], label [[TMP3:%.*]], label [[TMP4:%.*]]
+; HOT70:       3:
+; HOT70-NEXT:    unreachable
+; HOT70:       4:
+; HOT70-NEXT:    [[TMP5:%.*]] = load i32, ptr [[TMP0]], align 4
+; HOT70-NEXT:    ret i32 [[TMP5]]
+;
+  %2 = icmp eq ptr %0, null
+  br i1 %2, label %3, label %4
+
+3:
+  tail call void @llvm.ubsantrap(i8 22)
+  unreachable
+
+4:
+  %5 = load i32, ptr %0, align 4
+  ret i32 %5
+}
+
+
+define dso_local noundef i32 @branchColdFnHot(i32 noundef %0, ptr noundef readonly %1) !prof !39 {
+; NOPROFILE-LABEL: define dso_local noundef i32 @branchColdFnHot(
+; NOPROFILE-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF17]] {
+; NOPROFILE-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; NOPROFILE-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF18:![0-9]+]]
+; NOPROFILE:       4:
+; NOPROFILE-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; NOPROFILE-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; NOPROFILE:       6:
+; NOPROFILE-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; NOPROFILE-NEXT:    unreachable
+; NOPROFILE:       7:
+; NOPROFILE-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; NOPROFILE-NEXT:    br label [[TMP9]]
+; NOPROFILE:       9:
+; NOPROFILE-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; NOPROFILE-NEXT:    ret i32 [[TMP10]]
+;
+; ALL-LABEL: define dso_local noundef i32 @branchColdFnHot(
+; ALL-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF17]] {
+; ALL-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; ALL-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF18:![0-9]+]]
+; ALL:       4:
+; ALL-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; ALL-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; ALL:       6:
+; ALL-NEXT:    unreachable
+; ALL:       7:
+; ALL-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; ALL-NEXT:    br label [[TMP9]]
+; ALL:       9:
+; ALL-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; ALL-NEXT:    ret i32 [[TMP10]]
+;
+; HOT-LABEL: define dso_local noundef i32 @branchColdFnHot(
+; HOT-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF17]] {
+; HOT-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; HOT-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF18:![0-9]+]]
+; HOT:       4:
+; HOT-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; HOT-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; HOT:       6:
+; HOT-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; HOT-NEXT:    unreachable
+; HOT:       7:
+; HOT-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; HOT-NEXT:    br label [[TMP9]]
+; HOT:       9:
+; HOT-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; HOT-NEXT:    ret i32 [[TMP10]]
+;
+; HOT70-LABEL: define dso_local noundef i32 @branchColdFnHot(
+; HOT70-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF17]] {
+; HOT70-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; HOT70-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF18:![0-9]+]]
+; HOT70:       4:
+; HOT70-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; HOT70-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; HOT70:       6:
+; HOT70-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; HOT70-NEXT:    unreachable
+; HOT70:       7:
+; HOT70-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; HOT70-NEXT:    br label [[TMP9]]
+; HOT70:       9:
+; HOT70-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; HOT70-NEXT:    ret i32 [[TMP10]]
+;
+  %3 = icmp eq i32 %0, 0
+  br i1 %3, label %9, label %4, !prof !38
+
+4:
+  %5 = icmp eq ptr %1, null
+  br i1 %5, label %6, label %7
+
+6:
+  tail call void @llvm.ubsantrap(i8 22) #2
+  unreachable
+
+7:
+  %8 = load i32, ptr %1, align 4
+  br label %9
+
+9:
+  %10 = phi i32 [ %8, %7 ], [ 0, %2 ]
+  ret i32 %10
+}
+
+define dso_local noundef i32 @branchHotFnCold(i32 noundef %0, ptr noundef readonly %1) !prof !36 {
+; NOPROFILE-LABEL: define dso_local noundef i32 @branchHotFnCold(
+; NOPROFILE-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF16]] {
+; NOPROFILE-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; NOPROFILE-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF19:![0-9]+]]
+; NOPROFILE:       4:
+; NOPROFILE-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; NOPROFILE-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; NOPROFILE:       6:
+; NOPROFILE-NEXT:    tail call void @llvm.ubsantrap(i8 22)
+; NOPROFILE-NEXT:    unreachable
+; NOPROFILE:       7:
+; NOPROFILE-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; NOPROFILE-NEXT:    br label [[TMP9]]
+; NOPROFILE:       9:
+; NOPROFILE-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; NOPROFILE-NEXT:    ret i32 [[TMP10]]
+;
+; ALL-LABEL: define dso_local noundef i32 @branchHotFnCold(
+; ALL-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF16]] {
+; ALL-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; ALL-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF19:![0-9]+]]
+; ALL:       4:
+; ALL-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; ALL-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; ALL:       6:
+; ALL-NEXT:    unreachable
+; ALL:       7:
+; ALL-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; ALL-NEXT:    br label [[TMP9]]
+; ALL:       9:
+; ALL-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; ALL-NEXT:    ret i32 [[TMP10]]
+;
+; HOT-LABEL: define dso_local noundef i32 @branchHotFnCold(
+; HOT-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF16]] {
+; HOT-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; HOT-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF19:![0-9]+]]
+; HOT:       4:
+; HOT-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; HOT-NEXT:    br i1 [[TMP5]], label [[TMP6:%.*]], label [[TMP7:%.*]]
+; HOT:       6:
+; HOT-NEXT:    unreachable
+; HOT:       7:
+; HOT-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP1]], align 4
+; HOT-NEXT:    br label [[TMP9]]
+; HOT:       9:
+; HOT-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP8]], [[TMP7]] ], [ 0, [[TMP2:%.*]] ]
+; HOT-NEXT:    ret i32 [[TMP10]]
+;
+; HOT70-LABEL: define dso_local noundef i32 @branchHotFnCold(
+; HOT70-SAME: i32 noundef [[TMP0:%.*]], ptr noundef readonly [[TMP1:%.*]]) !prof [[PROF16]] {
+; HOT70-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[TMP0]], 0
+; HOT70-NEXT:    br i1 [[TMP3]], label [[TMP9:%.*]], label [[TMP4:%.*]], !prof [[PROF19:![0-9]+]]
+; HOT70:       4:
+; HOT70-NEXT:    [[TMP5:%.*]] = icmp eq ptr [[TMP1]], null
+; HOT70-NEXT:    br i1 [...
[truncated]

Copy link

github-actions bot commented Mar 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

UBSAN checks can be too expensive to be used
in release binaries. However not all code affect
performace in the same way. Removing small
number of checks in hot code we can performance
loss, preserving most of the checks.
@vitalybuka vitalybuka merged commit a6a6fca into llvm:main Mar 7, 2024
4 checks passed
vitalybuka added a commit that referenced this pull request Mar 11, 2024
With #83471 it reduces UBSAN overhead from 44% to 6%.
Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks"
with PGO build.

On real large server binary we see 95% of code is still instrumented,
with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only
with subset of UBSAN, so base overhead is smaller.

We have followup patches to improve it even further.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants