Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added feature in llvm-profdata merge to filter functions from the profile #78378

Merged
merged 3 commits into from
Jan 23, 2024

Conversation

huangjd
Copy link
Contributor

@huangjd huangjd commented Jan 17, 2024

--function=<regex> Include functions matching regex in the output
--no-function=<regex> Exclude functions matching regex from the output

If both are specified, --no-function has a higher precedence if a function name matches both filters

profile

--function=<regex> Include functions matching regex in the output
--no-function=<regex> Exclude functions matching regex from the output

If both are specified, --no-function has a higher precedence if a
function name matches both regexes
@huangjd huangjd added the PGO Profile Guided Optimizations label Jan 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-pgo

Author: William Junda Huang (huangjd)

Changes

--function=&lt;regex&gt; Include functions matching regex in the output
--no-function=&lt;regex&gt; Exclude functions matching regex from the output

If both are specified, --no-function has a higher precedence if a function name matches both filters


Full diff: https://github.com/llvm/llvm-project/pull/78378.diff

3 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/SampleProf.h (+2)
  • (added) llvm/test/tools/llvm-profdata/merge-filter.test (+62)
  • (modified) llvm/tools/llvm-profdata/llvm-profdata.cpp (+53-3)
diff --git a/llvm/include/llvm/ProfileData/SampleProf.h b/llvm/include/llvm/ProfileData/SampleProf.h
index 66aaf602d0e1d9..8ac84d4b933f20 100644
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@@ -1330,6 +1330,8 @@ class SampleProfileMap
   }
 
   size_t erase(const key_type &Key) { return base_type::erase(Key); }
+
+  iterator erase(iterator It) { return base_type::erase(It); }
 };
 
 using NameFunctionSamples = std::pair<hash_code, const FunctionSamples *>;
diff --git a/llvm/test/tools/llvm-profdata/merge-filter.test b/llvm/test/tools/llvm-profdata/merge-filter.test
new file mode 100644
index 00000000000000..8b24cbd6d54110
--- /dev/null
+++ b/llvm/test/tools/llvm-profdata/merge-filter.test
@@ -0,0 +1,62 @@
+Test llvm-profdata merge with function filters.
+
+RUN: llvm-profdata merge --sample %p/Inputs/sample-profile.proftext --text --function="_Z3.*" | FileCheck %s --check-prefix=CHECK-FILTER1
+RUN: llvm-profdata merge --sample %p/Inputs/sample-profile.proftext --text --no-function="main" | FileCheck %s --check-prefix=CHECK-FILTER1
+CHECK-FILTER1: _Z3bari:20301:1437
+CHECK-NEXT:  1: 1437
+CHECK-NEXT: _Z3fooi:7711:610
+CHECK-NEXT:  1: 610
+CHECK-NOT: main
+
+RUN: llvm-profdata merge --sample %p/Inputs/sample-profile.proftext --text --function="_Z3.*" --no-function="fooi$" | FileCheck %s --check-prefix=CHECK-FILTER2
+CHECK-FILTER2: _Z3bari:20301:1437
+CHECK-NEXT:  1: 1437
+CHECK-NOT: main
+CHECK-NOT: _Z3fooi
+
+RUN: llvm-profdata merge --instr %p/Inputs/basic.proftext --text --function="foo" | FileCheck %s --check-prefix=CHECK-FILTER3
+RUN: llvm-profdata merge --instr %p/Inputs/basic.proftext --text --no-function="main" | FileCheck %s --check-prefix=CHECK-FILTER3
+CHECK-FILTER3: foo
+CHECK-NEXT: # Func Hash:
+CHECK-NEXT: 10
+CHECK-NEXT: # Num Counters:
+CHECK-NEXT: 2
+CHECK-NEXT: # Counter Values:
+CHECK-NEXT: 499500
+CHECK-NEXT: 179900
+CHECK-NEXT: 
+CHECK-NEXT: foo2
+CHECK-NEXT: # Func Hash:
+CHECK-NEXT: 10
+CHECK-NEXT: # Num Counters:
+CHECK-NEXT: 2
+CHECK-NEXT: # Counter Values:
+CHECK-NEXT: 500500
+CHECK-NEXT: 180100
+
+RUN: llvm-profdata merge --instr %p/Inputs/basic.proftext --text --function="foo" --no-function="^foo$" | FileCheck %s --check-prefix=CHECK-FILTER4
+CHECK-FILTER4: foo2
+CHECK-NEXT: # Func Hash:
+CHECK-NEXT: 10
+CHECK-NEXT: # Num Counters:
+CHECK-NEXT: 2
+CHECK-NEXT: # Counter Values:
+CHECK-NEXT: 500500
+CHECK-NEXT: 180100
+
+RUN: llvm-profdata merge --sample %p/Inputs/cs-sample.proftext --text --function="main.*@.*_Z5funcBi" | FileCheck %s --check-prefix=CHECK-FILTER5
+CHECK-FILTER5: [main:3.1 @ _Z5funcBi:1 @ _Z8funcLeafi]:500853:20
+CHECK-NEXT:  0: 15
+CHECK-NEXT:  1: 15
+CHECK-NEXT:  3: 74946
+CHECK-NEXT:  4: 74941 _Z3fibi:82359
+CHECK-NEXT:  10: 23324
+CHECK-NEXT:  11: 23327 _Z3fibi:25228
+CHECK-NEXT:  15: 11
+CHECK-NEXT:  !Attributes: 1
+CHECK-NEXT: [main:3.1 @ _Z5funcBi]:120:19
+CHECK-NEXT:  0: 19
+CHECK-NEXT:  1: 19 _Z8funcLeafi:20
+CHECK-NEXT:  3: 12
+CHECK-NEXT:  !Attributes: 1
+
diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp
index 12b81d411cfa91..4c7bf90adb9226 100644
--- a/llvm/tools/llvm-profdata/llvm-profdata.cpp
+++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp
@@ -35,6 +35,7 @@
 #include "llvm/Support/MD5.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Support/Regex.h"
 #include "llvm/Support/ThreadPool.h"
 #include "llvm/Support/Threading.h"
 #include "llvm/Support/VirtualFileSystem.h"
@@ -132,9 +133,11 @@ cl::opt<std::string>
                    cl::sub(MergeSubcommand));
 cl::opt<std::string> FuncNameFilter(
     "function",
-    cl::desc("Details for matching functions. For overlapping CSSPGO, this "
-             "takes a function name with calling context."),
-    cl::sub(ShowSubcommand), cl::sub(OverlapSubcommand));
+    cl::desc("Only functions matching the filter are shown in the output. For "
+             "overlapping CSSPGO, this takes a function name with calling "
+             "context."),
+    cl::sub(ShowSubcommand), cl::sub(OverlapSubcommand),
+    cl::sub(MergeSubcommand));
 
 // TODO: Consider creating a template class (e.g., MergeOption, ShowOption) to
 // factor out the common cl::sub in cl::opt constructor for subcommand-specific
@@ -244,6 +247,10 @@ cl::opt<uint64_t> TemporalProfMaxTraceLength(
     cl::sub(MergeSubcommand),
     cl::desc("The maximum length of a single temporal profile trace "
              "(default: 10000)"));
+cl::opt<std::string> FuncNameNegativeFilter(
+    "no-function", cl::init(""),
+    cl::sub(MergeSubcommand),
+    cl::desc("Exclude functions matching the filter from the output."));
 
 cl::opt<FailureMode>
     FailMode("failure-mode", cl::init(failIfAnyAreInvalid),
@@ -760,6 +767,45 @@ static void mergeWriterContexts(WriterContext *Dst, WriterContext *Src) {
   });
 }
 
+static StringRef
+getFuncName(const StringMap<InstrProfWriter::ProfilingData>::value_type &Val) {
+  return Val.first();
+}
+
+static std::string
+getFuncName(const SampleProfileMap::value_type &Val) {
+  return Val.second.getContext().toString();
+}
+
+template <typename T>
+static void filterFunctions(T &ProfileMap) {
+  bool hasFilter = !FuncNameFilter.empty();
+  bool hasNegativeFilter = !FuncNameNegativeFilter.empty();
+  if (hasFilter || hasNegativeFilter) {
+    size_t Count = ProfileMap.size();
+
+    llvm::Regex Pattern(FuncNameFilter);
+    llvm::Regex NegativePattern(FuncNameNegativeFilter);
+    std::string Error;
+    if (hasFilter && !Pattern.isValid(Error))
+      exitWithError(Error);
+    if (hasNegativeFilter && !NegativePattern.isValid(Error))
+      exitWithError(Error);
+
+    for (auto I = ProfileMap.begin(); I != ProfileMap.end();) {
+      auto Tmp = I++;
+      const auto &FuncName = getFuncName(*Tmp);
+      // Negative filter has higher precedence than positive filter.
+      if ((hasNegativeFilter && NegativePattern.match(FuncName)) ||
+          (hasFilter && !Pattern.match(FuncName)))
+        ProfileMap.erase(Tmp);
+    }
+
+    llvm::dbgs() << Count - ProfileMap.size() << " of " << Count << " functions"
+                 << " in the original profile are filtered.\n";
+  }
+}
+
 static void writeInstrProfile(StringRef OutputFilename,
                               ProfileFormat OutputFormat,
                               InstrProfWriter &Writer) {
@@ -879,6 +925,8 @@ static void mergeInstrProfile(const WeightedFileVector &Inputs,
       (NumErrors > 0 && FailMode == failIfAnyAreInvalid))
     exitWithError("no profile can be merged");
 
+  filterFunctions(Contexts[0]->Writer.getProfileData());
+
   writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer);
 }
 
@@ -1459,6 +1507,8 @@ static void mergeSampleProfile(const WeightedFileVector &Inputs,
     ProfileIsCS = FunctionSamples::ProfileIsCS = false;
   }
 
+  filterFunctions(ProfileMap);
+
   auto WriterOrErr =
       SampleProfileWriter::create(OutputFilename, FormatMap[OutputFormat]);
   if (std::error_code EC = WriterOrErr.getError())

Copy link

github-actions bot commented Jan 17, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff a01b58aef0e42fb1b52e358adf4c56678a884d37 979e27ab6f4e23522c40a8bdbf5ccda48b2a3067 -- llvm/include/llvm/ProfileData/SampleProf.h llvm/tools/llvm-profdata/llvm-profdata.cpp
View the diff from clang-format here.
diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp
index d00ce49701..4d8c8126fa 100644
--- a/llvm/tools/llvm-profdata/llvm-profdata.cpp
+++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp
@@ -248,8 +248,7 @@ cl::opt<uint64_t> TemporalProfMaxTraceLength(
     cl::desc("The maximum length of a single temporal profile trace "
              "(default: 10000)"));
 cl::opt<std::string> FuncNameNegativeFilter(
-    "no-function", cl::init(""),
-    cl::sub(MergeSubcommand),
+    "no-function", cl::init(""), cl::sub(MergeSubcommand),
     cl::desc("Exclude functions matching the filter from the output."));
 
 cl::opt<FailureMode>
@@ -772,13 +771,11 @@ getFuncName(const StringMap<InstrProfWriter::ProfilingData>::value_type &Val) {
   return Val.first();
 }
 
-static std::string
-getFuncName(const SampleProfileMap::value_type &Val) {
+static std::string getFuncName(const SampleProfileMap::value_type &Val) {
   return Val.second.getContext().toString();
 }
 
-template <typename T>
-static void filterFunctions(T &ProfileMap) {
+template <typename T> static void filterFunctions(T &ProfileMap) {
   bool hasFilter = !FuncNameFilter.empty();
   bool hasNegativeFilter = !FuncNameNegativeFilter.empty();
   if (!hasFilter && !hasNegativeFilter)

@huangjd
Copy link
Contributor Author

huangjd commented Jan 17, 2024

Purpose of this feature is to manually exclude certain functions from the profile for investigating compilation bug caused by the profile, or using this as a workaround

@wlei-llvm
Copy link
Contributor

Thanks for the change.
IIUC, it only supports filtering top-level functions, wondering if it's also useful for excluding the inlined(nested children) functions? thinking of some "investigating compilation bug" scenario, it might be necessary to ensure all the function copies are not loaded with a profile.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also update llvm-profdata.rst file in CommandGuide.

static void filterFunctions(T &ProfileMap) {
bool hasFilter = !FuncNameFilter.empty();
bool hasNegativeFilter = !FuncNameNegativeFilter.empty();
if (hasFilter || hasNegativeFilter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use early return to avoid the nesting.

Added documentation
@huangjd
Copy link
Contributor Author

huangjd commented Jan 22, 2024

Thanks for the change. IIUC, it only supports filtering top-level functions, wondering if it's also useful for excluding the inlined(nested children) functions? thinking of some "investigating compilation bug" scenario, it might be necessary to ensure all the function copies are not loaded with a profile.

When specifying a positive match filter, this will basically strip all inlinees of a matched function because its inlinees are likely to have different names so they will not be matched.

We will need another way to specify the filter for inlined functions, as it is probably not intuitive to make the logic of positive and negative filter unexpectedly different.

@huangjd huangjd merged commit 2b8649f into llvm:main Jan 23, 2024
3 of 5 checks passed
@huangjd huangjd deleted the llvmprofdata-filter branch January 23, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:binary-utilities PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants