Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Fix runOnEachFunctionWithUniqueAllocId #90039

Merged

Conversation

kbeyls
Copy link
Collaborator

@kbeyls kbeyls commented Apr 25, 2024

When runOnEachFunctionWithUniqueAllocId is invoked with ForceSequential=true, then the current implementation runs the function with AllocId==0, which is the Id for the shared, non-unique, default AnnotationAllocator.

However, the documentation for runOnEachFunctionWithUniqueAllocId states:

/// Perform the work on each BinaryFunction except those that are rejected
/// by SkipPredicate, and create a unique annotation allocator for each
/// task. This should be used whenever the work function creates annotations to
/// allow thread-safe annotation creation.

Therefore, even when ForceSequential==true, a unique AllocId should be used, i.e. different from 0.

In the current upstream BOLT this is presumably not depended on, but it is needed to reduce memory usage for analyses that use a lot of memory/annotations. Examples are the pac-ret and stack-clash analyses that currently have prototype implementations as described in https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 These analyses use the DataFlowAnalysis framework to sometimes store quite a lot of information on each MCInst. They run in parallel on each function. When the dataflow analysis is finished, the annotations on each MCInst can be removed, hugely saving on memory consumption. The only annotations that need to remain are those that indicate some unexpected properties somewhere in the binary.

Fixing this bug enables implementing the deletion of the memory used by those huge number of DataFlowAnalysis annotations (by invoking BC.MIB->freeValuesAllocator(AllocatorId)), even when run with --no-threads. Without this bug fixed, the invocation of BC.MIB->freeValuesAllocator(AllocatorId) results in also the memory for all other annotations to be deleted, as AllocatorId is 0.

Copy link

github-actions bot commented Apr 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kbeyls
Copy link
Collaborator Author

kbeyls commented Apr 25, 2024

The one precommit test failure on buildkite/github-pull-requests looks exactly the same as on this other recent BOLT-related PR #90004 (RISCV/fake-label-no-entry.c). Therefore, It seems that the precommit test failure is not related to the patch in this PR.

@kbeyls kbeyls force-pushed the bolt-fix-runOnEachFunctionWithUniqueAllocId branch from ce4621d to 5870590 Compare April 30, 2024 07:59
@llvmbot llvmbot added the BOLT label Apr 30, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 30, 2024

@llvm/pr-subscribers-bolt

Author: Kristof Beyls (kbeyls)

Changes

When runOnEachFunctionWithUniqueAllocId is invoked with ForceSequential=true, then the current implementation runs the function with AllocId==0, which is the Id for the shared, non-unique, default AnnotationAllocator.

However, the documentation for runOnEachFunctionWithUniqueAllocId states:
/// Perform the work on each BinaryFunction except those that are rejected /// by SkipPredicate, and create a unique annotation allocator for each /// task. This should be used whenever the work function creates annotations to /// allow thread-safe annotation creation.

Therefore, even when ForceSequential==true, a unique AllocId should be used, i.e. different from 0.

In the current upstream BOLT this is presumably not depended on, but it is needed to reduce memory usage for analyses that use a lot of memory/annotations. Examples are the pac-ret and stack-clash analyses that currently have prototype implementations as described in https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 These analyses use the DataFlowAnalysis framework to sometimes store quite a lot of information on each MCInst. They run in parallel on each function. When the dataflow analysis is finished, the annotations on each MCInst can be removed, hugely saving on memory consumption. The only annotations that need to remain are those that indicate some unexpected properties somewhere in the binary.

Fixing this bug enables implementing the deletion of the memory used by those huge number of DataFlowAnalysis annotations (by invoking BC.MIB->freeValuesAllocator(AllocatorId)), even when run with --no-threads. Without this bug fixed, the invocation of BC.MIB->freeValuesAllocator(AllocatorId) results in also the memory for all other annotations to be deleted, as AllocatorId is 0.


Full diff: https://github.com/llvm/llvm-project/pull/90039.diff

1 Files Affected:

  • (modified) bolt/lib/Core/ParallelUtilities.cpp (+15-8)
diff --git a/bolt/lib/Core/ParallelUtilities.cpp b/bolt/lib/Core/ParallelUtilities.cpp
index 5f5e96e0e7881c..8e5f2f0f1f23d3 100644
--- a/bolt/lib/Core/ParallelUtilities.cpp
+++ b/bolt/lib/Core/ParallelUtilities.cpp
@@ -164,6 +164,15 @@ void runOnEachFunction(BinaryContext &BC, SchedulingPolicy SchedPolicy,
   Pool.wait();
 }
 
+static void EnsureAllocatorExists(BinaryContext &BC, unsigned AllocId) {
+  if (!BC.MIB->checkAllocatorExists(AllocId)) {
+    MCPlusBuilder::AllocatorIdTy Id =
+        BC.MIB->initializeNewAnnotationAllocator();
+    (void)Id;
+    assert(AllocId == Id && "unexpected allocator id created");
+  }
+}
+
 void runOnEachFunctionWithUniqueAllocId(
     BinaryContext &BC, SchedulingPolicy SchedPolicy,
     WorkFuncWithAllocTy WorkFunction, PredicateTy SkipPredicate,
@@ -188,8 +197,12 @@ void runOnEachFunctionWithUniqueAllocId(
     LLVM_DEBUG(T.stopTimer());
   };
 
+  unsigned AllocId = 1;
+
   if (opts::NoThreads || ForceSequential) {
-    runBlock(BC.getBinaryFunctions().begin(), BC.getBinaryFunctions().end(), 0);
+    EnsureAllocatorExists(BC, AllocId);
+    runBlock(BC.getBinaryFunctions().begin(), BC.getBinaryFunctions().end(),
+             AllocId);
     return;
   }
   // This lock is used to postpone task execution
@@ -205,19 +218,13 @@ void runOnEachFunctionWithUniqueAllocId(
   ThreadPoolInterface &Pool = getThreadPool();
   auto BlockBegin = BC.getBinaryFunctions().begin();
   unsigned CurrentCost = 0;
-  unsigned AllocId = 1;
   for (auto It = BC.getBinaryFunctions().begin();
        It != BC.getBinaryFunctions().end(); ++It) {
     BinaryFunction &BF = It->second;
     CurrentCost += computeCostFor(BF, SkipPredicate, SchedPolicy);
 
     if (CurrentCost >= BlockCost) {
-      if (!BC.MIB->checkAllocatorExists(AllocId)) {
-        MCPlusBuilder::AllocatorIdTy Id =
-            BC.MIB->initializeNewAnnotationAllocator();
-        (void)Id;
-        assert(AllocId == Id && "unexpected allocator id created");
-      }
+      EnsureAllocatorExists(BC, AllocId);
       Pool.async(runBlock, BlockBegin, std::next(It), AllocId);
       AllocId++;
       BlockBegin = std::next(It);

When runOnEachFunctionWithUniqueAllocId is invoked with
ForceSequential=true, then the current implementation runs the function
with AllocId==0, which is the Id for the shared, non-unique, default
AnnotationAllocator.

However, the documentation for runOnEachFunctionWithUniqueAllocId
states:
/// Perform the work on each BinaryFunction except those that are rejected
/// by SkipPredicate, and create a unique annotation allocator for each
/// task. This should be used whenever the work function creates annotations to
/// allow thread-safe annotation creation.

Therefore, even when ForceSequential==true, a unique AllocId should be
used, i.e. different from 0.

In the current upstream BOLT this is presumably not depended on, but it
is needed to reduce memory usage for analyses that use a lot of
memory/annotations. Examples are the pac-ret and stack-clash analyses
that currently have prototype implementations as described in
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
These analyses use the DataFlowAnalysis framework to sometimes store
quite a lot of information on each MCInst. They run in parallel on each
function. When the dataflow analysis is finished, the annotations on
each MCInst can be removed, hugely saving on memory consumption.
The only annotations that need to remain are those that indicate some
unexpected properties somewhere in the binary.

Fixing this bug enables implementing the deletion of the memory used by
those huge number of DataFlowAnalysis annotations (by invoking
BC.MIB->freeValuesAllocator(AllocatorId)), even when run with
--no-threads. Without this bug fixed, the invocation of
BC.MIB->freeValuesAllocator(AllocatorId) results in also the memory
for all other annotations to be deleted, as AllocatorId is 0.
@kbeyls kbeyls force-pushed the bolt-fix-runOnEachFunctionWithUniqueAllocId branch from 5870590 to 47f2dbf Compare April 30, 2024 10:15
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Looks good overall.

bolt/lib/Core/ParallelUtilities.cpp Outdated Show resolved Hide resolved
bolt/lib/Core/ParallelUtilities.cpp Outdated Show resolved Hide resolved
kbeyls and others added 3 commits May 3, 2024 11:51
Co-authored-by: Maksim Panchenko <maks@meta.com>
Co-authored-by: Maksim Panchenko <maks@meta.com>
@kbeyls kbeyls merged commit 554459a into llvm:main May 4, 2024
4 checks passed
@kbeyls kbeyls deleted the bolt-fix-runOnEachFunctionWithUniqueAllocId branch May 4, 2024 07:26
kbeyls added a commit to kbeyls/llvm-project that referenced this pull request Jun 5, 2024
When runOnEachFunctionWithUniqueAllocId is invoked with
ForceSequential=true, then the current implementation runs the function
with AllocId==0, which is the Id for the shared, non-unique, default
AnnotationAllocator.

However, the documentation for runOnEachFunctionWithUniqueAllocId
states:
```
/// Perform the work on each BinaryFunction except those that are rejected
/// by SkipPredicate, and create a unique annotation allocator for each
/// task. This should be used whenever the work function creates annotations to
/// allow thread-safe annotation creation.
```

Therefore, even when ForceSequential==true, a unique AllocId should be
used, i.e. different from 0.

In the current upstream BOLT this is presumably not depended on, but it
is needed to reduce memory usage for analyses that use a lot of
memory/annotations. Examples are the pac-ret and stack-clash analyses
that currently have prototype implementations as described in
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
These analyses use the DataFlowAnalysis framework to sometimes store
quite a lot of information on each MCInst. They run in parallel on each
function. When the dataflow analysis is finished, the annotations on
each MCInst can be removed, hugely saving on memory consumption. The
only annotations that need to remain are those that indicate some
unexpected properties somewhere in the binary.

Fixing this bug enables implementing the deletion of the memory used by
those huge number of DataFlowAnalysis annotations (by invoking
BC.MIB->freeValuesAllocator(AllocatorId)), even when run with
--no-threads. Without this bug fixed, the invocation of
BC.MIB->freeValuesAllocator(AllocatorId) results in also the memory for
all other annotations to be deleted, as AllocatorId is 0.

---------

Co-authored-by: Maksim Panchenko <maks@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants