Implement reserveAllocationSpace for SectionMemoryManager #71968

MikaelSmith · 2023-11-10T18:00:41Z

Implements reserveAllocationSpace and provides an option to enable needsToReserveAllocationSpace for large-memory environments with AArch64.

The AArch64 ABI has restrictions on the distance between TEXT and GOT sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32).

Fixes #71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory.

github-actions · 2023-11-10T18:08:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp

MikaelSmith · 2023-11-14T00:25:41Z

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp

+    return;
+
+  // Get space required for each section.
+  uint64_t RequiredCodeSize = requiredPageSize(CodeSize, CodeAlign);


I have concerns about this and line 46:

uintptr_t RequiredSize = Alignment * ((Size + Alignment - 1) / Alignment + 1);

That calculation buffers by 1 Alignment, to ensure we always reserve enough space to align on address. It's inefficient, but not terribly.

If this sums up many calls to allocateCodeSection/allocateDataSection, does the caller also sum appropriately to cover those buffers?

Based on changes submitted / discussed in: - llvm/llvm-project#71968 - https://github.com/MikaelSmith/impala/blob/ac8561b6b69530f9fa2ff2ae65ec7415aa4395c6/be/src/codegen/mcjit-mem-mgr.cc - https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616

gmarkall

I've tested a fairly similar change (based off these changes) in numba/llvmlite#1009 and the only issue I came across was that the alignment for the code segment requested during reserving the allocation can be smaller than the alignment requested when allocating the code segment - this is because the alignment for the code segment at allocation time takes into account the alignment of the stub (from here).

The ultimate effect of this is that sometimes the preallocation can be too small for the later allocation if the code size is right up close to a boundary (a page size boundary?) - for example I saw a preallocation request for 16380 bytes with alignment 4 resulting in an actual preallocation of 16384 bytes, then a later code allocation for 16379 bytes with alignment 8, which ended up trying to use 16392 bytes, slightly larger than the preallocation, and failing.

I hacked around this by by increasing the code alignment to 8 if it was less than 8 (simply because that seems to be the biggest stub alignment potentially used across all targets) and that seemed to resolve the issue. Ideally I would have queried the stub alignment, but I don't think there's an easy way to do that from within an RTDyldMemoryManager.

Perhaps the most correct fix is for RuntimeDyldImpl::computeTotalAllocSize() to take the stub alignment into consideration when computing the code segment alignment?

gmarkall · 2023-11-16T17:55:49Z

Also, I should have said earlier: many thanks for all your efforts and the trouble you've gone to to put this together - it is certainly looking promising, from the perspective of Numba on AArch64 platforms!

gmarkall · 2023-11-17T13:14:37Z

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp

+void SectionMemoryManager::reserveAllocationSpace(
+    uintptr_t CodeSize, Align CodeAlign, uintptr_t RODataSize,
+    Align RODataAlign, uintptr_t RWDataSize, Align RWDataAlign) {
+  if (CodeSize == 0 && RODataSize == 0 && RWDataSize == 0)


Perhaps we should clear the FreeMem vectors before returning early here? Otherwise, we could reserve no space, but still have some free memory left over from a previous reservation, then end up using that to satisfy a later erroneous request instead of hitting the assert that fails when we try to allocate memory without having reserved it first.

Or, is this OK to return early without clearing free memory, on the same grounds that we reuse a previous contiguous block below if there's enough space in the existing reservation?

Erroneous requests would negate any benefits from using reserveAllocationSpace. And yes, if I removed this early return it would still return early when checking for sufficient existing free space to satisfy the whole request.

gmarkall · 2023-11-17T13:27:17Z

TODO: add tests to MCJITMemoryManagerTest

This is done now, I think? Or are you planning to add more tests?

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp

Implements `reserveAllocationSpace` and provides an option to enable `needsToReserveAllocationSpace` for large-memory environments with AArch64. The [AArch64 ABI](https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst) has limits on the distance between sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32). Fixes llvm#71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory. Does not change the default behavior of SectionMemoryManager.

Based on changes submitted / discussed in: - llvm/llvm-project#71968 - https://github.com/MikaelSmith/impala/blob/ac8561b6b69530f9fa2ff2ae65ec7415aa4395c6/be/src/codegen/mcjit-mem-mgr.cc - https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616

The implementation of `reserveAllocationSpace()` now more closely follows that in llvm/llvm-project#71968, following some changes made there. The changes here include: - Improved readability of debugging output - Using a default alignment of 8 in `allocateSection()` to match the default alignment provided by the stub alignment during preallocation. - Replacing the "bespoke" `requiredPageSize()` function with computations using the LLVM `alignTo()` function. - Returning early from preallocation when no space is requested. - Reusing existing preallocations if there is enough space left over from the previous preallocation for all the required segments - this can happen quite frequently because allocations for each segment get rounded up to page sizes, which are usually either 4K or 16K, and many Numba-jitted functions require a lot less than this. - Removal of setting the near hints for memory blocks - this doesn't really have any use when all memory is preallocated, and forced to be "near" to other memory. - Addition of extra asserts to validate alignment of allocated sections.

gmarkall · 2024-01-02T11:20:32Z

Now that numba/llvmlite#1009 (which essentially implements the same changes made here, just inside llvmlite) is merged and an RC has been produced, I've received various reports that this fixes issues related to #71963 on both macOS and LinuxAArch64 systems, and no reports of adverse effects - so FWIW, my confidence in this patch being a good fix is quite high.

Diffs originated from llvm/llvm-project#71968 and were modified to target the specific version of LLVM in use by CUDA Quantum (16.0.6).

MikaelSmith force-pushed the prealloc-memory branch from 24ebe92 to 1110a48 Compare November 10, 2023 22:28

MikaelSmith commented Nov 10, 2023

View reviewed changes

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp Outdated Show resolved Hide resolved

MikaelSmith force-pushed the prealloc-memory branch 2 times, most recently from 89ceb93 to c04dc5a Compare November 10, 2023 23:02

sjoerdmeijer requested review from lhames, sjoerdmeijer and weliveindetail November 13, 2023 17:38

MikaelSmith commented Nov 14, 2023

View reviewed changes

MikaelSmith force-pushed the prealloc-memory branch 4 times, most recently from bffe9e2 to b699a14 Compare November 14, 2023 20:06

gmarkall mentioned this pull request Nov 15, 2023

Fix relocation overflows by implementing preallocation in the memory manager numba/llvmlite#1009

Merged

gmarkall suggested changes Nov 16, 2023

View reviewed changes

gmarkall reviewed Nov 17, 2023

View reviewed changes

MikaelSmith force-pushed the prealloc-memory branch from b699a14 to 73f54a2 Compare November 17, 2023 17:18

MikaelSmith commented Nov 17, 2023

View reviewed changes

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp Outdated Show resolved Hide resolved

MikaelSmith force-pushed the prealloc-memory branch from 73f54a2 to 5d3caaf Compare November 17, 2023 18:12

MikaelSmith commented Nov 17, 2023

View reviewed changes

llvm/lib/ExecutionEngine/SectionMemoryManager.cpp Outdated Show resolved Hide resolved

MikaelSmith force-pushed the prealloc-memory branch from 5d3caaf to afafa20 Compare November 17, 2023 19:09

MikaelSmith force-pushed the prealloc-memory branch from afafa20 to 1922c6c Compare November 20, 2023 18:47

c-rhodes mentioned this pull request Jan 15, 2024

Revert "[mlir][ExecutionEngine] Add support for global constructors and destructors" #78164

Merged

c-rhodes mentioned this pull request Jan 15, 2024

[mlir][ExecutionEngine] Add support for global constructors and destructors #78070

Merged

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this pull request Mar 26, 2024

Fix LLVM aarch64 relocation overflow

3a5733d

Diffs originated from llvm/llvm-project#71968 and were modified to target the specific version of LLVM in use by CUDA Quantum (16.0.6).

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this pull request Mar 26, 2024

Fix LLVM aarch64 relocation overflow

e117875

Diffs originated from llvm/llvm-project#71968 and were modified to target the specific version of LLVM in use by CUDA Quantum (16.0.6).

bmhowe23 mentioned this pull request Mar 26, 2024

Fix LLVM aarch64 relocation overflow NVIDIA/cuda-quantum#1444

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement reserveAllocationSpace for SectionMemoryManager #71968

Implement reserveAllocationSpace for SectionMemoryManager #71968

MikaelSmith commented Nov 10, 2023 •

edited

github-actions bot commented Nov 10, 2023 •

edited

MikaelSmith Nov 14, 2023

gmarkall left a comment

gmarkall commented Nov 16, 2023

gmarkall Nov 17, 2023

gmarkall Nov 17, 2023

MikaelSmith Nov 17, 2023

gmarkall commented Nov 17, 2023

gmarkall commented Jan 2, 2024

Implement reserveAllocationSpace for SectionMemoryManager #71968

Are you sure you want to change the base?

Implement reserveAllocationSpace for SectionMemoryManager #71968

Conversation

MikaelSmith commented Nov 10, 2023 • edited

github-actions bot commented Nov 10, 2023 • edited

MikaelSmith Nov 14, 2023

Choose a reason for hiding this comment

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall commented Nov 16, 2023

gmarkall Nov 17, 2023

Choose a reason for hiding this comment

gmarkall Nov 17, 2023

Choose a reason for hiding this comment

MikaelSmith Nov 17, 2023

Choose a reason for hiding this comment

gmarkall commented Nov 17, 2023

gmarkall commented Jan 2, 2024

MikaelSmith commented Nov 10, 2023 •

edited

github-actions bot commented Nov 10, 2023 •

edited