-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reserveAllocationSpace for SectionMemoryManager #71968
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
24ebe92
to
1110a48
Compare
89ceb93
to
c04dc5a
Compare
return; | ||
|
||
// Get space required for each section. | ||
uint64_t RequiredCodeSize = requiredPageSize(CodeSize, CodeAlign); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about this and line 46:
uintptr_t RequiredSize = Alignment * ((Size + Alignment - 1) / Alignment + 1);
That calculation buffers by 1 Alignment, to ensure we always reserve enough space to align on address. It's inefficient, but not terribly.
If this sums up many calls to allocateCodeSection/allocateDataSection, does the caller also sum appropriately to cover those buffers?
bffe9e2
to
b699a14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested a fairly similar change (based off these changes) in numba/llvmlite#1009 and the only issue I came across was that the alignment for the code segment requested during reserving the allocation can be smaller than the alignment requested when allocating the code segment - this is because the alignment for the code segment at allocation time takes into account the alignment of the stub (from here).
The ultimate effect of this is that sometimes the preallocation can be too small for the later allocation if the code size is right up close to a boundary (a page size boundary?) - for example I saw a preallocation request for 16380 bytes with alignment 4 resulting in an actual preallocation of 16384 bytes, then a later code allocation for 16379 bytes with alignment 8, which ended up trying to use 16392 bytes, slightly larger than the preallocation, and failing.
I hacked around this by by increasing the code alignment to 8 if it was less than 8 (simply because that seems to be the biggest stub alignment potentially used across all targets) and that seemed to resolve the issue. Ideally I would have queried the stub alignment, but I don't think there's an easy way to do that from within an RTDyldMemoryManager
.
Perhaps the most correct fix is for RuntimeDyldImpl::computeTotalAllocSize()
to take the stub alignment into consideration when computing the code segment alignment?
Also, I should have said earlier: many thanks for all your efforts and the trouble you've gone to to put this together - it is certainly looking promising, from the perspective of Numba on AArch64 platforms! |
void SectionMemoryManager::reserveAllocationSpace( | ||
uintptr_t CodeSize, Align CodeAlign, uintptr_t RODataSize, | ||
Align RODataAlign, uintptr_t RWDataSize, Align RWDataAlign) { | ||
if (CodeSize == 0 && RODataSize == 0 && RWDataSize == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should clear the FreeMem
vectors before returning early here? Otherwise, we could reserve no space, but still have some free memory left over from a previous reservation, then end up using that to satisfy a later erroneous request instead of hitting the assert that fails when we try to allocate memory without having reserved it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, is this OK to return early without clearing free memory, on the same grounds that we reuse a previous contiguous block below if there's enough space in the existing reservation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erroneous requests would negate any benefits from using reserveAllocationSpace. And yes, if I removed this early return it would still return early when checking for sufficient existing free space to satisfy the whole request.
This is done now, I think? Or are you planning to add more tests? |
b699a14
to
73f54a2
Compare
73f54a2
to
5d3caaf
Compare
5d3caaf
to
afafa20
Compare
Implements `reserveAllocationSpace` and provides an option to enable `needsToReserveAllocationSpace` for large-memory environments with AArch64. The [AArch64 ABI](https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst) has limits on the distance between sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32). Fixes llvm#71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory. Does not change the default behavior of SectionMemoryManager.
afafa20
to
1922c6c
Compare
The implementation of `reserveAllocationSpace()` now more closely follows that in llvm/llvm-project#71968, following some changes made there. The changes here include: - Improved readability of debugging output - Using a default alignment of 8 in `allocateSection()` to match the default alignment provided by the stub alignment during preallocation. - Replacing the "bespoke" `requiredPageSize()` function with computations using the LLVM `alignTo()` function. - Returning early from preallocation when no space is requested. - Reusing existing preallocations if there is enough space left over from the previous preallocation for all the required segments - this can happen quite frequently because allocations for each segment get rounded up to page sizes, which are usually either 4K or 16K, and many Numba-jitted functions require a lot less than this. - Removal of setting the near hints for memory blocks - this doesn't really have any use when all memory is preallocated, and forced to be "near" to other memory. - Addition of extra asserts to validate alignment of allocated sections.
Now that numba/llvmlite#1009 (which essentially implements the same changes made here, just inside llvmlite) is merged and an RC has been produced, I've received various reports that this fixes issues related to #71963 on both macOS and LinuxAArch64 systems, and no reports of adverse effects - so FWIW, my confidence in this patch being a good fix is quite high. |
Diffs originated from llvm/llvm-project#71968 and were modified to target the specific version of LLVM in use by CUDA Quantum (16.0.6).
Diffs originated from llvm/llvm-project#71968 and were modified to target the specific version of LLVM in use by CUDA Quantum (16.0.6).
Implements
reserveAllocationSpace
and provides an option to enableneedsToReserveAllocationSpace
for large-memory environments with AArch64.The AArch64 ABI has restrictions on the distance between TEXT and GOT sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32).
Fixes #71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory.