Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reserveAllocationSpace for SectionMemoryManager #71968

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MikaelSmith
Copy link

@MikaelSmith MikaelSmith commented Nov 10, 2023

Implements reserveAllocationSpace and provides an option to enable needsToReserveAllocationSpace for large-memory environments with AArch64.

The AArch64 ABI has restrictions on the distance between TEXT and GOT sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32).

Fixes #71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory.

Copy link

github-actions bot commented Nov 10, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

return;

// Get space required for each section.
uint64_t RequiredCodeSize = requiredPageSize(CodeSize, CodeAlign);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about this and line 46:

uintptr_t RequiredSize = Alignment * ((Size + Alignment - 1) / Alignment + 1);

That calculation buffers by 1 Alignment, to ensure we always reserve enough space to align on address. It's inefficient, but not terribly.

If this sums up many calls to allocateCodeSection/allocateDataSection, does the caller also sum appropriately to cover those buffers?

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested a fairly similar change (based off these changes) in numba/llvmlite#1009 and the only issue I came across was that the alignment for the code segment requested during reserving the allocation can be smaller than the alignment requested when allocating the code segment - this is because the alignment for the code segment at allocation time takes into account the alignment of the stub (from here).

The ultimate effect of this is that sometimes the preallocation can be too small for the later allocation if the code size is right up close to a boundary (a page size boundary?) - for example I saw a preallocation request for 16380 bytes with alignment 4 resulting in an actual preallocation of 16384 bytes, then a later code allocation for 16379 bytes with alignment 8, which ended up trying to use 16392 bytes, slightly larger than the preallocation, and failing.

I hacked around this by by increasing the code alignment to 8 if it was less than 8 (simply because that seems to be the biggest stub alignment potentially used across all targets) and that seemed to resolve the issue. Ideally I would have queried the stub alignment, but I don't think there's an easy way to do that from within an RTDyldMemoryManager.

Perhaps the most correct fix is for RuntimeDyldImpl::computeTotalAllocSize() to take the stub alignment into consideration when computing the code segment alignment?

@gmarkall
Copy link
Contributor

Also, I should have said earlier: many thanks for all your efforts and the trouble you've gone to to put this together - it is certainly looking promising, from the perspective of Numba on AArch64 platforms!

void SectionMemoryManager::reserveAllocationSpace(
uintptr_t CodeSize, Align CodeAlign, uintptr_t RODataSize,
Align RODataAlign, uintptr_t RWDataSize, Align RWDataAlign) {
if (CodeSize == 0 && RODataSize == 0 && RWDataSize == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should clear the FreeMem vectors before returning early here? Otherwise, we could reserve no space, but still have some free memory left over from a previous reservation, then end up using that to satisfy a later erroneous request instead of hitting the assert that fails when we try to allocate memory without having reserved it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, is this OK to return early without clearing free memory, on the same grounds that we reuse a previous contiguous block below if there's enough space in the existing reservation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroneous requests would negate any benefits from using reserveAllocationSpace. And yes, if I removed this early return it would still return early when checking for sufficient existing free space to satisfy the whole request.

@gmarkall
Copy link
Contributor

TODO: add tests to MCJITMemoryManagerTest

This is done now, I think? Or are you planning to add more tests?

Implements `reserveAllocationSpace` and provides an option to enable
`needsToReserveAllocationSpace` for large-memory environments with
AArch64.

The [AArch64 ABI](https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst)
has limits on the distance between sections as the instructions to
reference them are limited to 2 or 4GB. Allocating sections in multiple
blocks can result in distances greater than that on systems with lots of
memory. In those environments several projects using
SectionMemoryManager with MCJIT have run across assertion failures for
the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address
across distances greater than 2GB (an int32).

Fixes llvm#71963 by allocating all sections in a single contiguous memory
allocation, limiting the distance required for instruction offsets
similar to how pre-compiled binaries would be loaded into memory. Does
not change the default behavior of SectionMemoryManager.
gmarkall added a commit to gmarkall/llvmlite that referenced this pull request Nov 22, 2023
The implementation of `reserveAllocationSpace()` now more closely
follows that in llvm/llvm-project#71968,
following some changes made there.

The changes here include:

- Improved readability of debugging output
- Using a default alignment of 8 in `allocateSection()` to match the
  default alignment provided by the stub alignment during preallocation.
- Replacing the "bespoke" `requiredPageSize()` function with
  computations using the LLVM `alignTo()` function.
- Returning early from preallocation when no space is requested.
- Reusing existing preallocations if there is enough space left over
  from the previous preallocation for all the required segments - this
  can happen quite frequently because allocations for each segment get
  rounded up to page sizes, which are usually either 4K or 16K, and many
  Numba-jitted functions require a lot less than this.
- Removal of setting the near hints for memory blocks - this doesn't
  really have any use when all memory is preallocated, and forced to be
  "near" to other memory.
- Addition of extra asserts to validate alignment of allocated sections.
@gmarkall
Copy link
Contributor

gmarkall commented Jan 2, 2024

Now that numba/llvmlite#1009 (which essentially implements the same changes made here, just inside llvmlite) is merged and an RC has been produced, I've received various reports that this fixes issues related to #71963 on both macOS and LinuxAArch64 systems, and no reports of adverse effects - so FWIW, my confidence in this patch being a good fix is quite high.

bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this pull request Mar 26, 2024
Diffs originated from llvm/llvm-project#71968
and were modified to target the specific version of LLVM in use by CUDA
Quantum (16.0.6).
bmhowe23 added a commit to bmhowe23/cuda-quantum that referenced this pull request Mar 26, 2024
Diffs originated from llvm/llvm-project#71968
and were modified to target the specific version of LLVM in use by CUDA
Quantum (16.0.6).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failure in RuntimeDyldELF::resolveAArch64Relocation
3 participants