New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux AArch64 RuntimeDyld relocation overflows (#8567 specific to Linux only) #9001
Comments
It turns out I can even hit the assertion with a smaller test than the original reproducer from #8567 on Linux, in a docker container:
|
Witnessed running the whole testsuite (python -m numba.runtests -m`) in a desperate attempt to trigger the bug on the Ampere 80 core machine:
|
One failure mode is that it seems to be attempting to resolve a relocation for an ADRP instruction to
The target to write the relocation:
The value to insert:
the distance between them:
... almost 6GB apart, which is too far even for the large code model - see the table in https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#code-models |
I now have a simpler reproducer, which works with Numba and llvmlite from numba import njit
@njit
def f(x, y):
return x + y
i = 0
while True:
print(i)
t = tuple(range(i))
f(t, (1j,))
i += 1 executed with:
gives:
|
Repository for reproducer work: https://github.com/gmarkall/numba-issue-9001 I now have a repro that only uses NumPy and llvmlite, not Numba, which is the present state recorded in that repo. |
Hi, I don't know if this is helpful, but I thought I'd share my experience of this issue. I have come across the issue with my code both on the Ampere 80 core machine running Ubuntu and on AWS Graviton arm64 instances running Amazon Linux 2. On smaller AWS instances (like 8 cores) the problem is less frequent than on larger instances (like 32/64 cores or the Ampere 80). Perhaps this is because smaller instances have less RAM (8 GB vs 128 GB)? I can reproduce the numba-issue-9001 code on AWS m7gd.2xlarge after only 5 iterations. |
Thanks for the input - when there's a lot of RAM it does seem that it's easier for the issue to manifest in our experience too. There's also a start of a discussion on the LLVM Discourse regarding a potential fix: https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616 |
Based on the Impala memory manager: MikaelSmith/impala@ac8561b This allows the test suite to pass but still does not fix numba/numba#9001.
@zansibal It's early work at the moment, but if you're able to build llvmlite from source with the PR numba/llvmlite#1009, and let me know whether you still observe the issue with it (or observe any other issues) that would be good feedback - hopefully this resolves the issue, but there's a lot of testing / review to be done to have confidence in the strategy. |
This was fixed by numba/llvmlite#1009. |
I'm breaking out this issue from #8567 to avoid spamming everyone who participated in that issue with notifications as I add notes and debugging info. In summary, on Linux AArch64,
test_standalone.py
results in:
The text was updated successfully, but these errors were encountered: