-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M1 LLVM Runtimedyld Invalid page reloc value assertion error #8567
Comments
Dear users, i'm trying to run a script for a python3 program and i get this error 9/10. My script works, but randomly. When the code is too long, sometimes the error appears, some times it doesn't. So i have to run the script multiple times in the hope that it arrives at an end. I paste you the error that i get: Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210. i'm running in a macbookPro with M1Pro. |
@Francyrad thank you for asking about this. You do indeed encounter the same error as reported in this issue. You may consider Numba (more precisely, LLVM) to be broken on M1. There is currently no known fix or workaround and we are not sure if this has been reported upstream to LLVM or if there is a fix in progress. IIRC @sklam also checked LLVM 14 and it appears as though this has not be fixed. My only remaining guess here would be to try to run your script in a docker container on the M1 using a TL:DR Running Numba on an M1 may cause the segfaults you see above and the only known workaround is to use different hardware. |
@esc thank you for your answer, i hope someone will be able to fix that. Please, let me know when it will be fixed commenting this issue Thank you again |
Yes, we hope so too, if you subscribe to this issue, you will receive updates regarding this quest. |
Hi, is there any update on this? I'm under Python3.9 and LLVM 11.1.0 and M1 mac, and am having the same issue right now when running multi-processing of a forecast model (AutoCES) under statsforecast package. It seems like the temporary fix by #8583 is not working for me. I have other models tested without issues, but they're all with the numba in the backend to speed up the computing. The only difference that I can think of is this specific model using complex values rather than some real-number values. With numba (0.46) and llvmlite(0.39), this exactly the same error is raised when running. However, with dev version of numba (0.57.0.dev0+1257.gce69f3010) and llvmlite 0.40.0.dev0+70.ge6901e0), basically the multiprocessing just stuck in the terminal without any errors raised. (But I'm pretty sure it's still the same issue) |
I have still the issue. Sometimes I waste more of my time try to running my scripts instead of working |
#8583 only disables the tests so that we can complete the test-suite and ship the package, so it won't actually help with the issue. No, unfortunately not, there is no known workaround, it's broken in LLVM 11 and 14 (supported by next Numba/llvmlite release). I am not aware of anyone working on a fix at present, so your best bet for now will be to use non-M1/Apple silicon, i.e. change hardware. So sorry I don't have better news for you. @sklam for reference, was this ever reported to the LLVM issue tracker and if so, can you post the issue ID please? Thank you. |
Just wanted to mention that I'm having the same issue on Mac M1, llvm-openmp 16.0.2 and llvmlite 0.40.0! I run into this issue when solving systems of PDEs using py-pde. I've subscribed to this issue and fingers crossed that it will get fixed in the near-future. |
@iamlll another bug that I don't is that the parallelisation with OpenMP don't work with the following chips: M1Pro, M1Max and M1Ultra It works just with M1 Is there some llvm where is it possible to do some report? |
@iamlll The reason you are seeing this with llvmlite 0.40.0 is because it is based on LLVM 14 and that is indeed buggy. |
So how can we solve the problem |
This a problem of the LLVM JIT that we are using (MCJIT) and we need to migrate to OrcJIT (numba/llvmlite#919) so we can use JitLink and hopefully that will fix it. |
This issue is about |
The issue is not limited to Apple M1 or MacOS. We're seeing it on Neoverse-N1 running Ubuntu 20.04 ever since we've uprgraded to Numba 0.57. This is a server machine - and not just one. Unfortunately, we cannot downgrade Numba because we need CUDA 12.1 support. Error message:
System info:
|
I can confirm being able to reproduce a similar issue on a non-M1 AArch64 - in general we can overflow relocations - the assertion is a little different because Linux on AArch64 us using RuntimeDyldELF and not RuntimDyldMachO, but I think the principle (and the root cause) is the same, but I need to investigate further to be sure. At present I'm reproducing with DALI like:
and I need to figure out how to make a Numba-only reproducer. I'm working on a system very similar to the one reported by @mzient in #8567 (comment) - just some small minor OS / kernel version differences. |
I couldn't trigger this issue with @sklam's script from #8567 (comment), even after hundreds of runs on a Linux AArch64 system. However, the following (still using DALI, but without needing a test harness) does reproduce the issue pretty reliably: test_standalone.pyimport numpy as np
from nvidia.dali import pipeline_def
import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as dali_types
from nvidia.dali.plugin.numba.fn.experimental import numba_function
def set_all_values_to_255_batch(out0, in0):
out0[0][:] = 255
def set_all_values_to_255_sample(out0, in0):
out0[:] = 255
def set_all_values_to_float_batch(out0, in0):
out0[0][:] = 0.5
def set_all_values_to_float_sample(out0, in0):
out0[:] = 0.5
def setup_change_out_shape(out_shape, in_shape):
out0_shape = out_shape[0]
in0_shape = in_shape[0]
perm = [1, 2, 0]
for sample_idx in range(len(out0_shape)):
for d in range(len(perm)):
out0_shape[sample_idx][d] = in0_shape[sample_idx][perm[d]]
def change_out_shape_batch(out0, in0):
for sample_id in range(len(out0)):
out0[sample_id][:] = 42
def change_out_shape_sample(out0, in0):
out0[:] = 42
def get_data(shapes, dtype):
return [np.empty(shape, dtype=dtype) for shape in shapes]
def get_data_zeros(shapes, dtype):
return [np.zeros(shape, dtype=dtype) for shape in shapes]
@pipeline_def
def numba_func_pipe(shapes, dtype, run_fn=None, out_types=None, in_types=None,
outs_ndim=None, ins_ndim=None, setup_fn=None, batch_processing=None):
data = fn.external_source(lambda: get_data(shapes, dtype), batch=True, device="cpu")
return numba_function(
data, run_fn=run_fn, out_types=out_types, in_types=in_types,
outs_ndim=outs_ndim, ins_ndim=ins_ndim, setup_fn=setup_fn,
batch_processing=batch_processing)
def _testimpl_numba_func(shapes, dtype, run_fn, out_types, in_types,
outs_ndim, ins_ndim, setup_fn, batch_processing, expected_out):
batch_size = len(shapes)
pipe = numba_func_pipe(
batch_size=batch_size, num_threads=1, device_id=0,
shapes=shapes, dtype=dtype,
run_fn=run_fn, setup_fn=setup_fn, out_types=out_types,
in_types=in_types, outs_ndim=outs_ndim, ins_ndim=ins_ndim,
batch_processing=batch_processing)
pipe.build()
for _ in range(3):
outs = pipe.run()
for i in range(batch_size):
out_arr = np.array(outs[0][i])
assert np.array_equal(out_arr, expected_out[i])
def test_numba_func():
# shape, dtype, run_fn, out_types,
# in_types, out_ndim, in_ndim, setup_fn, batch_processing,
# expected_out
args = [
([(10, 10, 10)], np.uint8, set_all_values_to_255_batch, [dali_types.UINT8],
[dali_types.UINT8], [3], [3], None, True,
[np.full((10, 10, 10), 255, dtype=np.uint8)]),
([(10, 10, 10)], np.uint8, set_all_values_to_255_sample, [dali_types.UINT8],
[dali_types.UINT8], [3], [3], None, None,
[np.full((10, 10, 10), 255, dtype=np.uint8)]),
([(10, 10, 10)], np.float32, set_all_values_to_float_batch, [dali_types.FLOAT],
[dali_types.FLOAT], [3], [3], None, True,
[np.full((10, 10, 10), 0.5, dtype=np.float32)]),
([(10, 10, 10)], np.float32, set_all_values_to_float_sample, [dali_types.FLOAT],
[dali_types.FLOAT], [3], [3], None, None,
[np.full((10, 10, 10), 0.5, dtype=np.float32)]),
([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_batch, [dali_types.INT64],
[dali_types.INT64], [3], [3], setup_change_out_shape, True,
[np.full((20, 30, 10), 42, dtype=np.int32),
np.full((10, 30, 20), 42, dtype=np.int32)]),
([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_sample, [dali_types.INT64],
[dali_types.INT64], [3], [3], setup_change_out_shape, None,
[np.full((20, 30, 10), 42, dtype=np.int32),
np.full((10, 30, 20), 42, dtype=np.int32)]),
]
for shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \
setup_fn, batch_processing, expected_out in args:
_testimpl_numba_func(
shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \
setup_fn, batch_processing, expected_out
)
test_numba_func() which gives this on almost every run:
|
Yes please! |
please write me in francyrad.info@gmail.com The script and the file that you will read is quite big |
Alright, that means we got two large cases then to reporoduce. We will focus on reducing is as much as possible. |
Another thought I think worth sharing - it should be possible to get to a reproducer that doesn't depend on Numba at all - if it's minimised as much as possible, it would just involve calls to llvmlite. (Or even simpler than that, a small C++ source that links to LLVM only, to even take llvmlite out of the loop - but I think the "just llvmlite" case would already be a good starting point) |
Might take a while to get there as our developers naturally have a strong python background. We will start with a minimal nixtla setup, which is where this popped up for us. And from there on we will work our way down. |
Bump. I am consistently seeing this on M1 Pro and M2. It's a bit involved, but it occurs with ~30% probability in my code. Are you still looking for a reproducer @gmarkall ? |
FYI by googling I noticed that when porting Julia to ARM they also hit the same bug. Look at JuliaLang/julia#36617 and search in the page for "Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."),". Apparently, if this can help at all, the PR that fixed the issue was JuliaLang/julia#43664 ... |
The problem is still present |
Luckily, and coincidentally, I was working on this today, and I now have a pretty good one, which I'm going to add to #9001 because I'm tackling the issue on Linux AArch64 at present. In case you want to try it, it's: from numba import njit
@njit
def f(x, y):
return x + y
i = 0
while True:
print(i)
t = tuple(range(i))
f(t, (1j,))
i += 1 executed with:
gives:
It'd be interesting to know if that also triggers the error on your Mac. You might need to do something similar to my |
I can't set the ulimit: value exceeds hard limit The largest ulimit I can set is but it is not crashing for now... |
What number did it get to before you stopped it? |
340. I can let it run the whole night if you tell me it can be useful for you.
Il 19 ott 2023, 6:47 PM +0200, Graham Markall ***@***.***>, ha scritto:
… What number did it get to before you stopped it?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
That would be great if you could give it a go! |
@PhilipVinc Is it still running? :-) |
@gmarkall it crashes at 1001 but I think this is due to some check in numba itself? 999
1000
1001
Traceback (most recent call last):
File "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/repro.py", line 12, in <module>
File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 471, in _compile_for_args
error_rewrite(e, 'unsupported_error')
File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure features that are in use are in a valid form)
Tuple 'x' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.
File "repro.py", line 3:
<source missing, REPL/exec in use?> EDIT: This is with |
@PhilipVinc Thanks - indeed, that was a Numba limitation. I think in #9001 and https://github.com/gmarkall/numba-issue-9001 we're getting close to a really good reproducer now, so there's probably no need for additional testing here - thanks for everything you've looked into so far :-) |
LLVM discourse discussion started to discuss a potential fix: https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616 |
@Francyrad @PhilipVinc @carstenr It's early work at the moment, but if you're able to build llvmlite from source with the PR numba/llvmlite#1009, and let me know whether you still observe the issue with it (or observe any other issues) that would be good feedback - hopefully this resolves the issue, but there's a lot of testing / review to be done to have confidence in the strategy. |
I have experienced this issue repeatedly over the past month, getting errors similar to the following for my ~150 line code for solving a specific PDE:
@gmarkall, I'm not quite sure how to build from source but am happy to try and test it out. |
@jacobjivanov Thanks for sharing this info - fortunately you don't need to build from source to test the fix now, as it's part of the llvmlite 0.42 / Numba 0.59 release candidates. You can follow the instructions here to install the Numba and llvmlite release candidates: https://numba.discourse.group/t/ann-numba-0-59-0rc1-and-llvmlite-0-42-0rc1/2329 If you try this, I'd really appreciate if you can let me know whether it appears to have solved the issue for you. |
@gmarkall, I can't confirm whether it'll ever fail, but it no longer fails for the particular script that would fail roughly 50% of the time previously. Ran it ~20 times with different initial conditions. |
@gmarkall Your work is greatly appreciated! Switching to the release candidate also solved the issue for one of our packages which would occasionally fail. |
With llvmlite now at 0.42.0 and the new memory manager merged, can we close this? |
I've not heard of any reports of this issue manifesting in llvmlite 0.42, so I think so. |
Alright, let's put a proverbial checkmark behind this issue. We always have the option to re-open in case. @gmarkall thank you again for the fix for this, it is much appreciated! |
We are seeing a LLVM Assertion error occurring randomly in our build farm.
The error message is:
Earliest report is from gitter on July 15, 2022
The error can be triggered with the below script on bdb2384. The error usually occurs within 10 iteration.
The error occurs in both LLVM 11 and LLVM 14.
The current hypothesis is that the LLVM Runtimedyld is mishandling far jumps. To relate this to the reproducer above, the situation can be created by:
test_too_big_to_freeze
(the compilation and execution bits in the tests can be commented out and it will still trigger the error)test_fill_diagonal_basic
. The assertion error occurs here. The guess is that JITed code emitted for the stencil tests are reused here. The large allocation in between help make sure there is a gap/fragmentation in the memory space such that the fill_diagonal functions are JITed in somewhere far away.Julia devs is pointing to a broken large code model in LLVM Runtimedyld for MachO aarch64. See JuliaLang/julia#42295 (comment), JuliaLang/julia#43664.
The text was updated successfully, but these errors were encountered: