Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1 LLVM Runtimedyld Invalid page reloc value assertion error #8567

Closed
sklam opened this issue Nov 2, 2022 · 47 comments
Closed

M1 LLVM Runtimedyld Invalid page reloc value assertion error #8567

sklam opened this issue Nov 2, 2022 · 47 comments
Labels
bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS ISA: ARM Issue related to ARM ISA llvm LLVM related issues

Comments

@sklam
Copy link
Member

sklam commented Nov 2, 2022

We are seeing a LLVM Assertion error occurring randomly in our build farm.

The error message is:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /path/to/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210

Earliest report is from gitter on July 15, 2022

The error can be triggered with the below script on bdb2384. The error usually occurs within 10 iteration.

!python setup.py build_ext --inplace 
c = 0
_exit_code = 0
tests = """
numba.tests.test_stencils.TestManyStencils.test_basic40
numba.tests.test_stencils.TestManyStencils.test_basic70
numba.tests.test_array_constants.TestConstantArray.test_too_big_to_freeze
numba.tests.test_array_manipulation.TestArrayManipulation.test_fill_diagonal_basic
""".split()
cmdarg = ' '.join(tests)
while _exit_code == 0 and c < 150:
    print(f"c={c}".center(80, '='))
    !NUMBA_OPT=0 python -m unittest -vb $cmdarg $cmdarg
    c += 1
    print(f"exit={_exit_code}")
    assert _exit_code == 0

The error occurs in both LLVM 11 and LLVM 14.

The current hypothesis is that the LLVM Runtimedyld is mishandling far jumps. To relate this to the reproducer above, the situation can be created by:

  • first JITing some stencil kernels, which tend to be large and esp. larger when OPT=0
  • allocating large amount of memory as in test_too_big_to_freeze (the compilation and execution bits in the tests can be commented out and it will still trigger the error)
  • JITing more array operations as in test_fill_diagonal_basic. The assertion error occurs here. The guess is that JITed code emitted for the stencil tests are reused here. The large allocation in between help make sure there is a gap/fragmentation in the memory space such that the fill_diagonal functions are JITed in somewhere far away.

Julia devs is pointing to a broken large code model in LLVM Runtimedyld for MachO aarch64. See JuliaLang/julia#42295 (comment), JuliaLang/julia#43664.

@sklam sklam added llvm LLVM related issues ISA: ARM Issue related to ARM ISA bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS labels Nov 2, 2022
@Francyrad
Copy link

Francyrad commented Nov 14, 2022

Dear users, i'm trying to run a script for a python3 program and i get this error 9/10. My script works, but randomly. When the code is too long, sometimes the error appears, some times it doesn't. So i have to run the script multiple times in the hope that it arrives at an end. I paste you the error that i get:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210.
zsh: abort python3 test2.py

i'm running in a macbookPro with M1Pro.
I have no idea how to solve this error, i don't even have idea if i can really solve it or if it depends from llvm. Do you know something about that?
Thanks in advance, i hope in an answer...

@esc
Copy link
Member

esc commented Nov 15, 2022

Dear users, i'm trying to run a script for a python3 program and i get this error 9/10. My script works, but randomly. When the code is too long, sometimes the error appears, some times it doesn't. So i have to run the script multiple times in the hope that it arrives at an end. I paste you the error that i get:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210. zsh: abort python3 test2.py

i'm running in a macbookPro with M1Pro. I have no idea how to solve this error, i don't even have idea if i can really solve it or if it depends from llvm. Do you know something about that? Thanks in advance, i hope in an answer...

@Francyrad thank you for asking about this. You do indeed encounter the same error as reported in this issue. You may consider Numba (more precisely, LLVM) to be broken on M1. There is currently no known fix or workaround and we are not sure if this has been reported upstream to LLVM or if there is a fix in progress. IIRC @sklam also checked LLVM 14 and it appears as though this has not be fixed. My only remaining guess here would be to try to run your script in a docker container on the M1 using a linux-aarch64 docker image. Performance should not be too bad as the hardware will not be simulated in this case. Note however, that I am guessing at this and it may very well also not work.

TL:DR Running Numba on an M1 may cause the segfaults you see above and the only known workaround is to use different hardware.

@Francyrad
Copy link

@esc thank you for your answer, i hope someone will be able to fix that. Please, let me know when it will be fixed commenting this issue

Thank you again

@esc
Copy link
Member

esc commented Nov 15, 2022

@esc thank you for your answer, i hope someone will be able to fix that. Please, let me know when it will be fixed commenting this issue

Thank you again

Yes, we hope so too, if you subscribe to this issue, you will receive updates regarding this quest.

@sfc-gh-jhu
Copy link

sfc-gh-jhu commented Mar 11, 2023

Hi, is there any update on this? I'm under Python3.9 and LLVM 11.1.0 and M1 mac, and am having the same issue right now when running multi-processing of a forecast model (AutoCES) under statsforecast package.
I've tried to bootstrap dev versions of both numba (0.57.0.dev0+1257.gce69f3010) and llvmlite 0.40.0.dev0+70.ge6901e0) from github repos, and still failed and keep facing this issue.

It seems like the temporary fix by #8583 is not working for me.

I have other models tested without issues, but they're all with the numba in the backend to speed up the computing. The only difference that I can think of is this specific model using complex values rather than some real-number values.

With numba (0.46) and llvmlite(0.39), this exactly the same error is raised when running. However, with dev version of numba (0.57.0.dev0+1257.gce69f3010) and llvmlite 0.40.0.dev0+70.ge6901e0), basically the multiprocessing just stuck in the terminal without any errors raised. (But I'm pretty sure it's still the same issue)

Can anyone help here? Thanks @esc @sklam

@Francyrad
Copy link

I have still the issue. Sometimes I waste more of my time try to running my scripts instead of working

@esc
Copy link
Member

esc commented Mar 11, 2023

It seems like the temporary fix by #8583 is not working for me.

#8583 only disables the tests so that we can complete the test-suite and ship the package, so it won't actually help with the issue.

Can anyone help here? Thanks @esc @sklam

No, unfortunately not, there is no known workaround, it's broken in LLVM 11 and 14 (supported by next Numba/llvmlite release). I am not aware of anyone working on a fix at present, so your best bet for now will be to use non-M1/Apple silicon, i.e. change hardware. So sorry I don't have better news for you.

@sklam for reference, was this ever reported to the LLVM issue tracker and if so, can you post the issue ID please? Thank you.

@iamlll
Copy link

iamlll commented May 2, 2023

Just wanted to mention that I'm having the same issue on Mac M1, llvm-openmp 16.0.2 and llvmlite 0.40.0! I run into this issue when solving systems of PDEs using py-pde. I've subscribed to this issue and fingers crossed that it will get fixed in the near-future.

@Francyrad
Copy link

@iamlll another bug that I don't is that the parallelisation with OpenMP don't work with the following chips:

M1Pro, M1Max and M1Ultra

It works just with M1

Is there some llvm where is it possible to do some report?

@esc
Copy link
Member

esc commented May 4, 2023

Just wanted to mention that I'm having the same issue on Mac M1, llvm-openmp 16.0.2 and llvmlite 0.40.0! I run into this issue when solving systems of PDEs using py-pde. I've subscribed to this issue and fingers crossed that it will get fixed in the near-future.

@iamlll The reason you are seeing this with llvmlite 0.40.0 is because it is based on LLVM 14 and that is indeed buggy.

@Francyrad
Copy link

buggy.

So how can we solve the problem
With OpenMP?

@sklam
Copy link
Member Author

sklam commented May 4, 2023

This a problem of the LLVM JIT that we are using (MCJIT) and we need to migrate to OrcJIT (numba/llvmlite#919) so we can use JitLink and hopefully that will fix it.

@esc
Copy link
Member

esc commented May 4, 2023

buggy.

So how can we solve the problem With OpenMP?

This issue is about M1 LLVM Runtimedyld Invalid page reloc value assertion error -- you are inquiring about a different issue here. In order to keep the signal-to-noise low, please open a new issue with the OpenMP issues you are seeing, thank you!

@mzient
Copy link

mzient commented May 22, 2023

The issue is not limited to Apple M1 or MacOS. We're seeing it on Neoverse-N1 running Ubuntu 20.04 ever since we've uprgraded to Numba 0.57. This is a server machine - and not just one. Unfortunately, we cannot downgrade Numba because we need CUDA 12.1 support.

Error message:

python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

System info:

uname -a: Linux <hostname_redacted> 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Mon Aug 8 18:51:21 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

cat /etc/os-release | grep PRETTY
  PRETTY_NAME="Ubuntu 20.04.4 LTS"


lscpu
  Architecture:                    aarch64
  CPU op-mode(s):                  32-bit, 64-bit
  Byte Order:                      Little Endian
  CPU(s):                          80
  On-line CPU(s) list:             0-79
  Thread(s) per core:              1
  Core(s) per socket:              80
  Socket(s):                       1
  NUMA node(s):                    1
  Vendor ID:                       ARM
  Model:                           1
  Model name:                      Neoverse-N1
  Stepping:                        r3p1
  Frequency boost:                 disabled
  CPU max MHz:                     3000.0000
  CPU min MHz:                     1000.0000
  BogoMIPS:                        50.00
  L1d cache:                       5 MiB
  L1i cache:                       5 MiB
  L2 cache:                        80 MiB
  NUMA node0 CPU(s):               0-79
  Vulnerability Itlb multihit:     Not affected
  Vulnerability L1tf:              Not affected
  Vulnerability Mds:               Not affected
  Vulnerability Meltdown:          Not affected
  Vulnerability Mmio stale data:   Not affected
  Vulnerability Retbleed:          Not affected
  Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
  Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
  Vulnerability Spectre v2:        Mitigation; CSV2, BHB
  Vulnerability Srbds:             Not affected
  Vulnerability Tsx async abort:   Not affected
  Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

free -m
                total        used        free      shared  buff/cache   available
  Mem:         514318        5282       73564           6      435471      504537
  Swap:          2047         213        1834

@gmarkall
Copy link
Member

gmarkall commented Jun 2, 2023

I can confirm being able to reproduce a similar issue on a non-M1 AArch64 - in general we can overflow relocations - the assertion is a little different because Linux on AArch64 us using RuntimeDyldELF and not RuntimDyldMachO, but I think the principle (and the root cause) is the same, but I need to investigate further to be sure. At present I'm reproducing with DALI like:

/opt/dali/dali/test/python# DALI_EXTRA_PATH=/opt/dali_extra python -m nose2 --verbose --plugin=nose2_test_timer.plugin --with-timer --timer-color --timer-top-n 20 -A '!slow' -s operator_1 test_numba_func
test_numba_func.test_multiple_ins ... ok
test_numba_func.test_split_images_col ... ok
test_numba_func.test_numba_func:1
[(10, 10, 10)], <class 'numpy.uint8'>, <function set_all_values_to_255_batch at ... ok
test_numba_func.test_numba_func:2
[(10, 10, 10)], <class 'numpy.uint8'>, <function set_all_values_to_255_sample a ... python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

and I need to figure out how to make a Numba-only reproducer.

I'm working on a system very similar to the one reported by @mzient in #8567 (comment) - just some small minor OS / kernel version differences.

@gmarkall
Copy link
Member

gmarkall commented Jun 2, 2023

I couldn't trigger this issue with @sklam's script from #8567 (comment), even after hundreds of runs on a Linux AArch64 system. However, the following (still using DALI, but without needing a test harness) does reproduce the issue pretty reliably:

test_standalone.py
import numpy as np
from nvidia.dali import pipeline_def
import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as dali_types
from nvidia.dali.plugin.numba.fn.experimental import numba_function


def set_all_values_to_255_batch(out0, in0):
    out0[0][:] = 255


def set_all_values_to_255_sample(out0, in0):
    out0[:] = 255


def set_all_values_to_float_batch(out0, in0):
    out0[0][:] = 0.5


def set_all_values_to_float_sample(out0, in0):
    out0[:] = 0.5


def setup_change_out_shape(out_shape, in_shape):
    out0_shape = out_shape[0]
    in0_shape = in_shape[0]
    perm = [1, 2, 0]
    for sample_idx in range(len(out0_shape)):
        for d in range(len(perm)):
            out0_shape[sample_idx][d] = in0_shape[sample_idx][perm[d]]


def change_out_shape_batch(out0, in0):
    for sample_id in range(len(out0)):
        out0[sample_id][:] = 42


def change_out_shape_sample(out0, in0):
    out0[:] = 42


def get_data(shapes, dtype):
    return [np.empty(shape, dtype=dtype) for shape in shapes]


def get_data_zeros(shapes, dtype):
    return [np.zeros(shape, dtype=dtype) for shape in shapes]

@pipeline_def
def numba_func_pipe(shapes, dtype, run_fn=None, out_types=None, in_types=None,
                    outs_ndim=None, ins_ndim=None, setup_fn=None, batch_processing=None):
    data = fn.external_source(lambda: get_data(shapes, dtype), batch=True, device="cpu")
    return numba_function(
        data, run_fn=run_fn, out_types=out_types, in_types=in_types,
        outs_ndim=outs_ndim, ins_ndim=ins_ndim, setup_fn=setup_fn,
        batch_processing=batch_processing)


def _testimpl_numba_func(shapes, dtype, run_fn, out_types, in_types,
                         outs_ndim, ins_ndim, setup_fn, batch_processing, expected_out):
    batch_size = len(shapes)
    pipe = numba_func_pipe(
        batch_size=batch_size, num_threads=1, device_id=0,
        shapes=shapes, dtype=dtype,
        run_fn=run_fn, setup_fn=setup_fn, out_types=out_types,
        in_types=in_types, outs_ndim=outs_ndim, ins_ndim=ins_ndim,
        batch_processing=batch_processing)
    pipe.build()
    for _ in range(3):
        outs = pipe.run()
        for i in range(batch_size):
            out_arr = np.array(outs[0][i])
            assert np.array_equal(out_arr, expected_out[i])

def test_numba_func():
    # shape, dtype, run_fn, out_types,
    # in_types, out_ndim, in_ndim, setup_fn, batch_processing,
    # expected_out
    args = [
        ([(10, 10, 10)], np.uint8, set_all_values_to_255_batch, [dali_types.UINT8],
         [dali_types.UINT8], [3], [3], None, True,
         [np.full((10, 10, 10), 255, dtype=np.uint8)]),
        ([(10, 10, 10)], np.uint8, set_all_values_to_255_sample, [dali_types.UINT8],
         [dali_types.UINT8], [3], [3], None, None,
         [np.full((10, 10, 10), 255, dtype=np.uint8)]),
        ([(10, 10, 10)], np.float32, set_all_values_to_float_batch, [dali_types.FLOAT],
         [dali_types.FLOAT], [3], [3], None, True,
         [np.full((10, 10, 10), 0.5, dtype=np.float32)]),
        ([(10, 10, 10)], np.float32, set_all_values_to_float_sample, [dali_types.FLOAT],
         [dali_types.FLOAT], [3], [3], None, None,
         [np.full((10, 10, 10), 0.5, dtype=np.float32)]),
        ([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_batch, [dali_types.INT64],
         [dali_types.INT64], [3], [3], setup_change_out_shape, True,
         [np.full((20, 30, 10), 42, dtype=np.int32),
          np.full((10, 30, 20), 42, dtype=np.int32)]),
        ([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_sample, [dali_types.INT64],
         [dali_types.INT64], [3], [3], setup_change_out_shape, None,
         [np.full((20, 30, 10), 42, dtype=np.int32),
          np.full((10, 30, 20), 42, dtype=np.int32)]),
    ]

    for shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \
            setup_fn, batch_processing, expected_out in args:
        _testimpl_numba_func(
            shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \
            setup_fn, batch_processing, expected_out
        )

test_numba_func()

which gives this on almost every run:

$ python test_standalone.py 
python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

@gmarkall
Copy link
Member

I can give you a script that is able to reproduce it quite often if that can help

Yes please!

@Francyrad
Copy link

Francyrad commented Aug 24, 2023

I can give you a script that is able to reproduce it quite often if that can help

Yes please!

please write me in francyrad.info@gmail.com

The script and the file that you will read is quite big

@carstenr
Copy link

Alright, that means we got two large cases then to reporoduce. We will focus on reducing is as much as possible.

@gmarkall
Copy link
Member

Alright, that means we got two large cases then to reporoduce. We will focus on reducing is as much as possible.

Another thought I think worth sharing - it should be possible to get to a reproducer that doesn't depend on Numba at all - if it's minimised as much as possible, it would just involve calls to llvmlite. (Or even simpler than that, a small C++ source that links to LLVM only, to even take llvmlite out of the loop - but I think the "just llvmlite" case would already be a good starting point)

@carstenr
Copy link

Might take a while to get there as our developers naturally have a strong python background. We will start with a minimal nixtla setup, which is where this popped up for us. And from there on we will work our way down.

@PhilipVinc
Copy link

PhilipVinc commented Oct 19, 2023

Bump.

I am consistently seeing this on M1 Pro and M2. It's a bit involved, but it occurs with ~30% probability in my code.

Are you still looking for a reproducer @gmarkall ?

@PhilipVinc
Copy link

FYI by googling I noticed that when porting Julia to ARM they also hit the same bug. Look at JuliaLang/julia#36617 and search in the page for "Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."),".

Apparently, if this can help at all, the PR that fixed the issue was JuliaLang/julia#43664 ...

@Francyrad
Copy link

FYI by googling I noticed that when porting Julia to ARM they also hit the same bug. Look at JuliaLang/julia#36617 and search in the page for "Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."),".

Apparently, if this can help at all, the PR that fixed the issue was JuliaLang/julia#43664 ...

The problem is still present

@gmarkall
Copy link
Member

Are you still looking for a reproducer @gmarkall ?

Luckily, and coincidentally, I was working on this today, and I now have a pretty good one, which I'm going to add to #9001 because I'm tackling the issue on Linux AArch64 at present.

In case you want to try it, it's:

from numba import njit

@njit
def f(x, y):
    return x + y

i = 0

while True:
    print(i)
    t = tuple(range(i))
    f(t, (1j,))
    i += 1

executed with:

$ ulimit -s 1048576
$ python repro.py

gives:

0
1
2
3
4
5
6
7
8
9
python: /opt/conda/conda-bld/llvmdev_1684517249134/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

It'd be interesting to know if that also triggers the error on your Mac. You might need to do something similar to my ulimit invocation above to increase the stack limit.

@PhilipVinc
Copy link

I can't set the ulimit to such large numbers on Mac. it errors with

ulimit: value exceeds hard limit

The largest ulimit I can set is ulimit -s 65520

but it is not crashing for now...

@gmarkall
Copy link
Member

What number did it get to before you stopped it?

@PhilipVinc
Copy link

PhilipVinc commented Oct 19, 2023 via email

@gmarkall
Copy link
Member

That would be great if you could give it a go!

@gmarkall
Copy link
Member

@PhilipVinc Is it still running? :-)

@PhilipVinc
Copy link

PhilipVinc commented Oct 24, 2023

@gmarkall it crashes at 1001 but I think this is due to some check in numba itself?

999
1000
1001
Traceback (most recent call last):
  File "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/repro.py", line 12, in <module>
  File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 471, in _compile_for_args
    error_rewrite(e, 'unsupported_error')
  File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure features that are in use are in a valid form)
Tuple 'x' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.

File "repro.py", line 3:
<source missing, REPL/exec in use?>

EDIT: This is with ulimit -s 65520

@gmarkall
Copy link
Member

@PhilipVinc Thanks - indeed, that was a Numba limitation. I think in #9001 and https://github.com/gmarkall/numba-issue-9001 we're getting close to a really good reproducer now, so there's probably no need for additional testing here - thanks for everything you've looked into so far :-)

@gmarkall
Copy link
Member

gmarkall commented Nov 3, 2023

LLVM discourse discussion started to discuss a potential fix: https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616

@gmarkall
Copy link
Member

@Francyrad @PhilipVinc @carstenr It's early work at the moment, but if you're able to build llvmlite from source with the PR numba/llvmlite#1009, and let me know whether you still observe the issue with it (or observe any other issues) that would be good feedback - hopefully this resolves the issue, but there's a lot of testing / review to be done to have confidence in the strategy.

@jacobjivanov
Copy link

I have experienced this issue repeatedly over the past month, getting errors similar to the following for my ~150 line code for solving a specific PDE:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210.

@gmarkall, I'm not quite sure how to build from source but am happy to try and test it out.

@gmarkall
Copy link
Member

@jacobjivanov Thanks for sharing this info - fortunately you don't need to build from source to test the fix now, as it's part of the llvmlite 0.42 / Numba 0.59 release candidates. You can follow the instructions here to install the Numba and llvmlite release candidates: https://numba.discourse.group/t/ann-numba-0-59-0rc1-and-llvmlite-0-42-0rc1/2329

If you try this, I'd really appreciate if you can let me know whether it appears to have solved the issue for you.

@jacobjivanov
Copy link

@gmarkall, I can't confirm whether it'll ever fail, but it no longer fails for the particular script that would fail roughly 50% of the time previously. Ran it ~20 times with different initial conditions.

@GeorgWa
Copy link

GeorgWa commented Dec 22, 2023

@gmarkall Your work is greatly appreciated! Switching to the release candidate also solved the issue for one of our packages which would occasionally fail.

@esc
Copy link
Member

esc commented Feb 5, 2024

With llvmlite now at 0.42.0 and the new memory manager merged, can we close this?

@gmarkall
Copy link
Member

gmarkall commented Feb 5, 2024

I've not heard of any reports of this issue manifesting in llvmlite 0.42, so I think so.

@esc
Copy link
Member

esc commented Feb 5, 2024

I've not heard of any reports of this issue manifesting in llvmlite 0.42, so I think so.

Alright, let's put a proverbial checkmark behind this issue. We always have the option to re-open in case.

@gmarkall thank you again for the fix for this, it is much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS ISA: ARM Issue related to ARM ISA llvm LLVM related issues
Projects
None yet
Development

No branches or pull requests