Segfault when using assert with parallel=True #9029

99991 · 2023-06-21T13:41:57Z

The following code triggers a segfault. Tested on Ubuntu 22.04 with Numba 0.56.4 and Numba 0.57.0.

from numba import njit
import numpy as np

@njit(parallel=True)
def foo():
    n = 1000

    for _ in range(n):
        for _ in range(n):
            values = np.zeros(3)

            np.max(values)

            assert n >= 0

foo()

I have tried using the latest released version of Numba (most recent is
visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem.
i.e. it's possible to run as 'python bug.py'.

esc · 2023-06-21T13:50:20Z

This happens when n is too big. On my machine I can get n to be 323 and with 324 it segfaults.

esc · 2023-06-21T13:50:42Z

ping @DrTodd13

DrTodd13 · 2023-06-22T16:47:48Z

@99991 You can do the body of your nested for loops in another function and that should solve the problem. I wonder if this MWR obscures the issue a bit because parallelizing a "max" over 3 elements isn't very useful and in fact will slow the program down. Can the outer loops be collapsed into one loop with pndindex or one of the loops here made into a prange? If so, then the "max" inside the loop nest will be lowered sequentially and the problem will also go away. We think we know why this segfault is occurring and the fix is probably moderately difficult and we will be working on it but wanted to give some workarounds in the meantime.

99991 · 2023-06-22T18:04:54Z

wanted to give some workarounds in the meantime.

Thank you!

I wonder if this MWR obscures the issue a bit because parallelizing a "max" over 3 elements isn't very useful and in fact will slow the program down.

Definitely. But I doubt that anyone would read the 700 lines of filter kernels that this has been condensed from and removing the prange was required for the M in MWR.

You can do the body of your nested for loops in another function and that should solve the problem.

Perhaps, but the original code would then need to pass about 16 parameters, which is a bit much I think.

Can the outer loops be collapsed into one loop with pndindex or one of the loops here made into a prange? If so, then the "max" inside the loop nest will be lowered sequentially and the problem will also go away.

The original code computes something for every pixel of an image without any pixel dependencies (embarrassingly parallel), so both of the loops could make use of prange. Hadn't heard of pndindex yet.

Anyway, when using prange for any loop and changing the assert to i >= 0, there are still segfaults:

from numba import njit, prange
import numpy as np
import os

os.environ["NUMBA_PARALLEL_DIAGNOSTICS"] = "4"

@njit(parallel=True)
def foo():
    for y in prange(1000):
        for x in range(1000):
            values = np.zeros(3)

            i = np.argmax(values)

            assert i >= 0

foo()

I do not know how to read the diagnostics, but it seems that something is still happening with argmax.

gmarkall · 2023-06-27T14:06:55Z

@DrTodd13 Is it likely we can fix this by 0.58?

DrTodd13 · 2023-06-27T22:59:17Z

@99991 @stuartarchibald This doesn't appear to be documented anywhere but the parallelism code will only convert a prange to run in parallel if it is able to rewrite the loop into a parfor and it can only do that if there is only one exit from the loop. Because you have an assert in there, there is more than 1 exit from the loop. If you think about it, to some extent an assert here is ill-defined because normally you'd hit the first such extraneous control flow but in parallel you might hit a different assert or a different element would cause the assert. I'm not sure this was planned but it is a bit fortunate because the parfor is turned into a gufunc and gufuncs swallow exceptions anyway.

So, if you take out the assert then the parfors in the loop body (which are the zeros and the argmax) are lowered sequentially and you don't get a segfault. Suggestions Stuart?

stuartarchibald · 2023-07-06T16:29:17Z

@DrTodd13 maybe:

Document the lack of support (or if it's easy, which I don't think it necessarily will be, catch the issue with >1 loop exits at compile time)?
Fix the stack overflow issue (which seems to be in Dynamically allocate parfor schedule. #9048)?

How does that sound?

DrTodd13 · 2023-07-07T00:16:20Z

@DrTodd13 maybe:

Document the lack of support (or if it's easy, which I don't think it necessarily will be, catch the issue with >1 loop exits at compile time)?

Fix the stack overflow issue (which seems to be in Dynamically allocate parfor schedule. #9048)?

How does that sound?

I added documentation and a warning when a prange loop isn't parallelized (also in PR 9048). I also fixed the stack overflow issue in 9048.

esc added bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS ParallelAccelerator labels Jun 21, 2023

DrTodd13 mentioned this issue Jun 28, 2023

Dynamically allocate parfor schedule. #9048

Merged

sklam mentioned this issue Aug 28, 2023

segfault (crash) when calling numpy np.isnan repeatedly in nopython parallel function #9167

Closed

2 tasks

esc closed this as completed in #9048 Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when using assert with parallel=True #9029

Segfault when using assert with parallel=True #9029

99991 commented Jun 21, 2023

esc commented Jun 21, 2023

esc commented Jun 21, 2023

DrTodd13 commented Jun 22, 2023

99991 commented Jun 22, 2023 •

edited

gmarkall commented Jun 27, 2023

DrTodd13 commented Jun 27, 2023

stuartarchibald commented Jul 6, 2023

DrTodd13 commented Jul 7, 2023

Segfault when using assert with parallel=True #9029

Segfault when using assert with parallel=True #9029

Comments

99991 commented Jun 21, 2023

esc commented Jun 21, 2023

esc commented Jun 21, 2023

DrTodd13 commented Jun 22, 2023

99991 commented Jun 22, 2023 • edited

gmarkall commented Jun 27, 2023

DrTodd13 commented Jun 27, 2023

stuartarchibald commented Jul 6, 2023

DrTodd13 commented Jul 7, 2023

99991 commented Jun 22, 2023 •

edited