Numba appears to merge consecutive prange calls #8515

adetunji-david · 2022-10-16T09:13:45Z

Reporting a bug

I have tried using the latest released version of Numba (most recent is
visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem.
i.e. it's possible to run as 'python bug.py'.

Consider the following code that uses parallelization to create a range from 0 - 15 and subtracts each element from the previous element in the array.

import numpy as np
from numba import njit, prange


@njit
def do_long_calculation():
    x = 0.0
    for k in range(1000):
        x += np.random.standard_normal()
    return x


@njit
def serial_f(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in range(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    for i in range(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


@njit(parallel=True)
def parallel_f(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    for i in prange(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


@njit(parallel=True)
def parallel_f2(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    do_long_calculation()

    for i in prange(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


if __name__ == "__main__":
    n = 15
    print("serial")
    print(serial_f(n))
    print("parallel with intervening code")
    print(parallel_f2(n))
    print("parallel")
    print(parallel_f(n))

This prints

serial
[1 1 1 1 1 1 1 1 1 1 1 1 1 1]
parallel with intervening code
[1 1 1 1 1 1 1 1 1 1 1 1 1 1]
parallel
[ 1  1  1  3  1  4  1  6  1  6  1 12  1  6]

The expected result is an array of 1s, but the parallelized variant without any intervening code between the prange blocks gives an incorrect result that changes with each invocation.

The text was updated successfully, but these errors were encountered:

stuartarchibald · 2022-10-17T10:02:06Z

Thanks for the report @adetunji-david. Which Numba version are you using? I think that #7864 may have fixed this issue, the first version it is present in is Numba 0.56.0.

adetunji-david · 2022-10-17T12:57:48Z

@stuartarchibald pip list shows I am running version 0.56.3

stuartarchibald · 2022-10-18T11:49:36Z

Thanks for confirming @adetunji-david. I've been using this as a possible reproducer:

from numba import njit, prange
import numpy as np

@njit
def work():
    x = 0.0
    for k in range(100):
        x += np.random.standard_normal()
    return x

@njit(parallel=True)
def foo(n):
    r = np.zeros(n, dtype=np.intp)
    c = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            work()
            c[i] += 1

    for i in prange(n):
        if i == 0:
            continue
        r[i] = c[i] - c[i - 1]
    return r[1:]

print(foo(15))
foo.parallel_diagnostics(level=4)

running this on main, this is the relevant part of the parallel diagnostics output:

================================================================================
====== Parallel Accelerator Optimizing:  Function foo, bug.py (11)  ======
================================================================================


Parallel loop listing for  Function foo, bug.py (11) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def foo(n):                           | 
    r = np.zeros(n, dtype=np.intp)----| #0
    c = np.zeros(n, dtype=np.intp)----| #1
    for i in prange(n):---------------| #2
        for j in range(i):            | 
            work()                    | 
            c[i] += 1                 | 
                                      | 
    for i in prange(n):---------------| #3
        if i == 0:                    | 
            continue                  | 
        r[i] = c[i] - c[i - 1]        | 
    return r[1:]                      | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #0 and #1:
    - fusion succeeded: parallel for-loop #1 is fused into for-loop #0.
  Trying to fuse loops #2 and #3:
    - fusion failed: cross iteration dependency found between loops #2 and #3
  Trying to fuse loops #0 and #2:
    - fusion succeeded: parallel for-loop #2 is fused into for-loop #0.
  Trying to fuse loops #0 and #3:
    - fusion failed: cross iteration dependency found between loops #0 and #3
  Trying to fuse loops #0 and #3:
    - fusion failed: cross iteration dependency found between loops #0 and #3
----------------------------- Before Optimisation ------------------------------
Parallel region 0:
+--0 (parallel)
+--1 (parallel)
+--2 (parallel)


--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel region 0:
+--0 (parallel, fused with loop(s): 1, 2)


 
Parallel region 0 (loop #0) had 2 loop(s) fused.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

the loop fusion algorithm reports that #1 and #2 are is fused into #0, but the fused loop #0 and loop #3 have cross iteration dependencies and so refuse to fuse further.

stuartarchibald · 2022-10-21T10:55:19Z

During the issue triage meeting this week @guilhermeleobas managed to get this to reproduce locally with the OP issue against main branch. @DrTodd13 any chance you could take a look at this please? Thanks!

guilhermeleobas added the needtriage label Oct 17, 2022

stuartarchibald added the ParallelAccelerator label Oct 17, 2022

stuartarchibald added bug - incorrect behavior Bugs: incorrect behavior and removed needtriage labels Oct 21, 2022

DrTodd13 mentioned this issue Oct 24, 2022

Fix fusion bug. #8536

Merged

sklam closed this as completed in #8536 Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numba appears to merge consecutive prange calls #8515

Numba appears to merge consecutive prange calls #8515

adetunji-david commented Oct 16, 2022

stuartarchibald commented Oct 17, 2022

adetunji-david commented Oct 17, 2022

stuartarchibald commented Oct 18, 2022

stuartarchibald commented Oct 21, 2022 •

edited

Numba appears to merge consecutive prange calls #8515

Numba appears to merge consecutive prange calls #8515

Comments

adetunji-david commented Oct 16, 2022

Reporting a bug

stuartarchibald commented Oct 17, 2022

adetunji-david commented Oct 17, 2022

stuartarchibald commented Oct 18, 2022

stuartarchibald commented Oct 21, 2022 • edited

stuartarchibald commented Oct 21, 2022 •

edited