Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba appears to merge consecutive prange calls #8515

Closed
2 tasks done
adetunji-david opened this issue Oct 16, 2022 · 4 comments · Fixed by #8536
Closed
2 tasks done

Numba appears to merge consecutive prange calls #8515

adetunji-david opened this issue Oct 16, 2022 · 4 comments · Fixed by #8536

Comments

@adetunji-david
Copy link

Reporting a bug

  • I have tried using the latest released version of Numba (most recent is
    visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
  • I have included a self contained code sample to reproduce the problem.
    i.e. it's possible to run as 'python bug.py'.

Consider the following code that uses parallelization to create a range from 0 - 15 and subtracts each element from the previous element in the array.

import numpy as np
from numba import njit, prange


@njit
def do_long_calculation():
    x = 0.0
    for k in range(1000):
        x += np.random.standard_normal()
    return x


@njit
def serial_f(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in range(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    for i in range(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


@njit(parallel=True)
def parallel_f(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    for i in prange(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


@njit(parallel=True)
def parallel_f2(n):
    results = np.zeros(n, dtype=np.intp)
    counters = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            do_long_calculation()
            counters[i] += 1

    do_long_calculation()

    for i in prange(n):
        if i == 0:
            continue
        results[i] = counters[i] - counters[i - 1]

    return results[1:]


if __name__ == "__main__":
    n = 15
    print("serial")
    print(serial_f(n))
    print("parallel with intervening code")
    print(parallel_f2(n))
    print("parallel")
    print(parallel_f(n))

This prints

serial
[1 1 1 1 1 1 1 1 1 1 1 1 1 1]
parallel with intervening code
[1 1 1 1 1 1 1 1 1 1 1 1 1 1]
parallel
[ 1  1  1  3  1  4  1  6  1  6  1 12  1  6]

The expected result is an array of 1s, but the parallelized variant without any intervening code between the prange blocks gives an incorrect result that changes with each invocation.

@stuartarchibald
Copy link
Contributor

Thanks for the report @adetunji-david. Which Numba version are you using? I think that #7864 may have fixed this issue, the first version it is present in is Numba 0.56.0.

@adetunji-david
Copy link
Author

@stuartarchibald pip list shows I am running version 0.56.3

@stuartarchibald
Copy link
Contributor

Thanks for confirming @adetunji-david. I've been using this as a possible reproducer:

from numba import njit, prange
import numpy as np

@njit
def work():
    x = 0.0
    for k in range(100):
        x += np.random.standard_normal()
    return x

@njit(parallel=True)
def foo(n):
    r = np.zeros(n, dtype=np.intp)
    c = np.zeros(n, dtype=np.intp)
    for i in prange(n):
        for j in range(i):
            work()
            c[i] += 1

    for i in prange(n):
        if i == 0:
            continue
        r[i] = c[i] - c[i - 1]
    return r[1:]

print(foo(15))
foo.parallel_diagnostics(level=4)

running this on main, this is the relevant part of the parallel diagnostics output:

================================================================================
====== Parallel Accelerator Optimizing:  Function foo, bug.py (11)  ======
================================================================================


Parallel loop listing for  Function foo, bug.py (11) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def foo(n):                           | 
    r = np.zeros(n, dtype=np.intp)----| #0
    c = np.zeros(n, dtype=np.intp)----| #1
    for i in prange(n):---------------| #2
        for j in range(i):            | 
            work()                    | 
            c[i] += 1                 | 
                                      | 
    for i in prange(n):---------------| #3
        if i == 0:                    | 
            continue                  | 
        r[i] = c[i] - c[i - 1]        | 
    return r[1:]                      | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #0 and #1:
    - fusion succeeded: parallel for-loop #1 is fused into for-loop #0.
  Trying to fuse loops #2 and #3:
    - fusion failed: cross iteration dependency found between loops #2 and #3
  Trying to fuse loops #0 and #2:
    - fusion succeeded: parallel for-loop #2 is fused into for-loop #0.
  Trying to fuse loops #0 and #3:
    - fusion failed: cross iteration dependency found between loops #0 and #3
  Trying to fuse loops #0 and #3:
    - fusion failed: cross iteration dependency found between loops #0 and #3
----------------------------- Before Optimisation ------------------------------
Parallel region 0:
+--0 (parallel)
+--1 (parallel)
+--2 (parallel)


--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel region 0:
+--0 (parallel, fused with loop(s): 1, 2)


 
Parallel region 0 (loop #0) had 2 loop(s) fused.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

the loop fusion algorithm reports that #1 and #2 are is fused into #0, but the fused loop #0 and loop #3 have cross iteration dependencies and so refuse to fuse further.

@stuartarchibald
Copy link
Contributor

stuartarchibald commented Oct 21, 2022

During the issue triage meeting this week @guilhermeleobas managed to get this to reproduce locally with the OP issue against main branch. @DrTodd13 any chance you could take a look at this please? Thanks!

@stuartarchibald stuartarchibald added bug - incorrect behavior Bugs: incorrect behavior and removed needtriage labels Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants