Bad Parfor loop fusion side-effects? #6920

sklam · 2021-04-12T17:19:24Z

Reporting a bug

I have tried using the latest ~~released version~~ HEAD (476d192) ( of Numba (most recent is
visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem.
i.e. it's possible to run as 'python bug.py'.

Example:

import numpy as np
from numba import njit, prange


@njit
def array_add(arr, idx, val):
    old = arr[idx]
    arr[idx] += val
    return old


@njit(parallel=dict(prange=True, fusion=True))  # the problem goes away where `fusion=False`
def foo(arr):
    accum = np.zeros(1, dtype=arr.dtype) 
    for i in prange(arr.size):
        array_add(accum, 0, arr[i])

    return accum[0]


total = foo(np.arange(10, dtype=np.intp))
print(total) 

foo.parallel_diagnostics(level=4)

This always print 0 when fusion=True and print other values (random due to race condition) when fusion=False. The code has no valid fusion because the loop domain in np.zeros(1, ...) is not equivalent to the for i in prange(arr.size).

FYI, what I am trying to do is adding an atomic add intrinsic: https://gist.github.com/sklam/e5496e412fccac6acc0e96b4413ed977

The text was updated successfully, but these errors were encountered:

sklam · 2021-04-13T22:25:27Z

The problem seems to be coming from

numba/numba/parfors/parfor.py

Line 2827 in 835e56c

maximize_fusion(self.func_ir, self.func_ir.blocks, self.typemap)

The self.func_ir.render_dot() before that statement:

The one after that statement:

I highlighted the static_getitem associated with return accum[0] and it shows that the maximize_fusion() moved the static_getitem before the prange loop. Is it not recognizing the potential side effect from array_add().

(CC @DrTodd13)

stuartarchibald · 2021-04-21T13:16:44Z

Think this might fix it:

diff --git a/numba/core/ir_utils.py b/numba/core/ir_utils.py
index cf2080c..68b1cbd 100644
--- a/numba/core/ir_utils.py
+++ b/numba/core/ir_utils.py
@@ -727,6 +727,8 @@ def has_no_side_effect(rhs, lives, call_table):
         return False
     if isinstance(rhs, ir.Expr) and rhs.op == 'inplace_binop':
         return rhs.lhs.name not in lives
+    if isinstance(rhs, ir.Expr) and rhs.op in ('static_getitem', 'getitem'):
+        return rhs.value.name not in lives
     if isinstance(rhs, ir.Yield):
         return False
     if isinstance(rhs, ir.Expr) and rhs.op == 'pair_first':
diff --git a/numba/parfors/parfor.py b/numba/parfors/parfor.py
index 89a582e..a798475 100644
--- a/numba/parfors/parfor.py
+++ b/numba/parfors/parfor.py
@@ -3855,7 +3855,7 @@ def _can_reorder_stmts(stmt, next_stmt, func_ir, call_table,
         and not isinstance(next_stmt, Parfor)
         and not isinstance(next_stmt, ir.Print)
         and (not isinstance(next_stmt, ir.Assign)
-            or has_no_side_effect(next_stmt.value, set(), call_table)
+            or has_no_side_effect(next_stmt.value, [x.name for x in next_stmt.list_vars()], call_table)
             or guard(is_assert_equiv, func_ir, next_stmt.value))):
         stmt_accesses = expand_aliases({v.name for v in stmt.list_vars()},
                                        alias_map, arg_aliases)

ehsantn · 2021-04-21T13:49:11Z

Assuming getitems have side-effect doesn't match the semantics. maximize_fusion fusion should understand that array_add can potentially write to accum.

stuartarchibald · 2021-04-21T14:17:28Z

Assuming getitems have side-effect doesn't match the semantics. maximize_fusion fusion should understand that array_add can potentially write to accum.

Ah yes, this is true, it fixes the effect and not the cause. Any idea where in parfors the write association can be expressed?

ehsantn · 2021-04-21T17:04:36Z

I think get_stmt_writes could be conservative and assume that mutable arguments of user function calls are writes. However, we should be careful and not include internal calls like Numpy calls that don't change input, since it could hurt optimizations.

DrTodd13 · 2021-04-21T17:07:57Z

Yes, this is the correct approach.

I think get_stmt_writes could be conservative and assume that mutable arguments of user function calls are writes. However, we should be careful and not include internal calls like Numpy calls that don't change input, since it could hurt optimizations.

sklam added bug - incorrect behavior Bugs: incorrect behavior ParallelAccelerator labels Apr 12, 2021

sklam mentioned this issue Apr 12, 2021

Support For CPU Atomics #2988

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad Parfor loop fusion side-effects? #6920

Bad Parfor loop fusion side-effects? #6920

sklam commented Apr 12, 2021

sklam commented Apr 13, 2021

stuartarchibald commented Apr 21, 2021

ehsantn commented Apr 21, 2021

stuartarchibald commented Apr 21, 2021

ehsantn commented Apr 21, 2021

DrTodd13 commented Apr 21, 2021

Bad Parfor loop fusion side-effects? #6920

Bad Parfor loop fusion side-effects? #6920

Comments

sklam commented Apr 12, 2021

Reporting a bug

sklam commented Apr 13, 2021

stuartarchibald commented Apr 21, 2021

ehsantn commented Apr 21, 2021

stuartarchibald commented Apr 21, 2021

ehsantn commented Apr 21, 2021

DrTodd13 commented Apr 21, 2021