Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Parfor loop fusion side-effects? #6920

Open
2 tasks done
sklam opened this issue Apr 12, 2021 · 6 comments
Open
2 tasks done

Bad Parfor loop fusion side-effects? #6920

sklam opened this issue Apr 12, 2021 · 6 comments

Comments

@sklam
Copy link
Member

sklam commented Apr 12, 2021

Reporting a bug

Example:

import numpy as np
from numba import njit, prange


@njit
def array_add(arr, idx, val):
    old = arr[idx]
    arr[idx] += val
    return old


@njit(parallel=dict(prange=True, fusion=True))  # the problem goes away where `fusion=False`
def foo(arr):
    accum = np.zeros(1, dtype=arr.dtype) 
    for i in prange(arr.size):
        array_add(accum, 0, arr[i])

    return accum[0]


total = foo(np.arange(10, dtype=np.intp))
print(total) 

foo.parallel_diagnostics(level=4)

This always print 0 when fusion=True and print other values (random due to race condition) when fusion=False. The code has no valid fusion because the loop domain in np.zeros(1, ...) is not equivalent to the for i in prange(arr.size).

FYI, what I am trying to do is adding an atomic add intrinsic: https://gist.github.com/sklam/e5496e412fccac6acc0e96b4413ed977

@sklam
Copy link
Member Author

sklam commented Apr 13, 2021

The problem seems to be coming from

maximize_fusion(self.func_ir, self.func_ir.blocks, self.typemap)

The self.func_ir.render_dot() before that statement:

after firstfoo$1

The one after that statement:

after maxfoo$1 dot

I highlighted the static_getitem associated with return accum[0] and it shows that the maximize_fusion() moved the static_getitem before the prange loop. Is it not recognizing the potential side effect from array_add().

(CC @DrTodd13)

@stuartarchibald
Copy link
Contributor

Think this might fix it:

diff --git a/numba/core/ir_utils.py b/numba/core/ir_utils.py
index cf2080c..68b1cbd 100644
--- a/numba/core/ir_utils.py
+++ b/numba/core/ir_utils.py
@@ -727,6 +727,8 @@ def has_no_side_effect(rhs, lives, call_table):
         return False
     if isinstance(rhs, ir.Expr) and rhs.op == 'inplace_binop':
         return rhs.lhs.name not in lives
+    if isinstance(rhs, ir.Expr) and rhs.op in ('static_getitem', 'getitem'):
+        return rhs.value.name not in lives
     if isinstance(rhs, ir.Yield):
         return False
     if isinstance(rhs, ir.Expr) and rhs.op == 'pair_first':
diff --git a/numba/parfors/parfor.py b/numba/parfors/parfor.py
index 89a582e..a798475 100644
--- a/numba/parfors/parfor.py
+++ b/numba/parfors/parfor.py
@@ -3855,7 +3855,7 @@ def _can_reorder_stmts(stmt, next_stmt, func_ir, call_table,
         and not isinstance(next_stmt, Parfor)
         and not isinstance(next_stmt, ir.Print)
         and (not isinstance(next_stmt, ir.Assign)
-            or has_no_side_effect(next_stmt.value, set(), call_table)
+            or has_no_side_effect(next_stmt.value, [x.name for x in next_stmt.list_vars()], call_table)
             or guard(is_assert_equiv, func_ir, next_stmt.value))):
         stmt_accesses = expand_aliases({v.name for v in stmt.list_vars()},
                                        alias_map, arg_aliases)

@ehsantn
Copy link
Collaborator

ehsantn commented Apr 21, 2021

Assuming getitems have side-effect doesn't match the semantics. maximize_fusion fusion should understand that array_add can potentially write to accum.

@stuartarchibald
Copy link
Contributor

Assuming getitems have side-effect doesn't match the semantics. maximize_fusion fusion should understand that array_add can potentially write to accum.

Ah yes, this is true, it fixes the effect and not the cause. Any idea where in parfors the write association can be expressed?

@ehsantn
Copy link
Collaborator

ehsantn commented Apr 21, 2021

I think get_stmt_writes could be conservative and assume that mutable arguments of user function calls are writes. However, we should be careful and not include internal calls like Numpy calls that don't change input, since it could hurt optimizations.

@DrTodd13
Copy link
Collaborator

Yes, this is the correct approach.

I think get_stmt_writes could be conservative and assume that mutable arguments of user function calls are writes. However, we should be careful and not include internal calls like Numpy calls that don't change input, since it could hurt optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants