Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder block traversal order for correct hoisting. #9397

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

DrTodd13
Copy link
Contributor

Resolves #9379

@DrTodd13
Copy link
Contributor Author

Please suggest a testing strategy. The original example is quite complex and it isn't even clear how to simplify it. I think the original bug generates a race condition and too many simplifications may cause the bug not to manifest. Alternatively, we could try to look at the diagnostics and see if any build_lists are being hoisted.

@netomenoci
Copy link

Hi @DrTodd13 .
My original code was much more complex than the one shared in the reproducible, and I have simplified it the most I could. Further simplifications don't produce the error, which makes this error particularly difficult to be found. If you want , I can add a test similar to the one I have shared in the reproducible. If so, would you want me to make a PR into this branch?

@sklam sklam self-requested a review January 23, 2024 15:16
@DrTodd13
Copy link
Contributor Author

The prior code was actually okay but requires the blocks to be processed in the right order. The critical thing is that if there is a getattr on an object and then a call of that getattr, the getattr has to precede the call for the code that detects multiple writes to an object to detect this scenario as a second write. My attempt at copying the offending code into a Numba test didn't work because the block order there was one that didn't manifest the problem. Any test you could write now that used normal code I suppose you'd have no guarantee that the block order would continue to manifest the problem with any change later.

@DrTodd13 DrTodd13 changed the title Forbid hoisting of mutable types in parfor lowering. Reorder block traversal order for correct hoisting. Jan 25, 2024
@DrTodd13
Copy link
Contributor Author

@sklam Do you think these test failures are related to the PR? Seems unrelated to me.

@DrTodd13
Copy link
Contributor Author

DrTodd13 commented Feb 6, 2024

/azurepipelines run

Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@DrTodd13
Copy link
Contributor Author

DrTodd13 commented Feb 6, 2024

/azurepipelines run

Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@netodw
Copy link

netodw commented May 6, 2024

@sklam , would you be able to review this PR? The issue is still there.

@thomasfrederikhoeck
Copy link

might be related to #9582 but I'm not sure to be honest.

@sklam
Copy link
Member

sklam commented May 28, 2024

Confirmed that this fixes #9379 but it has also been fixed by #9532.
This doesn't fix #9586.

@DrTodd13
Copy link
Contributor Author

Confirmed that this fixes #9379 but it has also been fixed by #9532. This doesn't fix #9586.

So, 9532 fixes 9379 and 9586? 9397 only fixes 9379?

@sklam
Copy link
Member

sklam commented May 28, 2024

Confirmed that this fixes #9379 but it has also been fixed by #9532. This doesn't fix #9586.

So, 9532 fixes 9379 and 9586? 9397 only fixes 9379?

#9532 fixes #9379
#9532 does NOT fix #9586.
#9586 as a PR is needed for #9581, #9490.

I am trying to understand what remaining cases are fixed by this PR uniquely. The original problem in #9379 seems to come from build_tuple hiding dependency of mutable object in code like:

func([<listcomp>], *args)

where the build_tuple comes from handling of *args, which is translated to func( ( [<listcomp>] ,) + args ). I tested this by changing the original reproducer from

        res = func([data[start_idx:end_idx, ...] for data in dataargs], *funcargs) 

to

        res = func([data[start_idx:end_idx, ...] for data in dataargs]) 

and confirmed that the bug goes away.

I am unable to find a case that depends on the block traversal order. I believe such a case may appear from Numba internal changes; e.g. changes to inline-closure, changes to parfors details.

I think this patch will prevent non-deterministic bug that may be introduced from randomized block insertion from other passes or changes to internal details. Since it seems unlikely for users to create such a case, and I am slightly concerned that the reordering may expose some unforeseen bugs, I suggest we merge it for 0.61 milestone.

@sklam sklam added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review labels May 28, 2024
@sklam sklam added this to the 0.61.0-rc1 milestone May 28, 2024
@sklam sklam added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on reviewer Waiting for reviewer to respond to author ParallelAccelerator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

njit(parallel = True) gives wrong output
5 participants