New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./lib/llvmopencl/Kernel.cc:129: pocl::ParallelRegion* pocl::Kernel::createParallelRegionBefore(llvm::BasicBlock*): Assertion `region_entry_barrier != NULL' failed #1435
Comments
Hi. If you can create a reproducer in C/C++ with as small kernel as possible that causes it, it wold be helpful in trying to tackle the underlying problem. Meanwhile you can use the CBS for this test case. |
Here some information It seems that this is a llvm issue... llvm 15 ok |
I simplified the culprit kernel to a certain extent ... This does not do any sensible thing anymore but crashes the compiler:
It is the last barrier which is the cause of the crash. If one removes the |
And this compiles with later LLVMs, but not older than 15? |
This compiles with LLVM15 and elder (as old as we remember ... the code is from 2017) but it fails with LLVM16. |
This is unrelated to the LLVM version being used. |
I bisected it to commit b165b29 "LLVM 17 support", unfortunately that has |
minimized nonsensical kernel to reproduce the problem (reduced with cvise and some manual postprocessing)
|
Thanks! Very helpful. Also a great pointer to the cvise tool - looks very useful. @franz this is likely something to do with the moving to the new PM by default? Do we still have the ability to run with the old PM to test? |
Since SVMOffset pass was added without support for legacy PM, main branch does not support the legacy PM anymore. PoCL 5.0 should be able to run with old PM, it's enough to change |
The SVMOffset pass is not used in the client-side kernel compilation pipeline, but only called from command line from the pocld server side, so it shouldn't affect this. |
@picca @kif - could any of you test the "real" kernel in silx with this branch - it should fix the crash, but i'm interested if it produces correct results. |
Hi Michal,
I tested the default ICD (probaly the nvidia one):
```
(py311) lintaillefer:~/workspace/silx % ./run_tests.py src/silx/image/test/test_medianfilter.py
INFO:silx.setup:Install requires: numpy >=1.26.4
Building silx to /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11
INFO:silx.setup:Install requires: numpy >=1.26.4
running build
running build_py
running build_ext
INFO: Disabling color, you really want to install colorlog.
INFO:pythran:Disabling color, you really want to install colorlog.
Patched sys.path, added: '/users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11'
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rootdir: /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11/silx, configfile: ../../../pytest.ini
plugins: anyio-3.7.1, xvfb-2.0.0
collected 2 items
build/lib.linux-x86_64-3.11/silx/image/test/test_medianfilter.py .. [100%]
===================================================================================== 2 passed in 0.46s ======================================================================================
```
Then the POCL v5 which is known to crash the process:
```
(py311) lintaillefer:~/workspace/silx % OCL_ICD_VENDORS=/opt/pocl5/etc/OpenCL/vendors/pocl.icd ./run_tests.py src/silx/image/test/test_medianfilter.py
INFO:silx.setup:Install requires: numpy >=1.26.4
Building silx to /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11
INFO:silx.setup:Install requires: numpy >=1.26.4
running build
running build_py
running build_ext
INFO: Disabling color, you really want to install colorlog.
INFO:pythran:Disabling color, you really want to install colorlog.
Patched sys.path, added: '/users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11'
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rootdir: /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11/silx, configfile: ../../../pytest.ini
plugins: anyio-3.7.1, xvfb-2.0.0
collected 2 items
build/lib.linux-x86_64-3.11/silx/image/test/test_medianfilter.py . [crashed here]
```
And finally the patched version from branch fix 1435:
```
(py311) lintaillefer:~/workspace/silx % OCL_ICD_VENDORS=/opt/pocl6/etc/OpenCL/vendors/pocl.icd ./run_tests.py src/silx/image/test/test_medianfilter.py
INFO:silx.setup:Install requires: numpy >=1.26.4
Building silx to /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11
INFO:silx.setup:Install requires: numpy >=1.26.4
running build
running build_py
running build_ext
INFO: Disabling color, you really want to install colorlog.
INFO:pythran:Disabling color, you really want to install colorlog.
Patched sys.path, added: '/users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11'
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rootdir: /users/kieffer/workspace-400/silx/build/lib.linux-x86_64-3.11/silx, configfile: ../../../pytest.ini
plugins: anyio-3.7.1, xvfb-2.0.0
collected 2 items
build/lib.linux-x86_64-3.11/silx/image/test/test_medianfilter.py .. [100%]
===================================================================================== 2 passed in 0.13s ======================================================================================
```
So apparently the result looks good.
Thanks for the patch. It looks like it fixes the issue.
Jérôme Kieffer
|
fixed in main branch |
Hello, I am affected by this bug while preparing the Debian package of silx
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1060318
reading a bunch of your issues I found that adding
POCL_WORK_GROUP_METHOD=cbs helps saolve the issue or at leat dos not trigger the core dump.
so my question is what is the right way to solve this issue ?
thanks for considering
The text was updated successfully, but these errors were encountered: