-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many testcases SEGV #463
Comments
You have used mpich, right? |
@yurivict I have managed to reproduce your failure.
|
Please report to MPICH GitHub since it might be a bug there.
|
Not sure if it is a genuine MPICH bug or if it is caused by the way it was MPICH was configured to compile on FreeBSD. I believe the setting
|
Per https://spec.oneapi.io/level-zero/1.0.4/core/api.html, it is This isn't the first ZE-related issue in BSD MPICH, which is why I think it's worth reporting to one or both of those entities. |
@raffenet do you think this is an MPICH issue? |
Well, for one, the latest Level Zero docs don't define Setting |
ZE_IPC_MEMORY_FLAG_TBD is no longer listed as flag in the Level Zero documentation. The zeMemOpenIpcHandle documentation states that 0 is the default memory flag, so just use that. See nwchemgit/nwchem#463.
Thanks Ken. I think the BSD package needs to disable GPU support since that's not a common use case, especially with BSD. |
Agreed. |
ZE_IPC_MEMORY_FLAG_TBD is no longer listed as a valid flag in the Level Zero documentation. The zeMemOpenIpcHandle documentation states that 0 is the default, so just use that. See nwchemgit/nwchem#463.
Posted comment to the FreeBSD bugzilla website |
I have built mpich using the latest updates to the FreeBSD mpich port NWChem is still crashing with a SegV with the following valgrind stack
|
Is there a CUDA installation by chance? Or HIP (AMD)? |
I am using a Virtual box virtual image and, as far as I can tell (but I am not a FreeBSD expert), there isn't either Cuda or HIP.
|
With OpenMPI nwchem works fine. |
Thanks for confirming my findings. |
MPICH is currently broken in the runtime, see nwchemgit/nwchem#463 It works with OPENMPI=yes but thi can't be made default because math/scalapack and devel/ga need to have the same choice of MPI but dependencies of math/scalapack fail with OPENMPI=yes.
Could you try adding |
The code dies in a slightly different place and in a different way. I need to investigate what is going on.
|
@raffenet The following patch has fixed this crash. At the same time, most of the Mpich related memory leaks spotted by valgrind have disappeared.
|
This should be submitted to mpich as a PR. |
The ZE code in our |
Patch suggested in nwchemgit/nwchem#463 (comment) is added. science/nwchem now works with mpich.
I added the above patch to the FreeBSD port net/mpich and nwchem now works with mpich. |
@raffenet I think this scenario was caused by the fact that hwloc was linked with the oneAPI Level Zero Loader and the following patch to avoid the |
Patch suggested in nwchemgit/nwchem#463 (comment) is added. science/nwchem now works with mpich.
Describe the bug
Describe settings used
Environment:
Built with python support.
OS: FreeBSD 13
Attach log files
The text was updated successfully, but these errors were encountered: