Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many testcases SEGV #463

Closed
yurivict opened this issue Oct 23, 2021 · 22 comments
Closed

Many testcases SEGV #463

yurivict opened this issue Oct 23, 2021 · 22 comments

Comments

@yurivict
Copy link
Contributor

yurivict commented Oct 23, 2021

Describe the bug

     NWChem execution failed
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x804a1f6db in ???
#1  0x804a1e896 in ???
#2  0x803729f6f in handle_signal
	at /disk-samsung/freebsd-src/lib/libthr/thread/thr_sig.c:303
#3  0x80372953e in thr_sighandler
	at /disk-samsung/freebsd-src/lib/libthr/thread/thr_sig.c:246
#4  0x7ffffffff192 in ???
#5  0x803dd91b1 in ???
#6  0x803badeb5 in ???
#7  0x803298434 in ???
#8  0x80329901d in ???
#9  0x802f076a9 in ???
#10  0x802e70b74 in ???
#11  0x9f45dc in ???
#12  0x9e382f in ???
#13  0x7f38de in ???
#14  0x7e7ba4 in grad_force_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/gradients/grad_force.F:990
#15  0x7fdbf4 in gradients_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/gradients/gradients.F:121
#16  0x7ec987 in scf_gradient_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/gradients/scf_gradient.F:39
#17  0x4219d7 in task_gradient_doit_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/task/task_gradient.F:354
#18  0x422db0 in task_gradient_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/task/task_gradient.F:120
#19  0x513c87 in driver_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/driver/opt_drv.F:76
#20  0x4237c7 in task_optimize_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/task/task_optimize.F:162
#21  0x4149f3 in task_
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/task/task.F:384
#22  0x40e0e1 in nwchem
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/nwchem.F:305
#23  0x40c71c in main
	at /disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/nwchem.F:404

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 40997 RUNNING AT yv.noip.me
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Describe settings used
Environment:

NWCHEM_TOP=/disk-samsung/freebsd-ports/science/nwchem/work/nwchem-7.0.2-release/src/.. 
NWCHEM_MODULES=all 
NWCHEM_LONG_PATHS=Y 
NWCHEM_TARGET=LINUX64 
USE_MPI=Y USE_INTERNALBLAS=Y 
EXTERNAL_GA_PATH=/usr/local 
 BLAS_SIZE=4 USE_64TO32=y 
PYTHONVERSION=3.8 
NWCHEM_MODULES="all python" 
F77="gfortran10" FC="gfortran10" 
FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc10" 
FCFLAGS="-Wl,-rpath=/usr/local/lib/gcc10" PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/disk-samsung/freebsd-ports/science/nwchem/work  XDG_CONFIG_HOME=/disk-samsung/freebsd-ports/science/nwchem/work  XDG_CACHE_HOME=/disk-samsung/freebsd-ports/science/nwchem/work/.cache  HOME=/disk-samsung/freebsd-ports/science/nwchem/work PATH=/disk-samsung/freebsd-ports/science/nwchem/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin NO_PIE=yes MK_DEBUG_FILES=no MK_KERNEL_SYMBOLS=no SHELL=/bin/sh NO_LINT=YES ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" PREFIX=/usr/local  LOCALBASE=/usr/local  
CC="cc" 
CFLAGS="-O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -fno-strict-aliasing "  
CPP="cpp" CPPFLAGS="-fno-omit-frame-pointer"  
LDFLAGS=" -Wl,-rpath=/usr/local/lib/gcc10  -L/usr/local/lib/gcc10 -B/usr/local/bin -fstack-protector-strong " 
LIBS=""  
CXX="c++" CXXFLAGS="-O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer  "  MANPREFIX="/usr/local" BSD_INSTALL_PROGRAM="install  -s -m 555"  BSD_INSTALL_LIB="install  -s -m 0644"  BSD_INSTALL_SCRIPT="install  -m 555"  BSD_INSTALL_DATA="install  -m 0644"  BSD_INSTALL_MAN="install  -m 444"

Built with python support.
OS: FreeBSD 13

Attach log files

@edoapra
Copy link
Collaborator

edoapra commented Oct 23, 2021

You have used mpich, right?

@edoapra
Copy link
Collaborator

edoapra commented Oct 25, 2021

@yurivict I have managed to reproduce your failure.
It only occurs with mpich (it does not occur with openmpi or mpich2).
The workaround for mpich is to set the following

export MPIR_CVAR_ENABLE_GPU=0

@jeffhammond
Copy link
Collaborator

jeffhammond commented Oct 25, 2021 via email

@edoapra
Copy link
Collaborator

edoapra commented Oct 25, 2021

Not sure if it is a genuine MPICH bug or if it is caused by the way it was MPICH was configured to compile on FreeBSD.

I believe the setting ZE_IPC_MEMORY_FLAG_TBD=ZE_IPC_MEMORY_FLAG_BIAS_CACHED (from the output of mpirun -info reported below) is crucial to get mpich to compile on Freebsd 13. Any idea of what this does?

$  mpirun -info
HYDRA build details:
    Version:                                 3.4.2
    Release Date:                            Wed May 26 15:51:40 CDT 2021
    CC:                              cc  -I/usr/local/include/json-c  -L/usr/local/lib -lepoll-shim -ljson-c -lm
    Configure options:                       '--disable-option-checking' '--prefix=/usr/local' '--enable-fast=' '--with-hwloc-prefix=/usr/local' '--with-libfabric=/usr/local' 'pkgconfigdir=/usr/local/libdata/pkgconfig' 'MPICHLIB_CFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'CFLAGS=-I/usr/local/include/json-c -O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'MPICHLIB_CPPFLAGS=-DZE_IPC_MEMORY_FLAG_TBD=ZE_IPC_MEMORY_FLAG_BIAS_CACHED' 'CPPFLAGS= -DZE_IPC_MEMORY_FLAG_TBD=ZE_IPC_MEMORY_FLAG_BIAS_CACHED -I/usr/local/include -DNETMOD_INLINE=__netmod_inline_ofi__ -I/wrkdirs/usr/ports/net/mpich/work/mpich-3.4.2/src/mpl/include -I/wrkdirs/usr/ports/net/mpich/work/mpich-3.4.2/src/mpl/include -I/wrkdirs/usr/ports/net/mpich/work/mpich-3.4.2/modules/yaksa/src/frontend/include -I/wrkdirs/usr/ports/net/mpich/work/mpich-3.4.2/modules/yaksa/src/frontend/include -I/usr/local/include -D_REENTRANT -I/wrkdirs/usr/ports/net/mpich/work/mpich-3.4.2/src/mpi/romio/include' 'MPICHLIB_CXXFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'CXXFLAGS= -O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'MPICHLIB_FFLAGS=-O -Wl,-rpath=/usr/local/lib/gcc10' 'FFLAGS= -O -Wl,-rpath=/usr/local/lib/gcc10 -fallow-argument-mismatch' 'MPICHLIB_FCFLAGS=-Wl,-rpath=/usr/local/lib/gcc10' 'FCFLAGS= -Wl,-rpath=/usr/local/lib/gcc10' 'MPICHLIB_LDFLAGS= -Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin -fstack-protector-strong ' 'LDFLAGS= -L/usr/local/lib -L/usr/local/lib' 'MPICHLIB_LIBS=' 'LIBS=-L/usr/local/lib -lepoll-shim -ljson-c -lm' 'BASH_SHELL=/usr/local/bin/bash' '--enable-fortran' 'MPICH_MPICC_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 'MPICH_MPICXX_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 'MPICH_MPIF77_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 'MPICH_MPIFORT_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' '--with-pm=hydra,gforker' '--localstatedir=/var' '--mandir=/usr/local/man' '--disable-silent-rules' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd13.0' 'build_alias=amd64-portbld-freebsd13.0' 'CC=cc' 'CPP=cpp' 'CXX=c++' 'FC=gfortran10' 'F77=gfortran10' 'PKG_CONFIG=pkgconf' '--cache-file=/dev/null' '--srcdir=.' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

@jeffhammond
Copy link
Collaborator

Per https://spec.oneapi.io/level-zero/1.0.4/core/api.html, it is reserved for future use. But the ZE stuff is for Intel GPUs, and it does not make sense for Intel GPU support to be compiled unconditionally into BSD packages of MPICH when there are no multi-Intel GPU systems in the wild to use GPU IPC.

This isn't the first ZE-related issue in BSD MPICH, which is why I think it's worth reporting to one or both of those entities.

@jeffhammond
Copy link
Collaborator

@raffenet do you think this is an MPICH issue?

@raffenet
Copy link

Well, for one, the latest Level Zero docs don't define ZE_IPC_MEMORY_FLAG_TBD as a valid flag ipc memory flag. So that's an MPICH problem. MPICH should be using 0, since the documentation states that is the default. https://spec.oneapi.io/level-zero/latest/core/api.html#ze-ipc-memory-flags-t

Setting -DZE_IPC_MEMORY_FLAG_TBD=ZE_IPC_MEMORY_FLAG_BIAS_CACHED' effectively does what I propose, so the runtime issues need to be investigated on an Intel GPU system to determine what is going on. You could add --without-ze to your configure to fully compile out the Level Zero support instead of using the CVAR. Note that if GPU support is built/enabled in MPICH, many MPI operations incur extra cost to determine buffer location (device vs. host memory).

raffenet added a commit to raffenet/mpich that referenced this issue Oct 27, 2021
ZE_IPC_MEMORY_FLAG_TBD is no longer listed as flag in the Level Zero
documentation. The zeMemOpenIpcHandle documentation states that 0 is the
default memory flag, so just use that. See nwchemgit/nwchem#463.
@jeffhammond
Copy link
Collaborator

Thanks Ken. I think the BSD package needs to disable GPU support since that's not a common use case, especially with BSD.

@raffenet
Copy link

Thanks Ken. I think the BSD package needs to disable GPU support since that's not a common use case, especially with BSD.

Agreed.

raffenet added a commit to raffenet/mpich that referenced this issue Oct 27, 2021
ZE_IPC_MEMORY_FLAG_TBD is no longer listed as a valid flag in the Level
Zero documentation. The zeMemOpenIpcHandle documentation states that 0
is the default, so just use that. See nwchemgit/nwchem#463.
@edoapra
Copy link
Collaborator

edoapra commented Oct 27, 2021

Posted comment to the FreeBSD bugzilla website
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252536#c10

@edoapra
Copy link
Collaborator

edoapra commented Oct 29, 2021

I have built mpich using the latest updates to the FreeBSD mpich port
https://cgit.freebsd.org/ports/commit/?id=a46966bb3496e0cf8100f6acdd671d4fb90c9cdb
https://cgit.freebsd.org/ports/commit/?id=dea82318648e46c157874bd1079f50b23c9c08d0

NWChem is still crashing with a SegV with the following valgrind stack

==88293== Invalid read of size 4
==88293==    at 0xB902191: MPIDI_GPU_get_ipc_attr (gpu_post.c:203)
==88293==    by 0xB6D6E65: MPIDI_IPCI_try_lmt_isend (ipc_send.h:38)
==88293==    by 0xB6D6E65: MPIDI_IPC_mpi_isend (ipc_send.h:74)
==88293==    by 0xB6D6E65: MPIDI_SHM_mpi_isend (shm_p2p.h:74)
==88293==    by 0xB6D6E65: MPIDI_isend_unsafe (ch4_send.h:109)
==88293==    by 0xB6D6E65: MPIDI_isend_safe (ch4_send.h:334)
==88293==    by 0xB6D6E65: MPID_Isend (ch4_send.h:516)
==88293==    by 0xB6D6E65: PMPI_Isend (isend.c:130)
==88293==    by 0xD6D7C8: _my_isend (comex.c:129)
==88293==    by 0xD6D7C8: _mq_push (comex.c:188)
==88293==    by 0xD70155: comex_rmw (comex.c:1730)
==88293==    by 0xD0AA97: pnga_read_inc (in /usr/home/edo/nwchem/bin/LINUX64/nwchem)
==88293==    by 0xC623CC: NGA_Read_inc (in /usr/home/edo/nwchem/bin/LINUX64/nwchem)
==88293==    by 0xA0F481: util_gnxtval_ (in /usr/home/edo/nwchem/bin/LINUX64/nwchem)
==88293==    by 0x9F819F: nxtask0_ (in /usr/home/edo/nwchem/bin/LINUX64/nwchem)
==88293==    by 0x7C1D9E: grad2_ (in /usr/home/edo/nwchem/bin/LINUX64/nwchem)
==88293==    by 0x7B5693: grad_force_ (grad_force.F:990)
==88293==    by 0x7AF0E9: gradients_ (gradients.F:121)
==88293==    by 0x7AE951: scf_gradient_ (scf_gradient.F:39)
==88293==  Address 0x4 is not stack'd, malloc'd or (recently) free'd

@raffenet
Copy link

Is there a CUDA installation by chance? Or HIP (AMD)?

@edoapra
Copy link
Collaborator

edoapra commented Oct 29, 2021

I am using a Virtual box virtual image and, as far as I can tell (but I am not a FreeBSD expert), there isn't either Cuda or HIP.
The weird thing is that if I change CFLAGS from
-O2 -pipe -fstack-protector-strong -fno-strict-aliasing -g
to
-O1 -pipe -fstack-protector-strong -fno-strict-aliasing -g
the error vanishes.
I wonder if this is somewhat a compiler related problem, since the FreeBSD recipe for MPICH defines all the compiler options as shows in the mpirun -info (formatted) output below

HYDRA build details:
    Version:                                 3.4.2
    Release Date:                            Wed May 26 15:51:40 CDT 2021
    CC:                              cc  -I/usr/local/include/json-c  -L/usr/local/lib -lepoll-shim -ljson-c -lm
    Configure options:                       '--disable-option-checking' '--prefix=/usr/local' '--enable-fast=' '--with-hwloc-prefix=/usr/local' '--with-libfabric=/usr/local' 'pkgconfigdir=/usr/local/libdata/pkgconfig' 
'MPICHLIB_CFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 
'CFLAGS=-I/usr/local/include/json-c -O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 
'MPICHLIB_CPPFLAGS=' 
'CPPFLAGS= -I/usr/local/include -DNETMOD_INLINE=__netmod_inline_ofi__ -I/usr/ports/net/mpich/work/mpich-3.4.2/src/mpl/include -I/usr/ports/net/mpich/work/mpich-3.4.2/src/mpl/include -I/usr/ports/net/mpich/work/mpich-3.4.2/modules/yaksa/src/frontend/include -I/usr/ports/net/mpich/work/mpich-3.4.2/modules/yaksa/src/frontend/include -I/usr/local/include -D_REENTRANT -I/usr/ports/net/mpich/work/mpich-3.4.2/src/mpi/romio/include' 
'MPICHLIB_CXXFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 
'CXXFLAGS= -O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 
'MPICHLIB_FFLAGS=-O -Wl,-rpath=/usr/local/lib/gcc10' 
'FFLAGS= -O -Wl,-rpath=/usr/local/lib/gcc10 -fallow-argument-mismatch' 
'MPICHLIB_FCFLAGS=-Wl,-rpath=/usr/local/lib/gcc10' 
'FCFLAGS= -Wl,-rpath=/usr/local/lib/gcc10' 
'MPICHLIB_LDFLAGS= -Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin -fstack-protector-strong ' 
'LDFLAGS= -L/usr/local/lib -L/usr/local/lib' 'MPICHLIB_LIBS=' 'LIBS=-L/usr/local/lib -lepoll-shim -ljson-c -lm' 'BASH_SHELL=/usr/local/bin/bash' '--enable-fortran' 
'MPICH_MPICC_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 
'MPICH_MPICXX_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 
'MPICH_MPIF77_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' 
'MPICH_MPIFORT_LDFLAGS=-Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 -B/usr/local/bin' '--with-pm=hydra,gforker' '--localstatedir=/var' '--mandir=/usr/local/man' '--disable-silent-rules' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd13.0' 'build_alias=amd64-portbld-freebsd13.0' 
'CC=cc' 'CPP=cpp' 'CXX=c++' 'FC=gfortran10' 'F77=gfortran10' 'PKG_CONFIG=pkgconf' '--cache-file=/dev/null' '--srcdir=.' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

@yurivict
Copy link
Contributor Author

With OpenMPI nwchem works fine.

@edoapra
Copy link
Collaborator

edoapra commented Oct 29, 2021

With OpenMPI nwchem works fine.

Thanks for confirming my findings. mpich2 works as well.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Oct 30, 2021
MPICH is currently broken in the runtime, see nwchemgit/nwchem#463
It works with OPENMPI=yes but thi can't be made default because
math/scalapack and devel/ga need to have the same choice of MPI
but dependencies of math/scalapack fail with OPENMPI=yes.
@raffenet
Copy link

raffenet commented Nov 3, 2021

I am using a Virtual box virtual image and, as far as I can tell (but I am not a FreeBSD expert), there isn't either Cuda or HIP. The weird thing is that if I change CFLAGS from -O2 -pipe -fstack-protector-strong -fno-strict-aliasing -g to -O1 -pipe -fstack-protector-strong -fno-strict-aliasing -g the error vanishes. I wonder if this is somewhat a compiler related problem, since the FreeBSD recipe for MPICH defines all the compiler options as shows in the mpirun -info (formatted) output below

Could you try adding MPIR_CVAR_CH4_IPC_GPU_HANDLE_CACHE=0 to your environment and see if the error goes away?

@edoapra
Copy link
Collaborator

edoapra commented Nov 4, 2021

Could you try adding MPIR_CVAR_CH4_IPC_GPU_HANDLE_CACHE=0 to your environment and see if the error goes away?

The code dies in a slightly different place and in a different way. I need to investigate what is going on.
One more data point. Before getting to the place where the crash occurs, valgrind detects several memory leaks in fastpath_memcpy at the following lines (not sure why these lines should be executed if a GPU is not present)

       if ((inattr.type == MPL_GPU_POINTER_UNREGISTERED_HOST ||
             inattr.type == MPL_GPU_POINTER_REGISTERED_HOST) &&
            (outattr.type == MPL_GPU_POINTER_UNREGISTERED_HOST ||
             outattr.type == MPL_GPU_POINTER_REGISTERED_HOST)) {

@edoapra
Copy link
Collaborator

edoapra commented Nov 4, 2021

@raffenet The following patch has fixed this crash. At the same time, most of the Mpich related memory leaks spotted by valgrind have disappeared.
The problem was that attr->type and attr->device were left uninitialized with the early exit from ZE_ERR_CHECK(ret);

--- mpich-3.4.2/src/mpl/src/gpu/mpl_gpu_ze.c.old	2021-11-03 18:01:44.613889000 -0700
+++ mpich-3.4.2/src/mpl/src/gpu/mpl_gpu_ze.c	2021-11-03 17:58:29.367226000 -0700
@@ -191,6 +191,8 @@
     ze_device_handle_t device;
     memset(&ptr_attr, 0, sizeof(ze_memory_allocation_properties_t));
     ret = zeMemGetAllocProperties(global_ze_context, ptr, &ptr_attr, &device);
+    attr->type = 0;
+    attr->device = 0;
     ZE_ERR_CHECK(ret);
     attr->device = device;
     switch (ptr_attr.type) {

@yurivict
Copy link
Contributor Author

yurivict commented Nov 4, 2021

This should be submitted to mpich as a PR.

@raffenet
Copy link

raffenet commented Nov 4, 2021

The ZE code in our main branch has diverged a bit from 3.4.2, so this probably doesn't cleanly apply there. Even if main is already fixed, we can use this diff as a starting point for fixing 3.4.x.

yzgyyang pushed a commit to yzgyyang/freebsd-ports that referenced this issue Nov 4, 2021
Patch suggested in
nwchemgit/nwchem#463 (comment)
is added.

science/nwchem now works with mpich.
@yurivict
Copy link
Contributor Author

yurivict commented Nov 4, 2021

I added the above patch to the FreeBSD port net/mpich and nwchem now works with mpich.

@yurivict yurivict closed this as completed Nov 4, 2021
@edoapra edoapra reopened this Nov 4, 2021
@edoapra
Copy link
Collaborator

edoapra commented Nov 5, 2021

@raffenet I think this scenario was caused by the fact that hwloc was linked with the oneAPI Level Zero Loader and the following patch to avoid the MPIR_Init_thread gpu_init failed failure.
https://cgit.freebsd.org/ports/commit/net/mpich?id=b5815e7648a8e5307a20a234befa00e34306319d

@edoapra edoapra closed this as completed Nov 5, 2021
ocochard pushed a commit to ocochard/freebsd-ports that referenced this issue Nov 10, 2021
Patch suggested in
nwchemgit/nwchem#463 (comment)
is added.

science/nwchem now works with mpich.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants