Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warp_test fails in mpi on frontier with reducedThrust branch #54

Closed
cwsmith opened this issue Jun 27, 2023 · 1 comment
Closed

warp_test fails in mpi on frontier with reducedThrust branch #54

cwsmith opened this issue Jun 27, 2023 · 1 comment

Comments

@cwsmith
Copy link

cwsmith commented Jun 27, 2023

Following these instructions:

https://github.com/SCOREC/omega_h/wiki/Build-and-Run-on-OLCF-Frontier#build-with-mpi-enabled-using-cray-compiler-wrappers-and-amd-compilers

core was generated by `src/warp_test'.
Program terminated with signal SIGBUS, Bus error.

warning: Section `.reg-xstate/76107' in core file too small.
#0  0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fffed81a300 (LWP 76107))]
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.41.1.x86_64 krb5-debuginfo-1.19.2-150400.3.3.1.x86_64 libbrotlicommon1-debuginfo-1.0.7-3.3.1.x86_64 libbrotlidec1-debuginfo-1.0.7-3.3.1.x86_64 libcom_err2-debuginfo-1.46.4-150400.3.3.1.x86_64 libcurl4-debuginfo-7.79.1-150400.5.15.1.x86_64 libdrm2-debuginfo-2.4.107-150400.1.8.x86_64 libdrm_amdgpu1-debuginfo-2.4.107-150400.1.8.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libidn2-0-debuginfo-2.2.0-3.6.1.x86_64 libjson-c3-debuginfo-0.13-3.3.1.x86_64 libkeyutils1-debuginfo-1.6.3-5.6.1.x86_64 libldap-2_4-2-debuginfo-2.4.46-150200.14.11.2.x86_64 libncurses6-debuginfo-6.1-150000.5.12.1.x86_64 libnghttp2-14-debuginfo-1.40.0-6.1.x86_64 libnl3-200-debuginfo-3.3.0-1.29.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libopenssl1_1-debuginfo-1.1.1l-150400.7.22.1.x86_64 libpcre1-debuginfo-8.45-150000.20.13.1.x86_64 libpsl5-debuginfo-0.20.1-150000.3.3.1.x86_64 libselinux1-debuginfo-3.1-150400.1.69.x86_64 libssh4-debuginfo-0.9.6-150400.1.5.x86_64 libstdc++6-debuginfo-12.2.1+git416-150000.1.5.1.x86_64 libunistring2-debuginfo-0.9.10-1.1.x86_64 libyaml-0-2-debuginfo-0.1.7-1.17.x86_64 libz1-debuginfo-1.2.11-150000.3.39.1.x86_64 libzstd1-debuginfo-1.5.0-150400.1.71.x86_64
(ins)(gdb) where
#0  0x00007fffe817ed4f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x00007fffe9729b6c in MPIR_Localcopy () from /opt/cray/pe/lib64/libmpi_amd.so.12
#2  0x00007fffeb479223 in MPIDI_CRAY_Common_lmt_unpack () from /opt/cray/pe/lib64/libmpi_amd.so.12
#3  0x00007fffeb498a08 in MPIDI_CRAY_Common_lmt_ctrl_send_rts_cb () from /opt/cray/pe/lib64/libmpi_amd.so.12
#4  0x00007fffeb4716c8 in MPIDI_SHMI_progress () from /opt/cray/pe/lib64/libmpi_amd.so.12
#5  0x00007fffe9f6b7e9 in MPIR_Waitall_impl () from /opt/cray/pe/lib64/libmpi_amd.so.12
#6  0x00007fffe9fd19b1 in MPIR_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#7  0x00007fffe9fd2eae in PMPI_Waitall () from /opt/cray/pe/lib64/libmpi_amd.so.12
#8  0x000000000114cb9d in Omega_h::Comm::alltoallv<int> (this=0x1b04f80, sendbuf_dev=..., sdispls_dev=..., rdispls_dev=..., width=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_comm.cpp:557
#9  0x000000000117989c in Omega_h::Dist::exch<int> (this=this@entry=0x7fffffff6538, data=..., width=width@entry=1) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:118
#10 0x0000000001174221 in Omega_h::Dist::set_dest_idxs (this=this@entry=0x7fffffff6538, fitems2rroots=..., nrroots=nrroots@entry=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:78
#11 0x00000000011734bb in Omega_h::Dist::Dist (this=0x7fffffff6538, comm_in=..., fitems2rroots=..., nrroots=3000) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_dist.cpp:23
#12 0x0000000001199098 in Omega_h::bi_partition (comm=..., marks=...) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_bipart.cpp:32
#13 0x0000000001193657 in Omega_h::inertia::recursively_bisect (comm=..., tolerance=<error reading variable: That operation is not available on integers of more than 8 bytes.>, 
    p_coords=p_coords@entry=0x7fffffff6658, p_masses=p_masses@entry=0x7fffffff6640, p_owners=p_owners@entry=0x7fffffff6670, p_hints=p_hints@entry=0x7fffffff66d0)
    at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_inertia.cpp:181
#14 0x00000000011e0665 in Omega_h::Mesh::balance (this=0x7fffffff70c0, predictive=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_mesh.cpp:560
#15 0x00000000010f23b8 in Omega_h::build_box (comm=..., family=family@entry=OMEGA_H_SIMPLEX, x=<error reading variable: That operation is not available on integers of more than 8 bytes.>, 
    y=<error reading variable: That operation is not available on integers of more than 8 bytes.>, z=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nx=nx@entry=10, 
    ny=ny@entry=10, nz=nz@entry=10, symmetric=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/Omega_h_build.cpp:147
#16 0x000000000104af37 in main (argc=<optimized out>, argv=<optimized out>) at /ccs/home/cwsmith/omegahKk/test/omega_h/src/warp_test.cpp:71
(ins)(gdb) 
@cwsmith
Copy link
Author

cwsmith commented Mar 6, 2024

That branch was merged and this issue fixed: #60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant