Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python APIs not working with both MPI and offload #295

Open
chuckyount opened this issue May 9, 2024 · 0 comments
Open

Python APIs not working with both MPI and offload #295

chuckyount opened this issue May 9, 2024 · 0 comments
Labels

Comments

@chuckyount
Copy link
Contributor

The C++ APIs work with MPI and offload, and the Python APIs work for offload w/o MPI. But the combo of all 3 doesn't work. Is likely a bug in the SW stack; last tested with IMPI 2021.12 and oneAPI 2024.1.

%make clean; make -j -C src/kernel/ YK_CXXOPT=-O1 offload=1 mpi=1 ranks=2 py-yk-api-test
[0] MPI startup(): Number of NICs: 1
[0] MPI startup(): ===== NIC pinning on sdp7814 =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0 enp1s0
Error: failure in zeMemGetAllocProperties 78000001
[0#908140:908140@sdp7814] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.12
[0#908140:908140@sdp7814] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0#908140:908140@sdp7814] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=ssh
[0#908140:908140@sdp7814] MPI startup(): I_MPI_OFFLOAD=2
[0#908140:908140@sdp7814] MPI startup(): I_MPI_DEBUG=+5
[0#908140:908140@sdp7814] MPI startup(): I_MPI_PRINT_VERSION=1
Error: failure in zeMemGetAllocProperties 78000001
Abort(881416975) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_split_type: Unknown error class, error stack:
PMPI_Comm_split_type(468)..................: MPI_Comm_split(MPI_COMM_WORLD, color=1, key=0, new_comm=0x5563a6824b5c) failed
PMPI_Comm_split_type(448)..................:
MPIR_Comm_split_type_impl(90)..............:
MPIDI_Comm_split_type(114).................:
MPIR_Comm_split_type_node_topo(262)........:
compare_info_hint(329).....................:
MPIDI_Allreduce_intra_composition_beta(788):
MPIDI_NM_mpi_allreduce(147)................:
MPIR_Allreduce_intra_auto(60)..............:
MPIR_Allreduce_intra_recursive_doubling(56):
MPIR_Localcopy(56).........................:
MPIDI_GPU_Localcopy(1135)..................:
MPIDI_GPU_ILocalcopy(1040).................: Error returned from GPU API

@chuckyount chuckyount added the bug label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant