-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCX ERROR no active messages transport #4742
Comments
the errors come from |
ofiwg/libfabric#5281 |
@yimin-zhao pls see https://openucx.readthedocs.io/en/master/running.html for details how to run OpenMPI with UCX. |
So if I understand correctly, OFI removed the Mellanox (using ucx) provider in the latest release, but how does that explain I was able to run Intel MPI just before I installed the OpenMPI. Anyway, thanks for the info. |
Actually, this is not a good idea mixing two MPIs, better to run them in different terminals to avoid any potential errors. @yimin-zhao do you have any other questions regarding OMPI or UCX? if not, is it ok to close the issue? |
Sure, no more questions so far, I will just close this issue for now. |
thank you! |
IMPI 2019 U6 uses dc transport by default. Unfortunately we faced with a set of issues related to unexpected fallback from dc to ud at scale so we had to force it. According to provided ucx_info output you don't have the transport available. You may try to set UCX_TLS=ud,sm,self |
I see, will try it later. Thank you! |
FOR INFORMATION: I tried to run a Co-array Fortran program with Intel oneAPI (2021.1.10.2477) and got an issue that leads me to this issue. Since I found my solution on this page, I though to write this message if it might help other people. Please remove it if inappropriate. On a Fedora 31 computer, I tried to run a simple distributed Co-array Fortran program on 4 images and got the following error message:
SOLUTION: |
Describe the bug
When I try to run a mpi application - ior, it threw those error messages:
Steps to Reproduce
ucx 1.4 and ucx 1.7 (Found a similar question in this repo, so I switch to ucx1.7 but got same errors)
No
Setup and versions
Additional information (depending on the issue)
OpenMPI version
Intel Mpi (Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024)
Output of
ucx_info -d
to show transports and devices recognized by UCXucx_info.txt.txt
At the very beginning, ucx 1.4 works fine with Intel's Mpi. I met this error after I uninstalled openmpi and switched to intel's mpi, I don't know whether it's due to I removed some necessary compenents during this proceduce. Anyway, when I install intel's mpi, it doesn't warn me any about it.
Tell me if you have any ideas.
Thanks in advance.
The text was updated successfully, but these errors were encountered: