-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Comm_spawn fails without correct mpiexec in PATH #6335
Comments
Since this is singleton init, how would the program find out the correct |
It needs the corresponding |
I don't follow why the A reliable spawn enables IDE-integrated unit testing and doc testing of MPI-based software, which improves reliability and accessibility of the whole ecosystem, so I think it's worth some attention. |
For MPI-enabled unit testing, the other thing I'm exploring is to fork and use accept/connect to get parents and children connected. That might be more reliable regarding environment, but it's posix-only. |
Spawn mpi processes needs the functionality of a process manager. Because it is not just launching processes, but the processes need be coordinated. It is in principle possible to embed the process manager into the program itself, but we lose the flexibility of a system process manager that may be specifically designed for a particular HPC environment. Either way, users will be expected for some configurations, e.g. telling MPICH that the embedded process manager should be used. The default may make sense on a laptop, but does not apply to other systems. This versus the current option that require users to set PATH, I don't think the trouble of embedding process manager is worth it. But to enhance, I think we can accept other option such as specify an
I do not disagree. But I think the solution is cleaner on the IDE side rather than asking MPI to figure out its system. While convenience is important, it shouldn't shadow the first priority of ensuring the best performance on an HPC system, which default and fallback rarely works due to its cutting-edge nature.
Be careful that you are moving toward a more remote area of the MPI world. While the functionality is there in both the standard and implementation, it is the least significant and the least performant or reliable. Even the comm spawn is in the rural area due to few usages by HPC applications. We are starting to pay more attention and effort on comm spawn as more untraditional applications start to show up more often. EDIT: The concern is, your tests may not reflect eventual production. |
Thanks. I don't follow why linking the process manager into libmpi would break functionality when an external process manager is in use. |
For example, on most HPC systems, the process manager is a separate program that is different from e.g. hydra. using the embedded process manager may not work with the system job scheduler very well or at all. I am not saying it is not feasible. I am saying that the extra trouble of embedding process manager may not make much sense for most HPC applications. |
Okay, but you know at
How is the IDE/language server better positioned to know how to spawn processes than the MPI library itself? This sounds like extra configuration and special case handling, which means lots of potential developers won't get it right and will dislike developing MPI-based software/write software more slowly with more bugs. We should think about development and documentation (doc testing) experience as essential features of any programming model. Reliable, portable spawn is one of the most basic building blocks. |
IDE only needs to set the correct path for finding |
That's an extra configuration that needs to be exposed and kept in sync with which implementation is chosen by the build system. This is pushing incidental complexity onto tool builders and users, that could be avoided if |
IDE also needs to configure for finding |
That is done automatically with existing tooling, and nothing specific to the IDE. For example, $ ./configure --whatever-args
$ bear -- make -j60 # or ninja, etc.; works with any build system This creates |
I am sure The solution to embed process manager is feasible. But it will be a much more complex solution, both in the aspect of implementation and policy, than simply figuring out a way of setting PATH or finding the correct MPIEXEC. We'll need more evidence of necessity before committing effort.
Just want to note that this is a faulty comparison. No other library does distributed parallel processes as MPI does. I understand the desire of pretending MPI as just a normal single process application. But the pretension need be managed by the tools. We will do everything reasonable to facilitate the tools. PS: The |
Thanks for the background. I do not disagree with the motivation but on the solution. Maybe we can have an offline discussion to better understand the problem? |
In debugging broken
MPI_Comm_spawn
on a Debian system, I reproduced the following issue with latest (68b574e)main
and vanilla configuration. It looks like-pmi_args
is being passed to/usr/bin/mpiexec
(which is OMPI on this system). I think it should be able to spawn without specially craftedPATH
or running within matchingmpiexec
.The source
spawnintra.c
is from the MPICH test suite (modified to be stand-alone).The text was updated successfully, but these errors were encountered: