-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of MPI launcher #52
Comments
Regarding SLURM and |
I think these are generally pretty valid comments, this stuff worked well for my simple use case but could definitely be improved. Having said that, I'm not sure 'get rid of it' is super useful without some kind of vision of an alternative! Possibly improvements with regard to your points:
In general, this functionality is aimed at conveniently using If you have some ideas/API of how this could look let me know. |
I thought a little bit about it: I think the first step is to get rid of I'd see the next step to be a careful description of how slepc should be run with MPI but without Then, I'd prefer to unmount The first part is determining the mpi setup strictly through The second part is to make The third part is to, probably, get rid of spawning code. As far as I understand, you make a syncro gateway to, potentially, non-syncro MPI setup. The non-synchro setup looks like a legacy code that you started with and do not need any more. Unless there are solid performance reasons for the non-synchro setup, I suggest to firmly nail |
Yes this seems like a sensible thing to do. And in general I agree that it would be worthwhile to have a more explicit way to control various aspects of the MPI stuff. Are you wanting just more control over n_procs and n_threads or also managing computations with various different COMMs as well? Just to explain the current way things work:
This would be good to keep for ease of use in e.g. notebooks. No thinking about HPC,
I think it would be good to keep this design if possible (i.e. the script here which doesn't need to be modified when moving to MPI. At least initially, some steps might be:
|
I think these are two independent worthy changes.
Unfortunately, this does not really work: a number of things may break down along the way. I think it is better to run it in serial by default, until a certain explicit import or a setting. At least, the user will already know the entry point for MPI (which is currently not be the case until some sort of exception is raised) and will be able to debug it. I think you do a reasonable job regarding the choice of linear algebra solvers to "just make it work with what is available", however, I am afraid, MPI is a bit too fragile for this approach. |
I have several complaints about the interaction with
mpi4py
. It could be worth splitting into separate issues.quimb-mpi-python
andmpi_launcher.py
do not really belong to this package. As far as I understand, they solve some sort of issues withslepc
and are not needed otherwise.mpi
environment if run with SLURMsrun
because the latter does not setPMI_SIZE
. This causes all sorts of unexpected outcomes.mpi_launcher.py
is hard to understand. Theget_mpi_pool
function returns non-mpi stuff as well, for example. Most part of the code seems to deal with everything exceptmpi4py
. Does this mean thatquimb
can be MPI-parallelized withoutmpi4py
? Does this mean that non-MPIconcurrent.futures
parallelization is implemented?mpi_launcher.py
is hard to debug. It is not clear howget_mpi_pool
behaves and how it is expected to behave. The use of@CachedPoolWithShutdown
decorator is ambiguous: it replacesnum_workers
up to the point when the only reliable way to understand what's going on during runtime is to check the type of the returned pool. The most ridiculous thing is that for production this means writing another wrapper overget_mpi_pool
!quimb-mpi-python
is enforcing. My original intention was to run multiple calculations on a grid (for example, a phase diagram of a spin model). This means that I do not necessarily needquimb
's parallelization. But I have to do that because of that check inmpi_launcher.py
! Or, again, I have to wrap some scripts and invocations to cleanup environment variables and to makequimb
act as there is no MPI interface at all.mpi_launcher.py
does well with its purpose. It seems to be nailed firmly toCOMM_WORLD
while totally neglecting the possibility of nested parallelization, etc.The text was updated successfully, but these errors were encountered: