Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling xmipp has hardcoded mpirun with 4 slots #306

Closed
MohamadHarastani opened this issue Jul 24, 2020 · 17 comments
Closed

Compiling xmipp has hardcoded mpirun with 4 slots #306

MohamadHarastani opened this issue Jul 24, 2020 · 17 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@MohamadHarastani
Copy link
Collaborator

Hi,
While compiling xmipp on a personal laptop, I faced an error as follows:
"There are not enough slots available in the system to satisfy the 4 slots"
I have exactly 4 mpi slots in the processor (Intel® Core™ i7-4500U CPU @ 1.80GHz × 4 ).
This printing 4 times was only to print a sentence, but it ended up breaking the compilation.
I fixed this issue by replacing '4' by '2' in all the lines 692 to 698 here:
https://github.com/I2PC/xmipp/blob/devel/xmipp#L692
Couldn't we test if mpi runs during the installation another way? or turn the error into a warning?

Cheers
Mohamad

@DStrelak DStrelak added bug Something isn't working help wanted Extra attention is needed labels Jul 27, 2020
@DStrelak
Copy link
Collaborator

Hi @MohamadHarastani ,
can you try if this works for you?

@MohamadHarastani
Copy link
Collaborator Author

Thanks @DStrelak , I will check it out asap

@MohamadHarastani
Copy link
Collaborator Author

can you try if this works for you?

Hi @DStrelak ,
I checked it and it worked.
Here is the corresponding output of xmipp compile:

 mpirun -np 4 --oversubscribe echo '    > This sentence should be printed 4 times if mpi runs fine (by mpirun).'
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).

These were the line of codes that I edited (starting from this line):

    if checkProgram("mpirun",False):
        ok=(runJob("mpirun -np 4 --oversubscribe echo '%s (by mpirun).'" % echoString) or
            runJob("mpirun -np 4 --allow-run-as-root --oversubscribe echo '%s (by mpirun).'" % echoString))
    elif checkProgram("mpiexec",False):
        ok=(runJob("mpiexec -np 4 --oversubscribe echo '%s (by mpiexec).'" % echoString) or
            runJob("mpiexec -np 4 --oversubscribe --allow-run-as-root echo '%s (by mpiexec).'" % echoString))

I can't verify if the flag --oversubscribe works with mpiexec
I checked increasing 4 to 10 and it worked (to make sure its the flag --oversubscribe that is solving the issue as I have 4 mpi slots that could by chance be empty)

This is what I used to test:

[mohamad@localhost ~]$ which mpirun
/usr/lib64/openmpi3/bin/mpirun
[mohamad@localhost ~]$ mpirun --version
mpirun (Open MPI) 3.1.3

Regards

@DStrelak
Copy link
Collaborator

DStrelak commented Jul 29, 2020

'--oversubscribe' has been added in MPI 2.1. and e.g. Travis uses version 1.6
@dmaluenda , do we / can we detect MPI version?

@dmaluenda
Copy link
Member

dmaluenda commented Jul 29, 2020

We can assert the MPI version by parsing 'mpirun --version'. However, we can add the '--oversubscribe' flag to the 'or' string in order to avoid a fail just due to the lack of that flag.

We can, alternativelly, put a 'mpi -np 2 ...' . 2 should be always fine, isn't it?

@DStrelak
Copy link
Collaborator

We can, alternativelly, put a 'mpi -np 2 ...' . 2 should be always fine, isn't it?

In theory, yes. I doubt that anybody would be brave enough to use xmipp with less than two cores.
How about we link it with the number of jobs used for build?

@dmaluenda
Copy link
Member

I thought the same "Who want to run Xmipp with less than 4 cores?" But the answer is "What about the login node in clusters?" Damn!

I agree on linking the number of mpi-jobs to the number of cores for the compilation. Indeed, in the hypothetical case than N=1, if

mpirun -np 1 echo whatever

works, it's fine. We are checking that mpirun works and this prove that.

@DStrelak
Copy link
Collaborator

Can we close this issue?

@dmaluenda
Copy link
Member

I think yes. I forced to use only 2 cores, that is the minimum that makes sense...

If the problem persist, please, don't hesitate to reopen this to be able to make an more accurate approach.

@MohamadHarastani
Copy link
Collaborator Author

No objection.. thank you both @dmaluenda @DStrelak

@MohamadHarastani
Copy link
Collaborator Author

Hello again,
I have just had another similar issue. I am trying to compile xmipp on a supercomputer. Usually, we have a script to run mpi jobs using sbatch. Here is what I get:

[uhj53dz@jean-zay4: xmipp]$ ./xmipp 
 'xmipp.conf' detected.
Checking configuration ------------------------------
Checking compiler configuration ...
 g++ 8 detected
 g++ -c -w -mtune=native -march=native -std=c++11 -O3 xmipp_test_main.cpp -o xmipp_test_main.o -I../ -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include/python3.8 -I/gpfswork/rech/nvo/uhj53dz/miniconda3/lib/python3.8/site-packages/numpy/core/include
 g++  -L/gpfswork/rech/nvo/uhj53dz/miniconda3/lib xmipp_test_main.o -o xmipp_test_main -lfftw3 -lfftw3_threads -lhdf5  -lhdf5_cpp -ltiff -ljpeg -lsqlite3 -lpthread
 rm xmipp_test_main*
Checking MPI configuration ...
 mpicxx -c -w -I../ -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include -mtune=native -march=native -std=c++11 -O3  xmipp_mpi_test_main.cpp -o xmipp_mpi_test_main.o
 mpicxx   -L/gpfswork/rech/nvo/uhj53dz/miniconda3/lib xmipp_mpi_test_main.o -o xmipp_mpi_test_main -lfftw3 -lfftw3_threads -lhdf5  -lhdf5_cpp -ltiff -ljpeg -lsqlite3 -lpthread
 rm xmipp_mpi_test_main*
 mpirun -np 1 echo '    > This sentence should be printed 2 times if mpi runs fine.'
This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.
 mpirun -np 1 --allow-run-as-root echo '    > This sentence should be printed 2 times if mpi runs fine.'
This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.
 mpirun or mpiexec have failed.
 Cannot compile with MPI or use it
 rm xmipp_mpi_test_main*
rm: cannot remove 'xmipp_mpi_test_main*': No such file or directory

I will try to workaround this by commenting this test.. I will reply again here my progress.

Regards,
Mohamad

@DStrelak
Copy link
Collaborator

DStrelak commented Oct 22, 2020

Hi @MohamadHarastani ,
Thanks for reporting this problem.
However, I don't think that there's anything we can / should do about this particular case (unless, of course, it turns out that it's a wide-spread issue).
As you surely understand, we can't prepare our script for all possible environments.
The admin of your machine will be able to resolve this problem. I'm however a bit worried how / if Scipion will work fine in that environment (as we should have the support of Slurm, but AFAIK it's not often used [read: well tested]).
If in doubts, feel free to contact us, and we'll gladly help!
KR,
David

@MohamadHarastani
Copy link
Collaborator Author

Thanks @DStrelak for your reply.
I commented the lines that require mpirun and the installation continued. We have using the previous xmipp version on the same super computer (compatible with scipion2) and now I am trying to compile the new version. Of course, we don't need support for all environments or for slurm, we prepare our slurm scripts manually, all what we need is a successful xmipp compilation (it is just a linux redhat, compilation with conda).
I have limited experience in slurm, but it is used on the two supercomputers that we have access to here.
I passed this step now. Just for the record, I commented these lines from starting from here.

    # if not (runJob("%s -np 2 echo '%s.'" % (configDict['MPI_RUN'], echoString)) or
    #         runJob("%s -np 2 --allow-run-as-root echo '%s.'" % (configDict['MPI_RUN'], echoString))):
    #     print(red("mpirun or mpiexec have failed."))
    #     return False

We can close this issue and rediscuss a solution if needed (maybe a flag to pass this mpirun test with an error message that shows the option to run with this flag).

Regards,
Mohamad

@DStrelak
Copy link
Collaborator

Hi @MohamadHarastani ,
I'm glad that it was the only hurdle you've met.
The flag to skip (a specific) config test sounds good to me. What do you think, @dmaluenda ?

@dmaluenda
Copy link
Member

I agree on a bypassing flag. I vote for an environ var like XMIPP_NOMPICHECK=True or something like that. In this way we can add it to the https://github.com/I2PC/xmipp/wiki/Xmipp-configuration-(version-20.07) guide.

By the way, note that the whole config-checking can be skipped just by manually stepping the build

./xmipp config
./xmipp compileAndInstall

(note the mising ./xmipp checkConfig in between)

@MohamadHarastani
Copy link
Collaborator Author

By the way, note that the whole config-checking can be skipped just by manually stepping the build

./xmipp config
./xmipp compileAndInstall

(note the mising ./xmipp checkConfig in between)

Thanks a lot for this hint. I don't think a flag is necessary in this case. I will try this option soon and comment on the result.

@DStrelak
Copy link
Collaborator

Should be resolved .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants