Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for running GENESIS/MDSPACE in parallel using MPI #169

Merged
merged 8 commits into from
Mar 13, 2023
Merged

Conversation

mms29
Copy link
Collaborator

@mms29 mms29 commented Feb 14, 2023

New way of running genesis in parallel using mpi. This new script simplify the previous one and is better to use for big clusters.
The protocols md-nmmd-genesis and mdspace are modified accordingly

Additionally to that :

  • I changed the name of MD-NMMD-Genesis to MDTools to remove the hyphens (related to Pablo comment)
    We should move this changes to master asap as if someone install master now, he would not find MD-NMMD-Genesis as I changed the name of the repo

  • Explained variance replace "singular values" in PDB dimentionality reduction viewer. EV is a standard measure to compare PCA components and are more used than singular values, for example to put in a paper (although both are just proportional)

@@ -166,10 +149,10 @@ def _defineParams(self, form):
group.addParam('nm_dt', params.FloatParam, label='NM time step', default=0.001,
help="Time step of normal modes integration. Should be equal to MD time step. Could be increase "
"to accelerate NM integration, however can make the simulation unstable.",
condition="simulationType==2 or simulationType==4",expertLevel=params.LEVEL_ADVANCED)
condition="simulationType==2 or simulationType==4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract the constants?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

help="Resolution parameter of the simulated map. This is usually set to the half of the resolution"
" of the target map. For example, if the target map resolution is 5 Å, emfit_sigma=2.5",
condition="EMfitChoice!=0",expertLevel=params.LEVEL_ADVANCED)
condition="EMfitChoice!=0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMfitChoice!=0 is repeated many times, you may want to consider extracting it as a constant for clarity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -313,6 +296,53 @@ def _defineParams(self, form):
label="projection angle image set ", help='Image set containing projection alignement parameters',
condition="EMfitChoice==%i and projectAngleChoice==%i"%(EMFIT_IMAGES,PROJECTION_ANGLE_IMAGE))

form.addSection(label='MPI parallelization')

form.addParam('parallelType', params.EnumParam, label="How to process EM data ?", default=PARALLEL_MPI,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this section, it seems that you are trying to locally reproduce the sections in host.conf in case parallel mpi is used. I am sure it works, but 1) it may be difficult to maintain 2) it might be tricky to combine with the queueing option offered in Scipion. Let's discuss that soon.

@MohamadHarastani
Copy link
Collaborator

Hi Remi
I see the great effort you have put in making this mpi parallelization. I assume you tried to go through the queue of scipion and it was not giving you enough freedom? Anyway, I don't see a reason why this can harm anyone.
Cheers
Mohamad

@mms29
Copy link
Collaborator Author

mms29 commented Feb 14, 2023

Regarding this section, it seems that you are trying to locally reproduce the sections in host.conf in case parallel mpi is used. I am sure it works, but 1) it may be difficult to maintain 2) it might be tricky to combine with the queuing option offered in Scipion. Let's discuss that soon.

No, actually it is meant to work with the queuing option of Scipion. What I had before was a script that run on one local machine and for each clusters I had to adapt and write a new script dedicated to each clusters to make it work. Now I found a way that supposedly work on every clusters and local machines (all the one I tested so far). The thing is that for running on clusters I need some information about number of nodes, cores per node, sockets per nodes etc. Although those information maybe be present in the host.conf, it is not always the case and it strongly depends on the queuing system of the cluster (SLURM, PBS etc). Therefore, to avoid any problems, I find that the best solution so far is to define my own parameters that I need, and maybe in some cases the user have to defines twice those parameters, but at least we have something that works for sure. But that's definitively something we can discuss. For example another idea could be to try to read the CPU information on the machine, or locate the MPI flags defined in the environment when running on a cluster that probably contains those parameters. It might be more automated but also error prone...
Cheers

@mms29
Copy link
Collaborator Author

mms29 commented Feb 15, 2023

Can you try uninstall/reinstall the plugin and run test_workflow_GENESIS and test_workflow_MDSPACE ?

@MohamadHarastani
Copy link
Collaborator

I will test again, but as for this "For example another idea could be to try to read CPU information on the machine, or locate the MPI flags defined in the environment when running on a cluster that probably contains those parameters." It might not be easy if you are logged in to a login node and want to run on other nodes. With your current configuration, are you able to run on Idris/cines?

@mms29
Copy link
Collaborator Author

mms29 commented Feb 15, 2023

Yes I can run on Idris and Gadi (Australia) which use PBS instead of slurm and was a pain. Cines we don't a have access anymore. I think for now this solution is OK but I can improve in the future.
Tell we when you have the time to test so that I can merge.
Cheers

@mms29 mms29 merged commit 254e1dc into devel Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants