Skip to content

Commit

Permalink
Merge pull request #4 from jade-hpc-gpu/mihai-readthedocs
Browse files Browse the repository at this point in the history
Update to NAMD and Gromacs guide docs.
  • Loading branch information
mcduta committed Oct 6, 2017
2 parents b89eb62 + 0eabb66 commit 0916d30
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 43 deletions.
49 changes: 29 additions & 20 deletions software/apps/Gromacs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,35 @@
Gromacs
=======

Job scripts
-----------

Gromacs is a versatile package for molecular dynamics simulations, which solves the Newtonian equations of motion for systems with hundreds to millions of particles. Although the software scales well to hundreds of cores for typical simulations, Gromacs calculations are restricted to at most a single node on the JADE service.

The following is an example Slurm script to run the code using one of the regression tests from the installation:

::

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH -J testGromacs
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:4

module load gromacs/2016.3

mpirun -np $SLURM_NTASKS gmx_mpi mdrun -s topol.tpr -noconfout -resethway -nsteps 10000 -ntomp 10 -pin on &> run-gromacs.out

The example utilises half the resources on a JADE node, with a requests for a single node with 20 tasks and 4 GPUs. Gromacs is started with a number of processes to match the number of GPUs. Also, each process is multithreading, option which is set via `-ntomp`. Process pinning is requested via `-pin on`.

To read more about Gromacs processing on GPUs, please visit https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/gromacs/ .


Installation notes
------------------

The latest version of the source was used. The following build instructions were followed: http://www.nvidia.com/object/gromacs-installation.html

The code was compiled using OpenMPI v1.10.5a1 and GCC v4.8.4 with the following command:
Expand All @@ -22,23 +51,3 @@ The code was compiled using OpenMPI v1.10.5a1 and GCC v4.8.4 with the following
-DCMAKE_BUILD_TYPE=Release
-DGMX_BUILD_UNITTESTS=ON
-DCMAKE_INSTALL_PREFIX=/jmain01/home/atostest/gromacs-2016.3

The following is an example Slurm script to run the code using one of the regression tests from the installation:

::

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=4
#SBATCH -p all
#SBATCH -J Gromacs
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:2
module load gromacs/2016.3
./gmxtest.pl -np 8 -verbose simple

As can be seem the code will run across multiple compute nodes. Also, the number of GPUs per node is requested using the ``gres`` option. In this example the user will request 2 out of the possible 8 on the nodes.

There is access to the local disc space on each node whilst the batch session is in progress. It can be accessed via: ``/raid/local_scratch/$USERID``

Any data within the directory will be lost once the session completes.
54 changes: 31 additions & 23 deletions software/apps/NAMD.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,51 @@
NAMD
====

The latest version of the source code was used and built using OpenMPI v1.10.5a1 and GCC v4.8.4 following instructions from: http://www.nvidia.com/object/gpu-accelerated-applications-namd-installation.html
Job scripts
-----------

Charm++ was built using:
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations, however NAMD calculations are restricted to at most a single node on the JADE service.

Below is an example of a NAMD job script

::

./build charm++ verbs-linux-x86_64 gcc smp --with-production
#!/bin/bash

NAMD was built using the following:
#SBATCH --nodes=1
#SBATCH --ntassk-per-node=20
#SBATCH -J testNAMD
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:4

::
module load NAMD/2.12

./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc --with-cuda --cuda-prefix /usr/local/cuda-8.0
$NAMDROOT/namd2 +p$SLURM_NTASKS_PER_NODE +setcpuaffinity +devices $CUDA_VISIBLE_DEVICES ./input.conf &> run.log

The above example utilises half the resources on a JADE node, with a requests for a single node with 20 tasks and 4 GPUs.

Because the job is run on a single node, NAMD can be started directly, thus avoiding the use of the launcher `charmrun`. The application is set to run on the resources allocated uwing the `+p` and `+devices` command line options. Additionally, affinity is requested using the option `+setcpuaffinity`.

The general recommendation is to have no more than one process per GPU in the multi-node run, allowing the full utilisation of multiple cores via multi-threading. For single node jobs, the use of multiple GPUs per process is permitted.

For the decision on the number of threads to use per node, take a look at: http://www.nvidia.com/object/gpu-accelerated-applications-namd-running-jobs.html
To read more about NAMD processing on GPUs, please visit https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/namd/ .


Installation notes
------------------
The latest version of the source code was used and built using OpenMPI v1.10.5a1 and GCC v4.8.4 following instructions from http://www.nvidia.com/object/gpu-accelerated-applications-namd-installation.html

Charm++ was built using:

::

#!/bin/bash
#SBATCH --nodes=2
#SBATCH -p all
#SBATCH -J NAMD
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:8
./build charm++ verbs-linux-x86_64 gcc smp --with-production

module load NAMD/2.12
NAMD was built using the following:

#set up the nodelist file
srun hostname > hf-1
sed 's/^/host /' hf-1 > hf && rm hf-1
echo 'group main ++shell ssh' | cat - hf > hf-1 && mv hf-1 hf
::

$NAMDROOT/charmrun ++p 64 ++ppn 4 $NAMDROOT/namd2
++nodelist hf +setcpuaffinity +pemap 0-19,20-39 +commap 0,20
+devices 3,1 src/alanin
./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc --with-cuda --cuda-prefix /usr/local/cuda-8.0

rm hf

As can be seen from the script setting above the code will run across multiple nodes, in this case 2 nodes.
For the decision on the number of threads to use per node, take a look at http://www.nvidia.com/object/gpu-accelerated-applications-namd-running-jobs.html

0 comments on commit 0916d30

Please sign in to comment.