Merge pull request #4 from jade-hpc-gpu/mihai-readthedocs

Update to NAMD and Gromacs guide docs.
jade-hpc-gpu · Oct 6, 2017 · 0916d30 · 0916d30
2 parents b89eb62 + 0eabb66
commit 0916d30
Show file tree

Hide file tree

Showing 2 changed files with 60 additions and 43 deletions.
diff --git a/software/apps/Gromacs.rst b/software/apps/Gromacs.rst
@@ -3,6 +3,35 @@
 Gromacs
 =======
 
+Job scripts
+-----------
+
+Gromacs is a versatile package for molecular dynamics simulations, which solves the Newtonian equations of motion for systems with hundreds to millions of particles.  Although the software scales well to hundreds of cores for typical simulations, Gromacs calculations are restricted to at most a single node on the JADE service.
+
+The following is an example Slurm script to run the code using one of the regression tests from the installation:
+
+::
+
+    #!/bin/bash
+
+    #SBATCH --nodes=1
+    #SBATCH --ntasks=4
+    #SBATCH -J testGromacs
+    #SBATCH --time=01:00:00
+    #SBATCH --gres=gpu:4
+
+    module load gromacs/2016.3
+
+    mpirun -np $SLURM_NTASKS gmx_mpi mdrun -s topol.tpr -noconfout -resethway -nsteps 10000 -ntomp 10 -pin on &> run-gromacs.out
+
+The example utilises half the resources on a JADE node, with a requests for a single node with 20 tasks and 4 GPUs.  Gromacs is started with a number of processes to match the number of GPUs.  Also, each process is multithreading, option which is set via `-ntomp`.  Process pinning is requested via `-pin on`.
+
+To read more about Gromacs processing on GPUs, please visit https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/gromacs/ .
+
+
+Installation notes
+------------------
+
 The latest version of the source was used. The following build instructions were followed: http://www.nvidia.com/object/gromacs-installation.html
 
 The code was compiled using OpenMPI v1.10.5a1 and GCC v4.8.4 with the following command:
@@ -22,23 +51,3 @@ The code was compiled using OpenMPI v1.10.5a1 and GCC v4.8.4 with the following
     -DCMAKE_BUILD_TYPE=Release
     -DGMX_BUILD_UNITTESTS=ON
     -DCMAKE_INSTALL_PREFIX=/jmain01/home/atostest/gromacs-2016.3
-
-The following is an example Slurm script to run the code using one of the regression tests from the installation:
-
-::
-
-    #!/bin/bash
-    #SBATCH --nodes=2
-    #SBATCH --ntasks=4
-    #SBATCH -p all
-    #SBATCH -J Gromacs
-    #SBATCH --time=01:00:00
-    #SBATCH --gres=gpu:2
-    module load gromacs/2016.3
-    ./gmxtest.pl -np 8 -verbose simple
-
-As can be seem the code will run across multiple compute nodes. Also, the number of GPUs per node is requested using the ``gres`` option. In this example the user will request 2 out of the possible 8 on the nodes.
-
-There is access to the local disc space on each node whilst the batch session is in progress. It can be accessed via: ``/raid/local_scratch/$USERID``
-
-Any data within the directory will be lost once the session completes.
diff --git a/software/apps/NAMD.rst b/software/apps/NAMD.rst
@@ -3,43 +3,51 @@
 NAMD
 ====
 
-The latest version of the source code was used and built using OpenMPI v1.10.5a1 and GCC v4.8.4 following instructions from: http://www.nvidia.com/object/gpu-accelerated-applications-namd-installation.html
+Job scripts
+-----------
 
-Charm++ was built using:
+NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.  NAMD scales to hundreds of cores for typical simulations, however NAMD calculations are restricted to at most a single node on the JADE service.
+
+Below is an example of a NAMD job script
 
 ::
 
-    ./build charm++ verbs-linux-x86_64 gcc smp --with-production
+    #!/bin/bash
 
-NAMD was built using the following:
+    #SBATCH --nodes=1
+    #SBATCH --ntassk-per-node=20
+    #SBATCH -J testNAMD
+    #SBATCH --time=01:00:00
+    #SBATCH --gres=gpu:4
 
-::
+    module load NAMD/2.12
 
-    ./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc --with-cuda --cuda-prefix /usr/local/cuda-8.0
+    $NAMDROOT/namd2 +p$SLURM_NTASKS_PER_NODE +setcpuaffinity +devices $CUDA_VISIBLE_DEVICES ./input.conf &> run.log
+
+The above example utilises half the resources on a JADE node, with a requests for a single node with 20 tasks and 4 GPUs.
+
+Because the job is run on a single node, NAMD can be started directly, thus avoiding the use of the launcher `charmrun`.  The application is set to run on the resources allocated uwing the `+p` and `+devices` command line options.  Additionally, affinity is requested using the option `+setcpuaffinity`.
 
+The general recommendation is to have no more than one process per GPU in the multi-node run, allowing the full utilisation of multiple cores via multi-threading.  For single node jobs, the use of multiple GPUs per process is permitted.
 
-For the decision on the number of threads to use per node, take a look at: http://www.nvidia.com/object/gpu-accelerated-applications-namd-running-jobs.html
+To read more about NAMD processing on GPUs, please visit https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/namd/ .
+
+
+Installation notes
+------------------
+The latest version of the source code was used and built using OpenMPI v1.10.5a1 and GCC v4.8.4 following instructions from http://www.nvidia.com/object/gpu-accelerated-applications-namd-installation.html
+
+Charm++ was built using:
 
 ::
 
-    #!/bin/bash
-    #SBATCH --nodes=2
-    #SBATCH -p all
-    #SBATCH -J NAMD
-    #SBATCH --time=01:00:00
-    #SBATCH --gres=gpu:8
+    ./build charm++ verbs-linux-x86_64 gcc smp --with-production
 
-    module load NAMD/2.12
+NAMD was built using the following:
 
-    #set up the nodelist file
-    srun hostname > hf-1
-    sed 's/^/host /' hf-1 > hf && rm hf-1
-    echo 'group main ++shell ssh' | cat - hf  > hf-1 && mv hf-1 hf
+::
 
-    $NAMDROOT/charmrun ++p 64 ++ppn 4 $NAMDROOT/namd2
-    ++nodelist hf +setcpuaffinity +pemap 0-19,20-39 +commap 0,20
-    +devices 3,1 src/alanin
+    ./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc --with-cuda --cuda-prefix /usr/local/cuda-8.0
 
-    rm hf
 
-As can be seen from the script setting above the code will run across multiple nodes, in this case 2 nodes.
+For the decision on the number of threads to use per node, take a look at http://www.nvidia.com/object/gpu-accelerated-applications-namd-running-jobs.html