Merge pull request #8 from jade-hpc-gpu/dev__mihai

update to readthedocs (using JADE + cuda)
jade-hpc-gpu · Oct 10, 2017 · a24b45f · a24b45f
2 parents e7691d6 + 15aee84
commit a24b45f
Show file tree

Hide file tree

Showing 7 changed files with 172 additions and 31 deletions.
diff --git a/cuda/index.rst b/cuda/index.rst
@@ -0,0 +1,47 @@
+.. _software:
+
+CUDA
+====
+
+.. sidebar:: CUDA
+
+   :URL: http://www.nvidia.co.uk/object/cuda-parallel-computing-uk.html
+
+CUDA is a parallel computing platform and API model created and developed by Nvidia, which enables dramatic increases in computing performance by harnessing the power of GPUs
+
+
+Versions
+--------
+Multiple CUDA versions are available through the module system
+
+
+Environment
+-----------
+The CUDA environment is managed through the modules, which set all the environment variables needed.  The availability of different versions can be checked with ::
+
+  module avail cuda
+
+The environment set by a particular module can be inspected, *e.g.* ::
+
+  module show cuda/9.0
+
+
+Learn more
+----------
+To learn more about CUDA programming, either talk to your local RSE
+support, or visit Mike Giles' CUDA Programming course page at
+
+http://people.maths.ox.ac.uk/gilesm/cuda/
+
+This one-week course is taught in Oxford at the end of July each year,
+but all of the lecture notes and practicals are provided online for
+self-study at other times.
+
+
+
+
+.. toctree::
+    :maxdepth: 2
+    :glob:
+
+    learn/index
diff --git a/cuda/learn/index.rst b/cuda/learn/index.rst
@@ -0,0 +1,58 @@
+.. _learn:
+
+CUDA documentation
+==================
+
+NVIDIA provides lots of documentation, both online and in downloadable form:
+
+* `Online CUDA documentation <http://docs.nvidia.com/cuda/index.html>`_
+* `CUDA homepage <http://www.nvidia.com/object/cuda_home.html>`_
+* `CUDA Runtime API <http://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf>`_
+* `CUDA C Best Practices Guide <http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf>`_
+* `CUDA Compiler Driver NVCC <http://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf>`_
+* `CUDA Visual Profiler <http://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf>`_
+* `CUDA-gdb debugger <http://docs.nvidia.com/cuda/pdf/CUDA_GDB.pdf>`_
+* `CUDA-memcheck memory checker <http://docs.nvidia.com/cuda/pdf/CUDA_Memcheck.pdf>`_
+* `CUDA maths library <http://docs.nvidia.com/pdf/CUDA_Math_API.pdf>`_
+* `CUBLAS library <http://docs.nvidia.com/cuda/pdf/CUDA_CUBLAS_Users_Guide.pdf>`_
+* `CUFFT library <http://docs.nvidia.com/cuda/pdf/CUDA_CUFFT_Users_Guide.pdf>`_
+* `CUSPARSE library <http://docs.nvidia.com/cuda/pdf/CUDA_CUSPARSE_Users_Guide.pdf>`_
+* `CURAND library <http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf>`_
+* `NCCL multi-GPU communications library <https://developer.nvidia.com/nccl>`_
+* `NVIDIA blog article <https://devblogs.nvidia.com/parallelforall/fast-multi-gpu-collectives-nccl/>`_
+* `GTC 2015 presentation on NCCL <http://images.nvidia.com/events/sc15/pdfs/NCCL-Woolley.pdf>`_
+* `PTX (low-level instructions) <http://docs.nvidia.com/cuda/pdf/ptx_isa_4.1.pdf>`_
+
+
+Nsight is NVIDIA's integrated development environment:
+
+* `Nsight Visual Studio <https://developer.nvidia.com/nvidia-nsight-visual-studio-edition>`_
+* `Nsight Eclipse <https://developer.nvidia.com/nsight-eclipse-edition>`_
+* `Nsight Eclipse -- Getting Started <http://docs.nvidia.com/cuda/nsight-eclipse-edition-getting-started-guide/index.html>`_
+
+
+NVIDIA also provide helpful guides on the Pascal architecture:
+
+* `Floating point accuracy on NVIDIA GPUs <http://docs.nvidia.com/cuda/pdf/Floating_Point_on_NVIDIA_GPU_White_Paper.pdf>`_
+* `CUDA SDK examples <http://developer.nvidia.com/object/cuda_sdk_samples.html">`_
+* `OpenACC <http://www.openacc.org>`_
+* `OpenMP 4.5 <http://on-demand.gputechconf.com/gtc/2016/presentation/s6510-jeff-larkin-targeting-gpus-openmp.pdf>`_
+
+
+NVIDIA also provide helpful guides on the Pascal architecture:
+
+* `Pascal Tuning Guide <http://docs.nvidia.com/cuda/pascal-tuning-guide/>`_
+* `Pascal P100 White Paper <https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf>`_
+
+
+Useful presentations at NVIDIA's 2017 GTC conference contain:
+
+* `Cooperative Groups <http://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf>`_
+* `NCCL 2.0 <http://on-demand.gputechconf.com/gtc/2017/presentation/s7155-jeaugey-nccl.pdf>`_
+* `Multi-GPU Programming <http://on-demand.gputechconf.com/gtc/2017/presentation/s7142-jiri-kraus-multi-gpu-programming-models.pdf>`_
+* `The Making of Saturn-V <http://on-demand.gputechconf.com/gtc/2017/presentation/s7750-louis-capps-making-of-dgx-saturnv.pdf>`_
+
+
+.. toctree::
+    :maxdepth: 1
+    :glob:
diff --git a/index.rst b/index.rst
@@ -55,5 +55,6 @@ JADE hardware consists of:
 
    jade/index
    software/index
+   cuda/index
    more_info
    troubleshooting
diff --git a/jade/connecting.rst b/jade/connecting.rst
@@ -3,16 +3,7 @@
 Connecting to the cluster using SSH
 ===================================
 
-The most versatile way to **run commands and submit jobs** on the cluster is to
-use a mechanism called `SSH <https://en.wikipedia.org/wiki/Secure_Shell>`__,
-which is a common way of remotely logging in to computers
-running the Linux operating system.
-
-To connect to another machine using SSH you need to
-have a SSH *client* program installed on your machine.
-macOS and Linux come with a command-line (text-only) SSH client pre-installed.
-On Windows there are various graphical SSH clients you can use,
-including *MobaXTerm*.
+To log onto the JADE cluster you must use `SSH <https://en.wikipedia.org/wiki/Secure_Shell>`_, which is a common way of remotely logging in to computers running the Linux operating system.  To do this, you need to have a SSH *client* program installed on your machine. macOS and Linux come with a command-line (text-only) SSH client pre-installed.  On Windows there are various graphical SSH clients you can use, including *MobaXTerm*.
 
 
 SSH client software on Windows
@@ -52,29 +43,29 @@ Open a terminal (e.g. *Gnome Terminal* on Linux or *Terminal* on macOS) and then
 Establishing a SSH connection
 -----------------------------
 
-Once you have a terminal open run the following command to
-log in to a cluster: ::
-
+Once you have a terminal open, run the following command to log into one of the JADE front-end nodes:
+::
     ssh -l $USER jade.hartree.stfc.ac.uk
 
 Here you need to replace ``$USER`` with your username (e.g. ``te1st-test``)
 
 .. note::
 
-  JADE has multiple front-end systems, and because of this some SSH software operating under stringent security settings might give **warnings about possible man-in-the-middle attacks** because of apparent changes in machine settings. This is a known issue and is being addressed, but in the meantime **these warnings can be safely ignored**
+    **macOS users**: if this fails then:
+
+    * Check that your `XQuartz <https://www.xquartz.org/>`_ is up to date then try again *or*
+    * Try again with ``-Y`` instead of ``-X``
 
-  To ignore the warning, add the option `-o StrictHostKeyChecking=no` to your SSH command e.g.:
-      `ssh -o StrictHostKeyChecking=no -l $USER jade.hartree.stfc.ac.uk`
-  Or in your `~/.ssh/config` file, add the line:
-     `StrictHostKeyChecking no`
 
 .. note::
 
-    **macOS users**: if this fails then:
+   JADE has multiple front-end systems, and because of this some SSH software operating under stringent security settings might give warnings about possible man-in-the-middle attacks because of apparent changes in machine settings. This is a known issue and is being addressed, but in the meantime these warnings can be safely ignored.
 
-    * Check that your `XQuartz <https://www.xquartz.org/>`_ is up to date then try again *or*
-    * Try again with ``-Y`` instead of ``-X``
 
-This should give you a prompt resembling the one below: ::
+.. note::
+
+    When you login to a cluster you reach one of two login nodes.
+    You **should not** run applications on the login nodes.
+    Running ``srun`` gives you an interactive terminal
+    on one of the many worker nodes in the cluster.
 
-    te1st-test@dgj223:~$
diff --git a/jade/index.rst b/jade/index.rst
@@ -3,17 +3,48 @@
 Using the JADE Facility
 =======================
 
-If you have not used a High Performance Computing (HPC) cluster, the Linux operating system or even a command line before this is the place to start. This guide will get you set up using the JADE cluster fairly quickly.
+The JADE facility consists of 2 head nodes and 22 NVIDIA `DGX-1 <https://www.scan.co.uk/3xs/info/nvidia-dgx-1>`_ servers, each with 8 GPUs and 40 CPU cores.
 
-.. The whole system from the user perspective is interacted with via the Slurm Workload Manager on the login nodes. Via this scheduler, access to the compute nodes, can be interactive or batch. The installed application software consists of a mixture of docker container images, supplied by Nvidia, and executables built from source. Both container images and executables can use the system either interactively or in batch mode.
 
-.. It is only possible to ssh onto a node which has been allocated to the user. Once the session completes the ssh access is removed. Access to the global parallel file system is from the login nodes and all compute nodes. Any data on this file system is retained after a session on the nodes completes. There is also access to local disc space on each node. Access to this file system is only possible during a Slurm session. Once the session completes the local disc data is removed.
+**Accounts**
 
+Users get accounts on the system by following the directions in the **Getting an account** section, on the left.
+
+New users must provide a public SSH key and are given a user ID on JADE.  Using this ID, they will then be able to login to one of the head nodes by using an SSH command like
+::
+   ssh -l account_name jade.hartree.stfc.ac.uk
+
+Further details are in the section **Connecting to the cluster using SSH**.
+
+
+**Software**
+
+The software packages already installed on JADE comes in two kinds: *standard applications* (primarily Molecular Dynamics) and *containerised applications* (various Machine Learning applications in particular).  These are described further in the **Software on JADE** section on the left.
+
+The *module* system is used to control your working environment and the particular version of software which you want to use; details are given in the section **The `module` tool** on the left.
+
+If not using the installed software, you are also welcome to build your own applications.  Codes can be built on the head node, and a number of compilers and MPI library stacks are available via the modules.
+
+
+**Running applications**
+
+Applications can only be run on the compute nodes by submitting jobs to the Slurm batch queuing system.  Examples of Slurm submission scripts are given in the relevant sections for each of the main software packages.
+
+It is also possible to obtain an interactive session through Slurm on one of the compute nodes.  This is usually only for code development
+purposes; submitting batch jobs is the standard way of working.
+
+
+**Storage**
+
+The global file system is accessible from both the head nodes and the compute nodes.  Any files written during the job execution on the compute nodes will be found on the file system after the job has completed.
+
+There is also access to local disc space on each compute node, but this access only possible during a Slurm job and once the job is completed the local disc data is removed automatically.  In machine learning applications, for example, this local disc space (provided by fast SSD) may be useful as a staging point for very large training sets.
 
 
 
 .. toctree::
-   :maxdepth: -1
+   :maxdepth: 2
+   :glob:
 
    getting-account
    connecting

diff --git a/software/apps/gromacs.rst b/software/apps/gromacs.rst
@@ -3,11 +3,17 @@
 Gromacs
 =======
 
-Job scripts
------------
+.. sidebar:: Gromacs
+
+  :URL: http://www.gromacs.org/
+  :URL: https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/gromacs/
+
 
 Gromacs is a versatile package for molecular dynamics simulations, which solves the Newtonian equations of motion for systems with hundreds to millions of particles.  Although the software scales well to hundreds of cores for typical simulations, Gromacs calculations are restricted to at most a single node on the JADE service.
 
+Job scripts
+-----------
+
 The following is an example Slurm script to run the code using one of the regression tests from the installation:
 
 ::

diff --git a/software/apps/namd.rst b/software/apps/namd.rst
@@ -3,11 +3,18 @@
 NAMD
 ====
 
-Job scripts
------------
+.. sidebar:: NAMD
+
+   :URL: http://www.ks.uiuc.edu/Research/namd/
+   :URL: https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/namd/
+
 
 NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.  NAMD scales to hundreds of cores for typical simulations, however NAMD calculations are restricted to at most a single node on the JADE service.
 
+
+Job scripts
+-----------
+
 Below is an example of a NAMD job script
 
 ::