Skip to content

Commit

Permalink
Merge pull request #8 from jade-hpc-gpu/dev__mihai
Browse files Browse the repository at this point in the history
update to readthedocs (using JADE + cuda)
  • Loading branch information
mcduta committed Oct 10, 2017
2 parents e7691d6 + 15aee84 commit a24b45f
Show file tree
Hide file tree
Showing 7 changed files with 172 additions and 31 deletions.
47 changes: 47 additions & 0 deletions cuda/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
.. _software:

CUDA
====

.. sidebar:: CUDA

:URL: http://www.nvidia.co.uk/object/cuda-parallel-computing-uk.html

CUDA is a parallel computing platform and API model created and developed by Nvidia, which enables dramatic increases in computing performance by harnessing the power of GPUs


Versions
--------
Multiple CUDA versions are available through the module system


Environment
-----------
The CUDA environment is managed through the modules, which set all the environment variables needed. The availability of different versions can be checked with ::

module avail cuda

The environment set by a particular module can be inspected, *e.g.* ::

module show cuda/9.0


Learn more
----------
To learn more about CUDA programming, either talk to your local RSE
support, or visit Mike Giles' CUDA Programming course page at

http://people.maths.ox.ac.uk/gilesm/cuda/

This one-week course is taught in Oxford at the end of July each year,
but all of the lecture notes and practicals are provided online for
self-study at other times.




.. toctree::
:maxdepth: 2
:glob:

learn/index
58 changes: 58 additions & 0 deletions cuda/learn/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. _learn:

CUDA documentation
==================

NVIDIA provides lots of documentation, both online and in downloadable form:

* `Online CUDA documentation <http://docs.nvidia.com/cuda/index.html>`_
* `CUDA homepage <http://www.nvidia.com/object/cuda_home.html>`_
* `CUDA Runtime API <http://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf>`_
* `CUDA C Best Practices Guide <http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf>`_
* `CUDA Compiler Driver NVCC <http://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf>`_
* `CUDA Visual Profiler <http://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf>`_
* `CUDA-gdb debugger <http://docs.nvidia.com/cuda/pdf/CUDA_GDB.pdf>`_
* `CUDA-memcheck memory checker <http://docs.nvidia.com/cuda/pdf/CUDA_Memcheck.pdf>`_
* `CUDA maths library <http://docs.nvidia.com/pdf/CUDA_Math_API.pdf>`_
* `CUBLAS library <http://docs.nvidia.com/cuda/pdf/CUDA_CUBLAS_Users_Guide.pdf>`_
* `CUFFT library <http://docs.nvidia.com/cuda/pdf/CUDA_CUFFT_Users_Guide.pdf>`_
* `CUSPARSE library <http://docs.nvidia.com/cuda/pdf/CUDA_CUSPARSE_Users_Guide.pdf>`_
* `CURAND library <http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf>`_
* `NCCL multi-GPU communications library <https://developer.nvidia.com/nccl>`_
* `NVIDIA blog article <https://devblogs.nvidia.com/parallelforall/fast-multi-gpu-collectives-nccl/>`_
* `GTC 2015 presentation on NCCL <http://images.nvidia.com/events/sc15/pdfs/NCCL-Woolley.pdf>`_
* `PTX (low-level instructions) <http://docs.nvidia.com/cuda/pdf/ptx_isa_4.1.pdf>`_


Nsight is NVIDIA's integrated development environment:

* `Nsight Visual Studio <https://developer.nvidia.com/nvidia-nsight-visual-studio-edition>`_
* `Nsight Eclipse <https://developer.nvidia.com/nsight-eclipse-edition>`_
* `Nsight Eclipse -- Getting Started <http://docs.nvidia.com/cuda/nsight-eclipse-edition-getting-started-guide/index.html>`_


NVIDIA also provide helpful guides on the Pascal architecture:

* `Floating point accuracy on NVIDIA GPUs <http://docs.nvidia.com/cuda/pdf/Floating_Point_on_NVIDIA_GPU_White_Paper.pdf>`_
* `CUDA SDK examples <http://developer.nvidia.com/object/cuda_sdk_samples.html">`_
* `OpenACC <http://www.openacc.org>`_
* `OpenMP 4.5 <http://on-demand.gputechconf.com/gtc/2016/presentation/s6510-jeff-larkin-targeting-gpus-openmp.pdf>`_


NVIDIA also provide helpful guides on the Pascal architecture:

* `Pascal Tuning Guide <http://docs.nvidia.com/cuda/pascal-tuning-guide/>`_
* `Pascal P100 White Paper <https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf>`_


Useful presentations at NVIDIA's 2017 GTC conference contain:

* `Cooperative Groups <http://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf>`_
* `NCCL 2.0 <http://on-demand.gputechconf.com/gtc/2017/presentation/s7155-jeaugey-nccl.pdf>`_
* `Multi-GPU Programming <http://on-demand.gputechconf.com/gtc/2017/presentation/s7142-jiri-kraus-multi-gpu-programming-models.pdf>`_
* `The Making of Saturn-V <http://on-demand.gputechconf.com/gtc/2017/presentation/s7750-louis-capps-making-of-dgx-saturnv.pdf>`_


.. toctree::
:maxdepth: 1
:glob:
1 change: 1 addition & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ JADE hardware consists of:

jade/index
software/index
cuda/index
more_info
troubleshooting
37 changes: 14 additions & 23 deletions jade/connecting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,7 @@
Connecting to the cluster using SSH
===================================

The most versatile way to **run commands and submit jobs** on the cluster is to
use a mechanism called `SSH <https://en.wikipedia.org/wiki/Secure_Shell>`__,
which is a common way of remotely logging in to computers
running the Linux operating system.

To connect to another machine using SSH you need to
have a SSH *client* program installed on your machine.
macOS and Linux come with a command-line (text-only) SSH client pre-installed.
On Windows there are various graphical SSH clients you can use,
including *MobaXTerm*.
To log onto the JADE cluster you must use `SSH <https://en.wikipedia.org/wiki/Secure_Shell>`_, which is a common way of remotely logging in to computers running the Linux operating system. To do this, you need to have a SSH *client* program installed on your machine. macOS and Linux come with a command-line (text-only) SSH client pre-installed. On Windows there are various graphical SSH clients you can use, including *MobaXTerm*.


SSH client software on Windows
Expand Down Expand Up @@ -52,29 +43,29 @@ Open a terminal (e.g. *Gnome Terminal* on Linux or *Terminal* on macOS) and then
Establishing a SSH connection
-----------------------------

Once you have a terminal open run the following command to
log in to a cluster: ::

Once you have a terminal open, run the following command to log into one of the JADE front-end nodes:
::
ssh -l $USER jade.hartree.stfc.ac.uk

Here you need to replace ``$USER`` with your username (e.g. ``te1st-test``)

.. note::

JADE has multiple front-end systems, and because of this some SSH software operating under stringent security settings might give **warnings about possible man-in-the-middle attacks** because of apparent changes in machine settings. This is a known issue and is being addressed, but in the meantime **these warnings can be safely ignored**
**macOS users**: if this fails then:

* Check that your `XQuartz <https://www.xquartz.org/>`_ is up to date then try again *or*
* Try again with ``-Y`` instead of ``-X``

To ignore the warning, add the option `-o StrictHostKeyChecking=no` to your SSH command e.g.:
`ssh -o StrictHostKeyChecking=no -l $USER jade.hartree.stfc.ac.uk`
Or in your `~/.ssh/config` file, add the line:
`StrictHostKeyChecking no`

.. note::

**macOS users**: if this fails then:
JADE has multiple front-end systems, and because of this some SSH software operating under stringent security settings might give warnings about possible man-in-the-middle attacks because of apparent changes in machine settings. This is a known issue and is being addressed, but in the meantime these warnings can be safely ignored.

* Check that your `XQuartz <https://www.xquartz.org/>`_ is up to date then try again *or*
* Try again with ``-Y`` instead of ``-X``

This should give you a prompt resembling the one below: ::
.. note::

When you login to a cluster you reach one of two login nodes.
You **should not** run applications on the login nodes.
Running ``srun`` gives you an interactive terminal
on one of the many worker nodes in the cluster.

te1st-test@dgj223:~$
39 changes: 35 additions & 4 deletions jade/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,48 @@
Using the JADE Facility
=======================

If you have not used a High Performance Computing (HPC) cluster, the Linux operating system or even a command line before this is the place to start. This guide will get you set up using the JADE cluster fairly quickly.
The JADE facility consists of 2 head nodes and 22 NVIDIA `DGX-1 <https://www.scan.co.uk/3xs/info/nvidia-dgx-1>`_ servers, each with 8 GPUs and 40 CPU cores.

.. The whole system from the user perspective is interacted with via the Slurm Workload Manager on the login nodes. Via this scheduler, access to the compute nodes, can be interactive or batch. The installed application software consists of a mixture of docker container images, supplied by Nvidia, and executables built from source. Both container images and executables can use the system either interactively or in batch mode.

.. It is only possible to ssh onto a node which has been allocated to the user. Once the session completes the ssh access is removed. Access to the global parallel file system is from the login nodes and all compute nodes. Any data on this file system is retained after a session on the nodes completes. There is also access to local disc space on each node. Access to this file system is only possible during a Slurm session. Once the session completes the local disc data is removed.
**Accounts**

Users get accounts on the system by following the directions in the **Getting an account** section, on the left.

New users must provide a public SSH key and are given a user ID on JADE. Using this ID, they will then be able to login to one of the head nodes by using an SSH command like
::
ssh -l account_name jade.hartree.stfc.ac.uk

Further details are in the section **Connecting to the cluster using SSH**.


**Software**

The software packages already installed on JADE comes in two kinds: *standard applications* (primarily Molecular Dynamics) and *containerised applications* (various Machine Learning applications in particular). These are described further in the **Software on JADE** section on the left.

The *module* system is used to control your working environment and the particular version of software which you want to use; details are given in the section **The `module` tool** on the left.

If not using the installed software, you are also welcome to build your own applications. Codes can be built on the head node, and a number of compilers and MPI library stacks are available via the modules.


**Running applications**

Applications can only be run on the compute nodes by submitting jobs to the Slurm batch queuing system. Examples of Slurm submission scripts are given in the relevant sections for each of the main software packages.

It is also possible to obtain an interactive session through Slurm on one of the compute nodes. This is usually only for code development
purposes; submitting batch jobs is the standard way of working.


**Storage**

The global file system is accessible from both the head nodes and the compute nodes. Any files written during the job execution on the compute nodes will be found on the file system after the job has completed.

There is also access to local disc space on each compute node, but this access only possible during a Slurm job and once the job is completed the local disc data is removed automatically. In machine learning applications, for example, this local disc space (provided by fast SSD) may be useful as a staging point for very large training sets.



.. toctree::
:maxdepth: -1
:maxdepth: 2
:glob:

getting-account
connecting
Expand Down
10 changes: 8 additions & 2 deletions software/apps/gromacs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,17 @@
Gromacs
=======

Job scripts
-----------
.. sidebar:: Gromacs

:URL: http://www.gromacs.org/
:URL: https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/gromacs/


Gromacs is a versatile package for molecular dynamics simulations, which solves the Newtonian equations of motion for systems with hundreds to millions of particles. Although the software scales well to hundreds of cores for typical simulations, Gromacs calculations are restricted to at most a single node on the JADE service.

Job scripts
-----------

The following is an example Slurm script to run the code using one of the regression tests from the installation:

::
Expand Down
11 changes: 9 additions & 2 deletions software/apps/namd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,18 @@
NAMD
====

Job scripts
-----------
.. sidebar:: NAMD

:URL: http://www.ks.uiuc.edu/Research/namd/
:URL: https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/namd/


NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of cores for typical simulations, however NAMD calculations are restricted to at most a single node on the JADE service.


Job scripts
-----------

Below is an example of a NAMD job script

::
Expand Down

0 comments on commit a24b45f

Please sign in to comment.