Skip to content

Commit

Permalink
Merge pull request #3 from twinkarma/master
Browse files Browse the repository at this point in the history
Reorganised documentation in to correct sections
  • Loading branch information
twinkarma committed Oct 6, 2017
2 parents 2b55662 + 25774d1 commit b89eb62
Show file tree
Hide file tree
Showing 17 changed files with 293 additions and 90 deletions.
Binary file added images/mobaxterm-terminal.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/mobaxterm-welcome.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Run by Research Software Engineering' Research Computing Group with additional s
.. toctree::
:maxdepth: -1
:hidden:

jade/index
software/index
more_info
troubleshooting
troubleshooting
98 changes: 98 additions & 0 deletions jade/connecting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
.. _connecting:

Connecting to the cluster using SSH
===================================

The most versatile way to **run commands and submit jobs** on one of the clusters is to
use a mechanism called `SSH <https://en.wikipedia.org/wiki/Secure_Shell>`__,
which is a common way of remotely logging in to computers
running the Linux operating system.

To connect to another machine using SSH you need to
have a SSH *client* program installed on your machine.
macOS and Linux come with a command-line (text-only) SSH client pre-installed.
On Windows there are various graphical SSH clients you can use,
including *MobaXTerm*.


SSH client software on Windows
------------------------------

Download and install the *Installer edition* of `mobaxterm <https://mobaxterm.mobatek.net/download-home-edition.html>`_.

After starting MobaXterm you should see something like this:

.. image:: /images/mobaxterm-welcome.png
:width: 50%
:align: center

Click *Start local terminal* and if you see something like the following then please continue to :ref:`ssh`.

.. image:: /images/mobaxterm-terminal.png
:width: 50%
:align: center

Running commands from a terminal (from the command-line) may initially be
unfamiliar to Windows users but this is the recommended approach for
running commands on ShARC and Iceberg as
it is the idiomatic way of interfacing with the Linux clusters.

SSH client software on Mac OS/X and Linux
-----------------------------------------

Linux and macOS (OS X) both typically come with a command-line SSH client pre-installed.

If you are using macOS and want to be able to run graphical applications on the clusters then
you need to install the latest version of the `XQuartz <https://www.xquartz.org/>`_ *X Windows server*.

Open a terminal (e.g. *Gnome Terminal* on Linux or *Terminal* on macOS) and then go to :ref:`ssh`.

.. _ssh:

Establishing a SSH connection
-----------------------------

Once you have a terminal open run the following command to
log in to a cluster: ::

ssh -l $USER jade.hartree.stfc.ac.uk

Here you need to replace ``$USER`` with your username (e.g. ``te1st-test``)

.. note::

**macOS users**: if this fails then:

* Check that your `XQuartz <https://www.xquartz.org/>`_ is up to date then try again *or*
* Try again with ``-Y`` instead of ``-X``

This should give you a prompt resembling the one below: ::

te1st-test@dgj223:~$

At this prompt, to run ``bash`` in an interactive working node, type: ::

srun --pty bash

Like this: ::

te1st-test@dgj223:~$ srun --pty bash

Notice that you have been moved from the head node ``dgj223`` to the working node ``dgj113`` ready to run jobs interactively: ::

te1st-test@dgj113:~$


.. note::

When you login to a cluster you reach one of two login nodes.
You **should not** run applications on the login nodes.
Running ``srun`` gives you an interactive terminal
on one of the many worker nodes in the cluster.



What Next?
----------

Now you have connected to a cluster, you can look at how to submit jobs with :ref:`Slurm <slurm>` or look at the software installed on :ref:`JADE <software>`.
16 changes: 8 additions & 8 deletions jade/containers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This is 6.6TB in size but any data will be lost once the interactive session is
1.Interactive Mode
------------------

All the applications in containers can be launched interactively in the same way using 1 compute node at a time. The number of GPUs to be used per node is requested using the ``gres`` option. To request an interactive session on a compute node the following command is issued from the login node:
All the applications in containers can be launched interactively in the same way using 1 compute node at a time. The number of GPUs to be used per node is requested using the ``gres`` option. To request an interactive session on a compute node the following command is issued from the login node:

::

Expand All @@ -23,22 +23,22 @@ This command will show the following, which is now running on a compute node:
::

================
==NVIDIA Caffe==
==NVIDIA Caffe==
================

NVIDIA Release 17.04 (build 26740)

Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

groups: cannot find name for group ID 1002
I have no name!@124cf0e3582e:/home_directory$

Note. The warnings in the last two lines can be ignored. To exit the container, issue the "exit" command. To launch the other containers the commands are:
Note. The warnings in the last two lines can be ignored. To exit the container, issue the "exit" command. To launch the other containers the commands are:

::

Expand All @@ -48,7 +48,7 @@ Note. The warnings in the last two lines can be ignored. To exit the container,
2.Batch Mode
------------

There are wrappers for launching the containers in batch mode. For example, to launch the Torch application change directory to where the launching script is, in this case called ``submit-char.sh``:
There are wrappers for launching the containers in batch mode. For example, to launch the Torch application change directory to where the launching script is, in this case called ``submit-char.sh``:

::

Expand Down
60 changes: 60 additions & 0 deletions jade/getting-account.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
.. _getting-account:

Getting an account
==================

As a regular user, getting started involves 3 steps:

1) Apply for a Hartree SAFE account
-----------------------------------

This is a web account which will show you which projects you belong to, and the accounts which you have in them.

Before applying for a SAFE account, you should first have an SSH key-pair, and be ready to provide your public key as part of the SAFE registration process. Information on generating and using SSH keys is available here:
http://yukon.dl.ac.uk:8080/wiki/site/admin/SAFE%20User%20Guide.html#ssh
but for any help you should contact your local university IT support staff.

Once you have your public SSH key ready, apply for your SAFE account by going here:
https://um.hartree.stfc.ac.uk/hartree/login.jsp
and providing all of the required information.

When your account has been approved, you will receive an email giving your initial password. When you login for the first time you will be asked to change it to a new one.

Further details on the registration process are available here:
http://community.hartree.stfc.ac.uk/wiki/site/admin/safe%20user%20guide.html

2) Apply for a JADE project account
-----------------------------------

Once your SAFE account is established, login to it and click on "Request Join Project".

From the drop-down list select the appropriate project, enter the signup code which you should have been given by the project PI or manager, and then click "Request".

The approval process goes through several steps:
a) approval by the PI or project manager -- once this is done the SAFE status changes to Pending
b) initial account setup -- once this is done the SAFE status changes to Active
c) completion of account setup -- once this is done you will get an email confirming you are all set, and your SAFE account will have full details on your new project account

This process shouldn't take more than 2 working days. If it takes more than that, check whether the PI or project manager is aware that you have applied, and therefore your application needs their approval through the SAFE system.

If your SAFE userid is xyz, and your project suffix is abc, then your project account username will be xyz-abc and you will login to JADE using the command:

ssh -l xyz-abc jade.hartree.stfc.ac.uk

Note that some users may belong to more than one project, in which case they will have different account usernames for each project, and all of them will be listed on their SAFE web account.

Each project account will have a separate file structure, and separate quotas for GPU time, filestore and other system resources.

Note also that JADE has multiple front-end systems, and because of this some SSH software operating under stringent security settings might give warnings about possible man-in-the-middle attacks because of apparent changes in machine settings. This is a known issue and is being addressed, but in the meantime these warnings can be safely ignored.

3) Apply for a Hartree ServiceNow account
-----------------------------------------

This is a web account used for reporting any operational issues with JADE.

To obtain an account follow the directions here:
http://community.hartree.stfc.ac.uk/wiki/site/admin/servicenow.html

Note the guidance which explains that the first time you try to login you will not have a password so you need to click on the link which says "reset your password here".

Due to a problem with synchronising userids between ServiceNow and JADE, it is possible that ServiceNow may say that your email address is not recognised. If this happens, please send an email to hartree@stfc.ac.uk and ask them to add you to the ServiceNow database.
58 changes: 8 additions & 50 deletions jade/index.rst
Original file line number Diff line number Diff line change
@@ -1,64 +1,22 @@
.. _getting-started:

Using the JADE Facility
***********************
=======================

The whole system from the user perspective is interacted with via the Slurm Workload Manager on the login nodes. Via this scheduler, access to the compute nodes, can be interactive or batch. The installed application software consists of a mixture of docker container images, supplied by Nvidia, and executables built from source. Both container images and executables can use the system either interactively or in batch mode.
If you have not used a High Performance Computing (HPC) cluster, the Linux operating system or even a command line before this is the place to start. This guide will get you set up using the JADE cluster fairly quickly.

It is only possible to ssh onto a node which has been allocated to the user. Once the session completes the ssh access is removed. Access to the global parallel file system is from the login nodes and all compute nodes. Any data on this file system is retained after a session on the nodes completes. There is also access to local disc space on each node. Access to this file system is only possible during a Slurm session. Once the session completes the local disc data is removed.
.. The whole system from the user perspective is interacted with via the Slurm Workload Manager on the login nodes. Via this scheduler, access to the compute nodes, can be interactive or batch. The installed application software consists of a mixture of docker container images, supplied by Nvidia, and executables built from source. Both container images and executables can use the system either interactively or in batch mode.
The software initially installed on the machine is listed in the following table:
.. It is only possible to ssh onto a node which has been allocated to the user. Once the session completes the ssh access is removed. Access to the global parallel file system is from the login nodes and all compute nodes. Any data on this file system is retained after a session on the nodes completes. There is also access to local disc space on each node. Access to this file system is only possible during a Slurm session. Once the session completes the local disc data is removed.
.. csv-table::
:header: Application,Version,Note
:widths: 20, 20, 10
GNU compiler suite, 4.8.4, part of O/S
PGI compiler suite, 17.4,
OpenMPI, 1.10.2, Supplied with PGI
OpenMPI, 1.10.5a1, Supplied with PGI
Gromacs, 2016.3, Supplied by Nvidia
NAMD, 2.12,

This software has been built from source and installed as modules. To list the source built applications do:

::

$ module avail
----------------------------------------------------- /jmain01/apps/modules -----------------------------
gromacs/2016.3 openmpi/1.10.2/2017 pgi/17.4(default) pgi64/17.4(default)
PrgEnv-pgi/17.4(default) NAMD/2.12 openmpi/1.10.5a1/GNU pgi/2017 pgi64/2017

The applications initially supplied by Nvidia as containers are listed in the following table:

.. csv-table::
:header: Application,Version
:widths: 20, 20

Caffe, 17.04
Theano, 17.04
Torch, 17.04

To list the containers and version available on the system do:

::

$ containers
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda latest 15e5dedd88c5 4 weeks ago 1.67 GB
nvcr.io/nvidia/caffe 17.04 87c288427f2d 6 weeks ago 2.794 GB
nvcr.io/nvidia/theano 17.04 24943feafc9b 8 weeks ago 2.386 GB
nvcr.io/nvidia/torch 17.04 a337ffb42c8e 9 weeks ago 2.9 GB

The following brief notes explain how to run the various applications.
.. toctree::
:maxdepth: -1
:hidden:

Modules
Slurm
Gromacs
NAMD
getting-account
connecting
scheduler/index
modules
containers
4 changes: 2 additions & 2 deletions jade/Modules.rst → jade/modules.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _modules
.. _modules:

The ``module`` tool
===================
Expand All @@ -8,7 +8,7 @@ Introduction

The Linux operating system makes extensive use of the *working environment*, which is a collection of individual environment variables. An environment variable is a named object in the Linux shell that contains information used by one or more applications; two of the most used such variables are ``$HOME``, which defines a user's home directory name, and ``$PATH``, which represents a list paths to different executables. A large number of environment variables are already defined when a Linux shell is open but the environment can be customised, either by defining new environment variables relevant to certain applications or by modifying existing ones (e.g. adding a new path to ``$PATH``).

``module`` is a Software Environment Management tool, which is used to manage of working environment in preparation for running the applications installed on the JADE. By loading the module for a certain installed application, the environment variables that are relevant for that application are automatically defined or modified.
``module`` is a Software Environment Management tool, which is used to manage the working environment in preparation for running the applications installed on JADE. By loading the module for a certain installed application, the environment variables that are relevant for that application are automatically defined or modified.

Useful commands
---------------
Expand Down
14 changes: 6 additions & 8 deletions jade/Slurm.rst → jade/scheduler/index.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
.. _slurm
.. _slurm:

Slurm
=====
The Slurm Scheduler
===================

Introduction
------------

Running software on the JADE system is accomplished via batch jobs, *i.e.* in an unattended, non-interactive manner. Typically a user logs in to the JADE login nodes, prepares a job script and submits it to the job queue.

Jobs on JADE are managed by the Slurm_ batch system, which is in charge of

.. _Slurm: https://slurm.schedmd.com/
Jobs on JADE are managed by the `Slurm <https://slurm.schedmd.com>`_ batch system, which is in charge of:

* allocating the computer resources requested for the job,
* running the job and
Expand Down Expand Up @@ -81,7 +79,7 @@ A submission script is a Linux shell script that
* describes the processing to carry out (e.g. the application, its input and output, etc.) and
* requests computer resources (number of cpus, amount of memory, etc.) to use for processing.

The simplest case is that of a job that requires a single node (this is the smallest unit we allocate on arcus-b) with the following requirements:
The simplest case is that of a job that requires a single node with the following requirements:

* the job uses 1 node,
* the application is a single process,
Expand Down Expand Up @@ -121,7 +119,7 @@ The script continues with a series of lines starting with ``#``, which represent

The resource request ``#SBATCH --nodes=n`` determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job.

The maximum walltime is specified by ``#SBATCH --time=T``, where ``T`` has format **h:m:s**. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not.
The maximum walltime is specified by ``#SBATCH --time=T``, where ``T`` has format **h:m:s**. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not.

The name of the job can be specified too with ``#SBATCH --job-name=name``.

Expand Down
2 changes: 1 addition & 1 deletion more_info.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

More Information
################
================

JADE Web site: http://www.arc.ox.ac.uk/content/jade

Expand Down
26 changes: 13 additions & 13 deletions jade/Gromacs.rst → software/apps/Gromacs.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _gromacs
.. _gromacs:

Gromacs
=======
Expand All @@ -9,18 +9,18 @@ The code was compiled using OpenMPI v1.10.5a1 and GCC v4.8.4 with the following

::

CC=mpicc CXX=mpicxx cmake /jmain01/home/atostest/Building/gromacs-2016.3
-DGMX_OPENMP=ON
-DGMX_GPU=ON
-DGPU_DEPLOYMENT_KIT_ROOT_DIR=/usr/local/cuda-8.0/targets/x86_64-linux
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0/targets/x86_64-linux
-DNVML_INCLUDE_DIR=/usr/local/cuda-8.0/targets/x86_64-linux/include
-DNVML_LIBRARY=/usr/lib/nvidia-375/libnvidia-ml.so
-DHWLOC_INCLUDE_DIRS=/usr/mpi/gcc/openmpi-1.10.5a1/include/openmpi/opal/mca/hwloc/hwloc191/hwloc/include
-DGMX_BUILD_OWN_FFTW=ON
-DGMX_PREFER_STATIC_LIBS=ON
-DCMAKE_BUILD_TYPE=Release
-DGMX_BUILD_UNITTESTS=ON
CC=mpicc CXX=mpicxx cmake /jmain01/home/atostest/Building/gromacs-2016.3
-DGMX_OPENMP=ON
-DGMX_GPU=ON
-DGPU_DEPLOYMENT_KIT_ROOT_DIR=/usr/local/cuda-8.0/targets/x86_64-linux
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0/targets/x86_64-linux
-DNVML_INCLUDE_DIR=/usr/local/cuda-8.0/targets/x86_64-linux/include
-DNVML_LIBRARY=/usr/lib/nvidia-375/libnvidia-ml.so
-DHWLOC_INCLUDE_DIRS=/usr/mpi/gcc/openmpi-1.10.5a1/include/openmpi/opal/mca/hwloc/hwloc191/hwloc/include
-DGMX_BUILD_OWN_FFTW=ON
-DGMX_PREFER_STATIC_LIBS=ON
-DCMAKE_BUILD_TYPE=Release
-DGMX_BUILD_UNITTESTS=ON
-DCMAKE_INSTALL_PREFIX=/jmain01/home/atostest/gromacs-2016.3

The following is an example Slurm script to run the code using one of the regression tests from the installation:
Expand Down

0 comments on commit b89eb62

Please sign in to comment.