Skip to content

Commit

Permalink
Job arrays and ssh keypair
Browse files Browse the repository at this point in the history
  • Loading branch information
matteosecli committed May 22, 2020
1 parent ea8df6a commit 943bcbe
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 1 deletion.
69 changes: 68 additions & 1 deletion doc/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,74 @@ This is the same example, but with an explicit setup to ask for 2 full nodes.
Job Arrays
----------
.. warning:: **WORK IN PROGRESS!**
Job arrays are a handy way to send multiple jobs that vary e.g. just by some parameters of the calculation. SLURM's documentation has a very well-written page on job arrays, I suggest you to take a look for more details and examples: https://slurm.schedmd.com/job_array.html. Here I'll just show a couple of examples.
A job array is specified via the :data:`--array=<value>` option (see :ref:`Partition, Walltime and Output`), that takes a range of integers as ``<value>``. This range can be specified as an interval, e.g. ``1-10`` (numbers from 1 to 10), or as a sequence, e.g. ``3,5,23``, or both, e.g. ``1-5,13`` (numbers from 1 to 5, then 13).
For example, if one uses the option :data:`--array=1-5,13`, then SLURM will generate 6 different jobs, each of one containing the following environment variables:
.. table::
:width: 100%
:widths: auto
+-----------------------------+-----------------------------------------+
| Environment Variable | Value |
+=============================+=========================================+
| ``$SLURM_ARRAY_TASK_ID`` | One of the following: ``1,2,3,4,5,13``. |
+-----------------------------+-----------------------------------------+
| ``$SLURM_ARRAY_TASK_COUNT`` | ``6`` (number of jobs in the array) |
+-----------------------------+-----------------------------------------+
| ``$SLURM_ARRAY_TASK_MAX`` | ``13`` (max of given range) |
+-----------------------------+-----------------------------------------+
| ``$SLURM_ARRAY_TASK_MIN`` | ``1`` (min of given range) |
+-----------------------------+-----------------------------------------+
In other words, each of these 6 different jobs will have a variable ``$SLURM_ARRAY_TASK_ID`` containing one (and only one) of the numbers given to :data:`--array`. This variable can then be used to generate one or more parameters of the simulation, in a way that's completely up to you.
.. note:: Array ranges can additionally be specified with a step. For example, to generate multiples of 3 up to 21, you can use :data:`--array=0-21:3`.
.. note:: You can also specify a maximum number of jobs in that array that are allowed to run at the same time. For example, :data:`--array=1-20%4` generates 20 jobs but only 4 of them are allowed to run at the same time.
Serial Job Array
^^^^^^^^^^^^^^^^
.. code-block:: bash
:caption: Serial job array asking for 2 threads (bounded to 1 physical core), 990 MB of memory and 6 hours for each of the 32 jobs, on ``regular2``. The output and error filenames are in TORQUE style.
:linenos:
#!/usr/bin/env bash
#
#SBATCH --job-name=Array_Job
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jdoe@sissa.it
#
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --ntasks-per-core=1
#
#SBATCH --mem-per-cpu=990mb
#
#SBATCH --array=1-32
#SBATCH --partition=regular2
#SBATCH --time=06:00:00
#SBATCH --output=%x.o%A-%a
#SBATCH --error=%x.e%A-%a
#
## YOUR CODE GOES HERE (load the modules and do the calculations)
## Sample code:
# Make sure it's the same module used at compile time
module load intel
# Calculate the parameter of the calculation based on the array index,
# e.g. in this case as 5 times the array index
PARAM=$((${SLURM_ARRAY_TASK_ID}*5))
# Run calculation
./my_program.x $PARAM
.. note:: This workload is based on the specifics of the regular2 nodes. With these numbers you should be able to occupy even just a single node, if it's available; but hey, nonetheless you are running 32 calculations at the same time! 😄
Dependencies
------------
Expand Down
50 changes: 50 additions & 0 deletions doc/source/extra-tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,56 @@ If you want to **totally disable** Hyper-Threading, you can use
.. code-block:: console
$ sbatch --hint=nomultithread --cpu-bind=cores send_job.sh
Automatic Login
---------------

If you're on a **trusted computer**, you can avoid entering your password every time you login in Ulysses.

First, generate an SSH keypair via:

.. code-block:: console
$ ssh-keygen
Then, upload your credentials to Ulysses:

.. code-block:: console
$ ssh-copy-id username@frontend2.hpc.sissa.it
You'll be asked for your password for the last time. 🙃

You can further shorten the login procedure by opening (or creating) the file ``~/.ssh/config`` and adding the following lines (replace ``username`` with your SISSA username and ``sissacluster2`` with the name you prefer):

.. code-block:: console
Host sissacluster2
User username
HostName frontend2.hpc.sissa.it
IdentityFile ~/.ssh/id_rsa
ServerAliveInterval 120
ServerAliveCountMax 60
Then, in order to login, you will just need

.. code-block:: console
$ ssh sissacluster2
An even shorter way to login is then to open or create the file ``~/.bash_profile`` and, at the end, add the following line (replace ``cluster2`` with some name you like):

.. code-block:: bash
alias cluster2='ssh sissacluster2'
At this point, logging in to Ulysses becomes a matter of executing the command

.. code-block:: console
$ cluster2
in a terminal (you might need to close and reopen the terminal, first).

Explore Files in a User-Friendly Way
------------------------------------
Expand Down

0 comments on commit 943bcbe

Please sign in to comment.