# How to Queue your jobs on a shared compute system: SLURM

While our HPC systems are large they support a large number of users.
This means you cannot immediately access the hardware to run codes.
All jobs must be submitted to a queue that prioritizes jobs on a number of factors, your priority, the amount of resource requested, how to maximize facility utilization.
In essence, a short single node job from a user with high priority will run fast as it will fit in around other jobs, in contrast a long multi-node job from a user will low priority will be at the back of the queue.

This system is hugely preferable to every person having their own system, a large system with no queue, or departmental servers. 
It give all users the potential to run much larger jobs, it is economically very effective as we can ensure high utilization, and it is more environmentally friendly.
To address the 'no queue' idea, if all users could launch jobs to start immediately then it is likely a few users would have jobs constantly running, your jobs run time would be dependent on the number of other jobs running, and overall the extra workload of managing the context switching between jobs would make the total time taken to finish all jobs longer.

## SLURM concepts (HPC)


Creation of SLURM submission scripts requires some understanding of the terminology used to refer to elements of the system hardware. It also requires some understanding of how a computational workload is constructed in terms of processes and often threads.

* **Job**: When a user submits a job script to SLURM this creates a job, which is given a unique ID and placed into a queue until the requested resources are available. 

* **Node**: SLURM refers to a single server within the cluster as a *node*.

* **Partition**: A group of nodes to which job can be submitted, sometimes referred to as a queue for legacy reasons.  The default partition in SCRTP HPC clusters is called `compute` and is available to all users. Depending on the system there may also be `gpu` (GPU accelerators) and `hmem` (high memory) partitions available.

* **Socket**: SLURM refers to the node's compute processors (which contain multiple processing cores) by the *socket* they are plugged into. For example, in Avon the `compute` nodes each contain two Intel Xeon processors, one per socket.

* **CPU**: SLURM refers to each processor core as a *CPU*. Again, using Avon as an example, each Xeon processor in the `compute` partition contains `24` processor cores, and hence there are `48` CPUs total per node.

* **Task**: A task represents an instance of a running program/executable. Many HPC jobs will launch multiple tasks collectively performing a single calculation using [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) or other mechanisms.

In turn each task might make use of multiple CPUs via threading e.g. [OpenMP](https://www.openmp.org/) or by spawning child processes e.g [Python multiprocessing](https://docs.python.org/3/library/multiprocessing.html).

The maximum number of CPUs that can be used by any one task (without oversubscribing the node) is the number of CPUs in a node.

Jobs that launch multiple tasks may make use of resources across multiple nodes (servers) simultaneously, with tasks communicating over the cluster network to implement a single calculation. The high speed, low latency networking in the HPC clusters make this viable, in contrast to the `taskfarm` where such calculations are discouraged and unlikely to be performant. 


## Writing SLURM Scripts

In the previous section we used a SLURM script to launch the a job to run the test on the environment created. 

As part of the course prep you should have bought a piece of software that you want to run on the HPC. Preferably something that some form of parallelization or GPUs.

If you do not have this we provide a few scripts that you can launch from an `sbatch` launch that simply check your submission was correct then terminate you can use these to practice.

For this session we encourage you to look at [the documentation](https://docs.scrtp.warwick.ac.uk/hpc-pages/hpc-jobscript.html)

### Worked Examples

TODO