# HPC intro

## Jobs on an HPC system

When you log in via the terminal to an HPC system, you typically end up on a login node, i.e., a Linux system that gives you access to the HPC infrastructure.  Although you can run data processing scripts that take little time on a login node, this infrastructure is not intended for substantial computations.  Those are performed on compute nodes of the actual compute cluster.

To run an application on one or more compute nodes, you have two options:
  * a batch job,
  * an interactive job.

The vast majority of jobs are batch jobs, i.e., they run without user intervention.  These jobs are specified as a Bash script that contains some extra information on how you want to run your job.

This job script is submitted to a scheduler.  A scheduler is a software system that
  * knows the hardware characteristics (number of cores, memory, GPUs) of each node in the cluster;
  * the requirements of the jobs that have been submitted (requested number of nodes, cores, memory and run time)
  * the jobs that are currently running on the cluster.

Using this information, the cluster can efficiently schedule jobs on the cluster and make sure you cmoputations are executed.

In this tutorial, you will learn how to write a job script, how to submit it to the scheduler, how to monitor and manage your jobs.

## Job scripts

A job script describes the work you want to do, and the resources that are required to do that.  Here you will learn about the typical anatomy of such a script.

### Shebang

  A job script is a Bash script, so like any such script, its first line is a shebang.

```bash
#!/usr/bin/env bash
```

This line tells the scheduler that this is a job script that should be executing using Bash.

### Scheduler directives

In order for the scheduler to handle your job correctly, you will have to provide some information as scheduler directives.

#### Credit account

If your HPC center uses a credit system, you have to specify a credit account you can access.  Say the name of that account is `lp_multiscale_physics`, you would specify that as follows.

```bash
#SBATCH --account=lp_multiscale_physics
```

#### Nodes and tasks

Next, you specify the resources you need using scheduler directives, i.e., lines that start with `#SBATCH`.  For example, you want your job to run on a single node, there is only a single task to do, and that task requires a single core.

```bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
```

#### Memory requirements

You may also want to specify the memory usage of your application. Again, do not underestimate, but also keep in mind the hardware characteristics of the compute nodes.  If you request more memory than a node has, your job will not run.  For instance, if you are sure your job can run with 3 GB of RAM, you can specify that as follows.

```bash
#SBATCH --mem=3g
```

#### Walltime

You would also have to specify the maximum time your script should take to run.  You have to realize that the execution of your script will be terminated once that time has elapsed, so do not underestimate it.  Time is specified as "HHH:MM:SS", for example, if your computation would take at most 2 hours, you would specify that as follows.

```bash
#SBATCH --time=02:00:00
```

### Commands

Up to this point, you have only specified what the scheduler should know to handle your job, but not yet what is supposed bo be computed.  Since this is a very general introduction to the job system, you will first experiment with something really simple, and after that, you can move on to domain-specific computations.

You want to compute the product of pair of integers in the range 1 to 10.  In Bash, that can be done using the following code.

```bash
for i in $(seq 1 10)do        for j in $(seq 1 10)
     o         e o $(chi$$*$j $
    don  neo```
`

Clearly, this is not exactly an HPC-caliber computation, but again, this is just a generic example.e

### Putting it all together

You can put all of this together in a single file `jobscript.slurm`.

```bash
#!/usr/bin/env bash
#SBATCH --accoun= lp_multiscale_physics
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:02:--

for i in $(seq 1 10)
do
    for j in $(seq 1 10)
    do
        echo $(( $i * $j ))
    done
done
```

Note that
  * you have to replace `lp_multiscale_physics` by an account you have access to;
  * the memory has been adjusted to 1 GB;
  * the walltime has been adjusted to 2 minutes.

## Job submission

Your job script is now ready to be submitted to the scheduler.  You can do that using `sbatch`.  You should specify the cluster you want the job to run on, `wice` in this case.

In [None]:
sbatch  --cluster=wice  jobscript.slurm

If all goes well, `sbatch` will write the job ID to standard output.  Job IDs are unique, and you can use them to monitor the status of your job, to cancel a job if necessary, or to retrieve information about it while or after it finishes.

In this case, all the information required by slurm, the scheduler, is present in the job script as slurm directives (`#SBATCH`).  However, if that is not the case, or if you want different values, you can add those as command line options when you invoke `sbatch`.  For instance, to give your job a name, say "helle from R", you can use the `--job-name` option.

In [None]:
sbatch  --cluster=wice  --job-name 'hello from R'  jobscript.slurm

## Job status

Of course, you would like to keep an eye on your job(s).  You want to know whether they are still waiting to start, are running, or are completed.  The command to get this information is `squeue`.  Note that you have to specify the cluster you want to check.

In [None]:
squeue  --cluster=wice

You will get a list of jobs that are
  * queued (status `Q`): these jobs are not yet running, the scheduler will start them when resources are available;
  * running (status `R`): these jobs are executing.

When you have submitted a job, and you don't see it anymore, it completed.

## Summary

## Where to go from here?