# Introduction

Deneb is a cluster among others.

See the documentation https://scitas-data.epfl.ch/kb.

# Shared storage

## Cluster

### /scratch
* high performance **temporary space**
* not backed up
* local to each cluster
* disposable files: intermediate results, temporary files
* automatical deletion of files older than 2 weeks (or when high occupancy)

## Global

### /home

* 100Gb per user
* backed up to a remote site
* available on all clusters
* for important files: source code, final results

### /work

* per group quotas
* 50Gb for free
* ...

# Commands

Connect

    ssh username@deneb1.epf.ch
    
Basic commands
    
    id
    pwd
    ls /scratch/<username>
    cd /scratch/<username>

## Batch vs Interactive

* Interactive : open software and work
* Bash: script the work to be done, put it in queue. 
* Schedulers: decide when and where to run the job (depending on the requested resources and priority)

### sbatch

`sbatch` is the fundamental command to submit jobs. Workflow:

1. create a job-script
1. submit it to the batch system
1. *gets executed at some point*
1. look at the output

### Interactive access

* `salloc`: standard tool for interactive allocation for multi-node jovs
* `Sinteract`: custom tool to access a node

Behind the scenes, both use the same machanism as `sbatch` to get access to resources.

## Cancelling jobs

* Specific job: `scancel <JOB_ID>`
* All jobs: `scancel -u <username>`
* All jobs that are not yet running: `scancel -u <username> -t PENDING`



## List queues

`squeue` without arguments lists all jobs currently in the queue. `Squeue` is a custom squeue showing only your jobs with useful information 


## S tools

* `Sinteract`: custom tool to access a node
* `Sshare`: show fairshare information
* `Squeue`: show your pending and running jobs
* `Sjob`: show information about a job


# Partitions

* debug: `--partition=debug` or `Sinteract -p debug`
* build: `Sinteract -p build`

# SLURM directives

`#SBATCH--something` is how to tell SLURM the required resources. 

* Number of nodes per job: `--nodes XX`. Default is 1.
* Number of MPI tasks per job: `--ntasks XX`. Default is 1.
* Number of CPUs per task (multithreaded apps): `--cpu-per-task XX`. Default is 1. Can't be greater than the number of cores/cpus in a compute node. 
* Memory per node: `--mem 120G`, `--mem 4096M`. Default is 4096MB per CPU.
* How long will the job run: `--time 2-23`, `--time 06:00:00`. Default is 15 min.
* Partition in which to send the job: `--partition debug`, `--partition serial`. Default is parallel on Fidis, depends for Deneb. 

# Modules

Packages are organized hierarchically: Compiler / MPI / blas.

    module av(ailable)
    module load / unload <module-name>
    module spider <name>
    module purge
    
Python: 

    module load intel
    module load python
    module list
    module load python/2.7.14
    module load gcc
    module list

# Jupyter

The key is to install the tools using python virtual environments. 

    module load gcc
    module load python
    
Create virtual environment:

    virtualenv --system-site-packages opt/$SYS_TYPE/venv-gcc
    
Activate it :

    source opt/$SYS_TYPE/venv-gcc/bin/activate

Install jupyter and ipyparallel

    pip install jupyter ipyparallel
    
Running IPython

    Sinteract
    source opt/$SYS_TYPE/venv-gcc/bin/activate
    ipython

Running jupyter: more complicated, we need to launch the server on the login node but make computations on a compute node. That's why we installed ipyparallel. After loading the virtual environment, we use ipcluster to start worker engines on the nodes:

    # First find the name of the account
    sacctmgr -Pn show assoc where user=$USER format=account
    
    # Then run 
    ipcluster start --init --profile=default --ip="*" -n=<ntasks> --engines=Slurm --SlurmEngineSetLauncher.timelimit=<timelimit> --SlurmEngineSetLauncher.queue=<partition> --SlurmEngineSetLauncher.account=<account> &

Run jupyter notebook on the login node:

    jupyter notebook --ip="$(hostname -s).epfl.ch"



# Exercises

Copy the examples into the working directory : 

    cp -r /scratch/examples/using-the-clusters
    
## Exercise 1

Open `ex1.sh` with an editor. 
```
#!/bin/bash
#SBATCH --workdir /scratch/<put-your-username-here>
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 1G
#SBATCH --account <put-your-account-here>
#SBATCH --reservation using
sleep 10
echo "hello from $(hostname)"
sleep 10
```

Submit the job: `sbatch ex1.sh` and **remember the job ID** displayed. See the output:

    cat /scratch/<username>/slurm-ID_XXX.out

## Exercise 2

```
#!/bin/bash
#SBATCH --workdir /scratch/<username>
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 28
#SBATCH --mem 120G
#SBATCH --time 00:30:00
#SBATCH --account <your account>
#SBATCH --reservation intro2clusters
cd /scratch/examples/linpack/
./runme_xeon64
```


## Exercise 3 
Use module files

```
#!/bin/bash
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 4
#SBATCH --nodes 1
#SBATCH --mem 16G
#SBATCH --time 00:15:00
#SBATCH --account <your account>
#SBATCH --reservation using
echo STARTING AT $(date)
module purge
module load matlab
matlab -nodesktop -nojvm -r mymfile
echo FINISHED AT $(date)
```