## Slurm Tutorial

- [Getting cluster information using _sinfo_](#Getting-cluster-information-using-sinfo)
- [Submitting Jobs using _srun_ and _salloc_](#Submitting-Jobs-using-srun-and-salloc)
- [Monitoring Jobs using _squeue_](#Monitoring-Jobs-using-squeue)
- [Cancelling Jobs using _scancel_](#Cancelling-Jobs-using-scancel)
 
More information on slurm is [here](https://slurm.schedmd.com/documentation.html)

## Getting cluster information using sinfo

## sinfo help

In [1]:
!sinfo --help

Usage: sinfo [OPTIONS]
  -a, --all                  show all partitions (including hidden and those
			     not accessible)
  -b, --bg                   show bgblocks (on Blue Gene systems)
  -d, --dead                 show only non-responding nodes
  -e, --exact                group nodes only on exact match of configuration
      --federation           Report federated information if a member of one
  -h, --noheader             no headers on output
  --hide                     do not show hidden or non-accessible partitions
  -i, --iterate=seconds      specify an iteration period
      --local                show only local cluster in a federation.
                             Overrides --federation.
  -l, --long                 long output - displays more information
  -M, --clusters=names       clusters to issue commands to. Implies --local.
                             NOTE: SlurmDBD must be up.
  -n, --nodes=NODES          report on specific node(s)
  --noconvert                d

In [2]:
!sinfo -a

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      2   idle testnewvmzbl[000001,000003]


## Get Sockets, Memory, Cores information on each compute node

![mc_support](images/mc_support.gif)

[Reference](https://slurm.schedmd.com/mc_support.html)

In [3]:
!sinfo -o %X

SOCKETS
1


In [4]:
!sinfo -o %Y

CORES
2


In [5]:
!sinfo -o %Z

THREADS
1


In [6]:
!sinfo -o %e

FREE_MEM
6891-6893


In [46]:
!sinfo -o %all > sinfo_all.txt

In [45]:
import pandas as pd
%matplotlib inline

In [48]:
df = pd.read_table("sinfo_all.txt",delimiter="|")
df

Unnamed: 0,AVAIL,ACTIVE_FEATURES,CPUS,TMP_DISK,FREE_MEM,AVAIL_FEATURES,GROUPS,OVERSUBSCRIBE,TIMELIMIT,MEMORY,...,CPU_LOAD,PARTITION,PARTITION .1,ALLOCNODES,STATE,USER,CLUSTER,SOCKETS,CORES,THREADS
0,up,(null),4,49569,15082,(null),all,NO,infinite,16046,...,0.0,compute*,compute,all,idle,root(0),,4,1,1
1,up,(null),4,49569,15081,(null),all,NO,infinite,16046,...,0.0,compute*,compute,all,idle,root(0),,4,1,1
2,up,(null),4,49569,15081,(null),all,NO,infinite,16046,...,0.0,compute*,compute,all,idle,root(0),,4,1,1
3,up,(null),4,49569,15082,(null),all,NO,infinite,16046,...,0.0,compute*,compute,all,idle,root(0),,4,1,1


## Submitting Jobs using srun and salloc

![submit_jobs](images/submit_jobs.png)

In [51]:
!srun -N 2 -n 2 hostname

ip-172-31-37-10
ip-172-31-46-20


In [60]:
!salloc -N 2 -n 2 hostname

salloc: Granted job allocation 9
ip-172-31-33-28
salloc: Relinquishing job allocation 9


## Monitoring Jobs using squeue

In [50]:
!squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)


## Cancelling Jobs using scancel

In [None]:
!scancel --help