Skip to content

Hyak: Executing job via sbatch

seanb80 edited this page Jun 14, 2017 · 3 revisions

sbatch is the main execution command for the slurm scheduler. It spools up an execute node for long term or compute intensive tasks such as assemblies, blasts, or or other things of that nature.

sbatch can be run from a login node with the command sbatch -p srlab -A srlab shell.script

sbatch requires a shell script to function, with two main parts, the header and the execute portion.

The Header

#!/bin/bash
## Job Name
#SBATCH --job-name=myjob
## Allocation Definition 
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1   
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=dd-hh:mm:ss
## Memory per node
#SBATCH --mem=XXXG
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/xxx/yyy

Bolded sections must be changed prior to execution.

  • job-name=myjob is just an identifier for the system. It's what shows up in scontrol and squeue calls.

  • --time=dd-hh:mm:ss is the "wall" time, or how long we are reserving the node for our use. This argument requires some consideration and knowledge about the program you're running prior to execution. Selecting too little wall time will cause the scheduler to kill your process mid-run when time runs out, and it is nearly impossible to add additional time mid-execute. Selecting too much time limits other's ability to use Hyak functions, but the scheduler releases a node upon program completion usually, so this is a secondary consideration.

  • --mem=XXXG specifies how much memory to allocate to the process. This helps slurm decide which node to give you, as they have different nodes with different memory amounts. We are allowed up to 512GB of ram, but to be neighborly, it's good to select a number closer to your actual requirement.

  • --workdir=/gscratch/srlab/xxx/yyy indicates the working directory where output will be written, and how things will be referenced inside of the home node. Ideally this will be on the /gscratch/srlab/ drive, but there's no requirement for this.

The Execute portion

This section contains the commands you want executed. You can treat it like the command line in that it executes commands sequentially as input. These can include program calls, module loading, making directories, etc.

Full Example script

[seanb80@n2193 Oly_Plat_Illu2]$ cat Plat_Illu_Run2.sh 
#!/bin/bash
## Job Name
#SBATCH --job-name=Oly_Platanus_Illu
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (720 hours)
#SBATCH --time=720:00:00
## Memory per node
#SBATCH --mem=500G
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/srlab/data/Oly_Plat_Illu2/

module load anaconda2_4.3.1

mkdir -p /scr/srlab/seanb80/plat_illu_tmp

/gscratch/srlab/programs/platanus_1.2.4/platanus assemble -f /gscratch/srlab/data/OlyData/Illumina/trimmed/*.fq.fq -t 28 -k 20 -u 0.2 -o Oly_Out_ -m 500

/gscratch/srlab/programs/redundans/redundans.py -t 28 -v -l /gscratch/srlab/data/OlyData/PacBio/170210_PCB-CC_MS_EEE_20kb_P6v2_D01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170228_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170228_PCB-CC_AL_20kb_P6v2_D01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170228_PCB-CC_AL_20kb_P6v2_E01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170307_PCB-CC_AL_20kb_P6v2_C01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170307_PCB-CC_AL_20kb_P6v2_C02_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170314_PCB-CC_20kb_P6v2_A01_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170314_PCB-CC_20kb_P6v2_A02_1_filtered_subreads.fastq  /gscratch/srlab/data/OlyData/PacBio/170314_PCB-CC_20kb_P6v2_A03_1_filtered_subreads.fastq /gscratch/srlab/data/OlyData/PacBio/170314_PCB-CC_20kb_P6v2_A04_1_filtered_subreads.fastq -f /gscratch/srlab/data/Oly_Plat_Illu/Oly_Out__contig.fa -o /gscratch/srlab/data/Oly_Redundans_Run2