*Press `Space` to procced.*

Before we begin, let's take a second to familiarize you with this environment. When you're ready, press `Space`.


Pressing `Space` will always move you one frame *forward*.

*(See what I mean?)*


`Shift`+`Space` will move you one frame *backward* (give it a try.)


Pressing `Enter` while a code-cell is highlighted puts the cell into *edit-mode.* In this mode you can modify the contents, allowing you to do simple tasks.

In [None]:
# TASK: put something inside the quotes.
message=""

# use SHIFT+ENTER to exit and
#   don't forget to press SPACE.

Pressing `Shift`+`Enter` while editing or highlighting a code-cell causes the contents of the cell to be executed and the output displayed.

In [None]:
# TASK: execute this cell (also don't forget about SPACE.)
echo "IMPORTANT: ${message}!"

Excellent! Let's get to the real thing!


# SLURM: Jobs & Batch Scripts

## What Is A Job?

Fundamentally, a **job**, consists of two things: an allotment of resources & snippets of code to execute using that allotment. 

*This means that when we request that SLURM preform some task (say, executing a snippet of code) we are also requesting a resource allotment which will be reserved for the execution of that task.*

## How Do We Request Resources & Perform Tasks?

There are a couple of ways we can request a resource allotment and perform tasks.  

We might use the command `salloc` to request an allotment (by providing arguments on the command-line) and then interactively run code using the the resources once they become available. 

Alternatively, if we want for the task to be performed without needing us interact with it, we can use the command `sbatch`.  

Moreover, learning the basics of the `sbatch` command is the purpose of this tutorial.

### The Command `sbatch`


When provided the path of a shell-script, the `sbatch`
command searches the script for lines beginning with `#SBATCH`.  Any line starting with `#SBATCH` occuring during the preamble (*before any commands*) is treated as meta-data specifying some parameters for a job we want to create.

If these parameters describe a *feasible* job, that job is created and queued until an appropriate resource allotment (matching the provided parameters) can be made. 

Once an appropriate allotement becomes available, the provided shell-script is executed using the alloted resources.

---
Any shell-script containing meta-data that `sbatch` can understand *(say, describing a resource request or providing other ambient details)* is called a **batch-script** (or job-script).

---

### **Example 1**

#### Script: `example-one.sh`

In [None]:
%%file example-one.sh
#!/bin/bash

### Job Parameters:  
#SBATCH --job-name "My Letter"     # job's name
#SBATCH --output   "letter.%j.log" # output log (%j is replaced with job's id)
#SBATCH --comment  "A humble job." # comment about job

### Script To Execute:
amount=$((15 + RANDOM % 30)) # number between 15 and 30
memory=$((SLURM_MEM_PER_CPU * SLURM_CPUS_PER_TASK * 1024 * 1024))
draft=(" Dear ${USER},\n\n"
       "  I hope this letter finds you well; it is a glorious $(date +%A)\n"
       "in $(date +%B) and I find myself executing on "$(hostname -s)" with\n"
       "access to ${memory} bytes of memory and $(nproc) cpu(s). Knowing that\n"
       "these resources are shared, I have chosen to surrender ${amount}\n"
       "seconds of my alloted time to processes less fortunate than I.\n\n"
       "Cheers,\n"
       "  Job ${SLURM_JOB_ID}\n")

# write letter & take a nap.
echo -e "${draft[@]}" && sleep ${amount} 

Use `Shift`+`Enter` to create the script `example-one.sh`.

#### Submiting The Script & Getting Results

To submit the job to SLURM, we use the `sbatch` command,

In [None]:
# execute this cell using SHIFT+ENTER
sbatch example-one.sh 

Once a job has been successfully submitted, it will appear in the SLURM queue until it completes. To view jobs currently in the queue, we use the command `squeue`

In [None]:
# execute this cell using SHIFT+ENTER
squeue

Let's finish this example by taking a look at the output from the job.

In [None]:
# TASK:
#    Use the the "Job Parameters" section of `example-one.sh` and 
#  common sense to determine the filename of the output log. Put this
#  filename in the quotes below and run the cell.

cat ""

(pressing `SHIFT`+`SPACE` allows you to move in reverse, and `SPACE` lets you move forward.)

### **Example 2**

#### Script: `example-two.sh`

In [None]:
%%file "example-two.sh"
#!/bin/bash
### Job Parameters:
# Generic job info
#SBATCH --job-name  "Arguments" # display name
#SBATCH --output    "out.log"   # where to log terminal output 
#SBATCH --error     "err.log"   #  .. and error messages
#SBATCH --open-mode truncate    # always overwrite log files 

# Resources to request
#SBATCH --ntasks 1          # number of tasks we can perform (in parallel)
#SBATCH --cpus-per-task 1   # num. cpus each task will require
#SBATCH --mem-per-cpu 1024  # memory required per cpu (in megabytes)
#SBATCH --time 1:00:00      # time the job should be allowed to run (HH:MM:SS) 

### Script To Execute:

# loops through provided arguments.
for item in "$@"; do
    ## prints the length of the argument (as a string)
    echo "The argument '${item}' is ${#item} characters long."
    
    ## and then does something random...
    amount=$((RANDOM % 10)) # choose `amount` to sleep
    sleep 0.${amount}       # sleep for `amount/10` seconds
    echo "Slept for 0.${amount} seconds."
done

Use `Shift`+`Enter` to create the script `example-two.sh`.

#### Submiting The Script & Getting Results

The cool thing about job-scripts is they are still shell-scripts. This means we can execute them without needing to go through SLURM, so the normal testing & development workflow is available.

In [None]:
# let's test the script (execute this cell using SHIFT+ENTER)
bash example-two.sh "first" "quoted" third{0..2} "another argmuent" 

If you want to experiment a bit, you can modify the script `example-two.sh` by navigating to the relevant code-cell, modifying it's content, and saving the changes by running the cell.

(pressing `SHIFT`+`SPACE` allows you to move in reverse and `SPACE` lets you move forward.)

Once satisfied with the script, let's submit the job to SLURM *(like in the previous example)* and provide a sequence of command-line arguments which we want passed to the script when it's executed.

In [None]:
# execute this cell using SHIFT+ENTER
sbatch example-two.sh "lion" "tiger" "bear" argument{1..100} 

With the job submitted, let's inspect the queue to verify it's listed. However, instead of looking at all jobs in the queue, we'll restrict to only jobs associated with our user.  To do this we need to use the `--user` argument for `squeue`.

In [None]:
# execute this cell using SHIFT+ENTER
squeue --user $USER

*How did we know about the `--user` argument?*

We used `man`. The system command `man` is an important resource when using UNIX/Linux based systems.  The command provides detailed documentation (or *manuals*) for various topics, among those topics you'll find "*usage and runtime behavior*" information for a large portion of commands available at a terminal.

In the case of `squeue`, we can learn a lot about what sort of arguments it accepts by skimming the documentation presented by running `man squeue`. For example, a passage detailing how to use the argument `--user` is contained within.

In [None]:
# if you want to see an example manual press SHIFT+ENTER, 
man squeue

To finish off this example, let's take a look at the output generated by the job so far.

In [None]:
# execute this cell using SHIFT+ENTER
cat out.log

### **Example 3**

#### Script: `example-three.sh`

In [None]:
%%file "example-three.sh"
#!/bin/bash
### Job Parameters:
#SBATCH --job-name "Recursive"     # job name
#SBATCH --output   "recursive.log" # place to log output
#SBATCH --open-mode append         # always append to logs
#SBATCH --begin now+10             # specify when the job should start
                                   # .. (waits 10 seconds after submission)
### Script To Execute:
# print job detials
echo "Running job $SLURM_JOB_ID"
echo "Batch script file: $0"

# submit new job using *this* script (recursion)
sbatch "$0"

Use `Shift`+`Enter` to create the script `example-three.sh`.

#### Submiting The Script & Getting Results

We'll submit the job to SLURM as normal, however this time we will request that it be considered for execution immediately.

In [None]:
# execute this cell using SHIFT+ENTER
sbatch --begin now example-three.sh

As it turns out, any piece of information which can be provided in a batch-script's `#SBATCH` lines, can be directly passed to `sbatch` as a command-line argument (and vice versa). 

As an example, let's submit a job using the same script, but have the initial job request 1G of memory per cpu, and not start running until after tomorrow at mid-night.

In [None]:
# execute this cell using SHIFT+ENTER
sbatch --mem-per-cpu 1024 --begin tomorrow example-three.sh 

Let's check the status of the jobs we've submitted. This time, we'll only request information about jobs with the name "Recursive" associated with our user.

In [None]:
# execute this cell using SHIFT+ENTER
squeue --user $USER --name "Recursive" 

Since these jobs are recursive, they will keep resubmitting themselves *forever*. 

As a result, it's important for us to stop (or *cancel*) any instances which appear in the queue.  To accomplish this, we'll use the command `scancel`. 

This command can handle some of the same arguments as `squeue`. In particular, to cancel all jobs with the name "Recursive" associated with our user, we just run,

In [None]:
# execute this cell using SHIFT+ENTER
scancel --user $USER --name "Recursive"

Finally, if you want historical information about the jobs you've submitted, you use the `sacct` command. Moreover, it has argments which are similar to those of `squeue`. 

For example, to see a list of jobs (past and present) which have the name "Recursive" and are attached with your user, we can use the following,

In [None]:
# execute this cell using SHIFT+ENTER
sacct --user $USER --name "Recursive"

# That's all I got. GG.

### **Example 4**


#### Stage 1: Job Script: `example-four-stage1.sh`

In [None]:
%%file "example-four-stage1.sh"
#!/bin/bash
## Generic Info
#SBATCH --job-name "Stage 1: Example 4"  # job name
#SBATCH --output   "logs/stage1.%j.log"  # output file pattern

# create virtual environment
python3 -m venv "env"

# activate environment
source "env/bin/activate"

# install packages into environment
python3 -m pip install numpy

# With python environment setup,
#    ...  we request "stage 2" be scheduled
sbatch "example-four-stage2.sh"

#### Stage 2: Job Script: `example-four-stage2.sh`

In [None]:
%%file "example-four-stage2.sh"
#!/bin/bash
## Request Job-Array
#SBATCH --array 1-20%10 # the array has 100 sub-jobs (labeled 1 to 100) 
                        # .. with at most 10 sub-jobs running at 
                        # .. any given point.

## Generic Info for "sub-job"
#SBATCH --job-name "Stage 2: Example 4"    # sub-job name
#SBATCH --output   "logs/stage2.%A.%a.log" # output file pattern
                                           # .. %A -> array-job-id
                                           # .. %a -> array-task-id 

## Resources To Request For Each "sub-job"
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem-per-cpu 500
#SBATCH --time 00:45:00  

## Commands Executed By Each "sub-job"
source "env/bin/activate"         # activate environment from "stage 1"
python3 "scripts/example-four.py" # execute python script

#### Stage 2: Python Script: `example-four.py`

```python
"""
 File: `example-four.py`
 Synopsis:
    estimate condition number and print results for 
  a single random perturbation of the nxn identity matrix 
  (where n is fixed).
"""
from numpy import eye
from numpy.random import uniform
from numpy.linalg import cond

n = 1000
eps = 0.01

I = eye(n) # nxn identity matix
X = uniform(size=[n,n])

# generate perturbation `Z`
Z = I + eps*uniform()*X 

# estimate condition number of `Z`
c = cond(Z)

# print results
print(f'{c}', flush=True)
```

#### Submiting & Results

To see what's happening we'll need to switch to a terminal and use the following script.

In [None]:
%%file "watch-queue.sh"
#!/bin/bash

# submit seed job (stage1)
sbatch "example-four-stage1.sh"

# watch queue
ii=0 
while [ $ii -lt 1000 ]; do
    squeue --user $USER --name "Stage 1: Example 4","Stage 2: Example 4"
    sleep 0.7 && clear
    ii=$((ii+1))
done

----

*NOTE TO SELF*: This notebook requires that the "Calysto Bash" kernel and RISE notebook extension be installed. You can install both by opening and terminal a executing 
```bash
pip install --user calysto_bash RISE
jupyter nbextension install --user --py rise
jupyter nbextension enable --user --py rise

```
Once installed, refresh this page and select "Kernel > Change kernel > Calysto Bash" from the menu-bar.
