# Job arrays and dependencies

In this section, we're going to take a look at how to submit a script instead of a command. We'll then use a script to help us submit a large number of identical task as a single job with a single JOBID. Finally, we'll take a look at how to submit a job that is dependent on another job.  

***

## Writing and submitting scripts

Earlier on, we were submitting a single command as a job.

In [None]:
bsub "sleep 60"

But, what if want wanted to submit several commands in a single job? For this, we can use a shell script. We can write a series of commands in a file (our script) which will read in a DNA sequence and substitutes the letters/bases to return the complimentary sequence (A -> T, T -> A, C -> G and G -> C). 

Here, we've written the into a file called `myscript.sh`.  We can use `cat` to print the contents of our script `myscript.sh` to the terminal.

In [7]:
cat myscript.sh

#!/bin/bash

input=$(<data/sequence.txt)
echo "Input sequence: $input"

output=$(echo $input | tr 'ATGC' 'TACG')
echo "Complementary sequence: $output"


Notice that at the top we've added the line "#!/bin/bash" which tells the system the script should always be run with bash, rather than another shell.

There's one thing that we need to do before submitting our script. Before we can submit our script, we need to make sure the system recognises our file as a script. We can do this using `chmod` to make our script executable.

In [None]:
chmod u+x myscript.sh

To test whether this works we can run the following command. If it's successful, you should see the numbers from 1 to 10 returned by the script.

In [8]:
./myscript.sh

Input sequence: AAAGGTTC
Complementary sequence: TTTCCAAG


To submit this script as a job, we use `bsub`.

In [None]:
bsub -o myscript.o -e myscript.e "./myscript.sh"

You can check the status of your job using `bjobs`. Once the job has finished, take a look at the output file which was generated. At the top you should see the numbers from 1 to 10 which means that your script executed as we'd expect.

In [None]:
cat myscript.o

    Input sequence: AAAGGTTC
    Complementary sequence: TTTCCAAG

    ------------------------------------------------------------
    Sender: LSF System <lsfadmin@pcs5e>
    Subject: Job 4019973: <./myscript.sh> in cluster <pcs5> Done

    ...

***

## Submitting an array of jobs

Sometimes, you may want to run the same commands across lots of files. This is common in pipelines where your jobs will be identical except for the input data they are working on.

For example, let's say we want to find complementary sequences from in 5 different sequence files. We've named these sequence files so that a number from 1 to 5 is the suffix(sequence.txt.[1-5]).

In [12]:
ls data/*

data/sequence.txt	data/sequence.txt.2	data/sequence.txt.4
data/sequence.txt.1	data/sequence.txt.3	data/sequence.txt.5


We've given an updated script which uses an environment variable, **$LSB_JOBINDEX**, to tell the script which file to use as input. You can take a look at contents of the script using `cat`.

In [None]:
cat myarrayscript.sh

Don't forget to update the permissions on the script so that we can use it.

In [None]:
chmod u+x myarrayscript.sh

The submission command below uses two special variables, **%J** and **%I**, in the output file names which represent the JOBID and the JOBINDEX. The JOBID will be the same for all of the jobs in this array and the JOBINDEX, which identifies which job was run from the array, will correspond to the number at the end of our filenames.

In [None]:
bsub -J"sequence[1-5]" -o sequence.out.%J-%I ./myarrayscript.sh 

When you check on the progress using `bjobs` you will see five jobs running. Here is an example output from `bjobs`.

    JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
    4019986 userA   RUN   normal     pcs5a       pcs5e       *quence[1] Jan 15 13:46
    4019986 userA   RUN   normal     pcs5a       pcs5e       *quence[2] Jan 15 13:46
    4019986 userA   RUN   normal     pcs5a       pcs5e       *quence[3] Jan 15 13:46
    4019986 userA   RUN   normal     pcs5a       pcs5e       *quence[4] Jan 15 13:46
    4019986 userA   RUN   normal     pcs5a       pcs5e       *quence[5] Jan 15 13:46

Notice here that there are five jobs, one for each of our input files, and that the JOBID is the same for all of the jobs (4019986). If we look at the JOB_NAME, LSF has added the JOBINDEX to the end of each of the job names (e.g. [1]).

Once your jobs have finished you can look for the output files using `ls`.

In [None]:
ls sequence.out*

You should see five files. Each of the filenames will begin with _sequence.out_. This is followed by the JOBID (e.g. 4019986), a hyphen and then the JOBINDEX. Remember, the JOBINDEX will be a number from 1 to 5 which corresponds to the number at the end of each of the input files. So, s_equence.out.4019986-1_ will correspond with _sequence.txt.1_.

Let's take a look at the first three lines of each of the output files using `head`.

In [None]:
head -3 sequence.out.*

    ==> sequence.out.4019986-1 <==
    Reading data/sequence.txt.1
    Input sequence: AGGCTA
    Complementary sequence: TCCGAT

    ==> sequence.out.4019986-2 <==
    Reading data/sequence.txt.2
    Input sequence: TTGGCA
    Complementary sequence: AACCGT

    ==> sequence.out.4019986-3 <==
    Reading data/sequence.txt.3
    Input sequence: CGCAAT
    Complementary sequence: GCGTTA

    ==> sequence.out.4019986-4 <==
    Reading data/sequence.txt.4
    Input sequence: TTGCAA
    Complementary sequence: AACGTT

    ==> sequence.out.4019986-5 <==
    Reading data/sequence.txt.5
    Input sequence: GGCCAA
    Complementary sequence: CCGGTT

We can see that the same script (_myarrayscript.sh_) has been run on all five of our sequence files and we now have our complimentary sequences.

***

## Job dependencies

Sometimes, we may have a job which uses the output from another job i.e. job B must only start after job A has finished successfully. In the example below, we submit jobA using `bsub`. 

In [None]:
bsub -J jobA -o jobA.o -e jobA.e "sleep 180"

We can check on jobA using `bjobs`.

    JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
    4020418 UserA   RUN   normal     pcs5a       pcs5e       jobA       Jan 15 15:44

In this example, jobA has the JOBID 4020418 and has started running. Now, let's create jobB which requires jobA to have finished before jobB is allowed to start running. To do this, we use the `-w` option with `bsub`. To say that we want jobB to start running only once jobA has finished, we use `-w 'ended(jobA)'`. Don't forget to put you dependency requirement inside quotes.

In [None]:
bsub -w 'ended(jobA) ' -J jobB -o jobB.o -e jobB.e "sleep 60"

    JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
    4020418 UserA   RUN   normal     pcs5a       pcs5e       jobA       Jan 18 15:44
    4020421 UserA   PEND  normal     pcs5a                   jobB       Jan 15 15:45

Now, we can see that jobB is waiting (PEND), but how do we know it is waiting for jobA to finish? Well, let's take a look at the longer output with `bjobs -l`.

    bjobs -l 4020421

In the output, we can see the section _PENDING REASONS_ which tells us that the job dependency isn't satisfied (i.e. jobA hasn't finished yet).

    ...
    
     PENDING REASONS:
     Job dependency condition not satisfied;
     
    ...

We can take a closer look at the dependency conditions using `bjdepinfo -l`.

    bjdepinfo -l 4020421

This tells us that jobB (4020421) depends on jobA (4020418) being finished (ended). It shows the relationship between the two jobs, i.e jobA is the parent job and jobB is the child job.

    The dependency condition of job <4020421> is not satisfied: ended(jobA)
    JOBID          PARENT         PARENT_STATUS  PARENT_NAME  LEVEL
    4020421        4020418        RUN            jobA         1

***

## What's next?

For an overview of managing jobs, you can go back to the [managing jobs](managing_jobs.ipynb). Otherwise, let's take a look at [priority_and_fairshare](priority_and_fairshare.ipynb).