# The Unix Shell: Writing Shell Scripts

The shell commands constitute a programming language, and command line programs known as shell scripts can be written to perform complex tasks. 

This will only provide a brief overview - shell scripts have many traps and pitfalls for the unwary, and we generally prefer to use languages such as Python or R with more consistent syntax for complex tasks. However, shell scripts are extensively used in domains such as the preprocessing of genomics data, and it is a useful tool to know about.

## Assigning variables

We assign variables using `=` and recall them by using `$`. It is customary to spell shell variable names in ALL_CAPS.

In [None]:
NAME='Joe'
echo "Hello $NAME"
echo "Hello ${NAME}"

### Single and double parentheses

The main difference between the use of '' and "" is that variable expansion only occurs with double parentheses. For plain text, they are equivalent.

In [None]:
echo '${NAME}'

In [None]:
echo "${NAME}"

### Use of curly braces

Use of curly braces unambiguously specifies the variable of interest. I suggest you always use them as a defensive programming technique.

In [None]:
echo "Hello ${NAME}l"

$Namel is not defined, and so returns an empty string!

In [None]:
echo "Hello $NAMEl"

One of the quirks of shell scripts is already present - there cannot be spaces before or after the `=` in an assignment.

In [None]:
NAME2= 'Joe'
echo "Hello ${NAME2}"

The previous instruction assigns the **empty space** to NAME2, then tries to execute 'Joe' as a command.

In [None]:
NAME3 ='Joe'
echo "Hello ${NAME3}"

The previous instruction runs the **command** NAME3 with ='Joe' as its **argument**.

## Assigning commands to variables

In [None]:
pwd

In [None]:
CUR_DIR=$(pwd)
dirname ${CUR_DIR}
basename ${CUR_DIR}

## Working with numbers

**Careful**: Note the use of **double** parentheses to trigger evaluation of a mathematical expression.

In [None]:
NUM=$((1+2+3+4))
echo ${NUM}

### `seq` generates a range of numbers

In [None]:
seq 3

In [None]:
seq 2 5

In [None]:
seq 5 2 9

## Branching

### Using if to check for file existence

Note the test condition must use square brackets.

In [None]:
if [ -f hello.txt ]; then
    cat hello.txt
else
    echo "No such file"
fi

### Downloading remote files

In [None]:
man wget | head -n 20

In [None]:
wget -qO- https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/Forbes2000.html \
    | html2text | head -n 27  | tail -n 17

In [None]:
if [ ! -f "data/forbes.csv" ]; then
    wget https://vincentarelbundock.github.io/Rdatasets/csv/HSAUR/Forbes2000.csv \
    -O data/forbes.csv
fi

### Conditional evaluation with `test`

The `[ -f hello.txt ]` syntax is equivalent to `test -f hello.txt`, where `test` is a shell command with a large range of operators and flags that you can view in the man page.

```

TEST(1)                   BSD General Commands Manual                  TEST(1)

NAME
     test, [ -- condition evaluation utility

SYNOPSIS
     test expression
     [ expression ]

DESCRIPTION
     The test utility evaluates the expression and, if it evaluates to true,
     returns a zero (true) exit status; otherwise it returns 1 (false).  If
     there is no expression, test also returns 1 (false).

     All operators and flags are separate arguments to the test utility.

     The following primaries are used to construct expression:

     -b file       True if file exists and is a block special file.

     -c file       True if file exists and is a character special file.

     -d file       True if file exists and is a directory.

     -e file       True if file exists (regardless of type).

     -f file       True if file exists and is a regular file.

     -g file       True if file exists and its set group ID flag is set.
```

## Looping

### For loop

In [None]:
for FILE in $(ls *ipynb); do
    echo $FILE
done

### While loop

In [None]:
COUNTER=10
while [ $COUNTER -gt 0 ]; do
    echo $COUNTER
    COUNTER=$(($COUNTER - 1))
done

**Careful**: Note that `<` is the redirection operator, and hence will lead to an infinite loop. Use `-lt` for less than and `-gt` for greater than,  `==` for equality and `!=` for inequality.

In [None]:
COUNTER=10
while [ $COUNTER != 0 ]; do
    echo $COUNTER
    COUNTER=$(($COUNTER - 1))
done

## Shell script

From now on, we will write the shell script using an editor for convenience. For a syntax-highlighted display, I use a non-standard Python program `pygmentize` that you can install with 

```
pip install pygments
```

but you can also just use `cat` to display the file contents.

A shell script is traditionally given the extension `.sh`. There are a few things to note:

1. To make the script standalone, you need to add `#!/path/to/shell` in the first line. Otherwise you need to call the script with `bash /path/to/script` instead of just `/path/to/script`.
2. To make the script executable, change the file permissions to executable with `chmod +x /path/to/script`
3. Shell arguments are similar to  function arguments - i.e. `$1`, `$2`, `$@` etc. Another useful variable is `$#` which gives the number of command line arguments.

#### Find default shell to use

In [None]:
which bash

#### Display script

In [None]:
pygmentize -g scripts/cat_if_exists.sh

#### Give executable permission

In [None]:
chmod +x scripts/cat_if_exists.sh

In [None]:
scripts/cat_if_exists.sh hello.txt

In [None]:
scripts/cat_if_exists.sh goodbye.txt

### Reading a file line by line

We will write a script to extract headers from a FASTA Nucleic Acid (FNA) file. Headers in FASTA format are lines that begin with the `>` character.

In [None]:
cat data/example.fna | head -n 23

In [None]:
pygmentize scripts/extract_headers.sh

#### The ${X:m:n} expression extracts the characters of X from m to n

In [None]:
LINE=">random sequence 1 consisting of 1000 bases." 
echo "${LINE:0:1}"

In [None]:
echo "${LINE:5:10}"

**Careful**: You need to put all variables in the test condition within double quotes. If not, when the variable is empty or undefined (e.g. empty line) it vanishes and leaves `[ == '>' ]` which raises a syntax error.

In [None]:
chmod +x scripts/extract_headers.sh

In [None]:
cat data/example.fna | scripts/extract_headers.sh