# BMI565: Bioinformatics Programming & Scripting
#### (C) Michael Mooney (mooneymi@ohsu.edu)
## Week 2: Unix/Linux Commands and Bash Scripting

1. Linux Background
2. Basic Linux Commands
    - Commands for Navigating the Directory Tree
    - File Permissions
    - Environment Variables
    - Other Basic Utilities
3. Input/Output Redirection
    - STDIN and STDOUT
    - Redirecting I/O: `>, >>, <`
    - Pipes
4. File Manipulation in Linux
    - AWK
    - sed
    - cut and paste
4. Bash Scripts
    - Bash Control Structures
    
#### Requirements

1. Bash interpreter
2. Data Files
    - `./data/book.xml`
    - `./data/P00533.fasta`
    - `./data/serotonin_data.txt`
    - `./data/annot_test.txt`

**Caution: This notebook depends on the bash interpreter and will not run on Windows. 

## Unix/Linux Background

- Unix is an OS originally developed in 1969 at Bell Labs
- Unix utilizes a kernel: the main OS program that connects applications with hardware
- Designed to be portable, multi-tasking, and multi-user
- Custom versions of Unix were being sold with server workstations. They were expensive and behaved differently.
- Linux is an open-source, free version of Unix developed by Linus Torvalds (first released in 1991)
- There are several Linux distributions available: Ubuntu, Fedora, Mint

## Basic Linux Commands
### Commands for Navigating the Directory Tree

<table align="left">
<tr><td style="text-align:center"><b>Command</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`pwd`</td><td>Displays the current directory</td></tr>
<tr><td style="text-align:center">`mkdir [DIR]`</td><td>Creates a new directory</td></tr>
<tr><td style="text-align:center">`cd [DIR]`</td><td>Changes the current directory</td></tr>
<tr><td style="text-align:center">`ls [DIR]`</td><td>Lists the directory contents</td></tr>
</table>

Absolute Path: The entire directory path (starting from the root directory)

Relative Path: The directory path relative to the current directory

In [1]:
## Print the current working directory (absolute path is printed)
pwd

/Users/mooneymi/Documents/github/bioinformatics_programming


In [2]:
## Change to a sub-directory called data
## The following command uses the relative path to the data directory
cd data



In [3]:
## Print the current working directory again
pwd

/Users/mooneymi/Documents/github/bioinformatics_programming/data


In [4]:
## You can also move to a directory by giving an absolute path
cd /Users/mooneymi/Documents/github/bioinformatics_programming/data



In [5]:
## List the contents of the directory
ls

P00533.fasta		egfr.gb			new_file.txt
SHH.xml			file1.txt		pickle_data.dat
annot_test.txt		file2.txt		serotonin_data.txt
book.xml		files.txt		sorted_files.txt
egfr.fasta		hello.txt		test.txt


In [6]:
## List the contents in long form to see detailed info about the files
## the -l option is for long form, the -t option sorts by time
ls -lt

total 504
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     13 Sep 21 10:03 test.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    161 Sep 21 10:02 sorted_files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    161 Sep 21 10:01 files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     14 Sep 21 10:01 hello.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     43 Sep 21 08:53 pickle_data.dat
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3712 Sep 21 08:53 new_file.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     79 Aug  4 16:54 file2.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    123 Aug  4 16:54 file1.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   1320 Jul 18 15:08 P00533.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  99455 Jul 18 15:08 SHH.xml
-rwxr--r--  1 mooneymi  OHSUM01\Domain Users    299 Jul 18 15:08 book.xml
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3370 Jul 18 15:08 egfr.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  38976 Jul 18 15

In [7]:
## Go back to parent directory (up one level in the tree)
cd ..



In [8]:
pwd

/Users/mooneymi/Documents/github/bioinformatics_programming


### File Permissions

Every file and directory has three permission types:
- Read
- Write
- Execute

Each of the permission types are assigned at three different levels:
- Owner
- Group
- Other

File permissions can be changed with the `chmod` command:
    chmod ### file

The `###` above represents three integers that specify the permission types for the three groups. The permission types are coded as follows:

 - 4 = Read Only
 - 2 = Write Only
 - 1 = Execute Only
 - 0 = None

To give multiple permission types, sum the respective numbers. 

#### Examples:
The first example below gives read, write, and execute permissions for the `test.txt` file to everyone. The second example gives all permissions to the file owner, but only read and execute permissions to the group and others.

    chmod 777 test.txt
    
    chmod 755 test.txt
    

In [9]:
## Change to the data directory again, and list the contents
cd data
ls -l

total 504
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   1320 Jul 18 15:08 P00533.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  99455 Jul 18 15:08 SHH.xml
-rw-r--r--@ 1 mooneymi  OHSUM01\Domain Users   3712 Aug  7  2012 annot_test.txt
-rwxr--r--  1 mooneymi  OHSUM01\Domain Users    299 Jul 18 15:08 book.xml
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3370 Jul 18 15:08 egfr.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  38976 Jul 18 15:08 egfr.gb
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    123 Aug  4 16:54 file1.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     79 Aug  4 16:54 file2.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    161 Sep 21 10:01 files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     14 Sep 21 10:01 hello.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3712 Sep 21 08:53 new_file.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     43 Sep 21 08:53 pickle_data.dat
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  62531 Jul 18 15:08

In [10]:
## Change the permissions of the book.xml file
chmod 777 book.xml



In [11]:
## View the directory again
ls -l

total 504
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   1320 Jul 18 15:08 P00533.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  99455 Jul 18 15:08 SHH.xml
-rw-r--r--@ 1 mooneymi  OHSUM01\Domain Users   3712 Aug  7  2012 annot_test.txt
-rwxrwxrwx  1 mooneymi  OHSUM01\Domain Users    299 Jul 18 15:08 book.xml
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3370 Jul 18 15:08 egfr.fasta
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  38976 Jul 18 15:08 egfr.gb
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    123 Aug  4 16:54 file1.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     79 Aug  4 16:54 file2.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    161 Sep 21 10:01 files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     14 Sep 21 10:01 hello.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users   3712 Sep 21 08:53 new_file.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     43 Sep 21 08:53 pickle_data.dat
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users  62531 Jul 18 15:08

In [12]:
## Now change the permissions back to read-only for the group and others
chmod 744 book.xml



### Environment Variables

Environment variables hold information about the computer environment in which your programs are running. You can view all environment variables with the `env` command.

Environment variables are commonly used to tell the OS where programs and libraries are installed. For instance, the PATH environment variable tells the operating system where to look for executable programs. The PATH variable is simply a list of directories. When you attempt to execute a program the operating system will search these directories (in the order listed) for the program you requested. Many of the utilities we've talked about today are located in the /usr/bin or /bin folders. Because these directories are listed in the PATH variable, you won't need to explicitly tell the OS where the utility is located. You can see where a program is located with the `which` command:

In [13]:
which python

/Applications/anaconda/bin/python


In [14]:
echo $PATH

/Applications/anaconda/bin:/Users/mooneymi/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin


You can modify the PATH variable as follows (this will add the bin folder in your home directory to the PATH):

In [15]:
PATH=~/bin:$PATH



The `export` command will make the variable available to any subshell (i.e. a shell script or other program).

In [16]:
export PATH



Environment variables can be accessed (on the command-line or in a shell script) by placing a $ before the variable name. For example:

In [17]:
echo $PATH

/Users/mooneymi/bin:/Applications/anaconda/bin:/Users/mooneymi/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin


**Note: Environment variables can be set automatically at login by putting the commands above into your login script (`~/.bash_profile`).

### Other Basic Linux Utilities

<table align="left">
<tr><td style="text-align:center"><b>Command</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`man [COMMAND]`</td><td>Displays the manual entry for the command</td></tr>
<tr><td style="text-align:center">`cat [OPTION] [FILE]`</td><td>Prints the entire file</td></tr>
<tr><td style="text-align:center">`head [OPTION] [FILE]`</td><td>Prints the first few lines of a file</td></tr>
<tr><td style="text-align:center">`tail [OPTION] [FILE]`</td><td>Prints the last few lines of a file</td></tr>
<tr><td style="text-align:center">`less [FILE]`</td><td>Displays a file and allows scrolling</td></tr>
<tr><td style="text-align:center">`wc [OPTION] [FILE]`</td><td>Returns the word and line count of a file</td></tr>
<tr><td style="text-align:center">`rm [OPTION] FILE`</td><td>Deletes a file</td></tr>
<tr><td style="text-align:center">`grep [OPTION] PATTERN [FILES]`</td><td>Searches files for a string pattern</td></tr>
</table>


**Caution: interactive commands that wait for user input, like `man` or `less`, won't work in the Jupyter notebook. Open a new terminal window to try those commands.

In [18]:
## Display the first few lines of a file
cd /Users/mooneymi/Documents/github/bioinformatics_programming/data
head P00533.fasta

>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL
FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN


In [19]:
## Count the lines of a file
wc -l serotonin_data.txt

     251 serotonin_data.txt


#### Useful `grep` Parameters
<table align="left">
<tr><td style="text-align:center"><b>Option</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`i`</td><td>Ignore case</td></tr>
<tr><td style="text-align:center">`c`</td><td>Print a count of matching lines</td></tr>
<tr><td style="text-align:center">`r`</td><td>Recursive searching (look in subdirectories)</td></tr>
<tr><td style="text-align:center">`l`</td><td>Only print filenames containing matches</td></tr>
<tr><td style="text-align:center">`n`</td><td>Include line numbers of matches within files</td></tr>
</table>

In [20]:
## Search for probes on chromosome 21
grep "chr21" annot_test.txt

A_23_P253586	NM_005128	NM_005128	NM_005128	Hs.204575	9980	DOPEY2	dopey family member 2	ENST00000270190	THC2471394	GO:0000139(Golgi membrane)|GO:0003674(molecular_function)|GO:0006895(Golgi to endosome transport)|GO:0007029(endoplasmic reticulum organization and biogenesis)|GO:0007275(multicellular organismal development)	Homo sapiens dopey family member 2 (DOPEY2), mRNA [NM_005128]	chr21:36586364-36587509	hs|21q22.12


In [21]:
## How many lines match the pattern
grep -c "chr21" annot_test.txt

1


## I/O Redirection

### STDIN and STDOUT

STDIN (standard input), STDOUT (standard output), and STDERR (standard error) are pre-connected input and output channels for a computer program and its environment. The STDIN is the stream for data going into a program (typically keyboard input), STDOUT is the stream for data written by a program (typically the terminal that initiated the program), and STDERR is the stream used to report errors (again, typically this is the terminal that initiated the program).

However, we are able to use I/O redirection so that program input can come from sources other than the keyboard, and so that output can go to destinations other than the screen. 

### `> , >>, <`

The `>` and `>>` redirection characters can be used to redirect program output to a file. `>` causes the file to be overwritten (if it exists), while `>>` will append output to the end of an existing file.

    echo 'Hello, world!' > hello.txt

Likewise, we can supply input to a program from a file by using the `<` redirection character. 

    sort < files.txt > sorted_files.txt



In [22]:
echo 'Hello, world!' > hello.txt



In [23]:
cat hello.txt

Hello, world!


In [24]:
## Write list of files to a text file
ls -t > files.txt



In [25]:
cat files.txt

files.txt
hello.txt
test.txt
sorted_files.txt
pickle_data.dat
new_file.txt
file2.txt
file1.txt
P00533.fasta
SHH.xml
book.xml
egfr.fasta
egfr.gb
serotonin_data.txt
annot_test.txt


In [26]:
## Sort the file names and print to the screen
sort < files.txt

P00533.fasta
SHH.xml
annot_test.txt
book.xml
egfr.fasta
egfr.gb
file1.txt
file2.txt
files.txt
hello.txt
new_file.txt
pickle_data.dat
serotonin_data.txt
sorted_files.txt
test.txt


In [27]:
## Or save the sorted file names to another text file
sort < files.txt > sorted_files.txt



### Pipes |

Pipes are another form of redirection and can be used to redirect the output from one command to the input of another command. Any number of pipes or redirection characters can be strung together to perform multi-step tasks that might otherwise be cumbersome.

    ls | sort > sorted_files.txt
    
    ls | wc -l > num_files.txt
    
    ls -lt | head -n 5
    
    ls -t | grep ".py$" | sort | head -n 5

In [28]:
## How many files are in the current directory
ls | wc -l

      15


In [29]:
## Show a few of the most recently modified files
ls -lt | head -n 5

total 504
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    178 Sep 21 11:10 sorted_files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users    178 Sep 21 11:10 files.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     14 Sep 21 11:10 hello.txt
-rw-r--r--  1 mooneymi  OHSUM01\Domain Users     13 Sep 21 10:03 test.txt


## File Manipulation in Linux

### `AWK`

`awk` can be used to easily manipulate delimited files. `awk` is a programming language in itself, and has many options that we won't cover here. The general structure of an `awk` command is as follows:

    awk –F 'delimiter' 'condition {command}' file

For more detailed info on `awk`: [http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_06.html](http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_06.html)

#### `awk` Built-in Variables
<table align="left">
<tr><td style="text-align:center"><b>Variable</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`NR`</td><td>The current row number</td></tr>
<tr><td style="text-align:center">`NF`</td><td>The number of fields in the line</td></tr>
<tr><td style="text-align:center">`$0`</td><td>The entire line</td></tr>
<tr><td style="text-align:center">`$n`</td><td>Where `n` is an integer, the nth field</td></tr>
</table>

#### Examples
In the first example below will print the number of fields on the first line of the file `annot_test.txt`. The second example will print the first two fields from the first 10 lines of the file.

    awk -F '\t' 'NR==1 {print NF}' annot_test.txt
    
    awk -F '\t' 'NR<=10 {print $1"\t"$2}' annot_test.txt

In [30]:
awk -F '\t' 'NR==1 {print NF}' annot_test.txt

14


In [31]:
awk -F '\t' 'NR<=10 {print $1"\t"$2}' annot_test.txt

ProbeID	PrimaryAccession
A_23_P253586	NM_005128
A_23_P217507	NM_004729
A_24_P538590	AK092032
A_24_P569294	NM_021107
A_23_P259451	NM_007080
A_32_P219520	NM_014350
A_32_P38619	THC2760960
A_24_P153234	A_24_P153234
A_23_P76006	NM_001235


### `sed`

`sed` is another powerful linux utility for manipulating files. It is particularly useful for quickly editing files (e.g. performing find-and-replace operations). Like `awk`, `sed` has numerous commands. For more information see the following: [http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_05.html](http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_05.html)

In [32]:
## Create a test file
echo 'Hello World!' > test.txt



In [33]:
## Print the contents of the file to the screen
cat test.txt

Hello World!


In [34]:
## Use sed to replace all instances of l with X
## The pattern is s (for substitution), / (just a delimiter), a pattern to match, /, the replacement, / g (for global)
sed 's/l/X/g' test.txt 

HeXXo WorXd!


In [35]:
## Omitting the 'g' at the end will tell sed to replace only the first occurence in the line
sed 's/l/X/' test.txt 

HeXlo World!


### `cut`

`cut` can be used to extract columns from a delimited file. The first example below will print the second and third columns of the file `annot_test.txt`. The second example will print the first four columns. The `-f` option tells the program to split on a delimiter (tab is default), while the `-c` option will split the file based on character position.

    cut -f 2,3 annot_test.txt
    
    cut -f 1-4 annot_test.txt


In [36]:
cut -f 2,3 annot_test.txt

PrimaryAccession	RefSeqAccession
NM_005128	NM_005128
NM_004729	NM_004729
AK092032	
NM_021107	NM_021107
NM_007080	NM_007080
NM_014350	NM_014350
THC2760960	
A_24_P153234	
NM_001235	NM_001235


In [37]:
cut -f 1-4 annot_test.txt

ProbeID	PrimaryAccession	RefSeqAccession	GenbankAccession
A_23_P253586	NM_005128	NM_005128	NM_005128
A_23_P217507	NM_004729	NM_004729	NM_004729
A_24_P538590	AK092032		AK092032
A_24_P569294	NM_021107	NM_021107	NM_021107
A_23_P259451	NM_007080	NM_007080	NM_007080
A_32_P219520	NM_014350	NM_014350	NM_014350
A_32_P38619	THC2760960		
A_24_P153234	A_24_P153234		
A_23_P76006	NM_001235	NM_001235	NM_001235


### `paste`

`paste` can be used to merge two files. Each row of the input files will be concatenated, separted by a delimiter (tab is default) and printed to the screen. For example, using `paste` to combine two tab delimited input files, each with 2 columns and 10 rows, will result in 10 rows of 4 columns being printed to the screen. The `-d` option can be used to specify the delimiter.

    paste file1.txt file2.txt
    
    paste -d ',' file1.txt file2.txt

In [38]:
cut -f 1 annot_test.txt > file1.txt



In [39]:
cut -f 3 annot_test.txt > file2.txt



In [40]:
paste file1.txt file2.txt

ProbeID	RefSeqAccession
A_23_P253586	NM_005128
A_23_P217507	NM_004729
A_24_P538590	
A_24_P569294	NM_021107
A_23_P259451	NM_007080
A_32_P219520	NM_014350
A_32_P38619	
A_24_P153234	
A_23_P76006	NM_001235


In [41]:
paste -d ',' file1.txt file2.txt

ProbeID,RefSeqAccession
A_23_P253586,NM_005128
A_23_P217507,NM_004729
A_24_P538590,
A_24_P569294,NM_021107
A_23_P259451,NM_007080
A_32_P219520,NM_014350
A_32_P38619,
A_24_P153234,
A_23_P76006,NM_001235


## Bash Scripting

#### Why Script?

1. Automate repetitive tasks
2. Create tailored sequences of commands
3. Link together software tools written in different languages 

Writing a bash script is very similar to writing a Python program. A bash script is a text file containing commands that are interpreted by the bash interpreter. Just as in Python, the first line should contain a shebang (e.g. `#!/bin/bash`) indicating the location of the interpreter. To run a bash script you have two options:

    bash my_script.sh
    
    chmod 755 my_script.sh
    ./my_script.sh

#### Local Variables

You can assign user variables inside bash scripts with the format: `variable=value`. There must be no spaces between the `=` and the variable name or value. To access the variable, append a `$` to the front of the variable name.

    first_name="John"
    
Use the `echo` or `printf` commands to print output. `printf` accepts format specifiers much like in Python.
    
    echo $first_name

#### Command-line Arguments

To access command-line arguments within a bash script use the special variables `$1`, `$2`, `$3`, etc. For example, to supply a file name to script you would run the script as follows:

    bash my_script.sh hello.txt
    
Within the script `my_script.sh` the filename argument might be used as follows:

    filename=$1
    
    if [ -r $filename ]; then
        linecount=$(wc -l $filename)
        printf "%s has %d lines" $filename $linecount
    fi

#### Exit Status

Programs return an exit status to indicate whether the program ran successfully or not (i.e. an error occured). An exit status of 0 indicates the command ran successfully. An exit status other than 0 indicates a failure, with specific codes indicating different errors (exact codes depend on the program). We can use the exit status of commands to make decisions in our scripts. For example, an exit status can be used as a condition in control structures (as seen below). The special variable `$?` holds the exit status of the previous command.

    grep -q "ATCG" text.txt
    if (( $? == 0 )); then
        echo "ATCG found."
    fi

### Bash Control Structures

#### if/elif/else blocks


    if [ condition1 ]; then
        command block
    elif [ condition2 ]; then
        command block
    else
        command block
    fi
        

#### Bash Conditions

<table align="left">
<tr><td style="text-align:center"><b>Variable</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`[ -e file ]`</td><td>File exists</td></tr>
<tr><td style="text-align:center">`[ -d directory ]`</td><td>Directory exists</td></tr>
<tr><td style="text-align:center">`[ -r file ]`</td><td>File exists and is readable</td></tr>
<tr><td style="text-align:center">`[ -w file ]`</td><td>File exists and is writable</td></tr>
<tr><td style="text-align:center">`[ -x file ]`</td><td>File exists and is executable</td></tr>
<tr><td style="text-align:center">`[ STRING1 == STRING2 ]`</td><td>`STRING1` equals `STRING2`</td></tr>
<tr><td style="text-align:center">`[ STRING1 != STRING2 ]`</td><td>`STRING1` does not equal `STRING2`</td></tr>
<tr><td style="text-align:center">`[[ STRING =~ PATTERN ]]`</td><td>`STRING` matches RegEx pattern `PATTERN` (Note the double square braces).</td></tr>
<tr><td style="text-align:center">`[ NUM1 -eq NUM2 ]`</td><td>`NUM1` is equal to `NUM2`</td></tr>
<tr><td style="text-align:center">`[ NUM1 -ne NUM2 ]`</td><td>`NUM1` is not equal to `NUM2`</td></tr>
<tr><td style="text-align:center">`[ NUM1 -gt NUM2 ]`</td><td>`NUM1` is greater than `NUM2`</td></tr>
<tr><td style="text-align:center">`[ NUM1 -ge NUM2 ]`</td><td>`NUM1` is greater than or equal to `NUM2`</td></tr>
<tr><td style="text-align:center">`[ NUM1 -lt NUM2 ]`</td><td>`NUM1` is less than `NUM2`</td></tr>
<tr><td style="text-align:center">`[ NUM1 -le NUM2 ]`</td><td>`NUM1` is less than or equal to `NUM2`</td></tr>
<tr><td style="text-align:center">`(( NUM1 == NUM2 ))`</td><td>`NUM1` is equal to `NUM2` (Note the double parentheses--these conditions only accept integers; <br />`<, >, <=, >=, !=` operators can also be used).</td></tr>
</table>

Multiple conditions can be combined using 'and' and 'or' operators. The operators `-a` and `-o` are used within single brackets, while `&&` and `||` are used within double brackets.

#### while loops

    i=0
    while [ $i -lt 10 ]; do
    echo $i
        i=$((i+1))
    done

#### for loops

There are a few different methods for constructing a `for` loop in bash. Here are some examples:

    ## For loop example 1
    for i in 1 2 3 4 5
    do
        echo $i
    done
    
    ## For loop example 2
    for file in $(ls)
    do
        if [ -r $file ]; then
            echo "'$file' is readable."
        fi
    done
    
    ## For loop example 3
    for (( i=1; i<=5; i++ ))
    do
        echo $i
    done

## In-Class Exercises

Exercise 1. 

Create two simple Python programs: one that generates a random DNA sequence to the STDOUT, and another that processes sequence data from STDIN (e.g. calculates the sequence length) and returns the result to STDOUT. Use the Python functions `sys.stdout.write()` and `sys.stdin.read()`. 

Construct a pipeline that calls the two Python programs, and saves the output to a file.

Exercise 2.

Write a bash script that takes a filename as an argument. The script should check that the file is a Python script (.py extension) and that it is readable, then print the first few lines to the screen.

Exercise 3.

Write a script that will iterate through all files in the current directory and perform some action (or not) depending on the file contents (e.g. results of a `grep` command).

## References

- Running Linux, Matt Welsh, Matthias Kalle Dalheimer, Lar Kaufman, O’Reilly (1999)
- Classic Shell Scriptng, Arnold Robbins, Nelson HF Beebe, O'Reilly (2005)
- [https://linuxacademy.com/blog/linux/conditions-in-bash-scripting-if-statements/](https://linuxacademy.com/blog/linux/conditions-in-bash-scripting-if-statements/)
- [http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html](http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html)

#### Last Updated: 21-Sep-2016