# BMI565: Bioinformatics Programming & Scripting
#### (C) Michael Mooney (mooneymi@ohsu.edu)
#### * Thanks to Eric Leung for updating this material.
## Week 2: Unix/Linux Commands and Bash Scripting

1. Linux Background
2. Basic Linux Commands
    - Commands for Navigating the Directory Tree
    - File Permissions
    - Environment Variables
    - Other Basic Utilities
3. Input/Output Redirection
    - STDIN and STDOUT
    - Redirecting I/O: `>, >>, <`
    - Pipes
4. File Manipulation in Linux
    - AWK
    - sed
    - cut, etc.
5. Bash Scripts
    - Local Variables
    - Bash Control Structures
    - Exit Status
6. File Transfer
    - curl and wget
    - sftp and scp
7. Useful Tools
    - Screen
    
#### Requirements

1. Bash for Windows (Git Bash), or bash on Linux/Mac

## Getting Started

Open a terminal window (use PuTTY if you are a Windows user) so you can connect to the Linux server and try out the commands demonstrated in this lecture.

First, let's all SSH into `state`.

```sh
ssh <username>@state.ohsu.edu
```

And navigate to your folder.

```sh
cd /home/courses/BMI565/students/<username>
```

## Unix and Linux Background

- Unix developed at Bell Labs in 1970s
- Unix is operating system
- Linux kernel (or core) developed in 1991 by Finnish-American programmer and computer scientist Linux Torvalds
    - Linux and derivations are Unix-like
    - Act very much like original Unix operating system
- Multiple "flavors" of Linux with different purposes and uses
    - Ubuntu = general purpose
    - elementary OS = fast and open replacement for Windows and macOS
    - Arch Linux = simple, lightweight distribution
    - More in https://distrowatch.com/

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Linux_Foundation_logo.png/320px-Linux_Foundation_logo.png" width="200" align="left" />
<img src="https://upload.wikimedia.org/wikipedia/commons/a/af/Tux.png" width="200" align="left" />

## Basic Linux Commands

### The Most Important Command - The Manual

```sh
# What does the man command do?
man man

# The manual page for the ls command
man ls
```

### Navigation Commands

When you're confronted with a terminal, here are a few basic commands to get you around your computer.

| Command   | Description             | Plain words                                   |
|-----------|-------------------------|-----------------------------------------------|
| `pwd`     | Print working directory | Your location in your computer                |
| `ls`      | List directory contents | Look at what's around you at your location    |
| `cd`      | Change directory        | Move your position in your computer           |

In sum, the above commands help you navigate around your computer.

#### `pwd`

You can use `pwd` (print working directory) to show you where you are in your computer.

#### `ls`

After knowing where you are, you should find out what files and directories are around you.

You can use various options to change the output depending on what is interesting to you.

| Option | Description                          |
|--------|--------------------------------------|
| -l     | Format into a list                   |
| -a     | List all files e.g. hidden dot files |
| -t     | Order by last time modified          |
| -r     | Reverse order results                |
| -X     | Group by file type/extension         |

#### `cd`

It is no use if you can't go anywhere, so let's start moving around the computer using `cd`.

Here are some key shortcuts for using `cd`.

- `.` (single period) = this is where you are currently e.g. `cd .` = stays where you are
- `..` (double period) = go up to the parent directory/folder e.g. `cd ..` = goes up
- `-` (hyphen) = go to the previous location

```sh
# Briefly explore the computer with printing where you are with pwd
pwd
cd ..
pwd
cd -
```

### Echo and File Exploration

Now that we have some files, we can take a look at files using a variety of commands.

| Command | Description                               |
|---------|-------------------------------------------|
| echo    | Display lines                             |
| head    | Prints first few lines                    |
| cat     | Prints entire file and concatenates files |

#### `echo`

The `echo` command displays lines of text.

Some reasons you may want to use `echo` are:

- output messages to the screen
- printing file names

```sh
# Print to the screen some text
echo "Hello, World!"
```

#### `head`

The `head` command will show just the beginning lines of a given file.

By default, this command will show the first 10 lines of a file.

```sh
# Take a look at the first few lines of a file
head /home/courses/BMI565/examples/test.py
```

#### `cat`

If you want to see more than just the first few lines of a file, you can use the `cat` command to print out the entire file.

This command can also be used to concatenate or join multiple files together.

```sh
# Print contents of a file to the terminal
cat /home/courses/BMI565/examples/test.py
```

#### `less` 

The `less` command allows scrolling through a file.

```sh
less /home/courses/BMI565/examples/test.py
```

#### `wc` - word/line count

```sh
wc -l /home/courses/BMI565/examples/test.py
```

### File Manipulations

Once you're able to move around your computer, let's move around files and perform very simple changes to your files.

| Command   | Description                 | Plain Words                                   |
|-----------|-----------------------------|-----------------------------------------------|
| `touch`   | Change file timestamps      | Creates new file if none existed              |
| `cp`      | Copy files and directories  | Make clones of everything                     |
| `mv`      | Move (and rename) files     | Change where files are in your computer       |
| `rm`      | Remove files or directories | Delete files and be cautious using            |
| `mkdir`   | Make directories            | Create new positions/folders in your computer |
| `rmdir`   | Remove empty directories    | Remove empty folders on your computer         |

In sum, the above commands help create, move, and delete files and directories.

#### `touch`

Moving around your computer is great and all, but without files to open and edit, it can be kind of boring.

The `touch` command is used for creating empty files quickly.

This can be useful in creating a skeleton of an analysis work flow.

```sh
# Create a new empty file quickly (make sure you are in your own directory)
touch new_file_1.txt
ls
```

#### `cp`

Now that we have some files, you can create more with `cp`.

The `cp` command is used to copy files and directories.

```sh
# Create a copy of file
cp new_file_1.txt new_file_2.txt
ls
```

#### `mv`

The `mv` command is used to **move** files and directories around your computer.

Another use for this command is to **rename** things on your computer.

```sh
# "Rename" file by moving to new name
mv new_file_2.txt new_file_2_update.txt
ls
```

#### `rm`

The `rm` command **removes/deletes** files and directories around your computer.


** Use with caution, there is no trash bin for retrieving deleted files on Linux.

```sh
# Delete and remove files with rm
rm new_file_2_update.txt
ls
```

#### `mkdir` and `rmdir`

Creating lots of files can get messy, so having an organization structure with folders can help keep your files tidy.

The `mkdir` command **makes** directories, while the `rmdir` command **removes** (empty) directories.

Let's create a new directory with `mkdir`.

```sh
# Create new folders/directories
mkdir new_dir
cd new_dir
pwd
```

We may eventually want to remove a directory for various reasons. We can remove empty directories using the `rmdir` command, which is similar and more limited than the `rm` command we just learned about.

```sh
# Delete new (empty) directories
rmdir new_dir
ls
```

### File Permissions

File permissions can be a foreign concept, especially when you've never encountered them before or it's never affected you. File permissions are properties of every file/directory on your computer and dictate **what** (type of action) can be done by **who** (groups). There are three user groups, and three types of actions:

#### Permission Groups

There are three permission groups:

- **owner/user (u)** = personal owner of the file/directory
- **group (g) = group** of users who have access to file/directory
- **other users (o)** = users not in file's group
- **all users (a)** = what is accessible to any user, independent of group

#### Permission Types

There are three permission types:

- **read (r)** = user's capability to read the contents of file/directory
- **write (w)** = user's capability to write or change a file/directory
- **execute (x)** = user's capability to execute a file or view the contents of a directory

#### `chmod`

Let's create file to play around with its file permissions.

```sh
# Create new empty file and explore file permissions
touch restricted_file.txt
ls -l  # List files with extra information
```

The `chmod` command stands for **change file mode**, which refers to the number of modes (or types) of permissions a file can have.

This allows you to specify **who** has which permission **type**.

There are two ways to manipulate a file/directory's:

- **Octal representation** = using binary and numbers
- **Symbolic representation** = letters and semantics (easier to remember, IMO)

**Change permissions with octal representation**

Octal representation makes use of a base-8 number system to represent the three types of permissions for each of the groups. Each base-8 number is a digit from 0 to 7.

The power of this representation is in its conciseness, representing all possible combination of permission types into one number.

Each number can be constructed using three binary numbers from each of the three types of permissions.

Table below summarized all possible combinations.

| Permission               | rwx | Binary | Number |
|--------------------------|-----|--------|--------|
| read, write, and execute | rwx | 111    | 7      |
| read and write           | rw- | 110    | 6      |
| read and execute         | r-x | 101    | 5      |
| read only                | r-- | 100    | 4      |
| write and execute        | -wx | 011    | 3      |
| write only               | -w- | 010    | 2      |
| execute only             | --x | 001    | 1      |
| none                     | --- | 000    | 0      |

Another way to look at this is using just numbers:

- read = 4
- write = 2
- execute = 1

When crafting the correct number, you can follow this general workflow:

- figure out what kind of permissions you want,
- organize these permissions into the structure (read, write, execute),
- translate values to binary,
- translate binary to octal.

Each number can then be used to represent each of the three categories of people: user, group, and others (in that order).

For example, to give **read,write** (4 + 2 = 6) to the owner and **read** (4) to both group and others, you can run the following command with `chmod`:

```sh
# Change permissions of our file
chmod 644 restricted_file.txt
ls -l
```

**Change permissions with symbolic representation**

You can also use what is called symbolic representation to modify permissions.

This can be easier to remember and use because you don't have to remember which permission equals what number and how to put together the number as well.

All you need to remember are letters for the different roles:

| Letter | Role   |
|--------|--------|
| u      | user   |
| g      | group  |
| o      | others |

And the different permissions:

| Letter | Permission |
|--------|------------|
| r      | read       |
| w      | write      |
| x      | execute    |

Then use the following arithmetic symbols to change the permissions:

- `+` = add permission in addition to current permissions
- `-` = remove permission from current permissions
- `=` = set only the specified permissions

To do the same from above

> ...to give **read,write** (4 + 2 = 6) to user and just **read** (4) to both group and others, ...

you can run the following:

```sh
# Change file permissions with symbols
chmod u=rw,g=r,o=r restricted_file.txt
ls -l
```

**Resources and more**

- [7 Chmod Command Examples for Beginners (The Geek Stuff)](https://www.thegeekstuff.com/2010/06/chmod-command-examples/)
- [Beginners Guide to File and Directory Permissions (The Geek Stuff)](https://www.thegeekstuff.com/2010/04/unix-file-and-directory-permissions/)
- [Examples of chmod](http://examplenow.com/chmod/)

### Environment Variables

Environment variables hold information for your computer to read and make decisions about.

You can use the `env` command to see all created environmental variables.

One important environmental variable is your `PATH` variable. This variable controls how your computer searches for programs/software.

```sh
# Check where the python interpreter is
which python
```

You can access environment variables using the dollar sign, `$`, in front of the variable name.

```sh
# Print out our PATH variable
echo $PATH
```

You can manipulate your path variable with an equals sign `=` to append new parts of the path to the existing one using a colon, `:`. Note: there can be no spaces on either side of the `=`.

```sh
# Edit our path by adding the ~/bin directory
echo $PATH
PATH=~/bin:$PATH
echo $PATH
```

Changes made to the evironment variables like this will be temporary (if you log off the system, the changes won't be there when you return).

Also, you'll need to `export` variables for changes to be seen from any subshell (e.g. any shell script or application) called from this shell.

```sh
export PATH
```

## Redirect Input and Output

Sometimes when running commands, you'll want to chain together multiple commands or write the output to a file rather than the screen. You can do this by **redirecting** data streams (input or output of commands).

### Standard Streams

**Streams** are different types of information or data traveling within the Linux shell.

There are three standard streams of information:

- **STDIN** = standard input e.g. keyboard
- **STDOUT** = standard output e.g. screen
- **STDERR** = standard error

### Redirection of Streams

Now that we have an understanding of the basic streams of data available to us, we can redirect these streams however we like using the less than and greater than symbols.

| Write Status | Symbol | Description     |
|--------------|--------|-----------------|
| Overwrite    | >      | Standard output |
| Overwrite    | <      | Standard input  |
| Overwrite    | 2>     | Standard error  |
| Append       | >>     | Standard output |
| Append       | <<     | Standard input  |
| Append       | 2>>    | Standard error  |


#### Output and Overwrite with >

You can use the `>` (greater than symbol) character to write the output of a program to a file.

```sh
# Take output of command and redirect it to a file
ls ../ > list_of_files.txt
head list_of_files.txt
```

#### Output and Append with >>

The single greater than sign will overwrite the files you redirect to. But what if you want to just keep on adding to a list?

That's where the double greater than sign `>>` comes in.

This double greater than sign will add to the bottom of the file.

```sh
# The -e flag tells echo to interpret the backslash
echo -e "\nCompare with sorted list" >> list_of_files.txt

# Instead of overwriting a file, let's just append to it
ls -lt ../ >> list_of_files.txt
cat list_of_files.txt
```

#### Input with <

You can also redirect files **into** commands using the less than symbol, `<`.

```sh
# Take file and input it to a command (line count)
wc -l < list_of_files.txt
```

### Pipe

The pipe character, `|` (generally found above the Enter key), will redirect output from one command to the input of another command.

Let's look at what we have so far in our directory.

```sh
# Remind ourselves what is in our working directory
ls
```

Now, say I only want to look at 5 entries by using the `head` command.

```sh
# Pipe output of command into another
ls | head -n 2

# Same example but taking more steps
ls > all_files.txt
head -n 2 all_files.txt

# This command is particularly useful if you want to see 
# just a few of the most recently modified files
ls -lt | head
```

#### Resources and more

- [Input Output Redirection in Linux/Unix Examples (Guru99)](https://www.guru99.com/linux-redirection.html)
- [An Introduction to Linux I/O Redirection (Digital Ocean)](https://www.digitalocean.com/community/tutorials/an-introduction-to-linux-i-o-redirection)
- [Input/Output Redirection in the Shell (thoughtbot)](https://robots.thoughtbot.com/input-output-redirection-in-the-shell)
- [Pipeline (Wikipedia)](https://en.wikipedia.org/wiki/Pipeline_(Unix))
- [Bash One-Liners Explained, Part III: All about redirections](http://www.catonmat.net/blog/bash-one-liners-explained-part-three/)

## Exercise Break! (See Exercise #1 below)

## File Manipulation with Built-In Tools

We now have a basic understanding of how to move around our computer using the command line, explore files, and manipulate data to go where we want.

Here, let us explore powerful tools to manipulate text files. Many of these functions are available in Python and R, but here we can show the exact same functionality with time-tested tools.

### `awk` - text processing language with strength in tabular data

The `awk` command line tool is a powerful tool for processing text files, especially those organized into rows and columns i.e. tabular data.

#### General syntax

Borrowed from [Raunak Ramakrishnan](https://dev.to/rrampage/awk---a-useful-little-language-2fhf), their blog post has broken down how awk works in terms of Python pseudocode.

```
initialize()                             # Initializes variables in BEGIN block
for line in input_lines:                 # Divides input into a list of lines
    for condition, action in conditions: # Just list of condition-action pairs
        if condition(line):              # Match line against condition
            action()                     # Perform action on match
```

In other words, `awk` is a sequence of **pattern-action** pairs where it checks each line if it matches some pattern. If it does, the action will be executed.

```
BEGIN {...}
CONDITION {action}
CONDITION {action}
END {...}
```

#### Built-in variables

There are some built-in variables that can be used to make using `awk` more powerful. These variables relate to the file itself, such as the number of columns/fields in the file, which may be useful in manipulating the file.

| Variable   | Description                            | Example                          |
|------------|----------------------------------------|----------------------------------|
| FS         | Input separator                        | `awk 'BEGIN{FS="FS";}'`          |
| OFS        | Output separator                       | `awk 'BEGIN{OFS="=";}'`          |
| RS         | Determines what is a record            | `awk 'BEGIN{RS="\n\n";}'`        |
| ORS        | Output record separator                | `awk 'BEGIN{ORS="=";}'`          |
| NR         | Number of record                       | `awk '{print "Number - ", NR;}'` |
| NF         | Number of fields/columns               | `awk '{print NR,"->",NF;}'`      |
| FILENAME   | Name of current file                   | `awk '{print FILENAME}'`         |
| FNR        | Numbers of records relative to current | `awk '{print FILENAME, FNR;}'`   |
| `$0`       | The entire line                        | `awk '{print $0;}``              |
| `$n`       | The nth field number                   | `awk '{print $1;}'`              |


#### Examples

Now that we have the general syntax, let's try out some `awk` commands.

```sh
# Reminder of what the -l flag does
ls -l

# Examples of awk and potential actions
ls -l | awk '{ print $6 " " $10 }'
ls -l | awk '{ print $6 * 2 }'
```

#### Resources and More

- [Awk - A useful little language](https://dev.to/rrampage/awk---a-useful-little-language-2fhf)
- [How to Write AWK Commands and Scripts](https://www.lifewire.com/write-awk-commands-and-scripts-2200573)
- [8 Powerful Awk Built-in Variables (The Geek Stuff)](https://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/)
- [Awk (Grymoire)](http://www.grymoire.com/Unix/Awk.html)
- [awk or gawk (GNU awk)](https://ss64.com/bash/awk.html)
- [Learn by Example awk](https://github.com/learnbyexample/Command-line-text-processing/blob/master/gnu_awk.md)

### `sed` - edit streams of text

The `sed` command is another powerful command. While `awk` is useful for manipulating tabular data, `sed` is used to read in text and transform it.

A simple use of `sed` is for replacing text.

```sh
# Use sed to replace day and night
echo Sunday | sed 's/day/night/'
```

#### Resources and More

- [sed (Grymoire](http://www.grymoire.com/Unix/Sed.html)
- [sed (SS64)](https://ss64.com/bash/sed.html)

### `cut` - divide file into several parts by columns/delimiter

The `cut` command is useful to divide a file into several parts.

It goes through each line to cut parts of it based on:

- byte position
- character
- field

```sh
# Splice and dice tabular output, taking first 10 columns/bytes
ls -l
ls -l | cut -b 1-10
```

### `paste`

`paste` can be used to merge two files. Each row of the input files will be concatenated, separted by a delimiter (tab is default) and printed to the screen. For example, using `paste` to combine two tab delimited input files, each with 2 columns and 10 rows, will result in 10 rows of 4 columns being printed to the screen. The `-d` option can be used to specify the delimiter.

    paste file1.txt file2.txt
    
    paste -d ',' file1.txt file2.txt

### `sort` - put items in order

As the name implies, the `sort` command will order a list of items.

```sh
# Rearrange list of items
ls | sort
```

### `uniq` - filters repeated lines in a file

```sh
python /home/courses/BMI565/examples/test.py 5 | uniq
```

### `grep` - search for patterns in a file

```sh
# Count the number of lines containing "Welcome"
python /home/courses/BMI565/examples/test.py 5 | grep -c "Welcome" 

# Output all matches in the input
python /home/courses/BMI565/examples/test.py 5 | grep -o "Welcome"
```

### `find` - search for files

The `find` command will be helpful to search for files.

The general form of this command is

```
find (starting directory) (matching criteria and actions)
```

Here's a table summarizing the types of matching criteria available.

| Criteria     | Description                              |
|--------------|------------------------------------------|
| `-atime n`   | File accessed n days ago                 |
| `-mtime n`   | File modified n days ago                 |
| `-size n`    | File is n blocks big (block = 512 bytes) |
| `-type c`    | File type, f=file,d=dir                  |
| `-name nam`  | Search filename nam                      |
| `-user usr`  | File's owner is usr                      |
| `-group grp` | File's group is grp                      |
| `-perm p`    | File's access mode is p                  |

Arithmetic modifiers can be used to specify values.

| Modifiers    | Description                          |
|--------------|--------------------------------------|
| `-mtime +7`  | Modified more than seven days ago    |
| `-atime -2`  | File accessed less than two days ago |
| `-size +100` | File larger than 100 blocks (50 KB)  |


```sh
# Look for text files accessed less than three days ago
find . -atime -3 -name "*.txt"
```

#### Resources and More

- [35 Practical Examples of Linux Find Command](https://www.tecmint.com/35-practical-examples-of-linux-find-command/)
- [Use the Unix find command to search for files](https://kb.iu.edu/d/admm)

## Exercise Break! (See Exercise #2 below)

## Bash Scripting

Similar to Python scripting, bash scripting is a quick way to:

- automate repetitive tasks
- create custom sequence of commands
- link together software tools written in different languages

### Local Variables

We covered variables very briefly when we talked about the `$PATH` variable. This is a built-in variable, but you can also create variables yourself.

```sh
# Save and echo out variables
university="Oregon Health & Science University"
echo $university
```

Mind that there should be **no space** around the equals sign.

And to call the variable, you need the **dollar sign** in front of the variable name to use it.

Just a tease of what variables can do, you can save the output of one command into a variable and then use it later.

```sh
# Quick assignment of command output to variables
file_list=$(ls)
echo $file_list
```

### Pass in Arguments

Sometimes, instead of hard-coding (explicitly typing in) file names into your scripts, maybe we'd like our script to work with any file we give it.

Let's create a simple script that will count the number of lines a file has and create a test file.

```sh
# Create a simple bash script to take in arguments
echo '#!/usr/bin/env bash
filename=$1

if [ -r $filename ]; then
linecount=$(wc -l < $filename)
printf "%s has %d lines\n" $filename $linecount
fi' > count_lines.sh
cat count_lines.sh

# The -e flag interprets backslash characters to create new lines with \n
echo -e "This\nfile\nhas\nseven\nlines\nin\nit" > test_file.txt
echo "" # Just create a space between results
cat test_file.txt
```

Now we can run the script we just created using the file we want as an **argument**.

```sh
# Read in command line arguments
bash count_lines.sh test_file.txt
```

### Control Structures

Similar to Python, you can write statements in bash to control the flow of logic based on conditions or loop through a list of items.

####  if/elif/else blocks

We briefly saw the `if` statement being used earlier.

The general syntax using conditional is:

```
if [ expression ]; then
  Code is 'expression' is true
fi
```

Here's a working example using `if` statements

```sh
# Set variable
object="food"

# Check if variable is car or food
if [ $object == "car" ]; then
  echo "This is a car"
elif [ $object == "food" ]; then
  echo "This is food"
else
  echo "I don't know what this is"
fi
```

##### Resources and more

- [6. Conditionals](http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-6.html)

#### Various Conditions

Below are tables summarizing the various kinds of conditionals in bash.

**Files and Directories**

| Condition         | Description                      |
|-------------------|----------------------------------|
| `[ -e file ]`     | Check file exists                |
| `[ -d directory]` | Check directory exists           |
| `[ -r file ]`     | Check file exists and readable   |
| `[ -w file ]`     | Check file exists and writable   |
| `[ -x file ]`     | Check file exists and executable |

**Compare Strings**

| Condition                | Description                          |
|--------------------------|--------------------------------------|
| `[ -z STRING ]`          | True if length of STRING is zero     |
| `[ -n STRING ]`          | True if length of STRING is non-zero |
| `[ STRING1 == STRING2 ]` | True if strings are equal            |
| `[ STRING1 != STRING2 ]` | True if strings are not equal        |
| `[ STRING1 < STRING2 ]`  | True if STRING1 sorts before STRING2 |
| `[ STRING2 > STRING2 ]`  | True if STRING1 sorts after STRING2  |

**Numeric Comparisons**

| Condition            | Description                        |
|----------------------|------------------------------------|
| `[ NUM1 -eq NUM2 ]`  | Two numbers are equal              |
| `[ NUM1 -ne NUM2 ]`  | Two numbers not equal              |
| `[ NUM1 -gt NUM2 ]`  | NUM1 greater than NUM2             |
| `[ NUM1 -ge NUM2 ]`  | NUM1 greater than or equal to NUM2 |
| `[ NUM1 -lt NUM2 ]`  | NUM1 less than NUM2                |
| `[ NUM1 -le NUM2 ]`  | NUM1 less than or equal to NUM2    |
| `(( NUM1 == NUM2 ))` | Two numbers are equal              |

**Note**: Double parentheses are specifically for arithmetic expressions. In other words `[ NUM1 > NUM2 ]` would fail, but `(( NUM1 > NUM2 ))` would work.

##### Resources and more

- [Introduction to if](http://www.tldp.org/LDP/Bash-Beginners-Guide/html/sect_07_01.html)
- [What is the difference between double and single square brackets in bash?](https://serverfault.com/questions/52034/what-is-the-difference-between-double-and-single-square-brackets-in-bash)


#### while loops

The `while` loop keeps on running a set of commands **while** some condition is still met.

```sh
COUNTER=0

# Check COUNTER being less than 10
while [ $COUNTER -lt 10 ]; do
  echo The counter is $COUNTER

  # let allows arithmetic expressions to be evaluated
  let COUNTER+=1
done
```

#### for loops

Similar to `while` loops, `for` loops will iterate over a set of commands. This type of loop, however, loops over a list of items until that list is done.

```sh
# Simple for loop example
for i in 1 2 3 4; do
  echo $i
done

# Another way of the above
echo
for i in $(seq 1 4); do  # seq prints sequence of numbers
  echo $i
done
```

### Exit Status Indicators for Scripts

Unix and Linux systems have what are called **exit codes** that scripts/programs can return after it is done running.

These codes indicate whether or not the script passed successfully (`0`), failed in some way (`1`), or misuse of shell commands (`2`). There are a variety of other codes for other situations.

Because scripts are typically run in the context of other scripts, it can be important to know if any one of the inner scripts fail in someway so that you can fix them.

To access the exit code of the previous script, you can use the `$?` variable.

```sh
ls %  # <-- This will fail
echo $?

bashscript  # <-- Will also fail because non-existant
echo $?
```

Here is how you could use these within your own bash scripts.

```sh
# Create bash script
echo '#!/usr/bin/env bash

head -n 1 ../README.md

if [[ $? -eq 0 ]]; then
  echo "Successfully read beginning of file"; exit 0
else
  echo "Failed to read beginning of file"; exit 1
fi' > test_exit_codes.sh

# Run bash script from above
bash test_exit_codes.sh
```

**Resources and more**

- [Understanding Exit Codes and how to use them in bash scripts](http://bencane.com/2014/09/02/understanding-exit-codes-and-how-to-use-them-in-bash-scripts/)
- [Appendix E. Exit Codes with Special meanings](http://www.tldp.org/LDP/abs/html/exitcodes.html)
- [Exit command](https://bash.cyberciti.biz/guide/Exit_command)

## File Transfer and Interacting with the Web and Servers

When data analyses require compute power not available to your local computer, a server dedicated to crunching numbers and analyses may be help.

When working with a server, you may want to move files between your own computer and the server. While there are graphical tools to do this, there are command line tools available to you to do this as well.

### `curl` and `wget` retrieve files from servers

`curl` and `wget` are both command line tools that can download contents from servers and the internet.

For simple file downloads, there isn't much of a difference in use.

#### curl

**Note**: the flags below are the letter `O` (as in ostrich), not the number zero (`0`).

```sh
# Download Python's PEP 20 file
curl -O https://raw.githubusercontent.com/python/peps/master/pep-0020.txt

# Do the same thing, but name the downloaded file differently
curl -o zen.txt https://raw.githubusercontent.com/python/peps/master/pep-0020.txt
```

#### wget

```sh
# wget doesn't require any flags if you just want to download the file
wget https://raw.githubusercontent.com/python/peps/master/pep-0020.txt

# You can similarly name the downloaded file differently using the letter O
wget -O pep_zen.txt https://raw.githubusercontent.com/python/peps/master/pep-0020.txt
```

#### Resources and more

- [curl vs wget](https://daniel.haxx.se/docs/curl-vs-wget.html)
- [What is the difference between curl and wget?](https://unix.stackexchange.com/questions/47434/what-is-the-difference-between-curl-and-wget)
- [Linux/Unix: curl Command Download File Example](https://www.cyberciti.biz/faq/curl-download-file-example-under-linux-unix/)
- [Linux wget: Your Ultimate Command Line Downloader](https://www.cyberciti.biz/tips/linux-wget-your-ultimate-command-line-downloader.html)

### `scp` and `sftp` for secure transfer of files

`scp` and `sftp` are both useful command line tools to move files between servers/computers.

For example, you can use this program to move files between your computer and State using these two commands.

The **s** in front of each of these commands stands for **secure** because they both encrypt the data they transfer.

#### `scp`

`scp` stands for *secure copy* because this command copies files from a server.

So we can run the following command from my computer to copy this document to State.

```sh
# Copy this file over to the root of my user account
scp unix_outline.org leunge@state.ohsu.edu:~/
```

**Note** the colon after the server name. After the colon, you can type the path location where you'd like to put the file. In this case, the file will be moved to the root of my user account.

You can also move entire directories using the `-r` flag.

```sh
# Move this entire lecture directory to account
scp -r ../../ leunge@state.ohsu.edu:~/
```

#### `sftp`

The `sftp` works similarly to `scp` to copy files. One key difference is that
it is **interactive**.

| Command    | Description                        |
|------------|------------------------------------|
| `cd dir`   | Move to directory                  |
| `lcd dir`  | Change directory on local computer |
| `ls`       | List files on server               |
| `lls`      | List files on local computer       |
| `pwd`      | Print working directory on server  |
| `lpwd`     | Print working directory on local   |
| `get file` | Download file from server to local |
| `put file` | Upload file from local to server   |
| `exit`     | Exit from sftp program             |

To start a secure FTP session, run the following.

```sh
# Start FTP session
sftp leunge@state.ohsu.edu
```

Then you should be greeted with the following.

```sh
sftp>
```

From here, you can run commands to move files between your computer and the server.

```sh
sftp> lls
sftp> put unix_outline.org
sftp> ls
sftp> exit
```

#### Resources and more

- [How to exclude file when using scp command recursively](https://www.cyberciti.biz/faq/scp-exclude-files-when-using-command-recursively-on-unix-linux/)
- [Unix / Linux: sftp File From One Server To Another](https://www.cyberciti.biz/faq/sftp-file-from-server-to-another-in-unix-linux/)

## Exercise Break! (See Exercise #3 below)

## Useful Tools

### `screen` 

`Screen` is a "terminal multiplexer".

This is a fancy way of describing how to use a single terminal window and create new terminals from within it, without opening another window.

Here are some benefits to using a terminal multiplexer:

- Quickly switch between contexts
- Keep a job running even when disconnected from the server 
- Pick up progress quickly on a server when you login

#### Screen

Initially releases in 1987, Screen is a mature and stable terminal multiplexer.

```sh
# Start and open screen session
screen

# "Re-attach" to running session
screen -r

# Look at running sessions
screen -ls

# Create named screen session without attaching
screen -dmS myscreen
```

The main key to invoke changes or to control Screen is <kbd>Ctrl+a</kbd>. The following
is a list of commands to use while in screen.

| Command                 | Description                           |
|-------------------------|---------------------------------------|
| <kbd>Ctrl+a c</kbd>     | Create new Screen window              |
| <kbd>Ctrl+a 0-9</kbd>   | Switch to window number               |
| <kbd>Ctrl+a x</kbd>     | Locks terminal window                 |
| <kbd>Ctrl+a n</kbd>     | Switch to next window                 |
| <kbd>Ctrl+a space</kbd> | Switch to next window                 |
| <kbd>Ctrl+a k</kbd>     | Close current window                  |
| <kbd>Ctrl+a A</kbd>     | Choose title for window               |
| <kbd>Ctrl+a d</kbd>     | Detach from Screen and keep session   |
| <kbd>Ctrl+a \|</kbd>    | Split window into two vertical planes |
| <kbd>Ctrl+a S</kbd>     | Split window horizontally             |
| <kbd>Ctrl+a Q</kbd>     | Unsplit windows                       |
| <kbd>Ctrl+a tab</kbd>   | Switch from one to another            |
| <kbd>Ctrl+a "</kbd>     | Switch between terminals using list   |
| <kbd>Ctrl+a ?</kbd>     | Display list of all commands          |


#### Resources and more

- Manual
  - [Screen User's Manual](https://www.gnu.org/software/screen/manual/html_node/index.html)
- Screen tutorials
  - [Using GNU Screen to Manage Persistent Terminal Sessions - linode](https://www.linode.com/docs/networking/ssh/using-gnu-screen-to-manage-persistent-terminal-sessions/)
  - [A quick tutorial on screen - Matt Cutts](https://www.mattcutts.com/blog/a-quick-tutorial-on-screen/)
  - [Learn to use screen, a terminal multiplexer - dev.to](https://dev.to/thiht/learn-to-use-screen-a-terminal-multiplexer-gl)

## Summary

<span style="display: inline-block">

| Command/Term   | Simple Description                    |
|----------------|---------------------------------------|
| `man`          | Displays the manual                   |
| `pwd`          | Displays current directory            |
| `ls`           | Display files in directory            |
| `cd`           | Change current directory              |
| `echo`         | Display lines of text                 |
| `head`         | Display first few lines of text       |
| `cat`          | Print entire file                     |
| `less`         | Scroll through a file                 |
| `wc`           | Get word and line counts              |
| `touch`        | Update file or create new file        |
| `cp`           | Copy files and directories            |
| `mv`           | Move files and directories            |
| `rm`           | Delete files and directories          |
| `mkdir`        | Create new directory                  |
| `rmdir`        | Remove empty directory                |
| `chmod`        | Change file/directory permissions     |
| `which`        | Show path to command                  |
| `STDIN`        | Input going into program              |
| `STDOUT`       | Output coming out of programs         |
| `STDERR`       | Error messages                        |
| `>`            | Write to file                         |
| `>>`           | Append to file                        |
| `<`            | Input file to command                 |
| `\|` (Pipe)     | Pass text between commands            |
| `awk`          | Process text in tabular form          |
| `sed`          | Edit streams of data                  |
| `cut`          | Divide file by column/delimiters      |
| `sort`         | Sort text                             |
| `uniq`         | Filter repeated lines                 |
| `grep`         | Search for patterns                   |
| `find`         | Search for files                      |
| `$?`           | Check exit status of previous command |
| `curl`         | Transfer data                         |
| `wget`         | Retrieves contents from servers       |
| `scp`          | Secure copy over SSH                  |
| `sftp`         | Secure file transfer (interactive)    |
| `screen`       | Terminal multiplexer                  |

</span>

```sh
# Remove all generated files
rm -f count_lines.sh \
    list_of_files.txt \
    pep* \
    restricted_file.txt \
    test_exit_codes.sh \
    test_file.txt \
    zen.txt \
    new_file_1.txt
```

## Exercises

#### Exercise 1: Basic Pipelines

Create two simple Python programs:

1. Generate a random DNA sequence to `STDOUT`
2. Process sequence data from `STDIN` and calculate sequence length to `STDOUT`

Construct a bash pipeline script that calls the two Python program and save the
final output to a file.

**Hint**: Use the Python functions `sys.stdout.write()` and `sys.stdin.read()`.

#### Exercise 2: Search for Codons

Using the random DNA sequence Python script from Exercise 1, generate a random sequence of DNA and count the number of times the DNA sequence "TAA" (a stop codon) occurs.

**Hint**: You can use the `-o` flag for `grep` to print all matches.

**Bonus**: Simultaneously find all three stop codons: "TAA", "TAG", "TGA" and count them each.

#### Exercise 3: Download Data and Chop It Up

Create a bash script to download the processed Long Beach V.A. data `processed.va.data` from the [Heart Disease Data Set](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) and filter to keep only subjects with Class 0 (last column). Save this subset of the data to a separate file.

The data can be found at: [https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/](https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/).

**Hint**: Use `wget`/`curl` to download file and `awk` to subset data.

## General Resources

- [Linux Commands and Shell Scripting - learnbyexample](https://github.com/learnbyexample/Linux_command_line): overview of Linux and commonly found commands
- [Command Line Text Processing - learnbyexample](https://github.com/learnbyexample/Command-line-text-processing): from finding text to search and replace, from sorting to beautifying text and more
- [Advanced Bash-Scripting Guide](http://tldp.org/LDP/abs/html/): an in-depth exploration of the art of shell scripting
- [Bioinformatics One-Linears - Stephen Turner](https://github.com/stephenturner/oneliners): useful bash one-liners useful for bioinformatics
- [The Art of Command Line](https://github.com/jlevy/the-art-of-command-line): guide both for beginners and the experienced, with goals of **breadth** (everything important), **specificity** (give concrete examples of the most common case), and **brevity** (avoid things that aren't essential or digressions you can easily look up elsewhere)
- [Bash Handbook - denysdovhan](https://github.com/denysdovhan/bash-handbook): document for those who want to learn Bash without diving in too deeply.
- [Awesome Bash](https://github.com/awesome-lists/awesome-bash): a curated list of delightful Bash script and resources
- [Julia Evans' (@b0rk) Twitter snippets](https://twitter.com/i/moments/1026078161115729920): scroll through her photos for hand drawn descriptions of bash and others
- [Bash Guide for Beginners](http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html)
- [BASH Programming - Introduction HOW-TO](http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html)
- [Bash Pitfalls - Common Errors Bash Programmers Make](http://mywiki.wooledge.org/BashPitfalls)
- [Unix as IDE Series](https://sanctum.geek.nz/arabesque/series/unix-as-ide/)