#### Let's get started:

STEP 1: MAKE A COPY OF THIS NOTEBOOK AND BEGIN WORKING IN THAT NOTEBOOK

STEP 2: Double-click to get into editing mode. FILL OUT THE FOLLOWING INFORMATION

**_User(s)_**: enter your name(s) here

**_Date_**: enter the date here

**_Description_**: This notebook is to learn and practice UNIX commands. This will be useful when we run the bioinformatics tool: QIIME 2.

## Introducing the Shell

*Lesson adapted from ["The Carpentries The UNIX Shell"](https://swcarpentry.github.io/shell-novice/)*
*Download shell-lesson-data.zip and move the file to your Desktop.*


**Objectives**

- What is a command shell and why should we use one?
- Learn how to explain how the shell relates to the keyboard, screen, the operative system and programs.
- Learn why command-line interfaces are used verses graphical interfaces.

### Background

Computers do four things:
- run programs
- store data
- communicate with each other, and
- interact with us

Computers "interact with us" in many different ways. Can you name a few?

We use hardware interfaces like the keyboard, mouse, touch screen interfaces, or speech recognition using systems. Think of how these interfaces allow us to click selections of menus and drag-and-drop. 

Although most modern desktop operating systems (OS) communicate with their human users by means of windows, icons and pointers, these software technologies didn’t become widespread until the 1980s. What did we do beforehand?

#### The Command-Line Interface

This kind of interface is called a command-line interface, or CLI, to distinguish it from a graphical user interface, or GUI (pronounced: goo-ey), which most people now use. 

The heart of a CLI is a **read-**e**valuate-**p**rint **l**oop, or **REPL**. When the user types a command and then presses the `Enter` (or `Return`) key, the computer reads it, executes it, and prints its output. The user then types another command, and so on until the user logs off.

#### The Shell

The REPL description makes it sound as though the user sends commands directly to the computer, and the computer sends output directly to the user. 

In fact, there is usually a program in between called a command shell. What the user types goes into the shell, which then figures out what commands to run and orders the computer to execute them. 

**Note** that the command shell is called “the shell” because it encloses the operating system in order to hide some of its complexity and make it simpler to interact with.

A shell is a computer program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

In [None]:
echo $SHELL

#### Why should you learn to use the shell?

- Many bioinformatics tools can only be used through a command line interface, or have extra capabilities in the command line version that are not available in the GUI.

- In bioinformatics, you often need to do the same set of tasks with a large number of files. Learning to automate those repetitive tasks in a less error-prone. When humans do the same thing a hundred different times (or even ten times), they’re likely to make a mistake. Your computer can do the same thing a thousand times with no mistakes.

- When you carry out your work in the command-line (rather than a GUI), your computer keeps a record of every step that you’ve carried out, which you can use to re-do your work when you need to.

- Many bioinformatic tasks require large amounts of computing power and **can’t realistically** be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.

This is a nice place to start and then move into other languages like python or R.

### Navigating Files and Directories

#### Objectives
- Explain the similarities and differences between a file and a directory.
- Explain the steps in the shell’s read-run-print cycle.
- Identify the actual command, flags, and filenames in a command-line call.
- Demonstrate the use of tab completion, and explain its advantages.

The part of the operating system responsible for managing files and directories is called the file system. It organizes our data into files, which hold information, and directories (also called “folders”), which hold files or other directories.

#### Getting Started

Type the command `whoami`, then press `SHIFT-ENTER` to send the command to the shell. The command’s output is the ID of the current user, i.e., it shows us who the shell thinks we are (CyVerse username):

In [None]:
whoami

More specifically, when we type `whoami` the shell:

1. finds a program called `whoami`,
2. runs that program,
3. displays that program’s output, then
4. displays a new prompt to tell us that it’s ready for more commands.

Next, let’s find out where we are by running a command called `pwd` (which stands for `print working directory`):

In [None]:
pwd

At any moment, our current working directory is our current default directory, i.e., the directory that the computer assumes we want to run commands in unless we explicitly specify something else. For CyVerse, it will print `/home/cyversename`.

To understand what a “home directory” is, let’s have a look at how the file system as a whole is organized. 

**For the sake of this example, we’ll be illustrating the filesystem on *Nelle's* computer.** 

After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, *but not be exactly identical*.

On Nelle's computer, the filesystem looks like this:

![alt text](http://swcarpentry.github.io/shell-novice/fig/filesystem.svg "Logo Title Text 1")

At the top is the root directory that holds everything else. We refer to it using a slash character `/` on its own; this is the leading slash in `/Users/nelle`.

Inside that directory are several other directories: 
- `bin` (which is where some built-in programs are stored)
- `data` (for miscellaneous data files)
- `Users` (where users’ personal directories are located)
- `tmp` (for temporary files that don’t need to be stored long-term)

`nelle` has an account on her machine. He current working directory would be stored inside `/Users`. This is because `/Users/nelle` is the first part of its name. Similarly, we know that `/Users` is stored inside the root directory `/` because its name begins with `/`.

In this example directory below, underneath `/Users`, we find one directory for each user with an account on Nelle's machine.

![alt text](http://swcarpentry.github.io/shell-novice/fig/home-directories.svg "Logo Title Text 1")

**PRACTICE:** Who are the two other users?

> Chat with the person next to you and explain your result. Use the file system to explain the result.

---------
**ANSWER: **
Nelle's colleagues have files stored in the `/Users/inhotep` and `/Users/larry`. Typically, when you open a new command prompt you will be in your home directory to start.

## Commands
Now let’s learn the command that will let us see the contents of our own filesystem. We can see what’s in our home directory by running `ls`, which stands for “listing”:

In [None]:
ls

*(Again, when out of the CyVerse image, results may be slightly different depending on your operating system and how you have customized your filesystem.)*

`ls` prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns. 


We can make its output more comprehensible by using the **flag** `-F` (also known as a switch or an option) , which tells ls to add a trailing `/` to the names of directories:

In [None]:
ls -F

**ACTION:** Put your GREEN STICKY UP WHEN DONE.

Commands can be run alone. When using flags, they need to come after the command and before the input.

`ls` has lots of other flags. To find out what the possible, we use the `--help` flag.

### Getting help

Many commands and programs that people have written (that can be run from within bash) support the `--help` flag to display more information on how to use the command or program.

By entering the `--help` flag to the command `ls` below:

In [None]:
ls --help

**QUESTION:** We can read the manual of the command `ls` by using the `man` command. Does `ls` come first or does `man` come first? 

> Note: Discuss with partner and try both! Was there an error?

In [None]:
ls man

In [None]:
man ls

**QUESTION:** What is `ls msn` doing?

*HINT - look up the usage of `ls`.*

### Exploring Flags

*Listing Recursively and By Time*

Type `ls -R` in the code cell below. 

The `ls` command with the flag `-R` lists the contents of directories recursively, i.e., lists their sub-directories, sub-sub-directories, and so on in alphabetical order at each level.

In [None]:
ls -R

**PRACTICE:** Draw the directory structure (ignore files) that explains this output. Chat with your neighbor!

Type `ls -t` in the code cell below. 

The `ls` command with the flag `-t` lists things by time of last change, with most recently changed files or directories first.

In [None]:
ls -t

Compare this output to the `ls -R`.

Now type `ls -R -t -l`. The `ls` command combines the different flags `-R`, `-t` and `-l`. This order will list the contents of the directories (`-R`) by the time of the last change (`-t`), the most recently changed files first, along with long-listing format (`-l`) to view timestamps (`-h`) in a human-readable format.

In [None]:
ls -R -t -l -h

Using these ways of listing contents is helpful to check output files without clicking open so many windows, you can view the size of the files. Lots of information with a few keystrokes.

From the output above, in the home directory it contains sub-directories. There is another directory called `data`. Earlier we used the `ls` command with `F` flag to view directories. Below, type `ls data`, it will list the contents in the `
data` directory.

In [None]:
ls data

Using the `ls` to view inside other directories is helpful. We can use the same strategy to change our location to a different directory so we move out of home.

### Changing Locations

As you may now see, using a bash shell is strongly dependent on the idea that your files are organized in a hierarchical file system. Organizing things hierarchically in this way helps us keep track of our work: it’s possible to put hundreds of files in our home directory, just as it’s possible to pile hundreds of printed papers on our desk, but it’s a self-defeating strategy.


We learned we can look at a directories contents by `ls`.

**ACTION:** First let's look at our current working directory. Type `pwd`:

In [None]:
pwd

Here we will play with the command to change locations, `cd` followed by a `directory name` to change our working directory. `cd` stands for “change directory", which is a bit misleading: the command doesn’t change the directory, it changes the shell’s idea of what directory we are in.

We’ll start with the simplest.

There is a shortcut in the shell to move up one directory level that looks like this:

In [None]:
cd ..

'`..`' is a special directory name meaning “the directory containing this one”, or more succinctly, the parent of the current directory. Sure enough, if we run `pwd` after running `cd ..`,

In [None]:
pwd

*Hint - look at drawing you made previously. This will help visualize and navigating your directory structure on the virtual machine.*

You've learned the basic commands for navigating the filesystem on your computer: `pwd`, `ls` and `cd`. 

Let’s explore some variations on those commands. What happens if you type `cd` on its own or now flags, without giving a directory?

In [None]:
cd

Type the command for `print current directory` below:

In [None]:
pwd

It turns out that `cd` without an argument will return you to your home directory, which is great if you’ve gotten lost in your own filesystem.

Check that we’ve moved to the right place by running `pwd` and `ls -F`:

In [None]:
pwd

In [None]:
ls -F

Another shortcut is the `-` (dash) character. `cd` will translate `-` into "the previous directory I was in", which is faster than having to remember, then type, the full path. This is a very efficient way of moving back and forth between directories. The difference between `cd ..` (two periods) and `cd -` is that the former brings you up, while the latter brings you back. You can think of it as the Last Channel button on a TV remote.

**PRACTICE:** Starting from `/Users/amanda/data/`, which of the following commands could Amanda use to navigate to her home directory, which is `/Users/amanda`?

1. `cd /home/amanda`
2. `cd ../..`
3. `cd ~`
4. `cd home`
5. `cd`
6. `cd ..`


> Note: Use a paper to draw this out.
> Try not to scroll down into the answers.
> Discuss with your peers which works and doesn't work.
>

-------------------------------
**ANSWERS** Be careful scrolling down for answers.

1. No: Amanda’s home directory is `/Users/amanda`.
2. No: this goes up two levels, i.e. ends in `/Users`.
3. Yes: `~` stands for the user’s home directory, in this case `/Users/amanda`.
4. No: this would navigate into a directory home in the current directory if it exists.
5. Yes: shortcut to go back to the user’s home directory.
6. Yes: goes up one level.

### Make a new directory

Let’s create a new directory called `UNIX-test` using the command `mkdir UNIX-test` (which has no output).

As you might guess from its name, `mkdir` means “make directory”. Since thesis is a relative path (i.e., doesn’t have a leading slash), the new directory is created in the current working directory, type `ls`:

Make a new directory called `UNIX-test-2` using `mkdir` but with the flag `-v`. What happens?

### Good names for files and directories

Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.

> Don’t use whitespaces. 
>
> EXAMPLE: `JL water sample ID list copy12.txt`
>
> Whitespaces can make a name more meaningful but since whitespace is used to break arguments on the command line is better to avoid them in names of files and directories. You can use `-` (dash) or `_` (underscore) instead of whitespace.
>
> EXAMPLE: `JL-water-sample-ID-list-copy12.txt`
>
> Don’t begin the name with `-` (dash).
> Commands treat names starting with `-` as options.
> Stick with letters, numbers, `.` (period), `-` (dash) and `_` (underscore).
>
> Many other characters have special meanings on the command line. We will learn about some of these during this lesson. There are special characters that can cause your command to not work as expected and can even result in data loss.
>
>If you need to refer to names of files or directories that have whitespace or another non-alphanumeric character, you should surround the name in quotes ("").

**Renaming files and directories**

Here is a directory called `thesis` and within the directory a file named `draft.txt`. In the code cell below, at the end of "e" hit the `tab` key. This will give you options to `autocomplete` the file or directory name. Use the arrow keys to highlight `thesis`. Press enter, it will `autocomplete`.



In [None]:
ls thesis

`draft.txt` isn’t a particularly informative name, so let’s change the file’s name using `mv`, which is short for “move”. To understand how the `mv` command works, type `mv man` to learn how to use the command.

In [None]:
mv man

The usage of `mv` is:

`mv source-to-move target-directory` 

`-f` overwrite destination file

`-v` verbose, print source and destination files

Type: 

```
mv -v thesis/draft.txt thesis/quotes.txt
```

In [None]:
mv -v thesis/draft.txt thesis/quotes.txt

The first argument tells mv what we’re “moving”, while the second is where it’s to go. In this case, we’re moving `thesis/draft.txt` to `thesis/quotes.txt`, which has the same effect as renaming the file. Sure enough, `ls` shows us that thesis now contains one file called `quotes.txt`:

In [None]:
ls thesis

Sure enough, `ls thesis` shows us that thesis now contains one file called `quotes.txt`.

Let’s move `quotes.txt` into the current working directory. We use `mv` once again, but this time we’ll just use the name of a directory as the second argument to tell `mv` that we want to keep the filename, but put the file somewhere new. (This is why the command is called “move”.) In this case, the directory name we use is the special directory name `.` that we mentioned earlier.

Type:
```
mv thesis/quotes.txt .
```

In [None]:
mv thesis/quotes.txt .

The effect is to move the file from the directory it was in to the current working directory. Use `ls` to show that `thesis` is empty:

In [None]:
ls thesis

Further, `ls` with a filename or directory name as an argument only lists that file or directory. We can use this to see that `quotes.txt` is still in our current directory:

In [None]:
ls

**PRACTICE:** After running the following commands, Jamie realizes that she put the files `sucrose.dat` and `maltose.dat` into the wrong directory:

```
$ ls -F
 analyzed/ raw/
$ ls -F analyzed
fructose.dat glucose.dat maltose.dat sucrose.dat
$ cd raw/
```
Fill in the blanks to move these files to the current directory (i.e., the one she is currently in):

```
$ mv ___/sucrose.dat  ___/maltose.dat ___

```

> Note: Draw this out again.
> Check with the person next to you.
> Think about how this working on a graphical user interface (GUI).

--------
**ANSWER:** 

- The first command lists `ls -F` in the current working directory the two available directories: `analyzed/` and `raw/`. The `-F` flag lists all directories with the `/`.
- The second command lists the contents in the `analyzed/` directory. The output lists four files, including the two `sucrose.dat` and `maltose.dat`.
- The third command `cd raw/` changes to the directory `raw/`. 
- To move the files from `analyzed` to `raw` (current directory), Jamie will need to use the `mv` command. The `..` is one directory above the current directory and `.` refers to the current directory. Thus:

`$ mv ../analyzed/sucrose.dat ../analyzed/maltose.dat .`

### Deleting is Forever

The Unix shell doesn’t have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unhooked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there’s no guarantee they’ll work in any particular situation, since the computer may recycle the file’s disk space right away.

Type:
```
rm thesis
```

This happens because `rm` by default only works on files, not directories.

To really get rid of the directory `thesis` we must also delete the file `draft.txt`. We can do this with the recursive option `-r` for `rm`:

Check if the directory is removed from the current directory by using `ls`:

In [None]:
ls

> Note: Removing the files in a directory recursively can be a very dangerous operation.

### Examining Files

We now know how to switch directories, run programs, and look at the contents of directories, but how do we look at the contents of files?

One way to examine a file is to print out all of the contents using the program `cat`.
```
Usage:
cat [file]
```
In the code cell below, use `cat` to view `file-explanation.txt` in the `data` directory:

In the cell below, list all the contents within the directory `data`, we should see four files:

In the cell below, change into the `data` directory:

In the code cell below, use the command `cat` to print out the contents of the `sequences.fasta` file.

What is the last line of the file?

The last line from above is:
```
>2a06105f4e0e444f2777687683dddf1f
```

*That was a lot of scrolling.*

Use the command `wc` output the word, line, character, and byte count. 

In [None]:
wc sequences.fasta

The displays the number of lines, words, and bytes contained in the input file. In the fasta file `sequences.fasta`, prints 66398 lines. If you divide the total 66398 by 2, it will results in the number 33,199 which is the number of sequences in this one fasta file.

`cat` is a great program to use but when the file is really big, it can be annoying. The line above isn't actually the last line in the file. How can we check?

There’s another way that we can look at files, and in this case, **just look at part of them**. This can be particularly useful if we just want to see the beginning or end of the file, or see how it’s formatted.

The commands are `head` and `tail` and they let you look at the beginning and end of a file, respectively.

In the code cell below, type:

```
head sequences.fasta
```

In [None]:
head sequences.fasta

`head` displays the first ten lines of its input. Above is a `fasta` [file format](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp). The first line is called the description line:

```
>d2b972a835b4b341cf74a3e05bfa5fce
```
The description line (defline) is distinguished from the sequence data by a greater-than (">") symbol at the beginning. This is a QIIME feature ID number. This will make more sense in the tutorial `Exploring A Microbiome Workflow".

The second line is the nucleotide sequence data:
```
TACGTAGGGTGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTTGTAGGCGGTTTGTCGCGTCTGCCGTGAAATCCTCTGGCTTAACTGGGGGCGTGCGGTGGGTACGGGCAGGCTTGAGTGCGGTAGGGGAGACTGGAACTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAAGAACACCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTG
```
The single nucleotide code, (A) - Adenine; (C) - Cytosine; (T) - Thymine; (G) - Guanine. 

In the code call below, type:

```
tail sequences.fasta
```

In [None]:
tail sequences.fasta

`tail` displays the last few lines of its input.

The `-n` option to either of these commands can be used to print the first or last `n` lines of a file.

In [None]:
head -n 1 sequences.fasta

In [None]:
tail -n 1 sequences.fasta

The vertical bar, `|`, between the two commands is called a `pipe`. Hold down the shift key while pressing the `pipe` key `|` (located above the `return` key). It tells the shell that we want to use the output of the command on the left as the input to the command on the right. The computer might create a temporary file if it needs to, or copy data from one program to the other in memory, or something else entirely; we don’t have to know or care.

In the code cell below, print the top 4 lines using `head` and the last 6 lines `tail` using `pipe` which is above the `return key` `|`:

In [None]:
head -n 6 sequences.fasta | tail -n 2

From the first command, print the 6 lines at the top of the file `sequences.fasta`:
````
line 1: feature ID 1
line 2: sequence feature ID 1
line 3: feature ID 2
line 4: sequence feature ID 2
line 5: feature ID 3
line 6: sequence feature ID 3
```
Using pipe, the output will go through the second command to print the 2 lines from the output:
```
line 5: feature ID 3
line 6: sequence feature ID 3
```

**PRACTICE** How would you only print out the 23rd feature ID + sequence ? Hint: use `head` and `tail`.

> Note: 
> This may require some math to calculate. Especially is this is the first time looking at FASTA files.
> Consider how various programs are written to select only a few lines.
> Discuss with your peers before moving into the answers.


**ANSWER**:

1. Determine the line location that the feature ID starts.
```
line 1: feature ID 1
line 2: sequence feature ID 1
...
line 10: feature ID 5
line 11: sequence feature ID 5
...
line 40: feature ID 20
line 41: sequence feature ID 20
...
line 43: feature ID 23
line 44: sequence feature ID 23
```
2. From the top of the file, use the `head` command to print out to line 44. Remember you want both the feature ID and sequence.
```
head -n 44
```
3. From the output of the head command, you want the last two lines of that output.
```
tail -n
```
3. Combine the two with the `|`.

4. Answer:
```
>c6bff886449e4fccdb66ecf1c3f0567f
TACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGTTCGTAGGTGGCTTGCTAAGTCAGACGTGAAATCCCTCAGCTTAACTGGGGAACTGCGTCTGAGACTGGCCGGCTTGAGTGCAGGAGAGGAACGCGGAATTCCAGGTGTAGCGGTGAAATGCGTAGATATCTGGAGGAACACCGGTGGCGAAGGCGGCGTTCTGGACTGCAACTGACACTG
```

### Combine existing commands

If there is more time, check out the next exercise below!


Now that we know a few basic commands, we can finally look at the shell’s most powerful feature: the ease with which it lets us combine existing programs in new ways.

We’ll start with a directory called `molecules` that contains six files describing some simple organic molecules. The `.pdb` extension indicates that these files are in Protein Data Bank format, a simple text format that specifies the type and position of each atom in the molecule.

In the code cell below, print working directory and navigate to the directory above `molecules`:

In [None]:
cd

In [None]:
pwd

In [None]:
ls -F

Let’s go into that directory:

In [None]:
cd 

Run the command `wc *.pdb`. The `wildcard` `*` in `*.pdb` matches zero or more characters, so the shell turns `*.pdb` into a list of all `.pdb` files in the current directory:

In [None]:
wc *.pdb

`*` is a `wildcard`. It matches zero or more characters, so `*.pdb` matches `ethane.pdb`, `propane.pdb`, and every file that ends with `.pdb`. On the other hand, `p*.pdb` only matches `pentane.pdb` and `propane.pdb`, because the `p` at the front only matches filenames that begin with the letter `p`.

If we run `wc -l` instead of just `wc`, the output shows only the number of lines per file:

In [None]:
wc -l *.pdb

Which of these files is shortest? It’s an easy question to answer when there are only six files, but what if there were 6000? Our first step toward a solution is to run the command:

In [None]:
wc -l *.pdb > lengths.txt

The greater than symbol, `>`, tells the shell to redirect the command’s output to a file instead of printing it to the screen. (This is why there is no screen output: everything that `wc` would have printed has gone into the file `lengths.txt` instead.) The shell will create the file if it doesn’t exist. If the file exists, it will be silently overwritten, which may lead to data loss and thus requires some caution. `ls lengths.txt` confirms that the file exists:

In [None]:
ls lengths.txt

We can now send the content of `lengths.txt` to the screen using `cat lengths.txt`. `cat` stands for “concatenate”: it prints the contents of files one after another. There’s only one file in this case, so cat just shows us what it contains:

In [None]:
cat lengths.txt

In [None]:
sort lengths.txt

We will also use the `-n` flag to specify that the sort is numerical instead of alphabetical. This does not change the file; instead, it sends the sorted result to the screen:

In [None]:
sort -n lengths.txt

We can put the sorted list of lines in another temporary file called `sorted-lengths.txt` by putting `> sorted-lengths.txt` after the command, just as we used `> lengths.txt` to put the output of `wc` into `lengths.txt`.

In [None]:
sort -n lengths.txt > sorted-lengths.txt

In [None]:
ls

Once we’ve done that, we can run another command called `head` to get the first few lines in `sorted-lengths.txt`:

In [None]:
head -n 1 sorted-lengths.txt

Using `-n 1` with `head` tells it that we only want the first line of the file.

Since `sorted-lengths.txt` contains the lengths of our files ordered from least to greatest, the output of head must be the file with the fewest lines.


It’s a very bad idea to try redirecting the output of a command that operates on a file to the same file. For example:
```
$ sort -n lengths.txt > lengths.txt
```
Doing something like this may give you incorrect results and/or delete the contents of `lengths.txt`.

We have already met the `head` command, which prints lines from the start of a file. `tail` is similar, but prints lines from the end of a file instead. A similar operator `>>` which works slightly differently than `>`. `>>` appends the string to the file if it already exists (i.e. when we run it for the second time).

Consider the file  `cubane.pdb`.

In [None]:
head -n 3 cubane.pdb > cubaneUP.txt

Here, the top three lines of the file `cubane.pdb` are redirected to a new file named `cubaneUP.txt`. Look at the file `cubaneUP.txt` using `cat`:

In [None]:
cat cubaneUP.txt

In [None]:
tail -n 2 cubane.pdb >> cubaneUP.txt

Here we take the last two lines of the file `cubane.pdb` and append that output to the file `cubaneUP.txt`. Use `cat` to see the first three lines are the same and two new lines are added. 

In [None]:
cat cubaneUP.txt

If you think this is confusing, you’re in good company: even once you understand what wc, sort, and head do, all those intermediate files make it hard to follow what’s going on. We can make it easier to understand by running sort and head together:

In [None]:
sort -n lengths.txt | head -n 1

**PRACTICE:**
In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?
```
1. wc -l * > sort -n > head -n 3
2. wc -l * | sort -n | head -n 1-3
3. wc -l * | head -n 3 | sort -n
4. wc -l * | sort -n | head -n 3
```
-----------
**ANSWER:**
Option 4 is the solution. The pipe character `|` is used to feed the standard output from one process to the standard input of another. `>` is used to redirect standard output to a file. Try it in the data-shell/molecules directory!

When do all these items become useful in our microbiome workflow? The program QIIME utilizes these types of sorting, filtering and organization to give a useable, human-readable output. You can begin to explore how these simple commands can be useful!