### Please make sure that you are using the bash kernel to run this notebok ###

Concepts explained in this tutorial:
- What a shell is
- Environment variables
- $PATH
- Navigating the file system
- Relative vs. absolute paths
- Chaining commands using pipes

Commands covered:
- echo
- which
- pwd
- ls
- ls -lt
- mkdir
- cd
- touch
- cat
- cp
- rm
- rm -r
- mv
- head
- tail
- gzip
- zcat

Operators/aliases covered:
- ~
- ..
- \>\>
- |


# 1.1 Unix Basics#

We'll start by going through some basic unix commands. "Unix" is a term for a family of operating systems (just like "Windows"). The Mac operating systems (OSX) are part of the Unix family. You will also hear the term "Linux" a lot - linux refers to series of operating systems that are also part of the Unix family. Unix operating systems are very popular for running servers. However, while you may be used to interacting with your laptop using a graphical interface, these servers do not support graphical interfaces (as graphical interfaces are a LOT of work to build and are less flexible). Instead, you need to interact with them through the command line. Don't worry, it's easy once you get the hang of it, and it looks really cool to people who don't know what you are doing!

## How commands are understood##

Let's clarify a little about how unix commands are understood. The program that understands your unix commands is something called a "shell". If you hear the term "bash" get thrown around, just know that this is the name of a shell. There are many different kinds of shells, and different commands are slightly different depending on the shell that is being run. For now, we will focus on the bash shell.

Let us double check that the bash shell is being run. To do this, we will use the command "echo $SHELL" illustrated below:

In [None]:
#lines that begin with a hastag are comments; they are ignored
#by the shell.
echo $SHELL

Let's understand in detail how the command above was understood by the shell.

Commands tend to have the format:<br />
[name of the program] [one or more arguments to the program...]<br />
("arguments" just refers to all the terms that control the behaviour of the program).

In the example above, "echo" is the name of the program. The echo program prints the value of its arguments to the screen.

There is also a concept of an "environment variable". A variable is something that stores information, and an "environment variable" is something that stores information that can be accessed by the shell (i.e. they pertain to the "environment" that commands are run in). Environment variables can be accessed by using "\$" (so \$SHELL produces the value of the SHELL variable). In the example above, \$SHELL gives the location where the current shell program is stored. On my Mac, this location happens to be /bin/bash. It may be slightly different when you run this notebook, but it should still end in "bash".

How do we read a path like "/bin/bash"? Files in a Unix system are organized into folders (also called "directories"). "/" refers to the topmost level. "/bin" is the "bin" folder ("bin" is an abbreviation for "binaries"; "binary" files refers to the form that runable programs often take). So "/bin/bash" refers to the "bash" program stored in the "bin" folder.

When the shell is told to run a program (like "echo"), how does the shell know where to find it? This is where the PATH environment variable comes in. The PATH variable stores the names of a number of directories, each separated by a colon. The shell looks at each of these directories in turn and sees if a runnable file (also called an "executable") with the appropriate name exists in any of those directories. Once it finds such an executable, it stops looking and executes it.

<b> Exercise 1.1.0 </b><br />
Display the contents of your PATH environment variable below:

In [None]:
##enter the command to print out the value of PATH below
echo $PATH

The "which" program will tell you the exact location of the file that would be used to execute a particular program. For example, we can find the location of the "echo" program as shown below:

In [None]:
which echo

We can even find the location of the "which" program:

In [None]:
which which

<b> Exercise 1.1.1 </b><br />
A colleague of yours has installed one version of a program. However, when try to launch the program, the shell keeps launching a different version of the program than what they installed. What might the problem be? How could you check whether this is the problem?

## Navigating the file system, creating and editing files##

Here are a number of handy commands used to navigate the filesystem. Let's start with pwd, which tells you the directory that you are currently running out of:

In [None]:
pwd

We can also display the contents of the directory with the ls command.

In [None]:
ls

The ls command with the -l -t and -h arguments can be used to reveal a lot of additional information about the files, such as file permissions, creation date and file size. You can read more about that here: http://www.tutorialspoint.com/unix/unix-file-management.htm and here: http://www.tutorialspoint.com/unix/unix-file-permission.htm

In [None]:
ls -l -t -h

The letters after the dash, known as "flags", specify additional options to the command. These are: 
* -l provides an extended list of file attributes, including 
    * who own the file (you should your sunetid) 
    * file size 
    * file modification time 
* -t sorts the files from newest (on top) to oldest (at the bottom). You can add the -r flag to reverse the sort order. 
* -h provides the file size in a 'human-readable' format. 

Protip: You can group these flags together to make a single flag: ls -lth. Verify that this gives you the same output as above. 

**Exercise 1.1.2**: Explore the ls command with and without the various flags above. 

In [None]:
## YOUR CODE HERE 

To move around different folders, you can use the **cd** command. For example, to move to your "home" directory, you can use the alias "~".  

In [None]:
cd ~

The `pwd` command  **p**rints your **w**orking **d**irectory. 

In [None]:
pwd

To create a new directory, use the mkdir command. We will create a new directory called "exercise"

In [None]:
mkdir exercise

Let's now change into the exercise directory. To do this, we will need to use the cd command.

In [None]:
cd exercise
#pwd will tell us the present working directory
pwd

Once in the exercise directory, let's make a file with the name test_file.txt. To do this, we can use the touch command.

In [None]:
touch test_file.txt
ls #lists the contents of the directory

Let's write to test_file.txt. To do this, we will use the >> operator which appends the output to a file instead of printing it to the screen. If the file didn't exist, the >> operator would create the file.

In [3]:
echo "hello world\\nhello" >> test_file.txt

In [4]:
cat test_file.txt


hello world\nhello
hello world\nhello


Let's confirm the write using the cat command which displays the contents of the file

In [None]:
cat test_file.txt

FYI: if we use a single > instead of >>, this will overwrite the file rather than appending to it.

<b> Exercise 1.1.3 </b>
Write a single command to make test_file.txt contain a single line saying anything of your choosing. Hint: use the > operator mentioned above. Also know that "touch" will not overwrite files (rather, it will update their 'last edited' date).

In [None]:
#put your command below:

Let's make a copy of the file with the cp command. We will call the new file "test2_file.txt"

In [None]:
cp test_file.txt test2_file.txt
ls #list the contents of the directory to confirm the copy

In [None]:
cat test2_file.txt

Let's now delete test_file.txt using rm. Note that rm is PERMANENT - there is no recycle bin and no undo, so be careful when you use it!

In [None]:
rm test_file.txt
ls #list the contents of the directory to confirm the rm

In [None]:
mkdir my_directory


In [None]:
ls

In [None]:
rm -r my_directory

In [None]:
ls

Let's now rename test2_file.txt to test3_file.txt using the mv command ("move"). Why would we ever want to use mv verus just doing a cp followed by rm? For one, disk space. You may not always have the room to make a new copy of the file. The mv command works by updating information that says where the file is located without actually changing the data of the file on disk, so at no point is a copy made (Note: this is only true if you are doing the move withing the same "drive", which is usually the case if the first folder relative to the root is the same).

However, be aware that you can overwrite files with mv or cp if the file you are moving or copying to is a file that already exists, and when you do so the change is permanent. So rename or copy carefully!

In [None]:
mv test2_file.txt test3_file.txt
ls #list the contents of the directory to confirm the rename

In [None]:
cat test3_file.txt

Let's now clean up the exercise directory. Change back to the previous directory with cd. The ".." exists in every directory and points to "one directory up".

In [None]:
pwd

In [None]:
cd ..

In [None]:
pwd

FYI you can also use ls to list the contents of a specific directory. Let's list the contents of the exercise directory without changing into it:

In [None]:
ls exercise

We will now delete the exercise directory. Note that the standard rm command doesn't work - you will get a message saying exercise is a directory:

In [None]:
rm exercise
ls #show that the deletion did not happen

To delete the directory, you need to specify the -r flag, which stands for recursive. Recursion just refers to repeating the same action on a smaller scale. In this case, the rm command will delete the contents of any subdirectories (that's where "recursion" comes in), and will then delete the directory.

In [None]:
rm -r exercise
ls #show that the deletion happened

## A note on relative vs. absolute paths##

When you execute the pwd command (which shows the present working directory), the information that is printed out begins with a "/". This is called an "absolute path" to the present directory - "absolute" because it specifies the full location of the directory relative to the "root directory" (which is the "/").

By contrast, when we made the exercise directory, we didn't specify a location beginning with "/" - instead, we just said "mkdir exercise", and the exercise directory was created in the present directory. This is called a "relative path" because the location of "exercise" was interpreted RELATIVE to location of the present working directory. If we had said "mkdir ../exercise", it would have created the exercise directory one level above the present working directory (remember ".." points to the directory one level up).

To get the absolute path, you must take the relative path and append it to the absolute path of the present working directory. You can always specify absolute paths to commands like cd and ls.

<b> Exercise 1.1.4 </b>
What would be the result of the following commands?

In [None]:
#-p creates nested directories if they don't exist
mkdir -p exercise/a_dir/a_dir/a_dir/a_dir
cd exercise
cd a_dir/a_dir
touch a_dir/../../a_dir/a_dir/../hi.txt
cd ../..
echo "ls a_dir"
ls a_dir
echo "ls a_dir/a_dir"
ls a_dir/a_dir
echo "ls a_dir/a_dir/a_dir"
ls a_dir/a_dir/a_dir
echo "ls a_dir/a_dir/a_dir/a_dir"
ls a_dir/a_dir/a_dir/a_dir
#cleanup
cd ..
rm -r exercise

<b> Exercise 1.1.5 </b>
What is the absolute path of hi.txt in the example below? Check if you're right by issueing the command "cat /absolute/path/to/hi.txt", which will throw an error if your absolute path is incorrect

In [None]:
mkdir -p exercise/a_dir/a_dir/
echo "blah" > exercise/a_dir/a_dir/hi.txt
cat exercise/a_dir/a_dir/hi.txt
cat /replace/with/absolute/path/to/hi.txt
#cleanup
rm -r exercise

## Chaining commands with a pipe operator##

In [None]:
ls

The "|", called a "pipe operator" (should be present above your return key) can be used to send the output of one command as input to another command. This is illustrated below.

Let's start by creating an exercise folder with the file hi.txt that contains 3 lines:

In [None]:
mkdir exercise
cd exercise
touch hi.txt
echo "line1" >> hi.txt
echo "line2" >> hi.txt
echo "line3" >> hi.txt
#view the contents of hi.txt to confirm
cat hi.txt

To set up an example involving the pipe operator, we're going to introduce a number of commands. The head and tail commands that can be used to view the top or bottom lines of a file, as illustrated below:

In [None]:
echo "View the top 2 lines of hi.txt"
head -2 hi.txt
echo "View the bottom 1 line of hi.txt"
tail -1 hi.txt

We can also zip up files with the gzip command. This is something you should get in the habit of doing in order to save space. It would also be a smart thing to do if you're ever transferring large files, as it would reduce the sizes of the files you need to transfer. The gzip command will automatically create a new file with .gz appended to the file name. Gzipped files do not HAVE to have the .gz extension; it's just useful to give gzipped files this extension so that you can keep track of which files are gzipped and which ones are not.

In [None]:
gzip hi.txt
ls #view the contents of the directory

FYI, files can be unzipped with the gunzip command

In [None]:
gunzip hi.txt.gz #decompress the file. The gz extension is automatically removed
ls
#gzip hi.txt #compress the file again for our example
#ls

Say we want to view the contents of the file without unzipping the file on disk (which we may want to do because we don't have the disk space to unzip it). Because the file is compressed, the cat command would give us nonsensical results. Instead, we use the zcat command:

In [None]:
zcat hi.txt.gz #view the contents
ls #this shows that the file is still zipped up on disk

But now let's say we want to go further and view the top two lines of hi.txt.gz without unzipping it on disk. This is where the pipe operator is useful. It will allow us to send the output of zcat to the head command to use as input, as illustrated below:

In [None]:
zcat hi.txt.gz | head -2

We could go further and compress the output of the head command to create a gzipped file that has only the first two lines. To do this, we first use gzip -c (which produces an output of gzipped data) and then we write this output to a file using the > operator.

In [None]:
zcat hi.txt.gz | head -2 | gzip  > first_two_lines.txt.gz
#ls #list out the files in the directory
#zcat first_two_lines.txt.gz #view the contents of the new zipped file

In [None]:
zcat first_two_lines.txt.gz


Let's now clean up the exercise directory

In [None]:
cd ..
rm -r exercise

<b> Exercise 1.1.6 </b>
In the cell below, print ONLY the second line of hi.txt using a one-line command. Hint: this can be accomplished using the pipe operator and commands we have covered in the section below.

In [None]:
mkdir exercise
cd exercise
touch hi.txt
echo "line1" >> hi.txt
echo "line2" >> hi.txt
echo "line3" >> hi.txt

###Add your one-line command to print the second line of hi.txt


#cleanup
cd ..
rm -r exercise

## References##

Recommended Unix tutorial: http://www.ee.surrey.ac.uk/Teaching/Unix/

Here's a more detailed tutorial from tutorialspoint:
http://www.tutorialspoint.com/unix/index.htm

Another resource geared towards bioinformatics: http://manuals.bioinformatics.ucr.edu/home/linux‐basics

Reference for commonly useful commands: https://sites.google.com/site/anshulkundaje/inotes/programming/shell-scripts

Learning shell programming: http://www.learnshell.org/

Debugging shell scripts: http://www.shellcheck.net/