Concepts explained in this tutorial:
- What a shell is
- Environment variables
- $PATH
- Navigating the file system
- Relative vs. absolute paths
- Chaining commands using pipes

Commands covered:
- echo
- which
- pwd
- ls
- mkdir
- cd
- touch
- cat
- cp
- rm
- head
- tail
- gzip
- zcat

Operators/aliases covered:
- ..
- \>\>
- |

# 1.1 Unix Basics#

We'll start by going through some basic unix commands. "Unix" is a term for a family of operating systems (just like "Windows"). The Mac operating systems (OSX) are part of the Unix family. You will also hear the term "Linux" a lot - linux refers to series of operating systems that are also part of the Unix family. Unix operating systems are very popular for running servers. However, while you may be used to interacting with your laptop using a graphical interface, these servers do not support graphical interfaces (as graphical interfaces are a LOT of work to build and are less flexible). Instead, you need to interact with them through the command line. Don't worry, it's easy once you get the hang of it, and it looks really cool to people who don't know what you are doing!

## How commands are understood##

Let's clarify a little about how unix commands are understood. The program that understands your unix commands is something called a "shell". If you hear the term "bash" get thrown around, just know that this is the name of a shell. There are many different kinds of shells, and different commands are slightly different depending on the shell that is being run. For now, we will focus on the bash shell. To use the bash shell through an ipython notebook, add an exclamation point (!) before the command, as illustrated below (run the code in the following shell)

In [1]:
#lines that begin with a hastag are comments; they are ignored
#by the shell. Let us double check that the bash shell is being
#run. To do this, we will use the command "echo $SHELL" illustrated
#below:
! echo $SHELL

/bin/bash


Let's understand in detail how the command above was understood by the shell.

Commands tend to have the format:<br />
[name of the program] [one or more arguments to the program...]<br />
("arguments" just refers to all the terms that control the behaviour of the program).

In the example above, "echo" is the name of the program. The echo program prints the value of its arguments to the screen.

There is also a concept of an "environment variable". A variable is something that stores information, and an "environment variable" is something that stores information that can be accessed by the shell (i.e. they pertain to the "environment" that commands are run in). Environment variables can be accessed by using "\$" (so \$SHELL produces the value of the SHELL variable). In the example above, \$SHELL gives the location where the current shell program is stored. On my Mac, this location happens to be /bin/bash. It may be slightly different when you run this notebook, but it should still end in "bash".

How do we read a path like "/bin/bash"? Files in a Unix system are organized into folders (also called "directories"). "/" refers to the topmost level. "/bin" is the "bin" folder ("bin" is an abbreviation for "binaries"; "binary" files refers to the form that runable programs often take). So "/bin/bash" refers to the "bash" program stored in the "bin" folder.

When the shell is told to run a program (like "echo"), how does the shell know where to find it? This is where the PATH environment variable comes in. The PATH variable stores the names of a number of directories, each separated by a colon. The shell looks at each of these directories in turn and sees if a runnable file (also called an "executable") with the appropriate name exists in any of those directories. Once it finds such an executable, it stops looking and executes it.

<b> Exercise 1.1.1 </b><br />
Display the contents of your PATH environment variable below:

In [2]:
##enter the command to print out the value of PATH below
!

The "which" program will tell you the exact location of the file that would be used to execute a particular program. For example, we can find the location of the "echo" program as shown below:

In [3]:
! which echo

/bin/echo


We can even find the location of the "which" program:

In [4]:
! which which

/usr/bin/which


<b> Exercise 1.1.2 </b><br />
A colleague of yours has installed one version of a program. However, when try to launch the program, the shell keeps launching a different version of the program than what they installed. What might the problem be? How could you check whether this is the problem?

## Navigating the file system, creating and editing files##

Here are a number of handy commands used to navigate the filesystem:

In [5]:
#Find out the directory you are in with the pwd command
#IMPORTANT: whenever you invoke a %%bash shell with ipython notebook, it will
#always be started from the same directory that the ipython notebook is running
#out of. In other words, if you changed to a different directory in a previous
#shell, this will NOT be remembered when you start a new shell. Keep this in mind
#when deleting files; you don't want to delete the wrong files!
!pwd

/home/user1/training_camp/workflow_notebooks


In [6]:
#Display the contents of the directory with ls
#note that the ls command can be used to reveal a lot of additional information about the files,
#such as file permissions, creation date and file size. You can read more about that
#here: http://www.tutorialspoint.com/unix/unix-file-management.htm
#and: http://www.tutorialspoint.com/unix/unix-file-permission.htm
!ls

1.0 Big Ideas.ipynb
1.1 Unix Basics.ipynb
1.3 Getting ready to run code on the cluster.ipynb
2.0_Sequencing_Data_Analysis.ipynb
2.4 Creating count coverage tracks.ipynb
3.1 Clustering analysis and PCA.ipynb
3.2 Calling differentially expressed peaks.ipynb
3.3 GO Term Enrichment.ipynb
3.4 Finding TF motifs.ipynb
exercise


In [7]:
!echo "Create a new directory called 'exercise' with mkdir"
!mkdir exercise
!echo "" #for a new line

!echo "Change into the exercise directory with cd"
%cd exercise
!echo ""

!echo "Confirm you are in the right directory"
!pwd
!echo ""

!echo "Make a file with the name test_file.txt with touch"
!touch test_file.txt
!echo ""

!echo "Write to test_file.txt"
#the ">>" appends the output to a file instead
#of printing it to the . Using a single ">" would overwrite
!echo "blah blah" >> test_file.txt
!echo ""

!echo "Display the contents of text_file.txt"
!cat test_file.txt
!echo ""

!echo "Make a copy of text_file.txt called test2_file.txt"
!cp test_file.txt test2_file.txt
!echo ""

!echo "Confirm test2_file.txt is a copy by printing out its contents"
!cat test2_file.txt
!echo ""

!echo "Change back to the previous directory"
%cd .. 
#".." is a shortcut for the previous directory
!echo ""

!echo "List the contents of the exercise directory to confirm it contains the two files"
!ls exercise
!echo ""

!echo "Remove test_file.txt"
!echo "IMPORTANT: THERE IS NO RECYCLE BIN. ONCE YOU DELETE, IT IS GONE FOREVER"
!echo "THUS, BE EXTREMELY CAREFUL WHEN USING rm. MAKE SURE YOU'RE DELETING THE CORRECT THINGS"
!rm exercise/test_file.txt
!echo ""

!echo "Confirm the removal"
!ls exercise #when you specify a directory, ls will list the contents of that directory
!echo ""

!echo "Remove the exercise directory"
#-r is recursive, meaning it removes the
#files within exercise, then removes exercise.
#You need to specify -r to remove directories.
!echo "IMPORTANT: THERE IS NO RECYCLE BIN. ONCE YOU DELETE, IT IS GONE FOREVER"
!echo "THUS, BE EXTREMELY CAREFUL WHEN USING rm. MAKE SURE YOU'RE DELETING THE CORRECT THINGS"
!rm -r exercise
!echo ""

!echo "List the contents of the present directory to confirm removal"
!ls

Create a new directory called 'exercise' with mkdir
mkdir: cannot create directory ‘exercise’: File exists

Change into the exercise directory with cd
/home/user1/training_camp/workflow_notebooks/exercise

Confirm you are in the right directory
/home/user1/training_camp/workflow_notebooks/exercise

Make a file with the name test_file.txt with touch

Write to test_file.txt

Display the contents of text_file.txt
blah blah
blah blah

Make a copy of text_file.txt called test2_file.txt

Confirm test2_file.txt is a copy by printing out its contents
blah blah
blah blah

Change back to the previous directory
/home/user1/training_camp/workflow_notebooks

List the contents of the exercise directory to confirm it contains the two files
exercise  test2_file.txt  test_file.txt

Remove test_file.txt
IMPORTANT: THERE IS NO RECYCLE BIN. ONCE YOU DELETE, IT IS GONE FOREVER
THUS, BE EXTREMELY CAREFUL WHEN USING rm. MAKE SURE YOU'RE DELETING THE CORRECT THINGS

Confirm the removal
exercise  test2_file.tx

## A note on relative vs. absolute paths##

When you execute the pwd command (which shows the present working directory), the information that is printed out begins with a "/". This is called an "absolute path" to the present directory - "absolute" because it specifies the full location of the directory relative to the "root directory" (which is the "/").

By contrast, when we made the exercise directory, we didn't specify a location beginning with "/" - instead, we just said "mkdir exercise", and the exercise directory was created in the present directory. This is called a "relative path" because the location of "exercise" was interpreted RELATIVE to location of the present working directory. If we had said "mkdir ../exercise", it would have created the exercise directory one level above the present working directory (remember ".." points to the directory one level up).

To get the absolute path, you must take the relative path and append it to the absolute path of the present working directory. You can always specify absolute paths to commands like cd and ls.

<b> Exercise 1.1.3 </b>
What would be the result of the following commands?

In [8]:
#-p creates nested directories if they don't exist
!mkdir -p exercise/a_dir/a_dir/a_dir/a_dir
%cd exercise
%cd a_dir/a_dir
!touch a_dir/../../a_dir/a_dir/../hi.txt
%cd ../..
!echo "ls a_dir"
!ls a_dir
!echo "ls a_dir/a_dir"
!ls a_dir/a_dir
!echo "ls a_dir/a_dir/a_dir"
!ls a_dir/a_dir/a_dir
!echo "ls a_dir/a_dir/a_dir/a_dir"
!ls a_dir/a_dir/a_dir/a_dir

/home/user1/training_camp/workflow_notebooks/exercise
/home/user1/training_camp/workflow_notebooks/exercise/a_dir/a_dir
/home/user1/training_camp/workflow_notebooks/exercise
ls a_dir
a_dir
ls a_dir/a_dir
a_dir  hi.txt
ls a_dir/a_dir/a_dir
a_dir
ls a_dir/a_dir/a_dir/a_dir


<b> Exercise 1.1.4 </b>
What is the absolute path of hi.txt? Check if you're right by issueing the command "cat /absolute/path/to/hi.txt", which will throw an error if your absolute path is incorrect

In [9]:
!cat /replace/with/absolute/path/to/hi.txt

cat: /replace/with/absolute/path/to/hi.txt: No such file or directory


In [13]:
%cd ..
!rm -r exercise #clean up the exercise folder

/home/user1/training_camp/workflow_notebooks


## Chaining commands with a pipe operator##

The "|", called a "pipe operator" (should be present above your return key) can be used to send the output of one command as input to another command. This is illustrated below:

In [14]:
!mkdir exercise
%cd exercise
!touch hi.txt
!echo "line1" >> hi.txt
!echo "line2" >> hi.txt
!echo "line3" >> hi.txt

!echo "View the contents of hi.txt"
!cat hi.txt
!echo ""

#The head command can be used to view the top few lines of a file.
!echo "View the top 2 lines of hi.txt"
!head -2 hi.txt
!echo ""

#Similarly, the tail command can be used to view the last few lines of a file
!echo "View the bottom 2 lines of hi.txt"
!tail -2 hi.txt
!echo ""

#Let's see how the pipe operator can help us interact with zipped files.
#To create a compressed file, we will use gzip
#(note: you would often want to do this to save space):
!echo "Zip up hi.txt with gzip"
!gzip hi.txt
!echo ""

#The zipped file ends up with a "gz" extension appended to it
!ls
!echo ""

!echo "Print the contents of the zipped file to the screen with zcat"
!echo "Note: this does not change the file on disk"
!zcat hi.txt.gz
!echo ""

!echo "Pipe the output of zcat to the head command"
!echo "This allows us to view the first two lines without unzipping"
!zcat hi.txt.gz | head -2
!echo ""

#clean up
%cd ..
!rm -r exercise

/home/user1/training_camp/workflow_notebooks/exercise
View the contents of hi.txt
line1
line2
line3

View the top 2 lines of hi.txt
line1
line2

View the bottom 2 lines of hi.txt
line2
line3

Zip up hi.txt with gzip

hi.txt.gz

Print the contents of the zipped file to the screen with zcat
Note: this does not change the file on disk
line1
line2
line3

Pipe the output of zcat to the head command
This allows us to view the first two lines without unzipping
line1
line2

/home/user1/training_camp/workflow_notebooks


<b> Exercise 1.1.5 </b>
Print ONLY the second line of hi.txt using a one-line command. Hint: use the pipe operator.

In [15]:
!mkdir exercise
%cd exercise
!touch hi.txt
!echo "line1" >> hi.txt
!echo "line2" >> hi.txt
!echo "line3" >> hi.txt

###Replace this with your one-line command to print the second line of hi.txt

#cleanup
%cd ..
!rm -r exercise

/home/user1/training_camp/workflow_notebooks/exercise
/home/user1/training_camp/workflow_notebooks


## References##

Here is the tutorial that I (Avanti) used to learn Unix: http://www.ee.surrey.ac.uk/Teaching/Unix/

Here's a more detailed tutorial from tutorialspoint:
http://www.tutorialspoint.com/unix/index.htm

Another resource geared towards bioinformatics: http://manuals.bioinformatics.ucr.edu/home/linux‐basics

Reference for commonly useful commands: https://sites.google.com/site/anshulkundaje/inotes/programming/shell-scripts

Learning shell programming: http://www.learnshell.org/

Debugging shell scripts: http://www.shellcheck.net/