# Intro To Unix Basics

### Why learn Unix?

"Unix" is a term for a family of operating systems (just like "Windows"). The Mac operating systems (OSX) are part of the Unix family. You will also hear the term "Linux" a lot - linux is part of the Unix family. Unix operating systems are very popular for running servers.

While you may be used to interacting with your laptop using a mouse and graphical interface, Unix servers like Stanford's Sherlock cluster do not work that way (graphical interfaces are a LOT of work to build and are less flexible). Instead, we need to interact with these servers through the command line or terminal. These servers can store enormous files (such as files containing millions of ATAC-seq reads), and they have many CPUs and even GPUs too, which allow us to perform powerful bioinformatic analyses that our laptops just aren't capable of.

Don't worry, it's easy once you get the hang of it, and it looks really cool to people who don't know what you are doing!

# Day 1 Slides + Exercises

![title](images/slide1.png)

![title](images/slide2.png)

![title](images/slide3.png)

### Exercises

1. Log into the VM set up for training camp and open a terminal from the Jupyter homepage (New --> Terminal). Explore your home directory and beyond using `ls` and `cd`. Try creating, moving, copying, and removing new directories with `mkdir`, `mv`, `cp`, and `rm` (be careful with `rm`!!!). Test out using absolute vs. relative paths with the various commands. As you explore, keep track of where you are with `pwd` and compare to what the terminal prompt shows as your location. 


2. What happens if you try...
    - `cd [dir_that_doesnt_exist]`
    - `ls [dir_that_doesnt_exist]`
    - `cd` (without an input)


3. What happens if you try `mkdir [existing_dir]` or `mkdir dir1/dir2/dir3/dir4`, with and without the `-p` flag? (Hint: you'll get an error in one case and no error in the other case.)


4. Does the order of flags and/or inputs matter for `ls` (try using `-l`, `-t`, and an input directory at the same time) or for `mkdir` with `-p`? (Note that this doesn't generalize to all commands.)

5. What happens when you try `mv [existing_dir] [existing_dir]` (or the same with `cp`)? (Hint: the destination directory is not overwritten.) Can you imagine how this behavior could cause problems if you weren't aware the destination directory already existed?

![title](images/slide4.png)

![title](images/slide5.png)

### Exercises

1. Create a directory (`mkdir`) called `example_files` in your home directory and copy over the file `/outputs/all_merged.peaks.bed` into this new directory. (Hint: the destination for the copied file can be written as `destination_dir/destination_file_name.bed` or simply as `destination_dir`, in which case the original file name will be used.


2. This file is too big for a human to read through all the tines -- use `head` and `wc -l` to peek at the contents of the file and check how long it is.


3. How many peaks come from chromosome 16 (written as chrXVI)? (Use `grep` with the `-c` flag.)


4. Save the peaks from chromosome 16 to a new file (use `>`), and check that it appears to have worked using `head`, `tail`, and `wc -l` on this new file. (Make sure you don't see any other chromosomes besides 16, and the output from `wc -l` should match the output from your previous `grep -c`.) What would have happened if you used an incomplete chromosome name to `grep`, such as "chrXV", that could also match other chromosome names?


5. Using your chromosome 16 file and `sed`, convert all the Roman numeral XVIs to Arabic 16s, and save this to a new file. Check out that it worked with `head`.


6. The header line can sometimes cause software to throw an error if it's not expected. To get around this, you would need to create a new bed file with the header removed. Try doing this using `grep` with and without the `-v` flag.


7. Let's do some column math with `awk`! In our peaks file, the peak width is equal to column 3 minus column 2. We can calculate this for each line with `awk '{ print $3 - $2 }' all_merged.peaks.bed`. Try it out yourself!


Awk in particular is really powerful and has so many uses beyond printing columns. A notebook full of examples is here: https://colab.research.google.com/drive/1VOC7CVLWNvj59VAlpazbQI1dKM_YUXAN?usp=sharing

# Day 2 Slides + Exercises

![title](images/slide6.png)

### Exercises

1. Use `gzip` and `gunzip` to compress and decompress a few files. With `ls -l`, check that the size of the file (in bytes) is smaller when it is compressed than when it is not. What kinds of files shrink more/less when you compress them? Try it out on text/bed files as well as PNG files in the images directory.


2. Compress the bed file from the previous exercises section, and then use `zcat [gzipped file] | head` and confirm that the output is the same as when you run `head` on the non-gzipped file. Try out the `grep`, `sed`, and `awk` commands we ran on the bed file in the previous section, but using `zcat` and the pipe so that you avoid decompressing the bed file.


3. Try out interesting combinations of commands using piping. Can you combine `head` and `tail` to extract lines 101 to 201 from the bed file? Can you string together multiple `sed` commands to change all I numerals to "-one", all Vs to "-five", and all Xs to "-ten"?


4. Which chromosome is left after this series of grep filter commands: `grep "X" [bed_file] | grep "V" | grep -v "I"`? (Think about it before running to check your guess!)

![title](images/slide7.png)

Let's clarify a little about how unix commands are understood. The program that understands your unix commands called a "shell", and "bash" is the name of a shell. There are many different kinds of shells, and different commands are slightly different depending on the shell that is being run, but bash is pretty standard.

As an exercise, double check that the bash shell is being run in your terminal. To do this, we can look at what is stored in the variable `$SHELL` by running `echo $SHELL`.

How do we interpret `/bin/bash`? We can see that it is an absolute path (because it starts with `/`, and that `bash` is located inside the folder `bin` which is under the root directory. "bin" is an abbreviation for "binaries"; "binary" files refers to the form that runable programs often take. So "/bin/bash" refers to the "bash" program stored in the "bin" folder.



When the shell is told to run a program (like "echo"), how does the shell know where to find it? This is where the `$PATH` environment variable comes in. You can look at what's in your `$PATH` using `echo $PATH`.

The `$PATH` variable stores the names of a number of directories, each separated by a colon. When you enter a command in the terminal, the shell looks at each of these directories in `$PATH` in turn, checking if a runnable file (also called an "executable") exists in any of those directories and has the same name as the command you typed. Once it finds such an executable, it stops looking and executes it.

This is true for all the commands we have learned to run so far, such as `ls` and `cat`. You can look inside `/bin` and the other directories in `$PATH` to find where the file for each of these commands lives.

If you ever aren't sure where a particular command lives, you can retrieve the absolute path for it using `which`. Try it out with `which ls` and other commands. You can even run `which which`!

### Exercise

A colleague of yours has installed one version of a program. However, when try to launch the program, the shell keeps launching a different version of the program than what they installed. What might the problem be? How could you check whether this is the problem?

## Helpful References ##

Recommended Unix tutorial: http://www.ee.surrey.ac.uk/Teaching/Unix/

Here's a more detailed tutorial from tutorialspoint:
http://www.tutorialspoint.com/unix/index.htm

Another resource geared towards bioinformatics: http://manuals.bioinformatics.ucr.edu/home/linux‐basics

Reference for commonly useful commands: https://sites.google.com/site/anshulkundaje/inotes/programming/shell-scripts

Learning shell programming: http://www.learnshell.org/

Debugging shell scripts: http://www.shellcheck.net/