# Introduction to Unix: the Unix Filesystem

### Questions:
- What is a command shell and why would I use one?
- How can I move around in a computer?
- How can I see what files and directories I have?
- How can I specify the location of a file or directory on my computer?

### Objectives:
- Describe key reasons for learning shell.
- Navigate your file system using the command line.
- Access and read help files for `bash` programs and use help files to identify useful command options.


### Scetion 1: What is a shell and why should I care?

A *shell* is a computer program that presents a command line interface
which allows you to control your computer using commands entered
with a keyboard instead of controlling graphical user interfaces
(GUIs) with a mouse/keyboard combination.

There are many reasons to learn about the shell.

* Many bioinformatics tools can only be used through a command line interface, or 
have extra capabilities in the command line version that are not available in the GUI.
This is true, for example, of BLAST, which offers many advanced functions only accessible
to users who know how to use a shell.  
* The shell makes your work less boring. In bioinformatics you often need to do
the same set of tasks with a large number of files. Learning the shell will allow you to
automate those repetitive tasks and leave you free to do more exciting things.  
* The shell makes your work less error-prone. When humans do the same thing a hundred different times
(or even ten times), they're likely to make a mistake. Your computer can do the same thing a thousand times
with no mistakes.  
* The shell makes your work more reproducible. When you carry out your work in the command-line 
(rather than a GUI), your computer keeps a record of every step that you've carried out, which you can use 
to re-do your work when you need to. It also gives you a way to communicate unambiguously what you've done, 
so that others can check your work or apply your process to new data.  
* Many bioinformatic tasks require large amounts of computing power and can't realistically be run on your
own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed
through a shell.

In this lesson you will learn how to use the command line interface to move around in your file system. 

### Section 2: Navigating Files and Directories

### Questions:
- How can I perform operations on files outside of my working directory?
- What are some navigational shortcuts I can use to make my work more efficient?
### Objectives:
- Use a single command to navigate multiple steps in your directory structure, including moving backwards (one level up).
- Perform operations on files in directories outside your working directory.
- Work with hidden directories and hidden files.
- Interconvert between absolute and relative paths.
- Employ navigational shortcuts to move around your file system.
### Keypoints:
- The `/`, `~`, and `..` characters represent important navigational shortcuts.
- Hidden files and directories start with `.` and can be viewed using `ls -a`.
- Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.

#### Navigating your file system

The part of the operating system responsible for managing files and directories is called the **file system**.

It organizes our data into files, which hold information,
and directories (also called "folders"), which hold files or other directories.

Several commands are frequently used to create, inspect, rename, and delete files and directories.

The dollar sign is a **prompt**, which shows us that the shell is waiting for input;
your shell may use a different character as a prompt and may add information before the prompt. When typing commands, either from these lessons or from other sources, do not type the prompt, only the commands that follow it. In this lesson we will use the dollar sign to indicate the prompt. 

```
$
```

You won't see a prompt if you are using a cell in a Jupyter notebook, but you will if you are logged directly into a remote server. 

Let's find out where we are by running a command called `pwd`
(which stands for "print working directory").
At any moment, our **current working directory**
is our current default directory,
i.e.,
the directory that the computer assumes we want to run commands in
unless we explicitly specify something else.
Here, the computer's response is `/home/u20/bhurwitz`, which is my home directory.

In [None]:
'''
Type the command below, and run the cell.
!pwd
'''

### Section 3: Using the listing command

Let's look at how our file system is organized. We can see what files and subdirectories are in this directory by running `ls`, which stands for "listing"

`ls` prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns. 

In [None]:
'''
Type the command below, and run the cell.
!ls
'''

#### Using the /xdisk for storage

Your home directory has a limited amount of space. And because our metagenomics files are large, we are going to be working in the `/xdisk/bhurwitz/bh_class/<your_netid>` directory, where you will create new subdirectories throughout this class.  

The command to change locations in our file system is `cd` followed by a directory name to change our working directory. `cd` stands for "change directory".

Let's say we want to navigate to the `/xdisk/bhurwitz/bh_class/<your_netid>` directory we saw above (where you swap out <your_netid> with your own netid).  We can use the following command to get there:

In [None]:
# set a variable for your netid
# Replace "MY_NETID" with your actual netid
# Notice that the `cd` command is a magic command in Jupyter Notebooks and starts with %.
netid = "MY_NETID"
work_dir = "/xdisk/bhurwitz/bh_class/" + netid
%cd $work_dir

In [None]:
'''
Type the command below, and run the cell.
!ls
'''


#### So what do you get?

You should see something like this:

```
assignments  exercises
```

We can make the `ls` output more comprehensible by using the **flag** `-F`, which tells `ls` to add a trailing `/` to the names of directories, or other symbols to identify the type of elements in the directory:

In [None]:
'''
Type the command below, and run the cell.
!ls -F
'''

#### What is a directory?

You should see something like this:

```
assignments/  exercises/
```

Anything with a "/" after it is a directory. Things with a "*" after them are programs. If there are no decorations, it's a file.

To understand a little better how to move between folders, let's look at the following image:

<a href="../fig/directory_structure.png">
  <img src="../fig/directory_structure.png" width="870" height="631" alt="Folder organization diagram showing a parent directory called dc_workshop, with tree subdirectories called data, mags, and taxonomy. Insida data there is another one called untrimmed_fastq, and inside taxonomy there is another one called mags_taxonomy."/>
</a>

Here we can see a diagram of how the folders are arranged one inside another. In this way, if we think about moving from your directory to the assignments folder, the path must go as they are ordered: `cd <your_net_id>/exercises`

### Section 4: Using manual pages

No one can possibly learn all of these arguments, that's why the manual page is for. You can (and should) refer to the manual page or other help files as needed.

`ls` has lots of other options. To find out what they are, we can type:

```
$ man ls
```

Some manual files are very long. You can scroll through the file using
your keyboard's down arrow or use the <kbd>Space</kbd> key to go forward one page and the <kbd>b</kbd> key to go backwards one page. When you are done reading, hit <kbd>q</kbd> to quit.

In [None]:
'''
Type the command below, and run the cell.
!man ls
'''

#### Exercise 1: Extra information with `ls -l`
Use the `-l` option for the `ls` command to display more information for each item 
in the directory. What is one piece of additional information this long format
gives you that you don't see with the bare `ls` command?

In [None]:
'''
Type your commands below, and run the cell.
'''

### Section 5: unzipping files

OK, let's get started with working with some real data. Make sure you have a shell open, and then let's go into the `/xdisk/bhurwitz/bh_class/**your_netid**/exercises/data/untrimmed_fastq` directory and see what is in there.

In [None]:
'''
Type the commands below, and run the cell.
%cd /xdisk/bhurwitz/bh_class/$netid/exercises/data/untrimmed_fastq
!ls
'''

#### How do I unzip files in Unix?

You should see:

```
JC1A_R1.fastq.gz  JC1A_R2.fastq.gz  JP4D_R1.fastq.gz  JP4D_R2.fastq.gz  TruSeq3-PE.fa
```

This directory contains a file `TruSeq3-PE.fa`, that we will use in a later lesson and four files with `.fastq.gz` extensions. 

FASTQ is a format for storing information about sequencing reads and their quality. GZ is an archive file compressed. We will be learning more about FASTQ files in a later lesson. 

These data come in a compressed format, which is why there is a `.gz` at the end of the files. This makes it faster to transfer, and allows it to take up less space on our computer.

Let's use `gunzip` to decompress the files in your data directory so we can look at the FASTQ format. Notice that I can unzip all of the fastq files at once using the '*' character to indicate the pattern.

In [None]:
'''
Type these commands below, and run the cell.
!gunzip *fastq.gz
!ls
'''

#### What do the files look like after you unzip them?

You should see the following:

```
JC1A_R1.fastq  JC1A_R2.fastq  JP4D_R1.fastq  JP4D_R2.fastq  TruSeq3-PE.fa
```

### Section 6: Moving around the file system

We've learned how to use `pwd` to find our current location within our file system. We've also learned how to use `cd` to change locations and `ls` to list the contents of a directory. Now we're going to learn some additional commands for moving around within our file system.

Use the commands we've learned so far to navigate to the `exercises/data/untrimmed_fastq` directory, if you're not already there. 

In [None]:
'''
Type the commands below, and run the cell.
%cd /xdisk/bhurwitz/bh_class/$netid
%cd exercises
%cd data
%cd untrimmed_fastq
'''

#### What if we want to move back up and out of this directory and to our top level directory? 

Can we type `cd exercises`? Try it and see what happens.

In [None]:
'''
Type the command below, and run the cell.
%cd exercises
'''

#### Where am I?

Oh no! You likely got an error message like this...

```
-bash: cd: exercises: No such file or directory
```

Your computer looked for a directory or file called `exercises` within the directory you were already in. It didn't know you wanted to look at a directory level above the one you were located in. 

We have a special command to tell the computer to move us back or up one directory level. 

```
$ cd ..
```

Now we can use `pwd` to make sure that we are in the directory we intended to navigate to, and `ls` to check that the contents of the directory are correct.

```
$ pwd
```

```
/xdisk/bhurwitz/bh_class/**your_netid**/exercises/data
```

From this output, we can see that `..` did indeed took us back one level in our file system. 

You can chain these together to move several levels:

```
$ cd ../..
```

Where are you now?


In [None]:
'''
Type the commands below, and run the cell.
%cd ../..
!pwd
'''

#### Exercise 2: Finding hidden directories

First navigate to the `bh_class` directory. Remember how to get there? You can always use `pwd` to see the current directory you are in. 

There is a hidden directory within the `bh_class` directory. Explore the options for `ls` to find out how to see hidden directories. List the contents of the directory and identify the name of the text file in that directory.

Hint: hidden files and folders in Unix start with `.`, for example `.my_hidden_directory`

In [None]:
'''
Type your commands here to try to find the hidden file (Check out options for the ls command),
and then change directories into the hidden directory
'''

### Section 7: File permissions

Another option that the `ls` command has, is to check the permissions of a file. If we are organized and we have a folder with the backup of all our files, we can rescue files that we have accidentally deleted. But just because we have two copies doesn't make us safe! We can still accidentally delete or overwrite both copies. To make sure we can't accidentally mess up a file, we're going to change the permissions on the file so that we're only allowed to read (i.e. view) the file, not write to it (i.e. make new changes).

View the current permissions on a file using the `-l` (long) flag for the `ls` command. 

```
$ ls -l
```

```
total 4
-rw-r--r--. 1 bhurwitz bhurwitz 47 Aug 24 10:05 youfoundit.txt
```

The first part of the output for the `-l` flag gives you information about the file's current permissions. There are ten slots in the permissions list. The first character in this list is related to file type, not permissions, so we'll ignore it for now. The next three characters relate to the permissions that the file owner has, the next three relate to the permissions for group members, and the final three characters specify what other users outside of your group can do with the file. We're going to concentrate on the three positions that deal with your permissions (as the file owner). 
<a href="../fig/02-02-01.svg">
  <img src="../fig/02-02-01.svg" width="300" height="300" alt="The file permission parameters described in the text (-rw-rw-r--) showing which of the slots correspond to who has permissions, and a legend showing the meaning of the letters."/>
</a>

Here the three positions that relate to the file owner are `rw-`. The `r` means that you have permission to read the file, the `w` indicates that you have permission to write to (i.e. make changes to) the file, and the third position is a `-`, indicating that you don't have permission to carry out the ability encoded by that space (this is the space where `x` or executable ability is stored.

#### How can I change permissions on a file?

Our goal for now is to change permissions on the "youfoundit.txt" file so that you no longer have `w` or write permissions. We can do this using the `chmod` (change mode) command and subtracting (`-`) the write permission `-w`. 

But, before we can do that, we need to make a copy of the file in your user directory. I currently own the file (bhurwitz).

In [None]:
'''
Type the commands below, and run the cell
!cp youfoundit.txt ../*your_netid**
%cd ../*your_netid**
!chmod -w youfoundit.txt 
!ls -l 
'''

#### What is this output?

You should see something like this:

```
total 0
drwxrwsr-x. 3 your_netid bh_class 512 Aug 22 16:20 assignments
drwxrwsr-x. 2 your_netid bh_class 512 Aug 22 16:20 exercises
-r--r--r-- 1  your_netid bh_class 47 Aug 22 16:20 youfoundit.txt
```

This shows all of the permissions for the files in the directory you are currently in.

### Section 8: Absolute vs. relative paths

The `cd` command takes an argument which is a directory name. Directories can be specified using either a *relative* path or a full *absolute* path. The directories on the computer are arranged into a hierarchy. The full path tells you where a directory is in that hierarchy. You should be in your class directory, enter the `pwd` command to find out.

In [None]:
'''
Write your commands below:
'''

#### What is a full path in Unix?

You will see: 

```
/xdisk/bhurwitz/bh_class/your_netid
```

This is the full name of your class directory. This tells you that you
are in a directory called `your_netid`, which sits inside a directory called `bh_class` which sits inside the `/xdisk/bhurwitz` directory. The very top of the hierarchy is a directory called `/` which is usually referred to as the *root directory*. 

Let's make a .hidden directory and move the youfoundit.txt file into that directory.

In [None]:
'''
Type the commands below:
!mkdir .hidden
!mv youfoundit.txt .hidden
%cd /xdisk/bhurwitz/bh_class/$netid/.hidden
'''

#### Nice, I just got to the .hidden directory by specifying the entire path.

This jumps you to the `.hidden` directory. Can you do this in two steps instead? 
Now go back to the bh_class directory, and then in a second step go to the hidden directory 

In [None]:
'''
Type the commands below:
%cd /xdisk/bhurwitz/bh_class
%cd $netid/.hidden
'''


#### What is the relative path in Unix?

These two commands have the same effect, they both take us to the `.hidden` directory. The first one uses the absolute path, giving the full address from the home directory. The second uses a relative path, giving only the address from the working directory. A full path always starts with a `/`. A relative path does not.

A relative path is like getting directions from someone on the street. They tell you to "go right at the stop sign, and then turn left on Main Street". That works great if you're standing there together, but not so well if you're trying to tell someone how to get there from another country. A full path is like GPS coordinates. It tells you exactly where something is no matter where you are right now.

You can usually use either a full path or a relative path depending on what is most convenient. If we are in the home directory, it is more convenient to enter the relative path since it involves less typing.

Over time, it will become easier for you to keep a mental note of the structure of the directories that you are using and how to quickly navigate between them.

#### Excercise 3: Relative path resolution

Using the filesystem diagram below, if `pwd` displays `/Users/thing`,
Which one of the following will `ls ../backup` display?
 
1.  `../backup: No such file or directory`
2.  `2012-12-01 2013-01-08 2013-01-27`
3.  `2012-12-01/ 2013-01-08/ 2013-01-27/`
4.  `original pnas_final pnas_sub`
 
<img src="../fig/filesystem-challenge.svg" alt="Filesystem diagram with folders: Users/thing/backup/2012-12-02, Users/thing/backup/2012-01-08, Users/thing/backup/2013-01-27, Users/backup/original, Users/backup/pnas_final, and Users/backup/pnas_sub" />


### Summary

We now know how to move around our file system using the command line.
This gives us an advantage over interacting with the file system through a Graphical User Interface (GUI) as it allows us to work on a remote server, carry out the same set of operations on a large number of files quickly, and opens up many opportunities for using bioinformatics software that is only available in command line versions. 

In the next few exercises, we'll be expanding on these skills and seeing how using the command line shell enables us to make our workflow more efficient and reproducible.

### Key Points
- The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.
- Useful commands for navigating your file system include: `ls`, `pwd`, and `cd`.
- Most commands take options (flags) which begin with a `-`.
- Tab completion can reduce errors from mistyping and make work more efficient in the shell.

-----