# The Unix Shell

This course will have two parts. 

__Part I__ will discuss basic concepts such as 

* Introduction to the shell
* Navigating the file system (pwd, ls, cd)
* Moving files and directories (mv)
* Copying files and directories (cp)
* Deleting files and directories (rm)

__Part II__ will have more advanced topics such as 

* Working with files and searches (wc)
* Redirecting output (>,>>)
* Pipes (|, sort, tail)
* Finding Things (grep, find)

A lot of this course is based off of the material from the Software Carpentry website: [The Unix Shell](https://swcarpentry.github.io/shell-novice/07-find.html). This tutorial is meant for you to walkthorugh by typing the commands in your terminal. We will also be working with a dataset which is the [shell-lesson-data](https://drive.google.com/file/d/1r3uKC19rY-cxJ7_Ia7ZYx1_berJ-mZl-/view?usp=drive_link). Download the zip file from the link and unzip the file in the Documents directory. 

## __Part I__

### 1) Introduction to Unix Shell
#### a) Shell Basics

Unix is a powerful operating system (OS) that is used around the world. There are many OS based on Unix such as BSD, Mac OS and Linux among many others. These OSs use a Graphical User Interface (GUI) which can give instructions to the OS through visualization and using menu-driven interactions through a computer mouse.

The GUI allows an intuitive method for using a computer and can help with many tasks. However, when required to perform multiple analytic based the GUI does not work as well as the __Unix Shell__. The Unix Shell is a __Command-Line Interface__ and scripting language that can be accessed through a terminal. There are different types of CLIs

* __Bash__ - Default for most Linux systems and what we will use for this tutorial
* Zsh - Default for Mac OS (similar to Bash)
* Csh
* Bourne Shell

The CLI is powerful as you can customize the interface as well as run various programs either locally or on remote servers through **S**ecure **SH**ell (SSH). The below image is a typical example of a terminal

<img src="_static/shell_terminal.jpg" width="500">

In our examples we have created an account called __sio__ on a computer named __localhost__. The $ is the prompt and users can begin entering commands after the prompt. The features of the terminal can be customized through your bash profile. As mentioned previously, we will be using the Bash shell for our examples. To determine which shell you are using you can run the following command.

```
sio@localhost $ echo $0
```

For Mac OS users, the default CLI is zsh. If you are using a zsh shell then that is fine and there are no additional changes needed as zsh is very similar to bash. We will also later go into the __echo__ program as well as what the __$__ is used for (not the prompt).

In [None]:
echo $0

# output
-bash

### 2)  Navigating through Files and Directories

Every Unix based OS has a hierarchical structure to organize files and folders. This is known as the __file system__. The root directory is the Top-level folder and is represted by a foward slash, ```/```. Every subfolder is known as the __child__ or __subdirectory__ of the __parent__ folder.

<img src="_static/directory_structure.jpg" width="1000">

When starting the CLI your folder will be located in ```/Users/username```. For the examples used in this tutorial it will be ```/Users/sio```. This is referred to as the __home directory. You can see which directory you are currently in by type ```pwd``` (Print Working Direcotry).


In [None]:
# input
sio@localhost $ pwd

# output
sio@localhost $ /Users/sio

To begin we will start with changing directories. You can change directories with the ```cd``` (**c**hange **d**irectory) command. Let's change the directory to where we had stored the shell-lesson data

In [None]:
cd /Users/sio/Documents/shell-lesson-data

/Users/sio/Documents/shell-lesson-data


In [21]:
pwd

'/Users/sio/Documents/shell-lesson-data'

With ```cd``` command we had changed the directory and the ```pwd``` command shows which location we are in. We can also list files in the directory with the ```ls``` command.

In [23]:
ls

[34mexercise-data[m[m/      [34mnorth-pacific-gyre[m[m/


There are several built in shortcuts that can be used with various commands. The tilde, ```~``` is a shortcut to your home directory. If you wanted to change your directory back to home you could type in

```cd ~```

To go back to the directory you were just in you could use the dash, ```-``` 

```cd -```

You can also move up to the parent directory by using the ```..`` shortcut

```cd ..  # moves up one directory```

Often times we would like to see what commands we had previously run in a session. This could be done by using the ```history``` command. 

Lastly, to clear the screen use the ```clear``` command

In [25]:
clear

[H[2J

When referring to directories and paths we often talk about the absolute path or the relative path. The absolute path is the entire path of a directory, for example 

```/Users/sio/Documents```

Would be the absolute path towards the Documents directory. If we were in the documents directory and wanted to access the shell-lesson-data then the relative path would be

```Documents/shell-lesson-data```

Besides listing what objects there are in a directory the ```ls``` command, in addition to providing options, can also list information on if the item is a file or folder, the size of the object as well as when the object was last modified

```ls -lsG```

<img src="_static/ls_meaning.jpg" width="1000">

There will be times, especially on a remote server, where you cannot copy or remove files in a specific directory. This is because of user or group permissions

<img src="_static/ls_meaning_2.jpg" width="1000">


When using ```ls``` with the ```-l``` option you will see the permission structure for the Owner, Group and Other. In the below screenshot we see the following permissions

```drwxr-xr-x 7 tgstone staff```

<img src="_static/ls_permissions.jpg" width="1000">

There will always be ten characters in this first column. The first index determines if the object is a file, ```-```, or a directory ```d```. In the above example we see that the docs object is a directory.

The next three characters are the permissions for the Owner. 

* r -**r**ead
* w - **w**rite
* x - e**x**ecute

The owner, tgstone, has read,write and execute permissions for the docs folder. 

The next three characters determine the permissions for the group. The Group is **staff** and the permissions are **r**ead and e**x**ecute. The dash indicates that they do not have permissions for the **w**.

Lastly, if the user is not the owner or in the group then they are placed in the **Other** category. The permissions for other are read and execute. The ```chmod``` command can be used to change an objects permissions but this is outside the scope of this tutorial.

Using ```ls``` with the ```-h``` option lists all hidden files. Hidden files are files or folders that begin with a ```.```

<img src='_static/ls_hidden.jpg' width="1000">

#### a) Syntax and Commands

We will now go over examples of command syntax 

```sio@localhost $ ls -lGh .```

<img src="_static/command_syntax.jpg" width="1000">

Generally with the CLI we will run the command, give additional options and then list the arguments. Some commands do not follow this form (such as ```find```) however it is important to know how we can enter the commands. The **option** list additional information we want to include with our command and the ```.``` (this is a shortcut for current directory) determines which object we are working with. The ```-lGh``` options allow us to see the directory in **l**ong format )```-l```, color code files and directories (```-G```) and list all **hidden** files (```-h```).

Often times you will see ```.``` or ```..``` used in a command
* ```.``` refers to the current directory
* ```..``` refers to the directory above

#### b) Finding more information

If we want to find what options are avaiable or how a command works we can use the ```help``` command or look at the ```man``` (**man**ual) pages.

```help pwd```

<img src="_static/help.jpg" width="1000">

```man pwd```

<img src="_static/man.jpg" width="1000">

Lastly, thanks to the advances of AI, we could also just use an LLM to get additional information on a command or how to use a command

<img src="_static/chatgpt.jpg" width="1000">

### 3) Creating, deleting, copying and moving files and folders

Now that we have the ability to view information in the file system we want to be able to manipulate objects. Some of these basic tasks can be done with a few commands

* ```mkdir``` - Create directories
* ```touch``` or 
* ```mv``` - To move files and directories
* ```cp``` - To copy files and directories
* ```rm``` - To remove files and directories (need additional options to remove a directory)
* ```rmdir``` - To remove a directory 

#### a) Creating directories

Currently you should be in the ```shell-lesson-data/exercise-data``` directory. Confirm this information with the ```pwd``` command. We can create a new folder called thesis

In [14]:
mkdir thesis

You can also create sub directories using the ```-p``` option

In [15]:
mkdir -p ../project/data ../project/result

#### b) Creating files

There are multiple options when creating a file

1) Create a new file with the ```touch``` command. The ```touch``` command is often use when you need a file for initialization. As an example, in previous versions of python the ```__init__.py``` file was required to make a module. No information was needed in the file, only that it existed

In [None]:
touch thesis/draft.txt

2) Use an editor such as nano, vim or emacs to create a file

In [None]:
nano thesis/draft.txt 

#### c) Moving files and directories

To move a file or folder to another location or rename the object, we use the ```mv``` command. If we wanted to change the name of the draft.txt file we just created to quotes.txt

__WARNING:__ If you rename a file and there is another file with the same name, it will be overwritten

In [None]:

mv thesis/draft.txt thesis/quotes.txt


If we wanted to move the quotes files to our current directory we could specify the file and write out the destination

In [None]:
mv thesis/quotes.txt .

Let's say that we created a **references.tex**, **introduction.tex** and **methods.txt** file in the wrong directory. 

In [None]:
touch references.tex introduction.tex methods.txt

To move these files to the **thesis** directory we have two options

1) Manually move the files by listing each file

```mv introduction.tex references.tex thesis```

2) Use special that describe patterns to move the files



In [None]:
mv *.tex *.txt thesis

<u>The __*__ Character</u>

The __*__ characters is called the **wildcard** (also known as a **glob**). It is used to match 0 or more characters in a file or directory. Use the .tex file as an example, the wilcard is stating that any file that has a .tex at the end will be moved to thesis. Another example is if we only wanted to move files that started with an r and ended with a .tex we could write ```m*.tex```

#### d) Copying files and directories

Similar to moving files, we can also copy files and folders into new locations. This is done with the ```cp``` command




In [None]:
cp quotes.txt thesis/quotations.txt

Notice that, in addition to copying the file, we have also renamed the file to ```quotations.txt```. To copy multiple files and folders we need to include the recursive option, ```-r```. The below command will create a backup of the thesis folder and name it ```thesis_backup```

In [22]:
cp -r thesis thesis_backup

#### e) Removing files and directories

We can now delete the quotes.txt file since it was in the wrong directory

In [None]:
rm quotes.txt

**WARNING**: The remove command does not move the file to the trash bin. It 'unlinks' the file which will mark that the space as unused on the hard drive so it can be used for other items. This means that the file cannot be recovered. There are special programs that can perform this recovery but it is often very difficult to do.

If we want to remove an empty directory we can use the ```rmdir``` command. If we need to remove a directory that contains items we use the ```rm``` command with the ```-r``` option (recursive).

In [None]:
rm -ir thesis

I have included the ```-i``` option above. Often times we may accidentally delete files in a folder that we were not aware of. The ```-i``` option prompts for confirmation of deleting each file. Now that we have covered the basics of the Unix shell we can move onto the fun stuff

## Part II - The Fun Stuff

### 1) Working with files and searches

Navigate to the ```shell-lesson-data/exercise-data/writing``` directory. Inside the writing directory is a file called **abstract.txt**. Using the ```wc``` command we can list 

1) The number of lines in the file
2) The number of words
3) How many bytes the file uses.

For the ```wc``` command the ```-l``` option only displays the number of lines and the ```-c``` option only displays the size of the file (in bytes).

In [None]:
cd writing 
wc abstract.txt

<img src="_static/abstract_wc.jpg" width="1000">

If we wanted to know the number of lines, words and sizes of each file as well as the sum for the lines, words and sizes we could use the wildcard character

In [None]:
wc *.txt

<img src="_static/abstract_wc_all.jpg" width="1000">

### 2) Redirecting output

In the above example there are only three files listed, if we had thousands of files that we were trying to determine the size we may want to use another method for outputting the data. In the CLI we can use the ```>``` characters to move the information from std output (i.e. Unix shell) to a file

In [None]:
wc *.txt > lengths.txt

Notice that nothing is displayed in the CLI. This is because we have decided to write the information to the **lengths.txt** file.

<u>Redirect Option </u>
* ```>``` willl create  the file if it doesn't exist or, if the file does exist, it will **overwrite** the file
* ```>>``` will create the file if it doesn't exist or **append** the new information to the file

We can confirm that the data was written to the **lengths.txt** file by using the ```cat``` command

In [None]:
cat lengths.txt

### 3) Using the Pipe | 

The reason why the Unix shell is so powerful is that it has the ability to chain together multiple commands. This is done using the pipe ```|```. If we wanted to sort the output of the data we could use the ```sort``` command

In [None]:
wc *.txt | sort -n 

The ```-n``` option sorts the numeric values. Given our example again, if we wanted to know which were the largest files in our directory we could use the ```tail``` command to view the bottom three entries

In [None]:
wc *.txt | sort -n | tail -n 3

### 4) Finding Things

Often times it is difficult to remember the location of files or directories on servers or computers. This is especially true when dealing with multiple iterations of a program or script. There are two powerful tools that can be used to find items within directories and text documents. 

* ```grep``` A utility that searches any given input file, selecting lines that match one or more patterns
* ```find``` A utility that recursively descends the directory tree for each path listed, evaluating the expression in terms of each file in the tree

In the current **writing** directory there is a file called **haiku.txt**. If we wanted to search for any words that have the word "not" in the line we could use ```grep```

In [None]:
grep not haiku.txt

<img src="_static/grep_not.jpg" width="1000">

We see that grep has successfully returned all the lines that have the word __not__ in it. Let's try again with using __The__ as a pattern to search for 

In [None]:
grep The haiku.txt

<img src="_static/grep_The.jpg" width = "500">

First, we see that grep also included words that have the sequence "The" in them. Additionally, we see from the below image by running ```cat haiku.txt``` that all lower case versions of the word __the__ were missed. 

<img src="_static/haiku.jpg" width ="500">

To fix this we can include additional options to perform a better search 

In [None]:
grep -win The haiku.txt

<img src="_static/grep_win.jpg" width = "650">

From above image grep is now returning what we had initially intended. This shows that

* ```grep``` by default is case-sensitive. Using the ```-i``` option allows ```grep``` to be case-insensitive
* ```grep``` will search for any sequence of a pattern. Using ```-w``` will search for the word
* We can also return line numbers with grep by using the ```-n``` option. 


If we wanted to search multiple text files in a directory as well as the subdirectories we could use the recursive option, ```-r``` with grep

In [None]:
grep -iwnr "Yesterday" .

From the above command we see that, in addition to the options and patter, we also now must include the directory we want to search. It is important to note that grep only works with ASCII (text) files and cannot search binary files. 

### 5) grep and the pipe

Just as with any other program, we can also pipe things to grep so that it can search. As an example, when users want to find a command that had previously been executed in the terminal they can use the ```history``` command with ```grep```

In [None]:
history | grep -i "cd"

The above command will list the entire history and return only the lines that have "cd" in the string. 

### 6) find: a tool for searching for directories and files

```grep``` is used to search for information within a file. The ```find``` searches the file system to match patterns associated with a file and a directory.

In [None]:
find .

<src img="_static/grep_not.jpg" width="1000">

The above command will recursively search for all files and directories in the current working directory and display them to the std output. 

To search only for directories use the ```-type d``` option

```find . -type d```

To search for files use ```-type f``` instead

```find . -type f```

To find text files you can use patterns associated with file names

```find . -type f -name "*.txt"```

Notice that find doesn't match our typical syntax for Unix shell commands. Find can also be used to display the path to a file or directory and used with other commands. As an example, if we wanted to find all text files that match the pattern **NE\*.txt** and determine how many lines the files have

In [None]:
wc -l $(find . -type f -name "NE*.txt")

In the above command the items contained within the parenthesis are executed first, then the **$** symbol treats the return value from ```find``` as a variable. This extends the capabilities of ```wc``` as we were only able to use the program with files in the curren working directory. We could also perform the same step by using the ```-exec``` option with ```find```

In [None]:
find . -type f -name "NE*.txt" -exec wc -l {} \;

We have now given a basic overview of the Unix shell and a few of the powerful features. In further lessons we will go into scripting and working with the shell using remote sessions. 