# Scripting Bash

Scripting is almost always preferable to running commands line by line for any process you will do with regularity. 
Also it improves reproducibility and reliability, if a bash process is a part of your data pipeline then you need to be able to justify, reproduce and share it in your papers.

Before we start scripting there are a few more simple bash concepts we want to look at.

## Looking in and working with files: cat, more, head, tail

### `cat`, again.

The man page of cat is excellent we used cat to read a single file. 
We can read multiple files by passing them as multiple arguments.
We can con**cat**enate files by providing two files then redirecting `>` the output to a third file.
We can modify and visualize files by modifying them using the `-` options.

Fantastic, keep this in mind it will be used later refer back to `man cat` to get the commands you need.

### `more` (or `less`)

The man of more will, more or less, shame you into using `less`. You'll see `more` used a lot because habits are hard to break. 
Both are similar `less` is better. 
We will continue with `less` but the extent of our use wont go much past what `more` aside from the ability to scroll up.

`less filename` will launch into a program that allows you to inspect the file, up and down arrows scroll the file, space jumps a windows worth of lines, return/enter scrolls one line.

To exit press `q`.

The man page will show you all of the things we can do with more to search and filter text files this won't be used here.

### `head` and `tail`

`head` and `tail` read lines from the top and bottom of a file respectively.
With `head` this is really useful for checking the headers of files, i.e. column names in CSVs and the first few data rows. 
And with `tail` if you have some program writing logs to a file, e.g logging in python, and you want to quickly check the last few logged lines tail will do this.

Both take `-n` as an argument to specify the number of lines to take off the top or bottom. 
If `-n` isn't supplied then you get 10 lines.
Check out the man pages for the other useful things they do.

### Bonus `echo`

We briefly saw echo in the `*` aside. 
It fits to talk about it here.

`echo` returns a line to the standard output, the terminal.
Looking at man shows it is short and simple command. 
It's most useful for checking what a wildcard or command would do before you run it.

`echo rm *` 

Would show all the files that rm would try and remove from the wildcard expansion.
There are more expansions where this is useful go look at regex, [helpful website here](https://regex-generator.olafneumann.org).
Regex is outside of the scope of this course.

It can also be used in scripts to print messages to the user running the script which is how we will see it used shortly. 


## Terminal based text editors

Choice of terminal based text editors can be a contentious topic with many people very tribal about their preferred editor.
This is of course a silly argument as the clearly best editor is vim.
Lets compare editors to see why this is.

## Nano

To launch type `nano`. Or `nano file` to open a specific file.

To edit the text you simply start typing.
A hint panel is at the bottom, for useful commands such as WriteOut (save), Cut Text (cut), UnCut Text (paste). 
Finally there is exit which exits the program. 

All of these commands are executed by pressing 'ctrl' then using the indicated key. O, K, U and X for the commands we listed above.

There are more commands in the panel and more that can be used a google search will find them.

## Emacs

Emacs is very niche these days.
Some people still have a fervent belief in it as the best so we include it here.
However the best way to get acquainted is to view the [docs here](https://www.gnu.org/software/emacs/).

## Vim

To launch type `vim`. Or `vim file` to open a specific file.

To start editing you need to go into `insert mode`, to do this press `i` and `-- INSERT --` appears at the bottom. 
You can now type as usual.
To stop typing press `esc`, `-- INSERT --` disappears and we are back in command mode.

To copy is to yank the command is `yy`, or `10yy` to yank 10 lines.
To cut is to delete the command is `dd`, or `10dd` to delete 10 lines.
To paste is to put the command is `p`. 
To save is to write, `:w`
To exit is to quit, `:q`.
To quit without saving you need to override `:q!`
The most common command sequence is `esc` then `:wq`, this enables command mode then writes then quits.
If you haven't provided a file name then it will error `:w my-text-file.txt` would save the file under that name.

For much much more see the [vimhelp](https://vimhelp.org/) or a [quick reference](https://vim.rtorr.com/)


### The choice of editor is down to you but you will be asked to justify it at some point by somebody.

The rest of this lesson will use vim but feel free to use whatever you choose.

## Writing Shell/Bash Scripts


### Simple script.

To get started on bash scripting we will create a script as follows.

```bash

vim simple_script.sh

```


The `.sh` extension is like a `.txt`, `.py`, `.docx`, e.t.c and just denotes this is a shell script.

vim hint: use `i` to go into insert mode

We will start our shell script by using a `shebang` it's used to inform the reader or program which program should be used to run the following commands in the file.

simple_script.sh:
```shell
#!/bin/bash
```

Then add an echo to the script to make it print a message.

simple_script.sh:
```shell
#!/bin/bash

echo "Running simple_script.sh"

```

Then exit the editor.

vim hint: `esc` `:` `wq` `enter`

The final thing we need to do is give our script execute permissions, not all files can be run directly we have to tell the system that we want them to run. 
This is good for security. To do this:

```bash
chmod u+x simple_script.sh
```

Now we can run the script by using `./script.sh`.


### Less simple script

Functionally you can think of a shell script as the same as what you were doing in the command line.
If you have used Python it's like the interpreter vs the `.py` file.
The draw is reproducibility and also autonomy.
Reproducible workflows will save you time if you have to execute the same work on raw data multiple times and allow you to share your workflow with others for review or to help them.
Autonomy means you can fire and forget useful for long tasks with multiple steps and submitting work to remote systems with a queue.

As an example we will recreate all the work we did for the treasure hunt and put it in one script. 
With this script we can rerun the hunt without typing all the commands by hand. 

scripted_hunt.sh:
```bash
#!/bin/bash

echo "make the treasure hunt folder and copy content to it"
mkdir treasure_hunt || { echo "treasure_hunt dir exits, exiting"; exit 1; }
cp -r /etc/skel/itl_treasure_hunt/ ./treasure_hunt || { echo "copy failed, exiting;"; exit 1;}
echo "copied content to treasure_hunt dir"

echo "make solution dir and move readme"
mkdir treasure_hunt_solution || { echo "treasure_hunt_solution exits, exiting"; exit 1; }
mv ./treasure_hunt/.README_FOR_SOLUTION.md ./treasure_hunt_solution/README.md || { echo "copy failed, exiting"; exit 1; }
echo "solution dir created and readme added"

echo "Clean the treasure hunt junk"
rm treasure_hunt/junk_file.txt || { echo "could not remove junk file, exiting"; exit 1; }
rm -r treasure_hunt/junk_folder/ || { echo "could not remove junk folder, exiting"; exit 1; }
echo "Junk cleaned"

echo "Run the first worked example"
cd treasure_hunt/search_in_here || { echo "1: cd failed, exit"; exit 1; }
mv c.txt ./../../treasure_hunt_solution/ || { echo "2: mv failed, exit"; exit 1; }
mv .scraps map_pieces || { echo "3 rename failed, exit"; exit 1; }
rm *.txt || { echo "4 cleanup failed, exit"; exit 1; }
echo "First worked example complete"

exit 0
```

In this script we don't use `#` (other than the shebang), a `#` like in python is a comment.
We have made the script use basic echo logging so when you run the script it has a verbose output to say what it does and this also acts as comments.

The pattern we use extensively here is `command args || {echo error; exit}`.
Noting how much power bash has we want to make sure that if a command fails we do not continue as it would be from an unknown state.
Each bash command has a exit statement.
The `||` (or) operator can be used to take this exit statement and if it is **non zero** run the code after it.
The code is in curly braces and there are two commands separated by a semicolon.
The first is an echo which is to inform the user of why/where the failure occurred.
The second is an exit statement with a generic exit code of 1.

All the commands before the or statements are the commands that recreate the workflow from the first exorcise.
Finally, we explicitly add `exit 0` to our script.
This will allow other scripts to run it and if and only if it is successful it will exit with a 0 code.


## Zip files and tar

You are likely already familiar with a compressed archive, `.zip`, `.tar.gz`, e.t.c.
Bash can handle these files with ease, using a program `tar`, use `man tar` to see the documentation.

Focusing on standard archives `[filename].tar.gz`. 
You can look up more commands to do other archives using man or a web search.

The files we will look at have two extensions the first `.tar` is the tarball or archive.
This is a single file that contains the information of all the files that have gone into it. 

`tar -cf archive.tar /path/files`

This command creates `c` an archive and `f` indicates that it needs to create the archive with a given filename.

The `.tar` can be queried using `-tvf`, `t` is list, `v` is verbose.

`tar -tvf archive.tar`

To moving to the second extension `.gz` this is the compression this extension is `gzip` but others exist and can be found with a web search.

`tar -cfz archive.tar.gz /path/files`

This creates the `.tar.gz` file that has all the content of `/path/files` but compressed into a smaller folder better for transferring.

To unpack which is more common we can simply swap `c` for `x`.

Unpacking an uncompressed archive:

`tar -xf archive.tar`

Or a compressed archive:

`tar -xzf archive.tar.gz`

finally either of these commands can take `-C` which allows you to specify the directory that the compressed archive will be expanded to for example:

`tar -xzf archive.tar.gz -C /path/to/location/`

This will take the archive and put all contained files in the directory `location`.


# Second Worked Example

This example will see you use scripts to automate your workflow. 

Copy and paste the following script into a file called `coordinate_run.sh` this should be in the `treasure_hunt_solution` directory.

treasure_hunt_solution/coordinate_run.sh:
```bash
#!/bin/bash

# This script will remove treasure_hunt, required so we always start from the same state
./cleanup.sh || { echo cleanup failed; exit 1; }

# This script will rerun the steps from the 1st worked example, but not create or move things into treasure hunt solution
./rerun_we1.sh || { echo rerun_we1 failed; exit 1; }

# This script will untar the map pieces and results in all .map.txt pieces being moved to the search_in_here directory and any without the .map.txt extension deleted.
./untar_and_clean_map.sh || { echo untar_map failed; exit 1; }

exit 0;
```

You will need to create and write the three scripts inside this yourself.

Then run `./coordinate_run.sh` and the program will run your three scripts and exit.

You should then have all the map pieces required for the next exorcise.

## Hints

<details>
<summary>Show all hints</summary>
<details>
<summary>Why wont it run?</summary>

Did you forget `chmod`?

</details>

<details>
<summary>cleanup.sh hints</summary>
<details>
<summary>Hint 1</summary>

You will need to use `rm -r`

</details>

<details>

<summary>Hint 2</summary>
You will need to use `..` to go up a level.
</details>
</details>

<details>
<summary>rerun_we1.sh hints</summary>

You need to copy and make all of the modifications we made to `treasure_hunt`.

You need to not copy/move things to `treasure_hunt_solution`.

You need to delete things that would have been copied/moved to `treasure_hunt_solution`.


</details>

<details>
<summary>untar_and_clean_map.sh hints</summary>

You need to pass tar the location of the archive and the location of the intended output

You need to make sure all files with .map.txt extension are in `search_in_here` and no others are.

<details>
<summary>Sorting Method 1</summary>

Extract into a temporary folder.

Use `*` to move all the `.map.txt` files to `search_in_here`.

Delete the temporary folder and content.

</details>

<details>
<summary>Sorting method 2</summary>

Inspect the files in the archive.

Make note of the alternative file extensions that are not .map.txt (there are 3 others)

Extract to `search_in_here`.

Use a wildcard to select and delete these files from `search_in_here`.

</details>

</details>

</details>


## Solutions

<details>
<summary>cleanup.sh solution</summary>

Very simple just get the path right.

treasure_hunt_solution/cleanup.sh:
```bash
#!/bin/bash

echo "Removing treasure_hunt directory"
rm -r ../treasure_hunt/ || { echo "cannot remove treasure_hunt"; exit 1; }
echo "Done"

exit 0;
```

</details>

<details>
<summary>rerun_we1.sh solution</summary>

Mostly just the script from earlier but with some modifications to not mess with the solution folder.

treasure_hunt_solution/rerun_we1.sh:
```bash
#!/bin/bash

echo "make the treasure hunt folder and copy content to it"
mkdir ../treasure_hunt || { echo "treasure_hunt dir exits, exiting"; exit 1; }
cp -r /etc/skel/itl_treasure_hunt/ ../treasure_hunt || { echo "copy failed, exiting;"; exit 1;}
echo "copied content to treasure_hunt dir"

echo "remove solution readme"
rm ../treasure_hunt/.README_FOR_SOLUTION.md || { echo "remove failed, exiting"; exit 1; }
echo "done"

echo "Clean the treasure hunt junk"
rm ../treasure_hunt/junk_file.txt || { echo "could not remove junk file, exiting"; exit 1; }
rm -r ../treasure_hunt/junk_folder/ || { echo "could not remove junk folder, exiting"; exit 1; }
echo "Junk cleaned"

echo "Move the map_pieces and remove text files"
cd ../treasure_hunt/search_in_here || { echo "1: cd failed, exit"; exit 1; }
mv .scraps map_pieces || { echo "2 rename failed, exit"; exit 1; }
rm *.txt || { echo "3 cleanup failed, exit"; exit 1; }
echo "done"

exit 0;
```

</details>

<details>
<summary>untar_and_clean_map.sh solution</summary>

This method benefits from being agnostic to the other file extensions. 

Temp Folder Method:
treasure_hunt_solution/untar_and_clean_map.sh:
```bash
#!/bin/bash

mkdir ../treasure_hunt/temp_map || { echo "cannot create temp folder"; exit 1;}
tar -xzf ../treasure_hunt/search_in_here/map_pieces/map_bundle.tar.gz -C ../treasure_hunt/temp_map || { echo "failed to unpack"; exit 1; }

cp ../treasure_hunt/temp_map/map_bundle/*.map.txt ../treasure_hunt/search_in_here/ || { echo "failed to copy"; exit 1; }

rm -r ../treasure_hunt/temp_map || { echo "cannot remove temp_map"; exit 1; }

```
Inspection method:

Inspect the file,

`tar -tvf ../treasure_hunt/search_in_here/map_pieces/map_bundle.tar.gz`

note that the 3 junk extensions are:

`.wet_map.txt`
`.torn_map.txt`
`.blank_page.txt`

treasure_hunt_solution/untar_and_clean_map.sh:
```bash
#!/bin/bash

tar -xzf ../treasure_hunt/search_in_here/map_pieces/map_bundle.tar.gz -C ../treasure_hunt/search_in_here || { echo "failed to unpack"; exit 1; }

rm *.wet_map.txt *.torn_map.txt *.blank_page.txt || { echo "faild to remove junk"; exit 1;}

```

</details>