# Working with files and directories

::: {.callout-note}
## Prior experience

You can skip this section and proceed directly to the exercises if you are already familiar with basic commands like `cp`, `mv`, `less` and `nano`.
:::

::: {.callout-tip}
Remember that we have provided a list of helpful tips and hints in the appendix: @sec-unix-tips.
:::

## Examining files

### `cat`: viewing short files

The most basic command for viewing a file is the `cat <file>` command. It simply prints all of the contents of a file to the screen (= _standard output_). 



```{bash}
$ cd training/unix-demo
$ cat short.txt
CHAPTER I.
Down the Rabbit-Hole


Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure of
making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.
```



::: {.callout-tip collapse="true"}
## Try using `cat` on the file named `long.txt` and see what happens (Click me to expand!)

The entire file (in this case, the entirety of Moby Dick) is printed to the screen. This works, but is not very pleasant to navigate. Especially if you consider the fact that this text was only 0.03% of the size of the (rather short) human Y chromosome.
:::

While `cat` is very useful, it is clearly not suitable for large text files. Since long files are very prevalent - and not just in bioinformatics - we need an alternative. Enter the `less` command.

### `less`: viewing large files

This tool is suitable for streaming very large files, which would otherwise crash a normal text editor or program like Excel. `less` will open the contents of the file in a dedicated viewer, i.e. your terminal and prompt will be replaced by a unique interface for the `less` tool. You can exit this interface by pressing `q`. 

::: {.callout-tip collapse="true"}
## Try using the `less` command to view the contents of the truncated human Y chromosome (Click me to expand!)

Note that we truncated the Y chromosome to the first 100,000 basepairs, to keep the file size small.

```bash
$ less 3B207-2_S92_L001_R1_001.fastq.gz
```

![Opening a FASTQ file in `less`](../assets/less-fastq.png)
:::

We will learn more about FASTQ files in a later chapter. For the time being, it is enough to know that these files are very large and very common in genomics; they are the raw output of DNA/RNA sequencing and store the read fragments.

::: {.callout-note collapse="false"}
## Navigating inside `less`

- Use arrow keys to navigate. `space` and `b` can also be used to go forward and backwards, and `page up`/`page down` work as well.
- Press `g` to jump to the start of the file
- Press `G` (`shift + g`) to jump to the end of the file
- Type `/` followed by a string to search forward (`?[string]` for backwards search) and `n`/`N` for the previous/next match
- To exit, press `Q`
- Use the help command for more info: `less --help`
:::

### `head` and `tail`: viewing the start or end of files

Sometimes we are not interested in viewing the entire file, but just the first few or last lines. The commands `head` and `tail` were created for exactly this use case. The basic usage is simply `head <filepath>`, but there again are a few optional flags that can alter the default behaviour.


```bash
$ head 3B207-2_S92_L001_R1_001.fastq
@M00984:485:000000000-KR655:1:1101:9123:1645 1:N:0:92
AAAGAGAATATATAAAGCCTTTTTCATTTTTTTTCGTTTTATCTTATCATCCTTATTAATTATATATATTATTAGTGATTGTATTTTTATTTTCCCTTTTGTAATTATATTAATATATTTTTTTGTTTTCAAAAGTTTTTCG
+
-ACCCFDGGFFFFDFAEEFGGAFFD,CEEFCCBC<+;C,8,<;C@,C@,C@@@C,;C,,<C,C<F,<6A,,<,,,,,,::,:,:AFFF8F??@@959?=A,9,4AB9A,CA,,9,C,9@@EF@C6DEE8=,,,,,,99@=8+
@M00984:485:000000000-KR655:1:1101:19398:1658 1:N:0:92
CATATTCTTCTTTTTTTTCATGTATATGACTACTATTATAATTATATGTAGATTTACTTTTATGTTCCTGGAAACATATTTTTTTATTTGTATTTTCATCTACGTTCTT
+
@CCCCGGGGGGGGGGGGGGGGGGGGGGCCEE96C,CE,C,,<C,C,C,C,6,,<C,<CFEF9EFFF@EE,,,,,;,C,CEFEFFE<FFEAF,AE,:5,A@ECE:FFFFF
@M00984:485:000000000-KR655:1:1101:10191:1682 1:N:0:92
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
```


| Command                   | Result                                                    |
|---------------------------|-----------------------------------------------------------|
|     head file      |     Print the first 10 lines of a file    |
|     tail file      |     Print the last 10 lines of a file    |
|     head -n # file      |     Display the first # lines of a file    |
|     tail (-n #) file    |     Display the final # lines of a file    |

### `wc`: counting lines

A final command for extracting information from a text file is the `wc` command, which can be used to count the number of lines, words and bytes (file size) of a file. By default, it prints all of this information, but by providing the `-l` flag, you can tell the command to only return the number of lines.

```bash
$ wc 3B207-2_S92_L001_R1_001.fastq
460600   575750 46746025 3B207-2_S92_L001_R1_001.fastq

$ wc -l 3B207-2_S92_L001_R1_001.fastq
460600 3B207-2_S92_L001_R1_001.fastq
```

::: {.callout-tip collapse="true"}
## How many reads are there in this FASTQ file? (Click me to expand!)

Each read in a FASTQ file consists of four lines (see @sec-fastq). Therefor, we can simply divide the output of `wc -l` by four to figure out the number of reads. In this case: $$ {460,600 \over 4} = 115,150$$ reads.

|     Line    |     Description                                                                                                                               |
|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
|     1       |     identifier: always starts with ‘@’ and contains information about the read                                                              |
|     2       |     The sequence of nucleotides making up the read    |
|     3       |     Always begins with a ‘+’ and sometimes repeats the identifier                                                                           |
|     4       |     Contains a string of ASCII characters that represent the quality score for each base (i.e. it has the exact same length as line 2)    |

:::


## Editing files



### Navigating inside `nano`

- Your mouse pointer won’t work. Use arrow keys to move instead.
- To save, press `ctrl+o`, followed by return/enter.
- To exit, press `ctrl+x`, followed by return/enter.

![The `nano` text editor](../assets/nano.png)

## Moving things around

### Copying files and directories

### Moving or renaming files and directories

### Creating directories

### Removing things

The `rm` command (_remove_) is used to delete files and directories. Be warned though, once deleted, things are really gone. There is no recycle bin or trash folder where you can restore deleted items!

```bash
# for files:
rm <file path>

# for directories
rm -r <directory path>
```

For files, this works as expected, but for directories you need to provide the `-r` flag (or `--recursive`). This tells Unix to remove the directory recursively, i.e. all of its contents need to be removed as well. If you don't use this option, you will see the following warning:

```bash
rm directory
rm: cannot remove 'test/': Is a directory
```

Sometimes, files will be protected and you will get another warning message when you try to remove them. If you are really sure that you want to delete them, you can type `y` and press enter. Alternatively, you can cancel the operation (by entering `n` or by pressing `ctrl+c`) and try again, but this time providing the `-f/--force` option.

```bash
# create a new empty file
$ touch protected-file
# change its permissions so that it is protected against writing and deleting (see appendix for more info on file permissions)
$ chmod a-w protected-file
# try to remove it
$ rm protected-file
rm: remove write-protected regular empty file 'protected-file'? n
# use the --force flag
$ rm -f protected-file
```

::: {.callout-warning}
## Watch out...

Be careful while learning your way around the command-line. The Unix shell will do _exactly_ what you tell it to, often without hesitation or asking for confirmation. This means that you might accidentally move, overwrite or delete files without intending to do so. For example, when creating, copying or moving files, they can overwrite existing ones if you give them the same name. Similarly, when a file is deleted, it will be removed completely, without first passing by a recycle bin.

**No matter how much experience you have, it is a good idea to remain cautious when performing these types of operations.**

For the purposes of learning, if you are using your own device instead of a cloud environment, we recommend that you work in a dedicated playground directory or even create a new user profile to be extra safe. And like always, backups of your important files are invaluable regardless of what you are doing.
:::

## Summary

::: {.callout-tip collapse="false"}
## Overview of concepts and commands

| Command                               | Result                                                       |
|---------------------------------------|--------------------------------------------------------------|
| `cat <path/to/file>`                  | print the content of files                                   |
| `less <path/to/file>`                 | read the contents of (large) files in a special viewer       |
| `head/tail <path/to/file>`            | view the first or last lines of a file                       |
| `wc <path/to/file`                    | display the line/word/byte count of a file                   |
| `nano <path/to/file>`                 | open a file (or create a new file) in the `nano` text editor |
| `cp [-r] <source> <destination>`      | copy a file/directory to a new location                      |
| `mv [-r] <source> <destination>`      | move a file/directory to a new location (or rename it)       |
| `rm [-r] <path/to/file_or_directory>` | permanently remove a file/directory                          |
| `mkdir <path/to/directory>`           | create a new directory                                       |

:::


- https://conmeehan.github.io/UNIXtutorial#creating-directories-and-files
- https://astrobiomike.github.io/unix/working-with-files-and-dirs
- https://swcarpentry.github.io/shell-novice/03-create.html
- https://rnnh.github.io/bioinfo-notebook/docs/cl_intro.html#making-directories-with-mkdir

TODO
- plain text
- text editors (exiting vim)
