# Working directly with files and directories via CLI

I hope everyone is feeling more comfortable with coding and using the command-line interface! Now, let's explore how we can interact directly with plain text files, such as .txt, .tsv, and .csv, using the command line. This skill is especially valuable when working with genomic data, where you often need to count the number of FASTA sequences, inspect file contents, or even make quick edits. The command line provides powerful tools to accomplish all of these tasks efficiently—let’s dive in! 

**I want to note that many of these commands will not work on files that are not plain text. Some examples of common file types that are not plain-text files would be “.docx”, “.pdf”, or “.xlsx”. This is because those file formats contain special types of compression and formatting information that are only interpretable by programs specifically designed to work with them.**

### Viewing file contents
Now that we are experts in navigating around our directories, let's make our way to the **unix_intro** folder. 

Say I want to see the entire contents of the ***example.txt*** file in our working directory? To do this, I would run the **```cat```** command, which will print and display the entire file content. Let's try it below:

In [7]:
cat example.txt

This is line 1
This is a pretend data file
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
There may be fasta sequences here
This is line 8
This is line 9
This is line 10
This is line 11
This is line 12
This is line 13
This is line 14
This is line 15
This is line 16
This is line 17
This is line 18
These could be fasta sequences
This is line 19
This is line 20
This is line 21
This is line 22
This is line 23
This is line 24
This is line 25
This is line 26
This is line 27
This is line 28
This is line 29
This is line 30
This is line 31
This is line 32
This is line 33
This is line 34
This is line 35
This is line 36
This is line 37
This is line 38
This is line 39
This is line 40
This is line 41
This is line 42
This is line 43
This is line 44
This is line 45
I have data in here
This is line 46
This is line 47
This is line 48
This is line 49
This is line 50
This is line 51
This is line 52
This is line 53
This is line 54
This is line 55
This is line 56


In using the ```cat``` command, you will see the entire contents of the text file. Let's number the file lines using the ```-n``` flag.

In [5]:
cat -n example.txt

     1	This is line 1
     2	This is a pretend data file
     3	This is line 2
     4	This is line 3
     5	This is line 4
     6	This is line 5
     7	This is line 6
     8	This is line 7
     9	There may be fasta sequences here
    10	This is line 8
    11	This is line 9
    12	This is line 10
    13	This is line 11
    14	This is line 12
    15	This is line 13
    16	This is line 14
    17	This is line 15
    18	This is line 16
    19	This is line 17
    20	This is line 18
    21	These could be fasta sequences
    22	This is line 19
    23	This is line 20
    24	This is line 21
    25	This is line 22
    26	This is line 23
    27	This is line 24
    28	This is line 25
    29	This is line 26
    30	This is line 27
    31	This is line 28
    32	This is line 29
    33	This is line 30
    34	This is line 31
    35	This is line 32
    36	This is line 33
    37	This is line 34
    38	This is line 35
    39	This is line 36
    40	This is line 37
    41	This is line 38
    42	This is line 3

Now we have numbered all of the lines, so we know exactly how many lines/rows we have in our file. 

What if we just wanted to see the first few lines of our file? Then, we would use the **```head```** command. Let's try it out:

In [15]:
head example.txt

SyntaxError: invalid syntax (548798623.py, line 1)

What about if we wanted to see the last few lines? We would use the **```tail```** command.

In [16]:
tail example.txt

SyntaxError: invalid syntax (834086442.py, line 1)

By default, ```head``` and ```tail``` will show the first and last 10 lines of a file, respectively. If you want to see a specific number of lines, we can specify using the ```-n``` flag. 
Let's say I only want to see the first 5 lines of the **example.txt** file. We would run:

In [17]:
head -n 5 example.txt

SyntaxError: invalid syntax (1130201577.py, line 1)

And what about the last 5 lines?

In [18]:
tail -n 5 example.txt

SyntaxError: invalid syntax (4082300057.py, line 1)

**Great work everyone! These are some of the most basic commands for looking into your files, and some of the commands we use most often in bioinformatics to inspect our data.**

Let's say we have a very large file we want to inspect. If we tries to use the ```cat``` command, it may take up a lot of space in our CLI, or we may crash our environment trying to load it in. In this case, we may use something like the **```less```** or **```more```** commands to inspect the files/

##### ```less``` command
The less command allows you to view a file page by page. Let's try it out on the **example.txt** file first (if the notebook spits out the entire file, please move to a new terminal session to visualize). 

In [20]:
less example.txt

This is line 1
This is a pretend data file
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
There may be fasta sequences here
This is line 8
This is line 9
This is line 10
This is line 11
This is line 12
This is line 13
This is line 14
This is line 15
This is line 16
This is line 17
This is line 18
These could be fasta sequences
This is line 19
This is line 20
This is line 21
This is line 22
This is line 23
This is line 24
This is line 25
This is line 26
This is line 27
This is line 28
This is line 29
This is line 30
This is line 31
This is line 32
This is line 33
This is line 34
This is line 35
This is line 36
This is line 37
This is line 38
This is line 39
This is line 40
This is line 41
This is line 42
This is line 43
This is line 44
This is line 45
I have data in here
This is line 46
This is line 47
This is line 48
This is line 49
This is line 50
This is line 51
This is line 52
This is line 53
This is line 54
This is line 55
This is line 56


What you should be seeing are the first ~50ish or so lines of our **example.txt** file. At the bottom of the page, you will see that you have a blinking black square. Hit the down arrow on your keyboard one time. Now hit the up arrow. 

As you can see, we can navigate around our files this way. We can also scroll on our mouse to move among the data in our file. This is a great way to look at data in our files, without having to:
* Open it in GUI (when we may not even have an application on our computer that can read the file)
* Load in all of the data using ```cat```

After using the `less` command, you might notice that your terminal doesn’t immediately return to a new command line prompt. This is because `less` allows you to scroll through the file interactively, and it remains open until you explicitly close it. To exit `less` and return to the command prompt, simply press the **"q"** key (for "quit"). That’s it—just press **q**!

I want to show you what the ```less``` command looks like on a larger file. To do this, lets navigate to the **~/six_commands** directory and try it out on the **example_gene_annotations.csv** file. 

In [2]:
less example_gene_annotations.csv

gene_ID,genome,KO_ID,KO_annotation
1,CC9311,K02338,DPO3B; DNA polymerase III subunit beta [EC:2.7.7.7]
2,CC9311,NA,NA
3,CC9311,K01952,purL; phosphoribosylformylglycinamidine synthase [EC:6.3.5.3]
4,CC9311,K00764,purF; amidophosphoribosyltransferase [EC:2.4.2.14]
5,CC9311,K02469,gyrA; DNA gyrase subunit A [EC:5.99.1.3]
6,CC9311,NA,NA
7,CC9311,K18979,queG; epoxyqueuosine reductase [EC:1.17.99.6]
8,CC9311,NA,NA
9,CC9311,NA,NA
10,CC9311,K03625,nusB; N utilization substance protein B
11,CC9311,K03110,ftsY; fused signal recognition particle receptor
12,CC9311,K07315,rsbU_P; phosphoserine phosphatase RsbU/P [EC:3.1.3.3]
13,CC9311,K01755,argH; argininosuccinate lyase [EC:4.3.2.1]
14,CC9311,NA,NA
15,CC9311,K05539,dusA; tRNA-dihydrouridine synthase A [EC:1.-.-.-]
16,CC9311,K07305,msrB; peptide-methionine (R)-S-oxide reductase [EC:1.8.4.12]
17,CC9311,K07007,uncharacterized protein
18,CC9311,NA,NA
19,CC9311,K02653,pilC; type IV pilus assembly protein PilC
20,CC9311,K02669,pilT; twitching motility 

##### ```more``` command
The ```more``` is very similar to the ```less``` command, but it really only allows for forward navigation. Feel free to try it out in the command line, but since it is limited compared to ```less```, I tend not to use it. 