# Looking inside files

A common task is to look at the contents of a file. This can be achieved using several diffrent Unix commands, `less`, `head` and `tail`. Let us consider some examples.

## less

The `less` command displays the contents of a specified file one screen at a time. To test this command, open a terminal window on the computer and type the following command followed by the enter key.

`less Styphi.gff`

The contents of the file Styphi.gff is displayed one screen at a time, to view the next screen press the space bar. As Styphi.gff is a large file this will take a while, therefore you may want to escape or exit from this command. To do this, press the q key, this kills the `less` command, and returns you to the Unix prompt. `less` can also scroll backwards if you hit the `b` key. Another useful feature is the slash key, `/`, to search for an expression in the file. Try it, search for the gene with locus tag t0038. What is the start and end position of this gene?

## head and tail

Sometimes you may just want to view the text at the beginning or the end of a file, without having to display all of the file. The `head` and `tail` commands can be used to do this.

The `head` command displays the first ten lines of a file.

To look at the beginning of the fie Styphi.gff file use:

In [None]:
head Styphi.gff

The `tail` command displays the last ten lines of a file.

To look at the end of Styphi.gff use:

In [None]:
tail Styphi.gff

The amount of the file that is displayed can be increased by adding extra arguments. To increase the number of lines viewed from 10 to 25 add the –25 argument to the command: 

In [None]:
tail -25 Styphi.gff

## Saving time

Saving time while typing may not seem important, but the longer that you spend in front of a computer, the happier you will be if you can reduce the time you spend at the keyboard.

* Pressing the up/down arrows will let you scroll through previous commands entered. 

* If you highlight some text, middle clicking on the mouse will paste it on the command line.

* One of the best Unix tips you can learn early on is that you can use tab to complete the names of programs and files on most Unix systems. Type enough letters to uniquely identify the name of a file, directory or command and press tab. Unix will do the rest. Try it...

In [None]:
fin

## Getting help man

To obtain further information on any of the Unix commands introduced in this tutorial you can use the man command. For example, to get a full description and examples of how to use the sort command use the following command in a terminal window.

In [None]:
man tail

## Manipulating files

There are several other useful commands that can be used to mananipulate and summarise information inside files and we will introduce some of these next, `cat`, `sort`, `wc` and `uniq`.

## cat

The `cat` command joins files together. 

Having looked at the beginning and end of the Styphi.gff file you should notice that in GFF files the annotation comes first, then the DNA sequence at the end. If you had two separate files containing the annotation and the DNA sequence, it is possible to concatenate or join the two together to make a single file like the Styphi.gff file you have just looked at. The command `cat` can be used to join two or more files into a single file. The order in which the files are joined is determined by the order in which they appear in the command line. 

For example, we have two separate files, Styphi.noseq.gff and Styphi.fa, that contain the annotation and DNA sequence, respectively for the Salmonella typhi CT18 genome. To join together these files use:

In [None]:
cat Styphi.noseq.gff Styphi.fa > Styphi.concatenated.gff

The files Styphi.noseq.gff and Styphi.fa will be joined together and written to a file called Styphi.concatenated.gff.

The `>` symbol in the command line directs the output of the cat program to the designated file Styphi.concatenated.gff. Use the command `ls` to check for the precence of this file.

In [None]:
ls

## wc - counting

The command `wc` counts lines, words or characters.

To count the number of files that are listed by `ls` use:

In [None]:
ls | wc -l

The `|` symbol above also known as the pipe symbol, connects the two commands into a single operation for simplicity. We say that the output from the first command is piped to and used as input to the second command.

You can connect as many commands as you want. For example:

In [None]:
ls | grep ".gff" | wc -l

What does this command do? You will learn more about the grep command later in this course.

## sort - sorting values

The `sort` lets you sort the contents of the input. When you sort the input, lines with identical content end up next to each other in the output. This is useful as the output can then be fed to the `uniq` command (see below) to count the number of unique lines in the input.

To sort the contents of the BED file use:

In [None]:
sort Pfalciparum.bed

To sort the contents of the BED file on position type the following command.

In [None]:
sort -k 2 -n Pfalciparum.bed

The `sort` command can sort by multiple columns e.g. 1st column and then 2nd column by specifying successive -k parameters in the command.

## uniq - finding unique values

The `uniq` command extracts unique lines from the input. It is usualy used in combination with sort to count unique values in the input.

To get the list of chromosomes in the Pfalciaprum bed file use:

In [None]:
awk '{ print $1 }' Pfalciparum.bed | sort | uniq

How many chromosomes are there? You will learn more about the awk command later in this tutorial.

## Exercises

Open up a new terminal window, navigate to the `Unix` directory and complete the following exercises:

1. Use the `head` command to extract the first 500 lines of the S.typhi gff file and store the output in a new file called Styphi.500.gff
2. Use the `wc` command to count the number of lines in the `Pfalciparum.bed` file.
3. Use the `sort` command to sort the file Pfalciparum.bed on chromosome and then gene position.
4. Use the `uniq` command to count the number of features per chromosome in the `Pfalciparum bed` file. Hint: use the man command to look at the options for the uniq command. Or peruse the `wc` or `grep` manuals. There’s more than one way to do it!