# Creating delimiter separated files in BASH

You can use BASH to pre-process information before importing them into other programs such as Windows Excel. In this use case, you want to transform your data into a delimiter separated file, like tab-separated values `.tsv` or the comma-separated values `.csv`. 

Using your new skills with `grep` and `sed` you already have all the tools you need to make one of these files. But before you get started, let's have a look at how we can change the order of the columns in a file.

## A glimpse into awk

[awk](https://de.wikipedia.org/wiki/Awk) is a very powerful tool. In fact, it is its own programming language specialized in dealing with delimiter separated files.

`awk` loads the content of a file into different variables. Each variable stands for one column of the file. The first column will be called `$1`, the second `$2` and so on.

Look at the example below. We will call `awk` and tell it that our **F**ield delimiter is a tabstop (`\t`). We will then open the `awk` statement and tell it to print the second and then the first column, separeted by a tabstop:

In [6]:
%%bash
grep -v "X" ejemplo.txt

aaaaa	xxxxx
xxxxx	bbbbb
ccccc	xxxxx
xxxxx	ddddd
eeeee	xxxxx
aaaaa	bbbbb
....	fffff
axaxa	bxbxb


In [7]:
%%bash
grep -v "X" ejemplo.txt | awk -F "\t" '{print $2 "\t" $1}'

xxxxx	aaaaa
bbbbb	xxxxx
xxxxx	ccccc
ddddd	xxxxx
xxxxx	eeeee
bbbbb	aaaaa
fffff	....
bxbxb	axaxa


## Tasks

This folder contains the nucleotide sequences of the genes of Acinetobacter baumannii in the FASTA format. In the sequence headers you will see additional information, like the gene name, the description of the gene and the location where this gene is on the genome.

1. Extract information from the headers, creating a tabulator or pipe (|) separated file that contains the gene name, the description and the start and stop position of the gene. The result should look like this:

```
dnaA    chromosomal replication initiation protein      170     1567
dnaN    DNA polymerase III subunit beta 1665    2813
recF    recombination protein F 2828    3910
```

In [None]:
%%bash


2. Reorder the columns of your result so that it looks like this:

```
dnaA    170     1567    chromosomal replication initiation protein
dnaN    1665    2813    DNA polymerase III subunit beta
recF    2828    3910    recombination protein F
```

In [None]:
%%bash
