# File conversion
In this section we are going to look closer at how you can go about converting files from one format to another, and vice versa. There is a multitude of tools available for converting between file formats, and we will use some of the most common ones: samtools, bcftools and X.

## SAM to BAM (and vice versa)
Using samtools to convert between SAM and BAM format is very straightforward. For this tutorail we have bundled a BAM file (the same one used in the prevoius section), so let us start with converting BAM into SAM. We are going to use the `samtools view` command. This is a very versitile command that you can use for a lot of things, for example viewing BAM files directly without converting them. 

In this intance, we would like to include the SAM header as well, so we use the `-h` option:

In [None]:
samtools view -h data/NA20538.bam > data/NA20538.sam

Now, have a look at the first ten rows of your SAM file. They should look like they did in the previous section when you viewed the BAM file header.

In [None]:
head data/NA20538.sam

Well that was easy! And converting SAM to BAM is just as easy. This time there is no need for the `-h` option, however we have to tell samtools that we want the output in BAM format. We do so by adding the `-b` option:

In [None]:
samtools view -b data/NA20538.sam > data/NA20538_2.bam

Samtools is very well documented, so for more usage options and functions, head over to the [samtools manual](http://www.htslib.org/doc/samtools-1.0.html).

## BAM to CRAM
From samtools version 1.3, support for CRAM format was introduced. This means that the samtools view command can also be used to convert a BAM file to CRAM format. In the data directory there is a BAM file called yeast.bam that was created from S. cerevisiae Illumina sequencing data. There is also a reference genome in the directory, called Saccharomyces_cerevisiae.EF4.68.dna.toplevel.fa.

To convert to CRAM, we use the `-C` option to tell samtools we want the output as CRAM, and the `-T` option to specify what reference file to use for the conversion. We also use the `-o` option to specify the name of the output file. Give this a go:

In [None]:
samtools view -C -T data/Saccharomyces_cerevisiae.EF4.68.dna.toplevel.fa -o data/yeast.cram data/yeast.bam

__Q1: Since CRAM files use reference based compression, we expect the CRAM file to be smaller than the BAM file. What is the size of the CRAM file?__  

In [None]:
ls -l data/yeast.cram

__Q2: Is your CRAM file smaller than the original BAM file?__ 

In [None]:
ls -l data/yeast*

To convert CRAM back to BAM, simply change `-C` to `-b` and change places for the input and output CRAM/BAM:

```
samtools view -b -T data/Saccharomyces_cerevisiae.EF4.68.dna.toplevel.fa -o data/yeast.bam data/yeast.cram
```

## FASTQ to SAM
As mentioned in the previous section of this tutorial, SAM format is mainly used to store alignment data. However, in some cases we want to store unaligned data in SAM format and for this we can use the picard tools FastqToSam application. Picard tools is a Java application that comes with a number of handy tools for manipulating high-throughput sequencing data. Apart from  FASTQ to SAM, we won't go into any detail about Picard tools in this tutorial, but feel free to explore it on the [Picard tools website](https://broadinstitute.github.io/picard/). Below is an example of how it can be used to convert a set of FASTQ files to unaligned SAM format:

```
java -jar /path/to/picard.jar FastqToSam F1=forward_1.fastq.gz F2=reverse_2.fastq.gz O=output.sam SM=sample_name
```

From here you can go on and convert the SAM file to BAM and CRAM. There are also multiple options for specifying metadata for the SAM header. To see all available options, run:

```
java -jar /path/to/picard.jar FastqToSam -h
```

## CRAM to FASTQ
Although it is possible to convert CRAM to FASTQ directly using the `samtools fastq` command, for many applications we need the fastq files to be ordered. For this reason, we will first use `samtools collate`, which will produce a collated BAM file. Remember that the reference file and it's index file (that was created when we converte BAM to CRAM is required for this).

In [None]:
samtools collate data/yeast.cram data/yeast.collated

The newly produced BAM file will be called yeast.collated.bam. Let's use this to create two FASTQ files, one for the forward reads and one for the reverse reads:

In [None]:
samtools fastq -1 data/yeast.collated_1.fastq -2 data/yeast.collated_2.fastq data/yeast.collated.bam

For further information and usage options, please feel free to have a look at the [samtools manual page](http://www.htslib.org/doc/samtools.html).

## VCF to BCF (and vice versa)
bcftools comprises a set of programs for interacting with VCF/BCF files. In a similar way that samtools can be used to convert between SAM, BAM and CRAM, bcftools can be used to convert between VCF and BCF
Convert the file called 1kg.bcf to a compressed VCF file called 1kg.vcf.gz using the bcftools view command:

In [None]:
bcftools view -O z -o 1kg.vcf.gz 1kg.bcf

The answers to the questions on this page can be found [here](answers.ipynb).   

Now continue to the next section of the tutorial: [QC assessment of NGS data](assessment.ipynb).   
You can also return to the [index page](index.ipynb).