# Alignment Postprocessing

- A correctly aligned region (reads are shown as gray vertical bars with SNPs indicated as colored letters). 
- A spurious alignment where reads exhibit many small insertions (indicated as purple Is), deletions (shown as black horizontal lines) and SNPs.

## Steps

1. ~~__Local realignment around indels or__: GATK's IndelRealigner~~
2. __Calculation of per-base Base Quality scores (BAQ)__: 
    - We'll be performing this at a first pass as local realignment is both challenging and computationally intensive.
3. __Removal of Duplications__: Barcode2
    - Because this is an amplicon sequencing dataset, we'll need to resort to 
4. __Recalibration of Base Quality Scores__:

---

__Questions__:
- When do we sort BAM files?  Do we even need to?


## Workflow

| __Input__ | __Output__ |
|:----------|:-----------|
| `SAM`, `BAM` | `SAM`, `BAM` |

__To Index Reference__:

```
bwa index reference.fa
```

__To Sort BAM__:

```bash
samtools sort ./../../notebooks/data/original_files/aln.bam > ./../../notebooks/data/samtools_output/aln_sorted.bam
```

__To Index BAM__:

```bash
samtools index ./../../notebooks/data/samtools_output/aln_sorted.bam ./../../notebooks/data/samtools_output/aln_sorted.bam.bai
```

__To Align__:

```
bwa mem -M -t 2 ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.3.fa ./../../notebooks/data/original_files/aln1.fastq ./../../notebooks/data/original_files/aln1.fastq > ./../../notebooks/data/bwa_output/aln.sam
```

__Call Variants__:

```
freebayes --fasta-reference ./../../Reference_Sequence/Homo_sapiens.GRCh37/Homo_sapiens.GRCh37.dna_sm.toplevel.fa ./../../notebooks/data/samtools_output/aln_sorted.bam > ./../../notebooks/data/freebayes_output/var.vcf
```


```
freebayes --fasta-reference ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.3.fa ./../../notebooks/data/samtools_output/aln_sorted.bam > ./../../notebooks/data/freebayes_output/var_chrm3.vcf
```

```
bcftools mpileup -Ou -d 1150004 -f ./../../Reference_Sequence/Homo_sapiens.GRCh37/Homo_sapiens.GRCh37.dna_sm.toplevel.fa  ./../../notebooks/data/samtools_output/aln_sorted.bam | bcftools call -mv -Ob -o ./../../notebooks/data/bcftools_output/var150004.bcf
```

```
bcftools mpileup -d 1150004 -f ./../../Reference_Sequence/Homo_sapiens.GRCh37/Homo_sapiens.GRCh37.dna_sm.toplevel.fa  ./../../notebooks/data/samtools_output/aln_sorted.bam | bcftools call -mv -Oz -o ./../../notebooks/data/bcftools_output/var1150004.vcf
```


__Call Filtering__:


Analysis on Chromosome 3

[Tutorial](http://quinlanlab.org/tutorials/samtools/samtools.html).
[Tutorial](https://github.com/ekg/alignment-and-variant-calling-tutorial)

1.  1.  Unzip, Index, and BWA index reference

    ```
    gunzip ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa.gz
    samtools faidx ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa
    bwa index ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa
    ```

1.  Aligning to Chromosome 3

    ```
    bwa mem -M -t 2 ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.3.fa ./../../notebooks/data/original_files/aln1.fastq ./../../notebooks/data/original_files/aln1.fastq > ./../../notebooks/data/bwa_output/aln_chrm3_raw.sam
    ```

2.  Sort SAM file

    ```
    samtools sort ./../../notebooks/data/bwa_output/aln_chrm3_raw.sam > ./../../notebooks/data/samtools_output/aln_chrm3_sorted.sam
    ```
    
3.  Convert SAM to BAM

    ```
    samtools view -S -b ./../../notebooks/data/samtools_output/aln_chrm3_sorted.sam > ./../../notebooks/data/samtools_output/aln_chrm3_sorted.bam
    ```
    
4.  Index BAM
    
    ```
    samtools index ./../../notebooks/data/samtools_output/aln_chrm3_sorted.bam > ./../../notebooks/data/samtools_output/aln_chrm3_sorted.bam.bai
    ```
    
5.  Call Variants using Freebayes

    ```
    freebayes -f ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.3.fa ./../../notebooks/data/samtools_output/aln_chrm3_sorted.bam > ./../../notebooks/data/freebayes_output/aln_chrm3_sorted_variants.vcf
    ```

6.  Stopped because no coverage...
    

Analysis on Chromosome 1

[Tutorial](http://quinlanlab.org/tutorials/samtools/samtools.html).
[Tutorial](https://github.com/ekg/alignment-and-variant-calling-tutorial)

1.  Unzip, Index, and BWA index reference

    ```
    gunzip ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa.gz
    samtools faidx ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa
    bwa index ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa
    ```

1.  Aligning to Chromosome 1

    ```
    bwa mem -M -t 2 ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.1.fa ./../../notebooks/data/original_files/aln1.fastq ./../../notebooks/data/original_files/aln1.fastq > ./../../notebooks/data/bwa_output/aln_chrm1_raw.sam
    ```

    
3.  Convert SAM to BAM

    ```
    samtools view -S -b ./../../notebooks/data/bwa_output/aln_chrm1_raw.sam > ./../../notebooks/data/samtools_output/aln_chrm1_raw.bam
    ```


2.  Sort BAM file

    ```
    samtools sort ./../../notebooks/data/samtools_output/aln_chrm1_raw.bam > ./../../notebooks/data/samtools_output/aln_chrm1_sorted.bam
    ```
    
4.  Index BAM
    
    ```
    samtools index ./../../notebooks/data/samtools_output/aln_chrm1_sorted.bam > ./../../notebooks/data/samtools_output/aln_chrm1_sorted.bam.bai
    ```
    
5.  Call Variants using Freebayes

    ```
    freebayes -f ./../../Reference_Sequence/Homo_sapiens.GRCh37/chromosomes/Homo_sapiens.GRCh37.dna_sm.chromosome.3.fa ./../../notebooks/data/samtools_output/aln_chrm3_sorted.bam > ./../../notebooks/data/freebayes_output/aln_chrm3_sorted_variants.vcf
    ```

6.  Stopped because no coverage...