# Variant Calling
Call genetic variants in the evolved line using genome assembly based on the ancestor line

<hr >

## Current Directory Structure

In [1]:
%%bash
cd ./analysis
ls -1F

assembly/
data/
fastqc-analysis/
mappings/
trimmed/


- data: Raw FASTQ files
- trimmed: Sickle trimmed FASTQ files
- fastqc-analysis: FASTQC analysis of raw and trimmed FASTQ files
- assembly: reference genome assembly from ancestral genome with bowtie and bwa indexed references
- mappings: bowtie and bwa aligned mappings (sorted and post-processed)

<hr >

## Install necessary tools with conda
- samtools: 1.9
- bcftools: 1.9
- bamtools: 2.5.1
- freebayes: 1.2.0
- vcflib: 1.0.0_rc2 
- rtg-tools: 3.10

<hr >

## Pre-process: Indexing
- Create index FASTA reference for SNP caller using SAMtools
- Create index BAM files using BAMtools

- Creates .fai index file in the assebly/bwa directory

- Creates .bai index file in the mappings/bwa directory

<hr >

## Varinat Calling with Samtools
- Pile up all the reads with SAMtools mpileup:
    - -u: uncompressed output
    - -g: generate genotype likelihoods in BCF format
    - -f FILE: faidx indexed reference sequence file
- Call Variants with Bcftools call:
    - -v: output variant sites only
    - -m: alternative model for multiallelic and rare-variant calling
    - -o: output file-name
    - -O z: output type: ‘z’ compressed VCF
- Save output into variants directory

#### Count number of variants:

In [5]:
%%bash
zcat analysis/variants/evolved-6.mpileup.vcf.gz | grep -v '^#' | wc -l

695


<hr >

## Varinat Calling with Freebayes
- Reference genome scaffold file:
    - in fasta-format and the index in .fai format 
- Mapping BAM file:
    - Mapping file (.bam file) and a mapping index (.bai file)
- Callvariants with freebayes and pipe results to a new file
- -f --fasta-reference FILE: 
    - Use FILE as the reference sequence for analysis.  An index file (FILE.fai) will be created if none exists.  If neither --targets nor --region are specified, FreeBayes will analyze every position in this reference.


#### Count number of variants:

In [None]:
%%bash
zcat analysis/variants/evolved-6.mpileup.vcf.gz | grep -v '^#' | wc -l