# Whole Genome Sequencing of SARS-CoV-2

This tutorial was written by [Rafika I. Paramita](https://github.com/fikaparamita04) that adapted (with modification) from [Utah DoH ARTIC/Illumina Bioinformatic Workflow (Erin Young/Kelly Oakeson)](https://github.com/CDCgov/SARS-CoV-2_Sequencing/tree/master/protocols/BFX-UT_ARTIC_Illumina)

This workflow is to analyze WGS of SARS-CoV-2 that uses ARTICv3 amplicons, sequenced on Illumina.

You can copy the sample files from directory: /home/ref/sars-cov-2/samples/



STEP 1. Map Illumina reads to the reference (NC_045512; ARTIC default), and sort with samtools. This will also sort and remove unmapped reads:

In [None]:
bwa mem -t {threads} /home/ref/sars-cov-2/cov_ref/NC_045512.fasta {input.read1} {input.read2} | samtools sort | samtools view -F 4 -o {sample}.sorted.bam

STEP 2. Trim the primers off of the bam sequences using ivar:

In [None]:
ivar trim -e -i {sample}.sorted.bam -b /home/ref/sars-cov-2/cov_ref/nCoV-2019_v3.bed -p {sample}.primertrim

STEP 3. Re-sort your bams:

In [None]:
samtools sort {sample}.primertrim.bam -o {sample}.primertrim.sorted.bam

STEP 4. Get the consensus fasta that includes all the variants found, without replacing missing sequence with reference (missing sequence simply becomes "N"). The samtools mpileup options listed are those given in ivar's manual, and might not be the best options for our needs:

In [None]:
samtools mpileup -A -d 1000 -B -Q 0 --reference /home/ref/sars-cov-2/cov_ref/NC_045512.fasta {sample}.primertrim.sorted.bam | ivar consensus -p {sample}.consensus -n N

STEP 5. QC options:

In [None]:
samtools coverage {sample}.sorted.bam -o {sample}.samcov.txt

STEP 6. Variant predictions.

You can use Nextstrain webserver to analyse the variants: https://clades.nextstrain.org/
1. Upload / drop your consensus.fasta
2. Choose the reference