Skip to content

rjorton/PrimalAlign

Repository files navigation

PrimalAlign

Simple script to trim (trim_galore), align (bwa), primer trim (ivar) and conseus call (ivar) a sample. This is a basically a hard coded adaptation of valvs for covid19.

This is a simple script (that uses existing tools)and should not be used as a shortcut in the methods of a paper to represent methodology used. You should use the sentence:

"For illumina reads, reads were trimmed with trim_galore (bioinformatics.babraham.ac.uk/projects/trim_galore/) and mapped with BWA {Li,2009} to the Wuhan-Hu-1 reference sequence, followed by primer trimming and consensus calling with iVar {Grubaugh,2019}. A read coverage of least 10 was used for the consensus."

If starting from a folder of (unzipped) FASTQs from multiple samples i.e. an Illumina run, with the reads in the format:

CVR01_S1_R1_0001.fastq
CVR01_S1_R2_0001.fastq
CVR02_S1_R1_0001.fastq
CVR02_S1_R2_0001.fastq

First run setup_primal_align.sh to create a folder for each sample and move each samples corresponding FASTQs into it's folder:

setup_primal_align.sh

Then to run Primal Align over each sample it's folder, run loop_primal_align.sh:

loop_primal_align.sh

By deafult it will use the V1 primers, to use other primers (e.g. V2 or V3) provide this as an argument to the script:

loop_primal_align.sh V2

This will run primal align on each sample. To run primal align on a single sample, move into a folder which has paired end FASTQs in the format _R1_0001.fastq and _R2_0001.fastq and type:

primal_align.sh

By deafult it will use the V1 primers, to use other primers (e.g. V2 or V3) provide this as an argument to the script:

primal_align.sh V2

This will run:

  • trim_galore: quality 20 and length 50
  • bwa mem
  • remove primers using ivar
  • create a consensus using ivar: quality 20 depth 10

It will create:

  • SampleName.bam - no primer trimming/clipping
  • SampleName_trim.bam - with ivar primer trimming/clipping
  • SampleName_trim_ivar_consensus - ivar consensus sequence

There will also be a log file called

SampleName_primal_log.txt

Which has the following info:

  • Number of raw reads (reads not pairs)
  • Number of trimmed reads (reads not pairs) - number and as a % of raw
  • Number of reads mapping to BAM - number and as a % of raw
  • Number of reads mapping to primer trimmed/clipped BAM - number and as a % of raw
  • Average coverage extracted from weeSAM output

This is hard coded to run on alpha and hard coded for the covid-19 ref MN908947.fasta and artic v1 primers nCoV-2019-v1.bed to keep it minimal, will add an option to change to V2/V3 primers in a second.

Old PrimalAlign

Very very simple script to trim reads with trim_galore, align with bwa, sam to bam filtering out supplementary and additional alignments and primer-clip reads post alignment using bamclipper. NB: it is hardcoded to point to the bamclipper in my home directory on alpha. The -d and -i options should be tweaked

Within a folder with two fastqs in the format *_R1_001.fastq and *_R2_001.fastq (it will find those fatsq's automatically) provide the reference path/name, the stub for output names, and the bed file of primer locations with respect to the reference.

primal_align.sh ref.fasta output_name ref.bed

This will create a raw and primer clipped bam file (plus indexes):

output_name.bam

output_name.primerclipped.bam

I have included the nCoV reference and primer BEDPE file based on V1 of artic nCoV-2019 primers

bash primal_align.sh MN908947.fasta output_name MN908947_bed.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages