Thank you for your interest in using the Variant Analysis Pipeline. VAP is a comprehensive workflow for reference mapping and variant detection of genomic and transcriptomic reads using a suite of bioinformatics tools.
Article Source:
Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data Adetunji MO, Lamont SJ, Abasht B, Schmidt CJ (2019) Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data. PLOS ONE 14(9): e0216838. https://doi.org/10.1371/journal.pone.0216838
Bioinformatic tools are grouped based on sequencing reads
- BOWTIE2
- BWA
- TOPHAT2
- STAR (2-PASS)
- HISAT2
- PICARD + GATK HaplotypeCaller
- sort, addreadgroups, markduplicates using Picard Tools.
- split cigar reads using GATK from Transcriptomic Sequencing reads.
- variant detection using GATK.
N.B. : parameters of all tools are set to default.
Software | Version |
---|---|
TopHat2 | 2.1.1 |
HiSAT2 | 2.1.0 |
STAR | 2.5.2b |
SAMtools | 1.4.1 |
Picard tools | 2.13.2 |
GATK | 3.8 |
BWA-mem | 0.7.17 |
BOWTIE2 | 2.3.5.1 |
Current pipeline is not compatible with GATK v4
Contact maintainer to make custom changes to the different tools
- change config_job.file file with settings or renamed as required.
- If parameters are not needed, they must be either removed or changed to false
- Needed workflows (prefix: run) must be change to true (case-sensitive)
e.g : SAM = false (this means there is no sam file) else input the file directory
/path/to/samfiles/*sam
e.g : runTopHAT = true (this means the pipeline should run TopHAT2)
- Needed workflows (prefix: run) must be change to true (case-sensitive)
e.g : SAM = false (this means there is no sam file) else input the file directory
Before running the pipeline. Create indexes for the different assemblers specified REFERENCE GENOME INDEX SYNTAXS:
- GATK :
java -jar <picard directory>/picard.jar CreateSequenceDictionary R=<reference.fa> O=<reference.dict>
- HISAT :
hisat2-build <reference.fa> <path to reference.fa>/<index_name>
- BOWTIE/TOPHAT :
bowtie2-build <reference.fa> <path to reference.fa>/<index_name>
- BWA :
bwa index <reference.fa>
N.B. For easy use make sure all <index_name>
should be the same and stored in the <reference.fa>
directory
The downstream step performs the following:
- Merge SNPs from all variant calling tools initially specified to execute (TopHAT2/HiSAT2/STAR or BOWTIE/BWA).
- Pre-set filtering criteria using GATK-VariantFiltration tool.
- ReadRankPosSum (RRPS) < -8
- Quality by depth (QD) < 5
- Read depth (DP) < 10
- Fisher’s exact test p-value (FS) > 60
- Mapping Quality (MQ) < 40
- SnpCluster (3 SNPs in 35bp)
- Mann-Whitney Rank-Sum (MQRankSum) < -12.5
- Exploratory statistics of all variant files.
perl VariantAnalysisPipeline.pl -c config_job.file