Skip to content

A pipeline to detect SNVs and other short somatic variants. This tool can work with low variant allele frequencies, ideal for analyzing circulating tumor DNA (ctDNA) without the need of matched controls.

License

Notifications You must be signed in to change notification settings

nickveltmaat/CircuSNV

Repository files navigation

CircuSNV

A pipeline to detect SNVs and other short somatic variants. This tool can work with low variant allele frequencies, ideal for analyzing circulating tumor DNA (ctDNA) without the need of matched controls.

Pipeline to call SNVs with 4 tools (VarDict, LoFreq, Mutect2 & SiNVICT)

Prerequisites:

General Description

See S1_SNV_pipeline.png for a flowchart of the pipeline This is a pipeline made to reliably generate calls for somatic mutations in Low Variant Allele Frequencies (VAF) samples in specific regions, such as NGS data from cfDNA. This is done by analyzing .BAM files with 4 different tools (VarDict, LoFreq, Mutect2 & SiNVICT). The pipeline will output variants that are called with at least an x amount of tools (this can be set from 1-4). Of course, the higher the number, the lower False Positive call rate, the higher the reliability of the call, but also a higher chance of missing relevant somatic variants.

The general workflow in the pipeline is as follows:

Firstly, a panel of normals (PoN, a blacklist to filter mutations) can be generated if the argument -P is passed, leading to a directory containing normal/healty control samples. This is optional. To generate this, all normal samples are being analayzed with the 4 tools, in the exact same way as the tumor samples, to generate a list of personal variants / SNPs and technical artifacts. All variants found in all of the normal samples, called with at least one of the tools will be included. This blacklist will later be used as a filter. Ather generating the PoN, the 4 tools will run in parralel, generating raw data for each SNV caller. Every tool has .vcf files as output, which are then gunzipped and indexed, so they can be filtered on Read Depth (RD, -D), Variant Allele Frequency (VAF, -V) and Read Depth of Mutant allele (MRD, -M). Subsequently, with all .vcf files processed, the variants can be merged on overlapping variants, keeping the RD, VAF and MRD parameters intact (merge_variants.py). All variants called with x or more tools will be kept. Since the tools can call variants in different ways, sometimes a variant is part of a larger variant (MNV, multinucleotide variant). These are therefore duplicate and need to be removed, a proces that a custom python script (SNV-MNV_handling.py) takes care of. Next, the variants are annotated and directly after filtered based on functional effect (e.g. non-synonymous variants are kept) (post_filtering.py). Finally, the remaining variants will be annotated again using openCRAVAT. This is a wrapper around multiple well-known annotating tools, such as ClinVar, dbSNP, COSMIC, gnomAD and many more. All annotated mutations are saved in an excel file and as .vcf files.

Installation

1. Clone the repo

git clone https://github.com/nickveltmaat/CircuSNV

2. Set working directory to the repo

cd /path/to/CircuSNV

3. Create python virtual environment (env)

python3 -m venv ./env

4. Install needed packages in env with pip3

source ./env/bin/activate
pip3 install pandas
pip3 install glob
pip3 install xlrd
pip install open-cravat
oc module install-base
oc module ls -a -t annotator  (this generartes a list of available annotators that can be downloaded)
oc module install clinvar cosmic dbsnp ...  (see https://open-cravat.readthedocs.io/en/latest/1.-Installation-Instructions.html for more detailed instructions)
deactivate

5. Download and copy the pre-built tools to /path/to/SNVCaller/ and unzip

unzip ./tools.zip

6. Adjust paths in CircuSNV.sh

cd to CircuSNV installation directory
source ../../../env/bin/activate in pythonscript() and annotate() functions

Usage

Once all tools and pre-requisites are installed correctly, the pipeline can be called with:

bash ./CircuSNV.sh ARGUMENTS

Required arguments:

  • -I Input: String --> example: /path/to/input.bam Either one-file or directory
  • -R Reference: String --> example: /path/to/reference.fa
  • -L Regions List: String --> example: /path/to/panel.bed
  • -D minimum Read Depth: Int --> example: 100
  • -V minimum VAF: float [0-1] --> example: 0.004
  • -C minimum Calls: Int [1-4] --> example: 2
  • -P Panel of Normal: String --> example: /path/to/PoN/directory/ Optional
  • -Q Base Quality: Int --> example: 18
  • -M minimum Mutant Read Depth: Int --> example: 7
  • -F Filter Mode: String --> Combined or PerTool

Output will be generated in /path/to/CircuSNV/output/name_of_.bam_file/

About

A pipeline to detect SNVs and other short somatic variants. This tool can work with low variant allele frequencies, ideal for analyzing circulating tumor DNA (ctDNA) without the need of matched controls.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published