Skip to content

Latest commit

 

History

History
83 lines (72 loc) · 3.58 KB

README.md

File metadata and controls

83 lines (72 loc) · 3.58 KB

VCFParser

Render beautiful heatmaps of the SNVs of interest from any variant calling software. iVAR pipeline inputs are supported the most.

Extract non-duplicated SNVs from a VCF or iVar TSV file and render a heatmap for each SARS-COV2 Variable of Concern (VOC)

If SNVs of interest are not listed in the data/cov_lineage_variants.tsv file, append new ones or provide custom reference file by the -r parameter

To render all VOC plots defined in the cov_lineage_variants.tsv specify -voc all

Apply filters of min read coverage, SNV frequency and PHRED score quality

Requirements

  • pandas
  • python >= 3
  • matplotlib >= 3.3
  • openpyxl
  • pysam

Usage

$ vcfparser -h

usage: vcfparser parses VCF or TSV file and generates heatmaps and parsed VCF files on query SNVs/VOCs/VOIs.
The iVar TSV or VCF inputs are preferred (https://andersen-lab.github.io/ivar/html/manualpage.html) 

       [-h] (-i INPUT [INPUT ...] | -f INPUT_FILE | --clear_cov_cache)
       [-bam BAM_FILES [BAM_FILES ...]] [-voc VOC_NAMES] [-r REF_META]
       [--signature_snvs_only] [--key_snvs_only] [--stat_filter_snvs]
       [--subplots_mode SUBPLOTS_MODE] [--min_snv_freq_threshold [0-1]]
       [--min_depth_coverage [0-Inf]] [--min_quality [0-Inf]] [--annotate]
       [--dpi 400] [--font_size 2.5] [--annotate_text_color coral]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
                        List of ivar_variants.vcf or ivar_variants.tsv to
                        summarise
  -f INPUT_FILE, --input_file INPUT_FILE
                        Input file with TSV/VCF and BAM file paths for batch
                        input
  --clear_cov_cache     Erase cache of SNV coverages generated by previous
                        runs (.cache_snv_coverages.json)
  -bam BAM_FILES [BAM_FILES ...], --bam_files BAM_FILES [BAM_FILES ...]
                        Optionally provide a list of corresponding bam files
                        in THE SAME ORDER as files provided for the -i
                        parameter
  -voc VOC_NAMES, --voc_names VOC_NAMES
                        List of Variants of Concern names (e.g. UK, SA,
                        Brazil, Nigeria)
  -r REF_META, --ref_meta REF_META
                        Path to metadata TSV file containing info on the key
                        mutations
  --signature_snvs_only
                        Check VCF for only signature/official snvs linked to a
                        VOC
  --key_snvs_only       Check VCF for only the key (S-gene associated) snvs
                        linked to a VOC
  --stat_filter_snvs    Filter snvs based on statistical significance (i.e. QC
                        PASS/FAIL flags)
  --subplots_mode SUBPLOTS_MODE
                        How to plot multiple plots (onerow, onecolumn,
                        oneplotperfile)
  --min_snv_freq_threshold [0-1]
                        Set minimum SNV frequency threshold to display
                        (default: 0)
  --min_depth_coverage [0-Inf]
                        Filter SNVs based on min depth coverage (default:0 =
                        no filtering)
  --min_quality [0-Inf]
                        Filter SNVs based on min PHRED sequencing quality
                        (default:0 = no filtering)
  --annotate            Annotate heatmap with SNV frequency values
  --dpi 400             DPI value for the heatmap rendering. Default value:
                        400
  --font_size 2.5       Labels font size for both axis: 2.5
  --annotate_text_color coral
                        Annotate text colour (freq. values)