Skip to content

qual_classifier

Rob Flickenger edited this page Aug 9, 2021 · 1 revision

The biograph qual_classifier command assigns a genotype and quality score to variants and filters on a threshold.

See Customizing the BioGraph Pipeline for an overview of how and when to use this command.

Essential Options

  • --vcf: the input VCF.
  • --model: the classifier model file. This is provided by Spiral Genetics and should match your version of BioGraph (for example, biograph_model-7.0.0.ml).
  • --grm: the dataframe output from the truvari anno grm command. This is only required when running the quality score classifier.
  • --out: the output VCF. If unspecified, the VCF will be written to STDOUT.

Filter Thresholds

  • --filter: Calls with a quality score lower than this will be removed from the output VCF.
  • --lowqual_sv: Structural variants with a quality score lower than this will be included but marked lowq in the filter field.
  • --lowqual_ao: SNPs and indels with a quality score lower than this will be included but marked lowq in the filter field. The ao is short for all others (non-SVs).
  • --thresh_gt: Cutoff threshold for GT (default: 0.5)

Other Options

  • --sample: When running on a multi-sample VCF, set --sample to choose the sample of interest.
  • --clsf: The genotype and quality classifiers are both run by default. You can run just the GT classifier with --clsf 1, or just the quality classifier with --clsf 2.
  • --df: A dataframe generated from the input VCF with bgvar2table.py. If not specified, a dataframe will automatically be created.
  • --threads: Use the specified number of threads. By default, one thread is allocated per available processor.

Getting Help

To see a list of all biograph qual_classifier options, use the --help switch:

$ biograph qual_classifier --help
usage: qual_classifier [-h] -v VCF -d DATAFRAME -m MODEL [-o OUT] [-x GRM]
                       [-f FILTER] [-s LOWQUAL_SV] [-a LOWQUAL_AO]
                       [--sample SAMPLE] [--tmp TMP] [-t THREADS]
                       [-g THRESH_GT] [-c {GT,Qual,Both}]

Classify VCF variants

optional arguments:
  -h, --help            show this help message and exit
  -v VCF, --vcf VCF     VCF to parse
  -d DATAFRAME, --dataframe DATAFRAME
                        Coverage DataFrame frame
  -m MODEL, --model MODEL
                        Model to apply to data
  -o OUT, --out OUT     VCF to output
  -x GRM, --grm GRM     DataFrame conaining grm features from truvari
  -f FILTER, --filter FILTER
                        Maximum threshold of calls to filter (0.1)
  -s LOWQUAL_SV, --lowqual_sv LOWQUAL_SV
                        Maximum threshold for calls to mark as lowqual_sv
                        (0.352)
  -a LOWQUAL_AO, --lowqual_ao LOWQUAL_AO
                        Maximum threshold for calls to mark as lowqual_ao
                        (0.22)
  --sample SAMPLE       Sample identifier (only required for multi-sample
                        VCFs)
  --tmp TMP             Temporary directory (/tmp)
  -t THREADS, --threads THREADS
                        Number of threads to use (48)
  -g THRESH_GT, --thresh_gt THRESH_GT
                        threshold for GT
  -c {GT,Qual,Both}, --clsf {GT,Qual,Both}
                        Flag for which classifiers to run (Both)
Clone this wiki locally