Skip to content

05. Output Format: A (modified) VCF

Sebastian Niehus edited this page Mar 25, 2021 · 4 revisions

Overview
Special INFO and FORMAT fields
Window-wise output

Overview

PopDel's output is a standard VCF-4.2 file containing the genotypes for every sample.

Variant Call Format


VCF is a text file format. It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome. In PopDel each line represents one possible deletion.

The CHROM and POS fields give the position of the variant. ID, REF and ALT have the respective default values ".", "N" and "<DEL>" since the only variant PopDel calls, are deletions. QUAL and FILTER give information about the quality and whether a variant has passed the quality filter. The INFO and FORMAT field are modified and are explained in the next section. All columns left represent one of the BAM files given to popdel profile, containing information about the variant in the FORMAT order.


Every variant is defined by its genomic position and an estimate of its length. The precision of the length estimate mainly depends on the 'sharpness' of the insert size distribution(s) of the samples. The LR value in the INFO column gives a good additional quality measure for the deletion, as the value of the QUAL field will quickly cap at 100 while the Log-Likelihood Ratio has no upper limit. In fact, the QUAL value is simply a PHRED-like representation of the LR value.

Special INFO and FORMAT fields

The SWIN value represents the number of significant 30 bp windows that have been merged into the variant. The YIELD represents which fraction of the samples could be genotyped for the deletion, regardless of carrier status. The Likelihood-derived Allelic Depth (LAD) represents the number of reads that shifted the likelihood ratio in favor of the REF or ALT model (or neither). The Distribution-derived Allelic Depth (DAD) is similar to the LAD, but is based on the quantiles of the distributions. Therefore, it also contains counts for read pairs that support both models, or have an insert size that is too big for the deletion model. The First & Last (FL) values gives the position of the first and last read pair that acted in favor of the deletion model in the DAD-calculation. The First to Last Distance (FLD) is the distance between those two read pairs and should roughly correspond to the size of the deletion plus the median insert size.

Explanation of one line in a sample popdel.vcf

One line of a popdel.vcf

Each sample in FORMAT

Window-wise output

For different applications PopDel can write the output in a window-wise fashion. This behavior is enabled by setting the flag -n. Consider the following deletion of length 3000:

chr21 2000 . N <DEL> 100 PASS IMPRECISE;SVLEN=-3000;SVTYPE=DEL;AF=0.5 [...]

When applying the window-wise output this becomes:

chr21 1970 . N <DEL> 90 PASS IMPRECISE;SVLEN=-3015;SVTYPE=DEL;AF=0.4 [...]
chr21 2000 . N <DEL> 100 PASS IMPRECISE;SVLEN=-3000;SVTYPE=DEL;AF=0.5 [...]
chr21 2030 . N <DEL> 100 PASS IMPRECISE;SVLEN=-2999;SVTYPE=DEL;AF=0.5 [...]
...
chr21 5030 . N <DEL> 90 PASS IMPRECISE;SVLEN=-3012;SVTYPE=DEL;AF=0.4 [...]

Note: When using window-wise output, the POS field of the file no longer represents the starting position of the variant but that of the window. As you can see, every window that is overlapped by the deletion and that passes all tests gets reported to the output file. Further, the estimates of the deletion will slightly differ from window to window.



Next page → PopDel View