POS, SVLEN, and END are seemingly discordant in VCF output #72

RyanVidegar-Laird · 2022-12-05T18:35:05Z

Hi, thanks for all of your work on this tool!

I've ran xTea (with defaults) on 4 short-read WGS samples, and am a bit confused why the POS, END, and SVLEN values don't seem to align in the VCF output. I would expect END = POS + SVLEN, yet it doesn't across any of my samples for Alu or L1 SVs. Is this an error? I'm new to working with SVs, so perhaps it's my misunderstanding.

Small output example:
awk '!/orphan/' ./xtea/out/sample-01_ALU.vcf | bcftools query -f'[%CHROM\t%POS\t%INFO/SVLEN\t%END\n]' - | shuf -n 5 | awk '{$5 = $4-$2}1' | column -t

CHR	POS	SVLEN	END	END-POS
chr6	49594200	269	49594216	16
chr9	96301414	276	96301428	14
chr8	19801418	292	19801427	9
chr2	64936633	274	64936649	16
chr8	127025867	378	127025879	12

The text was updated successfully, but these errors were encountered:

simoncchu · 2022-12-07T13:52:13Z

This is insertion, which means they are absent from the genome. Here, SVLEN is the insertion length. In general, we think one insertion will have one breakpoint on the genome, however for TE insertions the two strands usually do not break at the exact same location (you can search for target-site-duplication in L1 retrotransposon to understand more). Thus, there are two breakpoints reported here (POS and END). Hope this helps.

simoncchu closed this as completed Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POS, SVLEN, and END are seemingly discordant in VCF output #72

POS, SVLEN, and END are seemingly discordant in VCF output #72

RyanVidegar-Laird commented Dec 5, 2022

simoncchu commented Dec 7, 2022

POS, SVLEN, and END are seemingly discordant in VCF output #72

POS, SVLEN, and END are seemingly discordant in VCF output #72

Comments

RyanVidegar-Laird commented Dec 5, 2022

simoncchu commented Dec 7, 2022