Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS, SVLEN, and END are seemingly discordant in VCF output #72

Closed
RyanVidegar-Laird opened this issue Dec 5, 2022 · 1 comment
Closed

Comments

@RyanVidegar-Laird
Copy link

Hi, thanks for all of your work on this tool!

I've ran xTea (with defaults) on 4 short-read WGS samples, and am a bit confused why the POS, END, and SVLEN values don't seem to align in the VCF output. I would expect END = POS + SVLEN, yet it doesn't across any of my samples for Alu or L1 SVs. Is this an error? I'm new to working with SVs, so perhaps it's my misunderstanding.

Small output example:
awk '!/orphan/' ./xtea/out/sample-01_ALU.vcf | bcftools query -f'[%CHROM\t%POS\t%INFO/SVLEN\t%END\n]' - | shuf -n 5 | awk '{$5 = $4-$2}1' | column -t

CHR POS SVLEN END END-POS
chr6 49594200 269 49594216 16
chr9 96301414 276 96301428 14
chr8 19801418 292 19801427 9
chr2 64936633 274 64936649 16
chr8 127025867 378 127025879 12
@simoncchu
Copy link
Collaborator

This is insertion, which means they are absent from the genome. Here, SVLEN is the insertion length. In general, we think one insertion will have one breakpoint on the genome, however for TE insertions the two strands usually do not break at the exact same location (you can search for target-site-duplication in L1 retrotransposon to understand more). Thus, there are two breakpoints reported here (POS and END). Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants