-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to get actual inserted/deleted sequence #10
Comments
Could use some help on this @kcleal . |
Hi brentp, For the 1st record you have highlighted it is possible to get the insertion sequence by slicing the first 36 bases (svlen=36) from the soft-clipped region i.e: tttctttctttctttctttctttctttctttctttc This is not guarenteed to work for all insertions though, as some only have partial mappings (or no mappings). I guess full insertion sequences could be added into the alt allele field, however it is not clear what to do for partial or no-mappings. The lowercase letters are simply non-reference aligned bases, so any insertion or soft-clipped sequences will be represented as lowercase. In the second example, there looks like a 2bp insertion sequence in the consensus, but this is not related to the deletion. The actual deletion sequence is quite difficult to extract from the record, but can be sliced from the reference using POS and END; this is obviously a pain to do, so I will try and get that added to the alt allele field I think. |
Thanks for the reply. |
Thanks, I think that is a good solution, I will get this fixed. |
That would be awesome! And, if it can be assembled, manta puts the insertion sequence in |
Hi brent, ive added some improvements to insertion reporting. If possible the insertion seq is now written in the ALT field, however this applies only to re-mapped sequences where the whole insertion was likely mapped, and 'within-read' events. Other insertion sequences that have partial mappings (or no mappings) are available in the |
Excellent, Thank you! I am evaluating this now. |
I can run this and get some INS sequences. For those that do not have it, what does that mean?
has only
has neither. |
Here is another one:
has contiga and contigb, but no SEQ fields. |
Hi brent, thanks for looking at this, I can see why this is confusing! Basically, the ones you have pulled out that do not have insertion sequences called - these are events that were identified from split/supplementary mappings (they have RMS re-mapping score==0, and WR within-read support == 0). These events will take a bit more handling to infer the insertion sequence. I will try and get this fixed in the next few days. |
Hi, given a variant like this one (added some newlines for readability):
what is the actual inserted sequence?
SVLEN says 36. EXPSEQ has len=28, the lowercase sequence in
CONTIGA
has len=41I understand that in some cases we can't get the full inserted sequence, but for those cases, can we get the left end of the inserted sequence from CONTIGA and the right end from CONTIGB? how?
Would also be nice to be able to get deleted sequence for DEL:
In this case, the sequence after the lower-case letters in CONTIGA is length 32. Is that the deleted sequence? Or would I look this up in a fasta and contig A is the haplotype with the deletion?
The text was updated successfully, but these errors were encountered: