New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAM Tag for structural variations #386
Comments
You can determine the order by parsing the cigar of each supplementary alignment. |
Im parsing the CIGAR in order to get the length. but im not sure how that gives me the order. The problem is if its 1 read with several supplementary alignments, then im not sure how the length helps. Could you eloborate? Im analyzing circular DNA, which can contain fragments of several chromosomal regions. And since im using long reads, these reads might span across several different chromosome regions, thus yielding several supplementary alignments. Many thanks for your time and help. |
The soft-clips coupled with the strand information tell you exactly where in the query read those supplementary alignment start aligning to the reference.
This one starts at query base 42. |
As far as i understood when you have an supplementary alignment, the fields would be And then the order of this one reads supplementary alignments would be Or is this incorrectly understood? Because how do i know the order isnt like this instead? |
CIGAR are computed with respect to the reference genome, not respect to reads. For reads mapped to the reverse strand, you need to look at the last clipping. You'd better use PAF output. |
Okay thanks. But im still not sure how this answers my problem with regards to the order of the "SA" Tag for a specific read. Because as you say the CIGAR is computed with the use of the reference. Should i then just assume no matter the strand that the order of alignment is as written when extracting the TAG?? |
As I said, use PAF. You can learn how SAM encodes the same information by comparing PAF and SAM. |
Im analyzing Structural variations using Pysam. The structural variations are a result of the dna being sequence is circularized. Meaning when i map it back to the reference genome i get a lot of soft-clipped reads across the circular junction, and a lot of supplementary reads. In some cases the SV might be complex, so when im extracting the SA tag, i can get several different chromosomal regions, as an example:
Extracting the soft-clipped reads for the region chr1:16626700-16627600
i get the following tag:
['chr8,51869044,-,2097S539M9D340S,60,31', 'chr8,51869044,-,446S536M11D1994S,60,31', 'chr1,16626774,+,2530S357M3I86S,60,24', 'chr8,51892697,+,41S297M4I2634S,60,12', 'chr8,51892649,+,1661S328M10D987S,60,33', 'chr1,16626960,-,33S59M1D2884S,1,2']
So i was wondering is the order of the Tag, actually the order of which the supplementary alignments are aligning.?
The text was updated successfully, but these errors were encountered: