Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM Tag for structural variations #386

Closed
RAHenriksen opened this issue Apr 19, 2019 · 7 comments
Closed

SAM Tag for structural variations #386

RAHenriksen opened this issue Apr 19, 2019 · 7 comments
Labels

Comments

@RAHenriksen
Copy link

Im analyzing Structural variations using Pysam. The structural variations are a result of the dna being sequence is circularized. Meaning when i map it back to the reference genome i get a lot of soft-clipped reads across the circular junction, and a lot of supplementary reads. In some cases the SV might be complex, so when im extracting the SA tag, i can get several different chromosomal regions, as an example:
Extracting the soft-clipped reads for the region chr1:16626700-16627600
i get the following tag:
['chr8,51869044,-,2097S539M9D340S,60,31', 'chr8,51869044,-,446S536M11D1994S,60,31', 'chr1,16626774,+,2530S357M3I86S,60,24', 'chr8,51892697,+,41S297M4I2634S,60,12', 'chr8,51892649,+,1661S328M10D987S,60,33', 'chr1,16626960,-,33S59M1D2884S,1,2']

So i was wondering is the order of the Tag, actually the order of which the supplementary alignments are aligning.?

@armintoepfer
Copy link
Contributor

You can determine the order by parsing the cigar of each supplementary alignment.

@RAHenriksen
Copy link
Author

RAHenriksen commented Apr 19, 2019

Im parsing the CIGAR in order to get the length. but im not sure how that gives me the order. The problem is if its 1 read with several supplementary alignments, then im not sure how the length helps. Could you eloborate?

Im analyzing circular DNA, which can contain fragments of several chromosomal regions. And since im using long reads, these reads might span across several different chromosome regions, thus yielding several supplementary alignments. Many thanks for your time and help.

@armintoepfer
Copy link
Contributor

The soft-clips coupled with the strand information tell you exactly where in the query read those supplementary alignment start aligning to the reference.
Example:

41S297M4I2634S

This one starts at query base 42.

@RAHenriksen
Copy link
Author

As far as i understood when you have an supplementary alignment, the fields would be
chromosome | start | strand | cigar ...
so if its a supplementary alignment the start position is already given. But it makes sense what your describing.
But then as in the case of the original example, if we only look at negative strand the TAG for the supplementary alignments would be
[chr8,51869044,-,2097S539M9D340S,60,31', 'chr8,51869044,-,446S536M11D1994S,60,31', 'chr1,16626960,-,33S59M1D2884S,1,2']
Meaning there are supplementary alignment for several chromosomal regions.

And then the order of this one reads supplementary alignments would be
'chr8,51869044,-,446S... => 'chr8,51869044,-,2097S... => 'chr1,16626960,-,33S

Or is this incorrectly understood?

Because how do i know the order isnt like this instead?
'chr1,16626960,-,33S => 'chr8,51869044,-,446S.. => 'chr8,51869044,-,2097S...

@lh3 lh3 added the question label Apr 19, 2019
@lh3
Copy link
Owner

lh3 commented Apr 19, 2019

CIGAR are computed with respect to the reference genome, not respect to reads. For reads mapped to the reverse strand, you need to look at the last clipping.

You'd better use PAF output.

@lh3 lh3 closed this as completed Apr 19, 2019
@RAHenriksen
Copy link
Author

Okay thanks. But im still not sure how this answers my problem with regards to the order of the "SA" Tag for a specific read. Because as you say the CIGAR is computed with the use of the reference. Should i then just assume no matter the strand that the order of alignment is as written when extracting the TAG??

@lh3
Copy link
Owner

lh3 commented Apr 19, 2019

As I said, use PAF. You can learn how SAM encodes the same information by comparing PAF and SAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants