SAM Tag for structural variations #386

RAHenriksen · 2019-04-19T11:40:02Z

Im analyzing Structural variations using Pysam. The structural variations are a result of the dna being sequence is circularized. Meaning when i map it back to the reference genome i get a lot of soft-clipped reads across the circular junction, and a lot of supplementary reads. In some cases the SV might be complex, so when im extracting the SA tag, i can get several different chromosomal regions, as an example:
Extracting the soft-clipped reads for the region chr1:16626700-16627600
i get the following tag:
['chr8,51869044,-,2097S539M9D340S,60,31', 'chr8,51869044,-,446S536M11D1994S,60,31', 'chr1,16626774,+,2530S357M3I86S,60,24', 'chr8,51892697,+,41S297M4I2634S,60,12', 'chr8,51892649,+,1661S328M10D987S,60,33', 'chr1,16626960,-,33S59M1D2884S,1,2']

So i was wondering is the order of the Tag, actually the order of which the supplementary alignments are aligning.?

armintoepfer · 2019-04-19T11:56:15Z

You can determine the order by parsing the cigar of each supplementary alignment.

RAHenriksen · 2019-04-19T11:58:12Z

Im parsing the CIGAR in order to get the length. but im not sure how that gives me the order. The problem is if its 1 read with several supplementary alignments, then im not sure how the length helps. Could you eloborate?

Im analyzing circular DNA, which can contain fragments of several chromosomal regions. And since im using long reads, these reads might span across several different chromosome regions, thus yielding several supplementary alignments. Many thanks for your time and help.

armintoepfer · 2019-04-19T14:06:42Z

The soft-clips coupled with the strand information tell you exactly where in the query read those supplementary alignment start aligning to the reference.
Example:

41S297M4I2634S

This one starts at query base 42.

RAHenriksen · 2019-04-19T14:28:56Z

As far as i understood when you have an supplementary alignment, the fields would be
chromosome | start | strand | cigar ...
so if its a supplementary alignment the start position is already given. But it makes sense what your describing.
But then as in the case of the original example, if we only look at negative strand the TAG for the supplementary alignments would be
[chr8,51869044,-,2097S539M9D340S,60,31', 'chr8,51869044,-,446S536M11D1994S,60,31', 'chr1,16626960,-,33S59M1D2884S,1,2']
Meaning there are supplementary alignment for several chromosomal regions.

And then the order of this one reads supplementary alignments would be
'chr8,51869044,-,446S... => 'chr8,51869044,-,2097S... => 'chr1,16626960,-,33S

Or is this incorrectly understood?

Because how do i know the order isnt like this instead?
'chr1,16626960,-,33S => 'chr8,51869044,-,446S.. => 'chr8,51869044,-,2097S...

lh3 · 2019-04-19T14:39:28Z

CIGAR are computed with respect to the reference genome, not respect to reads. For reads mapped to the reverse strand, you need to look at the last clipping.

You'd better use PAF output.

RAHenriksen · 2019-04-19T14:49:13Z

Okay thanks. But im still not sure how this answers my problem with regards to the order of the "SA" Tag for a specific read. Because as you say the CIGAR is computed with the use of the reference. Should i then just assume no matter the strand that the order of alignment is as written when extracting the TAG??

lh3 · 2019-04-19T15:03:29Z

As I said, use PAF. You can learn how SAM encodes the same information by comparing PAF and SAM.

lh3 added the question label Apr 19, 2019

lh3 closed this as completed Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAM Tag for structural variations #386

SAM Tag for structural variations #386

RAHenriksen commented Apr 19, 2019

armintoepfer commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019 •

edited

armintoepfer commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019

lh3 commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019

lh3 commented Apr 19, 2019

SAM Tag for structural variations #386

SAM Tag for structural variations #386

Comments

RAHenriksen commented Apr 19, 2019

armintoepfer commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019 • edited

armintoepfer commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019

lh3 commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019

lh3 commented Apr 19, 2019

RAHenriksen commented Apr 19, 2019 •

edited