Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Junction extract: different junction start/end sites detected using short and long reads #165

Open
ErminZ opened this issue Jul 20, 2022 · 1 comment

Comments

@ErminZ
Copy link

ErminZ commented Jul 20, 2022

Hello,

I have a question about junction extract, different junctions are identified between Regtools and NanoSplicer. Would you help me to understand why? Thank you!

The input file is the same BAM file generated by minimap2 for Nanopore data.

Regtools output file
ref 1180 3212 JUNC00000028 137 ? 1180 3212 255,0,0 2 99,51 0,1981
ref 1180 3245 JUNC00000017 658 ? 1180 3245 255,0,0 2 99,87 0,1978
ref 1180 6541 JUNC00000030 377 ? 1180 6541 255,0,0 2 99,71 0,5290
ref 1180 6608 JUNC00000020 2705 ? 1180 6608 255,0,0 2 99,141 0,5287
ref 1198 4144 JUNC00000033 66 ? 1198 4144 255,0,0 2 81,46 0,2900
ref 1207 5330 JUNC00000029 89 ? 1207 5330 255,0,0 2 72,134 0,3989
ref 1241 5191 JUNC00000117 1 ? 1241 5191 255,0,0 2 38,15 0,3935
ref 1252 6485 JUNC00000094 1 ? 1252 6485 255,0,0 2 18,18 0,5215
ref 1260 1433 JUNC00000042 1 ? 1260 1433 255,0,0 2 19,24 0,149
ref 1267 1710 JUNC00000095 2 ? 1267 1710 255,0,0 2 12,29 0,414
ref 1977 2486 JUNC00000163 3 ? 1977 2486 255,0,0 2 26,15 0,494
ref 2009 2518 JUNC00000168 3

From NanoSplicer
(1279, 6467) | 2681 (number of reads supporting the junction)
(1279, 3158) | 650
(1279, 6470) | 372
(1283, 6467) | 78

I found a read in this BAM file as an example:
5b273de4-fcdd-4c4a-91a3-0e6edd27e2cf 0 Ad-ED90A1.4.1.6_CMV-Wuhan_Stabilized_S-BGH-CMV-Luc 1178 60 32=1D37=1D31=5188N71=1D11=1D52=2D15=1I17=1I1=1X61=1X14=2D3=1I30=1I6=1I5=1D23=2I42=4I5=2D45=1D17=1X15=1D12=1D15=6S * 0 0

Based on the CIGAR, it should be a (1279, 6467). However, no junction (1279, 6467) in Regtools output junc file, is shown above. Would you help to point out what is causing this problem?

The Illumina data from the same sample also gives different junctions compared to Nanopore sequencing. That's why I want to understand why.
ref 1154 6614 JUNC00000057 4515 + 1154 6614 255,0,0 2 125,147 0,5313
ref 1166 3305 JUNC00000228 932 + 1166 3305 255,0,0 2 113,147 0,1992
ref 1177 6617 JUNC00000232 4481 + 1177 6617 255,0,0 2 102,147 0,5293
ref 1178 3308 JUNC00000083 256 + 1178 3308 255,0,0 2 101,147 0,1983
ref 1178 5343 JUNC00000234 226 + 1178 5343 255,0,0 2 101,147 0,4018
ref 1180 1803 JUNC00000237 13 + 1180 1803 255,0,0 2 99,122 0,501
ref 1180 4245 JUNC00000236 175 + 1180 4245 255,0,0 2 99,147 0,2918
ref 1180 5210 JUNC00000240 469 + 1180 5210 255,0,0 2 99,147 0,3883
ref 1182 6608 JUNC00000243 126 + 1182 6608 255,0,0 2 81,141 0,5285

Thank you in advance!

Best,
Ermin

@kcotto
Copy link
Contributor

kcotto commented Mar 23, 2023

I think this issue is related to how different tools represent junctions differently. For the Regtools output file you're referencing, which command did you run?

I'm pretty sure it's coming from junctions extract, which encodes some visualization information into the bed file but if you want to match coordinates to NanoSplicer, you could use the information to get there.

Alternatively, if you run junctions annotate, I think you'll get values similar to NanoSplicer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants