-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start and end on miRNA paralogs #19
Comments
Hi,
Thanks for the question.
I would say if the sequence map exactly to different precursors, having the same mature miRNA, you can choose to only use one paralog and use the position of that one, or multiple the number of lines, one for each paralogs. Use the attributes Parents and Name to give more information.
Does that make sense?
If not, can you give a specific example with sequence and numbers? That way is easier to be on the same page.
Cheers
… On May 22, 2018, at 2:03 PM, xbdr86 ***@***.***> wrote:
Hi! I have question for the miRTop community.
How would you define the "precursor start/end" in case of reads that can be assigned to paralogs (about ~ 15% of described miRNA have multiple copies with exact mature sequence)?
column4/5: start/end: precursor start/end as indicated by alignment tool
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#19>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HGSaCECIsEHo8zZAuntqsETSMGxFks5t1FLngaJpZM4UJIdU>.
|
Hi @lpantano! Thanks for your fast response! For instance, I was thinking in the case that I have been working more recently of mir-9. This mature miRNA can arise from 3 different paralogs. This is an extremely abundant miRNA in brain, thus generating hundreds of 3' isomiRs. Interestingly, when studied separately only one (paper coming soon hopefully!) of them generates a 5' isomiR of functional importance (Tan et al. NAR 2014). I think an annotation system that would annotate this 5' isomiR to each paralog could be misguiding for future interpretations of the data. So far in our custom program QuagmiR (https://github.com/Gu-Lab-RBL-NCI/QuagmiR/) we were annotating all miR-9 reads under the following naming structure:
On the practical end, annotating each read under multiple gene locations would generate a significant amount of data duplicity on the GFF file, although I don't see an easy way to deal with columns 1, 3, 4. Have a nice day! |
Thank you for the example!
In the case of the isomiR 5’, for sure, you only give one, the one is coming from. What I do in my tool, is giving only one of them when the match is perfect to more than one precursor because I am interesting on the miRNA itself and not the parent.
I totally get your point to increase redundancy of the GFF, although the mirtop code could handle this redundancy.
What we talked time ago was to have another attribute to add multiple Parents. So ideally, Parent is used for the representative precursor and the other attribute can be used to add the rest. But we never came with a name.
According to GFF3 original format, Parent can have multiple parents, you can separate them with a ‘,’. For instance: Parent has-miR-9-5p-1,has-miR-9-5p-2,has-miR-9-5p-3
That should be valid, what do you think?
However in this case you say, the 5’ isomiR only should have one Parent not the 3 of them, even if the 3’ isomiRs have. Does that make sense?
Thanks!
… On May 23, 2018, at 10:07 AM, xbdr86 ***@***.***> wrote:
Hi @lpantano <https://github.com/lpantano>!
Thanks for your fast response!
For instance, I was thinking in the case that I have been working more recently of mir-9. This mature miRNA can arise from 3 different paralogs.
mir-9-1 (http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000466 <http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000466>)
mir-9-2 (http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000467 <http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000467>)
mir-9-3 (http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000468 <http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000468>)
This is an extremely abundant miRNA in brain, thus generating hundreds of 3' isomiRs. Interestingly, when studied separately only one (paper coming soon hopefully!) of them generates a 5' isomiR of functional importance (Tan et al. NAR 2014). I think an annotation system that would annotate this 5' isomiR to each paralog could be misguiding for future interpretations of the data. So far in our custom program QuagmiR (https://github.com/Gu-Lab-RBL-NCI/QuagmiR/ <https://github.com/Gu-Lab-RBL-NCI/QuagmiR/>) we were annotating all miR-9 reads under the following naming structure:
hsa-miR-9-5p-1-2-3
hsa-miR-9-3p-1-2-3
On the practical end, annotating each read under multiple gene locations would generate a significant amount of data duplicity on the GFF file, although I don't see an easy way to deal with columns 1, 3, 4.
Have a nice day!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HBVnfQQOc8SjLUx6ZZAECk5F2lYlks5t1W0fgaJpZM4UJIdU>.
|
Hi! Yes, you are right the issue of which parent pri-miRNA to assign is quite important for us. Do you think it might work to arbitrarily assign reads that can belong to multiple parents to paralog-1, and indicating on attributes that that particular sequences has let's say 3 paralogs? And any read that can be uniquely mapped to one of the paralogs, to the corresponding parent? For example:
Present the following reads in GFF like that:
PS: Sorry, for the long delay in my response, I missed the notification e-mail from GitHub. |
Hi, no worries. I think is better to name the other paralogs, in case some tools wants to do something with that information. I am happy to have another attribute with Let me know if that helps. Thanks for working on this! |
Hi, |
Internally we favor reporting everything so that nothing is missed. Below are some illustrative examples - I picked some random sequences to illustrate the point. Example 1 shows that in 3 hairpins the 3p end of the isomiR differs by 1nt from the annotated mature. But on one of the hairpins, for the same sequence, the 3p end differs by 2nt (in the opposite direction) of the annotated mature. Whereas example 2 shows a sequence that could come from 5 different precursors. Example 1: Example 2: Some other things to consider:
|
Thanks @lpantano @ThomasDesvignes @phillipeloher ! I will take into account your suggestions! ;-) |
Hi! I have question for the miRTop community.
How would you define the "precursor start/end" in case of reads that can be assigned to paralogs (about ~ 15% of described miRNA have multiple copies with exact mature sequence)?
The text was updated successfully, but these errors were encountered: