-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Star or not star? #5
Comments
Nice idea! I think we can add the annotation, maybe not in the name it self, but another label together with the variant information. In terms of file format, maybe we can start with something miRBase compatible, and then add other naming with this kind of information. |
I think this is a fair point to discuss and my opinion is that star-sequences and in general all passenger-strand reads have to be characterized, too. Mind you that there are many such sequences that are not annotated in miRBase. Whether or not the difference between a 2 fold change or 10 fold change is considered for calling a star is always going to be subjective but before setting such a measure it is more important to think about how to name the passenger-strands if the ratio is not given for a star. We proposed "co-mature" as designation, where per definition 5' would be mature and 3' co. |
Great to have some input! For 1, indeed, many strands are missing in miRBase and other database. I have personally updated my species of interest (some actinopterygians species) but haven't done it for any species for which I haven't sequencing data. There should be for sure an encouragement to annotate all strands when possible (makes me think of this paper about plants: do it right or not at all! http://onlinelibrary.wiley.com/doi/10.1002/bies.201600113/abstract). And maybe we should eventually think of a way to share up-to-date annotation files? For 2, that's where I think there's some thinking to be made:
My vision in short: I actually personally don't really care about the "star" symbol and don't pay attention to it. What I look for in my data (with all 5p and 3p strands annotated) is what is deferentially expressed between my samples/conditions, no matter the strand they come from or whether they are the dominant strands. Maybe the non-dominant strand is actually the one that in my physiological situation is making the difference, who knows? Then I look at the absolute abundance, abundance of one strand compared to the complementary one, their isomiRs, their predicted targets, etc. Knowing that a given mature sequence is 'considered' the star sequence, or "co-mature", or not, won't change the way I analyze my sequencing data. I think it tries to bring non-robust functional information into a naming system that needs to be robust to anything. |
I completely agree with the Thomas vision, it is better to have a robust naming system based on concrete and unchangeable miRNA characteristics. Other information, like folding rates of expression between the two strands or if a miR is canonical or not (using the Fromm notation) can be included and maintained in other complementary files or DBs (like MirGeneDB, miRGator, and others). If available, this info can be valuable during the analysis, provided that contextualised with the tissue, age, and the conditions of the sample type.
to the next
Some of the available info can be used to access other databases like MirGeneDB by query composition (check the other component of LET-7 family -> http://mirgenedb.org/browse?org=hsa&query=LET-7 ) |
Here I'd like to open the discussion on the use of the "star" symbol.
Originally, when we thought that only one arm of the hairpin was functional, the star symbol () was used to convey that this strand was a non-functional by-product of the functional miRNA biogenesis. But the use of the start strand denomination '' is now not approved by any nomenclature consortium, including miRBase since April 2011 (Release 18), because many miRNA genes were then showed to produce mature miRNAs from both sides of the hairpin and because of the risk of fluctuation of expression levels as this denomination relies a lot on sequencing depth and the nature of the studied tissue/stage/etc. But in some cases of extreme differences in levels of expression, this additional symbol can convey potentially useful functional information.
So, should we simply follow the gene nomenclature consortia and not support this symbol? Or try to find an agreement and define rules for using this symbol to make this symbol consistent and trustworthy?
For example, at what level of arm selection can we say that a strand is likely only a by-product? Fromm et al (2015) propose a one-fold change. To me this appears not strong enough of a difference to call the second strand star strand, given the non-representation of the complete expressed miRNAome of an organism and the sequence bias known in miRNA-Seq library preparation. For instance I would personally be more confident in a 10-fold change and a good representation of tissue types in the organism considered to call one strand the star strand.
If you have any comments or propositions to try to clarify this situation, please participate!
The text was updated successfully, but these errors were encountered: