New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge 'protein complex' term into macromolecular complex, rename 'protein-containing complex' (was: MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term) #12782
Comments
@dosumis Is this due to a pattern that isn't specific enough? |
The hierarchy now is transferase complex is_a catalytic complex is_a macromolecular complex. It's hard to see how to get protein complex into that hierarchy without also asserting that all catalytic complexes have only protein subunits, which seems dangerously restrictive. |
Yes. But certainly many of the current children of macromolecular complex are protein complexes. It seems they were mis-classified en mass at some point. |
This issue comes up all the time! As soon as one child has a non-protein member the whole branch gets moved to macromolecule complex and can't be found under protein complex. We widened the def for protein complexes but only to include prosthetic groups. See the latest edits here, this might explain some of the problems: I'm afraid, we are going round in circles here, folks, and it needs sorting :( I've been banging my head against the wall over this for the past 3+ years... @dosumis @paolaroncaglia @mcourtot Birgit |
I still like my crazy suggestion in this ticket I have a bigger issue......many people use the "protein complex" term, and would expect that to retrieve complexes like the ribosome and the spliceosome and telomerase (I suspect) Is it possible to define a protein complex as a complex which has only proteins, or protein and RNA components? so Would that be crazy? then everything can go under protein complex, unless we know that it has an RNA component, then it moves down... ..it might not be possible but to me it's similar to saying that a glycoprotein is_a protein.... |
Indeed. As far as I'm concerned, we've already agreed that 'macromolecular complex' is the general term. We defined it as having at least one protein component and it has the synonym ''protein containing complex". All complex classes defined entirely by activity are now under 'macromolecular complex': I wasn't entirely sure this was the best solution so delayed committing and asked for feedback on it at the time (see ticket). The obvious way to implement Val's solution would be to obsolete the current 'protein complex' term and rename macromolecular complex to protein complex. If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that? |
It's often borderline and down to personal interpretation if something is a prosthetic group or a 'full blown component'... And looking at how users interpret the terms, they expect to retrieve everything that's currently under macromolecular complex when they query on protein complex. But in one of the previous tickets there was hesitation about doing away with the 'protein-only protein complex' class. Birgit |
Could be done as a merge. The def of macromolecular complex would win. Might cause complaints downstream though if if keeps its ID but gets the name 'protein complex'. |
Complaints from users or scripts? |
Would it help, at least in part, to swap primary name and synonym for 'macromolecular complex'? I.e. name it 'protein-containing complex' (and keep 'macromolecular complex' as an exact synonym) |
From consuming databases (see recent complaints from FlyBase). This is just a matter of strategy though. I think the most important thing is answering this question:
|
Coming back to Birgit's last comment, "prosthetic group" can be defined so that it's not a borderline personal interpretation. It's a molecule that is not encoded directly or indirectly in the genome (i.e., not DNA, RNA, protein) that is associated with a protein and required for the enzymatic activity of the protein or the complex of which the protein is a part (Devlin Biochemistry, 4th edition, page 414). Stryer just says "non-protein", but that definition was clearly composed before the significance of ribozymes was understood, so I think we are allowed to ignore it. Devlin then distinguishes cofactors and prosthetic groups by the strength of their association with the protein - loose / low-affinity for cofactors and tight / high-affinity / possibly covalent for prosthetic groups - but that subdistinction doesn't matter here. On the Reactome definition of complex, where any association involving two or more molecules at least one of which is a protein, all are complexes. (Does GO require two or more polypeptides? - I think so.) But we all agree that a complex composed entirely of polypeptides can be distinguished from a complex composed of polypeptides and other stuff, be that stuff encoded proteins, peptides, RNAs, etc or unencoded heme, biotin, etc. Which still doesn't resolve the issue whether it's useful to distinguish purely protein complexes from protein + other stuff ones. |
Tbh, we can't do this now either as many classes under macromolecular complex contain protein-only leaves but as they have sibling terms that do contain non-protein members the whole class has been re-classified. If we wanted to be true to any definition of protein-only complexes we'd have to sieve through all the leaves and add in the protein complex parent manually. That ain't gonna happen, is it? |
If the distinction between protein-only and protein-mixed complexes were discarded ("Val's crazy suggestion" or something close to it) information would be lost but, I think, the problem in this thread would go away. So, who uses the information captured by this distinction? How would they be hurt by the loss? |
That's is the crucial point, Peter. At the moment, the mixed parentage is definitively causing issues for the users. Should we send a message to GO-discuss and GO-friends and ask what would work better? |
@ValWood I think we need to change the title so we can find the ticket again in the future as it has little to do with the actual 6-phosphofructokinase complex :( |
@ukemi 's suggestion would work for me :) |
In general, I think it is more harmful to have people retrieve "protein complex", and not get ribosome, spliceosome, telomerase, (the historical and current situation), than it is to retain the distinction between a protein complex and a macromolecular complex. I vote to discard the distinction between protein-only complex and protein-mixed complex. Its a simplification that I'm sure would HELP users. |
What about changing the term name macromolecular complex to protein-containing complex? It fits the definition. Could we change the name of protein complex to make it more explicit that it only contains proteins? |
but do we really need the distinction? would users be hurt by not making this distinction? (I think not). we could even still have protein-RNA complex and protein-DNA complex. So if a user really, really did want to exclude ribosomes, telomerase, spliceosomes, DNA polymerase, MCM complex |
Although the DNA-protein complexes ( at least GO:0043599 nuclear DNA replication factor C complex) appear to have is_a links to protein complex and protein-DNA complex. So if that is valid, this would be an alternative solution...but it seems a bit wrong? |
Val, I think these inconsistencies stem from the issues with TPVs: If a pre-terminal node has children that are a mix of protein-only complexes and protein-X complexes, the pre-terminal node belongs to protein-X complex but the terminal nodes may have a mix of both ancestries. That's what happened with the endonucleases :( |
I thought this was already the case. Can you check. If it's not, file a On 9 Nov 2016, at 4:59, David Hill wrote:
|
@cmungall committed, but not deployed. |
Just to recap for Thursday's editor's call and the discussion above. |
@bmeldal Or were you talking about 'complex involved in process'? This is clearly a dangerous path.... Thanks, Pascale |
I could merge x biogenesis into assembly for those, not sure that this is what the annotations were trying to capture. Looks rather like regulation of expression. |
I haven't used the biogenesis terms so better ask those who have. If it's just translation than there shouldn't be any specific x protein biogenesis terms unless something special happens :) |
historically biogenesis terms were created for some processes when they knew that the production of something was affected but were not sure whether it was the transcription, translation, assembly etc. The only strong case for keeping is "ribosome biogenesis" which researchers use to include rRNA processing, assembly and export from the nucleus because some of the steps don't appear to be separable (at least currently). |
Well, I suppose that 'biogenesis' of a protein could mean other things besides translation depending on what you were referring to (ie, posttranslational events) |
But then we have 'protein modification.... ' |
which, again, depending on what protein form you are referring to, would be included in biogenesis. It's a fairly broad grouping term. |
yes it's historic, we shouldn't need them. If you can't be sure which process , don't make the annotation.... |
WooHoo. |
Thanks @pgaudet for taking this on. It was a very complicated merge/rename. |
No problem! I create 3 new tickets for the outstanding issues. Closing this one. |
Thank you everyone!!! I feel like celebrating! |
We just had a IntAct/CP release but I will tweet about these changes next week. Leaving our release tweets on top of the news feed for a few days. |
95 comments! |
No objections from here. We annotate the assembly of a complex to capture distinct functions mediated by the complex at various stages of its assembly, or to capture interactions with other physical entities that affect distinct steps of the assembly process, and we treat the assembly process as part of whatever process the complex itself mediates, not as a distinct process in its own right, so these changes in GO should not affect us, |
As I can't see the changes until they go public:
|
Yes and yes |
does not have the parent protein complex
The text was updated successfully, but these errors were encountered: