Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 'protein complex' term into macromolecular complex, rename 'protein-containing complex' (was: MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term) #12782

Closed
ValWood opened this issue Nov 7, 2016 · 99 comments · Fixed by #15170

Comments

@ValWood
Copy link
Contributor

ValWood commented Nov 7, 2016

does not have the parent protein complex

@ukemi
Copy link
Contributor

ukemi commented Nov 7, 2016

@dosumis Is this due to a pattern that isn't specific enough?

@deustp01
Copy link

deustp01 commented Nov 7, 2016

The hierarchy now is transferase complex is_a catalytic complex is_a macromolecular complex. It's hard to see how to get protein complex into that hierarchy without also asserting that all catalytic complexes have only protein subunits, which seems dangerously restrictive.

@ukemi
Copy link
Contributor

ukemi commented Nov 8, 2016

Yes. But certainly many of the current children of macromolecular complex are protein complexes. It seems they were mis-classified en mass at some point.

@bmeldal
Copy link

bmeldal commented Nov 8, 2016

This issue comes up all the time! As soon as one child has a non-protein member the whole branch gets moved to macromolecule complex and can't be found under protein complex. We widened the def for protein complexes but only to include prosthetic groups.

See the latest edits here, this might explain some of the problems:
#12574 moved the whole branch of catalytic complex out of protein complex because some children of endoribonuclease complex are ribonucleoprotein complexes. (start halfway down with Val's comment on 11/8/16)
#12620 made more generic changes.

I'm afraid, we are going round in circles here, folks, and it needs sorting :( I've been banging my head against the wall over this for the past 3+ years...

@dosumis @paolaroncaglia @mcourtot

Birgit

@ValWood
Copy link
Contributor Author

ValWood commented Nov 8, 2016

I still like my crazy suggestion in this ticket

I have a bigger issue......many people use the "protein complex" term, and would expect that to retrieve complexes like the ribosome and the spliceosome and telomerase (I suspect)

Is it possible to define a protein complex as a complex which has only proteins, or protein and RNA components?

so
protein complex
--ribonucleoprotein complex

Would that be crazy? then everything can go under protein complex, unless we know that it has an RNA component, then it moves down...

..it might not be possible but to me it's similar to saying that a glycoprotein is_a protein....

@dosumis
Copy link
Contributor

dosumis commented Nov 8, 2016

"I'm afraid, we are going round in circles here, folks, and it needs sorting :( I've been banging my head against the wall over this for the past 3+ years..."

Indeed.

As far as I'm concerned, we've already agreed that 'macromolecular complex' is the general term. We defined it as having at least one protein component and it has the synonym ''protein containing complex".

All complex classes defined entirely by activity are now under 'macromolecular complex':
#12620. This ensure proper classification where some complexes with an activity are protein (only) complexes and some are RNPs.

I wasn't entirely sure this was the best solution so delayed committing and asked for feedback on it at the time (see ticket).

The obvious way to implement Val's solution would be to obsolete the current 'protein complex' term and rename macromolecular complex to protein complex. If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

@bmeldal
Copy link

bmeldal commented Nov 8, 2016

The obvious way to implement Val's solution would be to obsolete the current 'protein complex' term and rename macromolecular complex to protein complex. If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

It's often borderline and down to personal interpretation if something is a prosthetic group or a 'full blown component'... And looking at how users interpret the terms, they expect to retrieve everything that's currently under macromolecular complex when they query on protein complex. But in one of the previous tickets there was hesitation about doing away with the 'protein-only protein complex' class.

Birgit

@dosumis
Copy link
Contributor

dosumis commented Nov 8, 2016

obsolete the current 'protein complex' term and rename macromolecular complex to protein complex

Could be done as a merge. The def of macromolecular complex would win. Might cause complaints downstream though if if keeps its ID but gets the name 'protein complex'.

@bmeldal
Copy link

bmeldal commented Nov 8, 2016

Complaints from users or scripts?

@paolaroncaglia
Copy link
Collaborator

Would it help, at least in part, to swap primary name and synonym for 'macromolecular complex'? I.e. name it 'protein-containing complex' (and keep 'macromolecular complex' as an exact synonym)

@dosumis
Copy link
Contributor

dosumis commented Nov 8, 2016

Complaints from users or scripts?

From consuming databases (see recent complaints from FlyBase). This is just a matter of strategy though. I think the most important thing is answering this question:

If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

@deustp01
Copy link

deustp01 commented Nov 8, 2016

Coming back to Birgit's last comment, "prosthetic group" can be defined so that it's not a borderline personal interpretation. It's a molecule that is not encoded directly or indirectly in the genome (i.e., not DNA, RNA, protein) that is associated with a protein and required for the enzymatic activity of the protein or the complex of which the protein is a part (Devlin Biochemistry, 4th edition, page 414). Stryer just says "non-protein", but that definition was clearly composed before the significance of ribozymes was understood, so I think we are allowed to ignore it.

Devlin then distinguishes cofactors and prosthetic groups by the strength of their association with the protein - loose / low-affinity for cofactors and tight / high-affinity / possibly covalent for prosthetic groups - but that subdistinction doesn't matter here.

On the Reactome definition of complex, where any association involving two or more molecules at least one of which is a protein, all are complexes. (Does GO require two or more polypeptides? - I think so.) But we all agree that a complex composed entirely of polypeptides can be distinguished from a complex composed of polypeptides and other stuff, be that stuff encoded proteins, peptides, RNAs, etc or unencoded heme, biotin, etc.

Which still doesn't resolve the issue whether it's useful to distinguish purely protein complexes from protein + other stuff ones.

@bmeldal
Copy link

bmeldal commented Nov 8, 2016

If we do this we would be unable to distinguish complexes consisting only of proteins (+ prosthetic groups & covalent modifications; see current protein complex def) from RNPs etc. Is everyone happy with that?

Tbh, we can't do this now either as many classes under macromolecular complex contain protein-only leaves but as they have sibling terms that do contain non-protein members the whole class has been re-classified. If we wanted to be true to any definition of protein-only complexes we'd have to sieve through all the leaves and add in the protein complex parent manually. That ain't gonna happen, is it?

@deustp01
Copy link

deustp01 commented Nov 8, 2016

If the distinction between protein-only and protein-mixed complexes were discarded ("Val's crazy suggestion" or something close to it) information would be lost but, I think, the problem in this thread would go away. So, who uses the information captured by this distinction? How would they be hurt by the loss?

@bmeldal
Copy link

bmeldal commented Nov 9, 2016

That's is the crucial point, Peter. At the moment, the mixed parentage is definitively causing issues for the users. Should we send a message to GO-discuss and GO-friends and ask what would work better?

@bmeldal
Copy link

bmeldal commented Nov 9, 2016

@ValWood I think we need to change the title so we can find the ticket again in the future as it has little to do with the actual 6-phosphofructokinase complex :(

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2016

This request stemmed from the Noctua workshop. Maybe one practical solution for now is to have @kltm or @cmungall make macromolecular complexes valid as entities in the complex generator in Noctua. That way they can be chosen if they have protein components.

@ValWood ValWood changed the title MP: GO:0005945 6-phosphofructokinase complex MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term Nov 9, 2016
@srengel
Copy link

srengel commented Nov 9, 2016

@ukemi 's suggestion would work for me :)

@ukemi ukemi self-assigned this Nov 11, 2016
@ValWood
Copy link
Contributor Author

ValWood commented Nov 11, 2016

In general, I think it is more harmful to have people retrieve "protein complex", and not get ribosome, spliceosome, telomerase, (the historical and current situation), than it is to retain the distinction between a protein complex and a macromolecular complex.
I often need to tell people to go up to macromolecular complex, and I often forget myself and search on "protein complex" by mistake.

I vote to discard the distinction between protein-only complex and protein-mixed complex. Its a simplification that I'm sure would HELP users.

@ukemi
Copy link
Contributor

ukemi commented Nov 11, 2016

What about changing the term name macromolecular complex to protein-containing complex? It fits the definition. Could we change the name of protein complex to make it more explicit that it only contains proteins?

@ValWood
Copy link
Contributor Author

ValWood commented Nov 11, 2016

but do we really need the distinction? would users be hurt by not making this distinction? (I think not).

we could even still have protein-RNA complex and protein-DNA complex. So if a user really, really did want to exclude ribosomes, telomerase, spliceosomes, DNA polymerase, MCM complex
which would be excluded from "protein complex" in the current scenario, they could take the "protein containing complex" annotations and subtract the "protein-DNA" and "protein-RNA" complex annotations....

@ValWood
Copy link
Contributor Author

ValWood commented Nov 11, 2016

Although the DNA-protein complexes ( at least GO:0043599 nuclear DNA replication factor C complex) appear to have is_a links to protein complex and protein-DNA complex. So if that is valid, this would be an alternative solution...but it seems a bit wrong?

@bmeldal
Copy link

bmeldal commented Nov 14, 2016

Val, I think these inconsistencies stem from the issues with TPVs: If a pre-terminal node has children that are a mix of protein-only complexes and protein-X complexes, the pre-terminal node belongs to protein-X complex but the terminal nodes may have a mix of both ancestries. That's what happened with the endonucleases :(

@cmungall
Copy link
Member

I thought this was already the case. Can you check. If it's not, file a
ticket in the Noctua tracker

On 9 Nov 2016, at 4:59, David Hill wrote:

This request stemmed from the Noctua workshop. Maybe one practical
solution for now is to have @kltm or @cmungall make macromolecular
complexes valid as entities in the complex generator in Noctua. That
way they can be chosen if they have protein components.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#12782 (comment)

@kltm
Copy link
Member

kltm commented Nov 17, 2016

@cmungall committed, but not deployed.

@ukemi
Copy link
Contributor

ukemi commented Dec 7, 2016

Just to recap for Thursday's editor's call and the discussion above.
What about changing the term name macromolecular complex to protein-containing complex and merging protein complex into it? It fits the definition and I think addresses the comment by @bmeldal . If that is the case, purely protein complexes would be annotated to the parent, for specific children such as protein-DNA complex, protein-lipid complex, protein DNA-RNA complex and protein carbohydrate complex.

@pgaudet
Copy link
Contributor

pgaudet commented Feb 16, 2018

@bmeldal
'Biogenesis' for a protein is translation, isn't it? Unless there is something special about the translation of the protein making up these complexes ???

Or were you talking about 'complex involved in process'? This is clearly a dangerous path....

Thanks, Pascale

@pgaudet
Copy link
Contributor

pgaudet commented Feb 16, 2018

I could merge x biogenesis into assembly for those, not sure that this is what the annotations were trying to capture. Looks rather like regulation of expression.

@bmeldal
Copy link

bmeldal commented Feb 16, 2018

I haven't used the biogenesis terms so better ask those who have. If it's just translation than there shouldn't be any specific x protein biogenesis terms unless something special happens :)

@ValWood
Copy link
Contributor Author

ValWood commented Feb 16, 2018

historically biogenesis terms were created for some processes when they knew that the production of something was affected but were not sure whether it was the transcription, translation, assembly etc.

The only strong case for keeping is "ribosome biogenesis" which researchers use to include rRNA processing, assembly and export from the nucleus because some of the steps don't appear to be separable (at least currently).

@hdrabkin
Copy link
Contributor

Well, I suppose that 'biogenesis' of a protein could mean other things besides translation depending on what you were referring to (ie, posttranslational events)

@pgaudet
Copy link
Contributor

pgaudet commented Feb 16, 2018

But then we have 'protein modification.... '

@hdrabkin
Copy link
Contributor

which, again, depending on what protein form you are referring to, would be included in biogenesis. It's a fairly broad grouping term.

@ValWood
Copy link
Contributor Author

ValWood commented Feb 16, 2018

yes it's historic, we shouldn't need them. If you can't be sure which process , don't make the annotation....

@ValWood
Copy link
Contributor Author

ValWood commented Feb 16, 2018

WooHoo.
I agree with @bmeldal this is quite a big change, maybe a post on go friends and the consortium list just as a heads up?

@ukemi
Copy link
Contributor

ukemi commented Feb 16, 2018

Thanks @pgaudet for taking this on. It was a very complicated merge/rename.

@pgaudet
Copy link
Contributor

pgaudet commented Feb 16, 2018

No problem!

I create 3 new tickets for the outstanding issues. Closing this one.

@bmeldal
Copy link

bmeldal commented Feb 16, 2018

Thank you everyone!!! I feel like celebrating!
I think I first discussed this topic with the then EBI editors 5 years ago :)

@bmeldal
Copy link

bmeldal commented Feb 16, 2018

We just had a IntAct/CP release but I will tweet about these changes next week. Leaving our release tweets on top of the news feed for a few days.

@ValWood
Copy link
Contributor Author

ValWood commented Feb 16, 2018

95 comments!

@deustp01
Copy link

No objections from here. We annotate the assembly of a complex to capture distinct functions mediated by the complex at various stages of its assembly, or to capture interactions with other physical entities that affect distinct steps of the assembly process, and we treat the assembly process as part of whatever process the complex itself mediates, not as a distinct process in its own right, so these changes in GO should not affect us,

@bmeldal
Copy link

bmeldal commented Feb 19, 2018

As I can't see the changes until they go public:
@pgaudet

  1. Did you move the "comment" from the old protein complex term to the renamed "protein-containing complex" term?
  2. Have you updated the synonyms?
    protein complex [narrow]
    protein-protein complex [narrow]

@bmeldal bmeldal reopened this Feb 19, 2018
@pgaudet
Copy link
Contributor

pgaudet commented Feb 19, 2018

Yes and yes

@pgaudet pgaudet closed this as completed Feb 19, 2018
@pgaudet pgaudet changed the title MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term Merge 'protein complex' term into macromolecular complex, rename 'protein-containing complex' (was: MP: GO:0005945 6-phosphofructokinase complex -> high level protein complex/macromolecular complex term) Mar 15, 2018
@ukemi ukemi moved this from In progress to Done in ontology weekly meetings Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment