-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add active unit information from Reactome #31
Comments
@fabregat what do you think would be the best way to access the active unit information ? |
I don't see a way to get the info. from the Web API but did manage to find it in the graph database. Noting to self query will look something like: |
Working now with information gathered from the graph database. Uncomfortable about longevity of this solution.. Would prefer to use one standard and extend that standard as needed (e.g. by adding content into the BioPAX export even when its not in the specification - doing so should not break any semantic web capable code that consumes it.) |
To get the info using the ContentService, you will need two queries. Let's take the reaction you shared above (R-HSA-450346). The first query is Its result is in TSV format containing Identifier, DisplayName and SchemaClass:
Taking the first column, then you perform a second query And its results, follows the same format, and it is
Please note that in case you need more info about the last EWAS, then you can query the ContentService: Since you've also mentioned the graph database, I strongly recommend you to go this way since it will be faster on your end ;) The query you suggest is almost correct, and I say almost because for other identifiers corresponding to other types of reactions (BlackBoxEvent for example) it wouldn't work. In any case, the fix is very easy (please see in bold what I've changed):
Related to extending the BioPax format in favour of the options above, maybe we could set a meeting to discuss how to go forward? |
@fabregat thank you! Indeed I'm happy that I ended up down the graphdb path. It would have been quite slow the other way and this will clearly open up more possibilities. (But see my email about problems getting your Java access project to build). I updated my query to include ReactionLikeEvent and seems to be working - captures about 120 more relations. Yes, I'd like to talk about how to get this information into your BioPAX export somehow. Although it would end up off-standard, an extension to the BioPAX ontology or even a hacky use of xrefs or comments could do the job for the short term and provide an example for extending the standard in the future. It might also be worth talking with @cmungall about new standardized ways for sharing graph databases like your neo4j version. Longer term, that work might end up replacing the BioPAX stuff, though I think that is likely a way off. |
Need to convert to take the data out of a new provided statement in comments on Control And Catalysis entities. e.g. activeUnit: #Protein26 . This will replace the code that uses the graphdb. Future versions of the biopax export will contain this information. Note that the active unit e.g. #protein26 may or may not be otherwise linked to in biopax file, but should be present. |
example file: |
Question for @ukemi and @deustp01 when we have an active unit annotated on a complex that is not catalyzing but rather exerting a regulatory effect on the reaction, how should that be captured? For catalysis we have the enabled by / contributes to structure. What should we use for regulates? e.g. protein involved_in_negative_regulation of reaction, complex has_part protein, complex ?relation? reaction ? |
I think I can answer my own question here. The plan is to go ahead assert the involved_in_regulation triple for the active entity. This happens in the first phase of processing. It will be picked up in the second phase and converted to the pattern from #39 |
Test with RAF-independent MAPK1/3 activation |
Hmm, I haven't thought this one through but that relation is mostly
intended for inference rather than assertion
…On Fri, Feb 15, 2019 at 3:07 PM goodb ***@***.***> wrote:
I think I can answer my own question here. The plan is to go ahead assert
the involved_in_regulation triple for the active entity. This happens in
the first phase of processing. It will be picked up in the second phase and
converted to the pattern from #39
<#39>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADGOQLUGvcxgEm3bCQNQPV7FtAiy7U7ks5vNz2wgaJpZM4ZauEK>
.
|
In Reactome, "active unit" is an attribute of the catalyst activity class only, and is meant to be used when the physical entity doing the catalysis is a complex and one protein or subcomplex part is known to be the actual catalyst. We don't have an equivalent attribute on our regulator class (but it sounds like a good idea). |
@deustp01 I'm seeing it in the biopax that @guanmingwu enhanced with the additional activity information. For example, here is a biopax control structure showing negative regulation with an annotated active unit (PEA15) for the reaction "Phosphorylated MAPKs translocate into the nucleus" in pathway "RAF-independent MAPK1/3 activation" This kind of thing is present in both the test file he sent for this pathway and looks like it shows up 66 times for the full Homo Sapiens batch export. Is this an error? I don't see any indication of active unit on the corresponding control element in the public interface or the curatorial interface for that reaction. |
For what its worth, this is what the end result of the current transformation looks like for this one. Note that the involved_in_regulation triple (I mentioned above and start this process with) ends up being replaced with the regulatory binding activity pattern. (Sorry Neo not loaded, UniProtKB:Q15121 = PEA15.) |
@goodb The problem is at my end. Despite what I said on Saturday, "active unit" is an optional attribute of regulation instances and it has been used to annotate 578 of the 2108 regulation instances in our released database. The annotation criteria should be the same as for active unit annotations of catalyst activity: the physical entity associated with the activity is a complex of two or more different gene products and there is evidence that allows one of them (or a subcomplex) to be identified as the part of the complex that is directly doing the catalysis or regulation. I don't know how to do a query to filter out the non-human instances but they are a minority, so we have a new problem: why only 66 problem instances detected when it looks like there should be hundreds? Guanming will be back in about a week, I think. Sorry. |
@deustp01 Let me confirm the 66 number with a more careful test. (I pulled that out with a very fast hack to see if there was more than one.. want to double check). In the meantime, do you think the representation depicted in the figure above is suitable for representing this information in GO-CAM? |
My initial reaction to the figure above is that there is nothing enabling the binding reaction in the lower middle. It only has an input. That seems weird. |
Okay, I made a mistake with the 66 number. Shouldn't have reported that specifically before verifying. For the homo sapiens release that Guanming prepared for me in January I see: |
On the representation, it would be good if @ukemi could weigh in. From my perspective I never really understood what was wrong with the initial entity involved_in_regulation_of function pattern but everyone pretty adamantly did not like that. The only difference here as compared to that one is that we have pulled the active unit out of its complex. According to the pattern, If the reaction there (Phosphorylated MAPKs translocate to the nucleus) had something enabling it, that something would be attached as an enabler of the binding |
We need to make sure that each of the above types of active entities are represented in the pathways that we systematically review. Can we get a reactome identifier and the corresponding models for: The reaction above is exactly why I have argued that kinase activity and phosphorylation represent a molecular function and a biological process respectively. In this case the activities of two independent kinases modify two different residues on the target protein. This is what I have always argued is the process of phosphorylation. @vanaukenk EDIT (Adding examples requested above:
|
It is misleading to me. It looks like the chemical is being regulated. I think it is also mixing a bit of apples and oranges. These models show the flow of activities that are enabled by continuants. The old representation short-circuits the activity aspect of the regulation. |
I don't really want to argue against everyone opposed to the use of the involved_in properties but the relation is unambiguous - the continuant entity is involved in the regulation of the occurrent function. Like many things, this may be confusing in the display. |
I completely agree that it is not technically wrong, but it blurs over the actual activity that is occurring. Don't you think we should state how the continuant is involved if we know? I don't want people to be mislead into thinking this represents, for example, AMP regulates PFK activity. We have worked so hard to tell people that something that is happening regulates PFK activity. I agree that the display makes it more confusing. |
I get the point that if the chemical citrate is the product of reaction 1, and that citrate molecule regulates reaction 2, the Reactome assertion that citrate regulates reaction 2 maps cleanly and reliably onto the assertion that reaction 1 regulates reaction 2. What about cases where reaction 2 is responsive to the level of a small molecule and that level is determined by the flux through several reactions, some generating citrate and some consuming it? There is certainly not a 1:1 mapping, and a many to one mapping, even with flags to identify positive and negative contributors is misleading without quantitative / stoichiometric features that neither of us want to indicate how much each of the contributing reactions contributes. Generalizing, anywhere that there is a pipeline, A acts on B acts on C acts on D and the organization of the pipeline physically excludes participation by entities outside the pipeline, mapping entities to the reactions that produce them works cleanly. Whenever we're dealing with a system where there are several sources and sinks affecting the level of an entity that does something to another reaction / activity, it's hard to see how to get a reliable mapping, especially if the mapping is also supposed to yield a direction (i.e., the net effect is to raise citrate levels and thus promote / lower citrate levels and thus suppress a downstream activity). Hard for me to see, anyway. If there's a fix I'm missing, we move on to implementing it in the Reactome to GO-CAM process. But maybe there is a workaround. In classic metabolic cases where the end product of a pathway feedback-inhibits an early reaction in the pathway (e.g., the products of purine biosynthesis AMP, GMP, and IMP all negatively regulate the activity of PRPP synthase, the mapping is clean: each of those molecules is generated in a reaction in the pathway, so those reactions each negatively regulate the PRPP synthase reaction. In a case like regulation by citrate or ATP where there may be many sources and sinks and indeed the engineering goal is to integrate over all of them to determine the need for the reaction that the citrate or ATP is regulating, would it be acceptable to create a placeholder reaction on the fly, "synthesis of ATP" and make that the positive regulator (and maybe "breakdown of ATP" as a negative regulator)? |
This looks like bad annotation practice, and that the curator is trying to make one reaction instance do the job of at least two, by cramming two distinct ways of providing catalytic activity into a single catalystActivity instance attached to a single reaction instance. If Ben could provide the list of 204 control events with more than one active unit annotated, I will take a look to see if this initial reaction is right and if so, work with Reactome people to come up with a plan for changed annotation practice henceforth and cleanup of the legacy instances. My hunch is that clean-up of the annotations could probably be automated, but all the new reactions that will appear will need to be laid out manually in pathway diagrams and this part will be painful. |
Looks like we will have plenty to do next week. If you don't get around to these, we can look together.
The difficulty in annotating a reaction (or two) here is that choosing which one comes first might not be known or even consistent. That's why I would invoke a process with two kinase activities as parts. Not that I stubbornly argue a point. :) |
But our view since the beginning has been that it is not the citrate regulating reaction 2, but that the citrate is taking part in something that is happening that regulates reaction 2. That's why we have processes 'regulation of x'. Since we are gene-centric, we express the something that is happening in the context of what genes are doing. So the PFK binding the citrate is what is regulating the kinase activity. I think this also cleans up the mass action difficulties. If PFK binds citrate, it negatively regulates the kinase activity. |
PS. It makes total sense from a reaction point of view that the citrate (product of some reaction) is able to negatively regulate some downstream reaction. |
OK. Some of these are definitely processes from a GO perspective. |
@goodb. What about splitting out the regulatory processes into a separate model, but referring to the same Reactome pathway. Would that give us the best of both worlds? It would get rid of the clutter that you find undesirable and would give me the explicit process-based representation. |
I'd really prefer to keep a 1 to 1 relationship between Reactome pathways and GO-CAM model. @ukemi I find the clutter and layout mess annoying from a UI perspective, but that is a separate issue from the modeling. We should not let weaknesses in the UI translate into weaknesses or hacks in the data model. The UI can be fixed once the structures in the model are stable. (Worst case is that I spend a week and hack together a new layout algorithm that is aware of the recent changes. Best case is that we get a client UI developer involved that has more power to control the system and can do more expand/contract operations on the graph entities...) |
After further pondering, I am slowly coming around to what you are referring to as the process-based representation for this case. The key for me is that it matches up better with the structure of everything else happening in the GO-CAM kb and this is really important for query. As an aside, the structures that are being established here should be stored somewhere as templates for curators once they are agreed upon... |
"Since we are gene-centric, we express the something that is happening in the context of what genes are doing. So the PFK binding the citrate is what is regulating the kinase activity." OK. We will need to look at a sample of regulation instances to be sure that we can translate Reactome's "entity X regulates an activity" into GO-CAM "entity X interacts with the enabler of the activity and that regulates the activity". Note the swapping of "binds" for "interacts": sometimes we do know that the interaction takes the form of binding, but we probably can't guarantee this. Indeed, there are likely to be at least some cases where the data say that an activity increases or decreases when an entity is present (so "entity regulates activity" is valid) but we have no data as to mechanism (so "binds" goes beyond the data). |
"split out regulation" etc. I think the way we were going last week and also here, we aren't really doing this. We are more nearly looking for ways of collapsing the whole Reactome annotation of a regulatory process that affects glycolysis or BMP signaling into a simple assertion that X entity regulates Y activity in the process being converted into GO-CAM. If there's a lot of Reactome annotations hidden inside that simple assertion, that's lost for now and can be brought back in a future UI that enables navigation between GO-CAM models. Our goal now is to make GO-CAMs that are good templates for future GO curation, so suppression of some human-specific, possibly out-of-scope annotations seems OK, especially as the surviving regulation stubs can be used in the future to reassemble the lost pieces. |
If we can't invoke binding, then I think we should explore your idea above. We could invoke a regulatory process in which the entity is simply a participant. When you curate one of these types of regulatory events, is it always because the entity is somehow involved in regulating the whole pathway? |
So maybe Ben's original representation is the best for now, which essentially said entity X is somehow involved in the regulation of MF A. Sorry to be going round and round with this, I thought we could always assert binding in these representations. |
For some reactions, like 'activated human TAK1 phosphorylates MKK3/MKK6', Reactome says that a protein complex (or set) catalyzes the reaction, and then indicates a specific member of that complex (or set) as the 'active unit'. In this case, the modified protein 'p-T184,T187-MAP3K7' is the active unit and 'Activated TAK complexes' is the 'complex'. Note that several of the members of 'Activated TAK complexes' contain a MAP3K7.
To close this issue:
The text was updated successfully, but these errors were encountered: