New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Reactome xref for pyruvate kinase GO:0004743 #15921

Merged
merged 1 commit into from Jun 20, 2018

Conversation

Projects
None yet
4 participants
@ukemi
Contributor

ukemi commented Jun 20, 2018

No description provided.

@ukemi ukemi merged commit 63cfd70 into master Jun 20, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ukemi ukemi deleted the reactome-xref-test branch Jun 20, 2018

@goodb

This comment has been minimized.

goodb commented Jun 20, 2018

@ukemi I believe the old non-human xrefs should be deleted per our conversation with Peter, right? Just checking as I set up a process for doing all of them.

@deustp01

This comment has been minimized.

deustp01 commented Jun 20, 2018

I think so, yes. There are no currently valid Reactome annotations that can be accessed via REACT_ IDs but not via R-HSA- IDs, and resolving an old ID to find its new counterpart can be tricky (and, for all I know, may become impossible depending on what the developers are planning to do to our search engine and APIs).

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 20, 2018

@deustp01, here is an example of the stanza as it was before I modified it this morning (see the diff in this pull request). All I did was change the human xref to the HSA format. What Ben is asking is whether we should delete any reference to the non-human xrefs. I'm still a little concerned that someone somewhere might use them. If we retain them, we will have to replace them as well.

[Term]
id: GO:0004743
name: pyruvate kinase activity
namespace: molecular_function
def: "Catalysis of the reaction: ATP + pyruvate = ADP + phosphoenolpyruvate." [EC:2.7.1.40]
subset: gosubset_prok
synonym: "ATP:pyruvate 2-O-phosphotransferase activity" EXACT [EC:2.7.1.40]
synonym: "phosphoenol transphosphorylase activity" EXACT [EC:2.7.1.40]
synonym: "phosphoenolpyruvate kinase activity" EXACT [EC:2.7.1.40]
xref: EC:2.7.1.40
xref: MetaCyc:PEPDEPHOS-RXN
xref: Reactome:REACT_101257 "phosphoenolpyruvate + ADP => pyruvate + ATP, Saccharomyces cerevisiae"
xref: Reactome:REACT_101740 "phosphoenolpyruvate + ADP => pyruvate + ATP, Mus musculus"
xref: Reactome:REACT_107647 "phosphoenolpyruvate + ADP => pyruvate + ATP, Schizosaccharomyces pombe"
xref: Reactome:REACT_109252 "phosphoenolpyruvate + ADP => pyruvate + ATP, Canis familiaris"
xref: Reactome:REACT_112705 "phosphoenolpyruvate + ADP => pyruvate + ATP, Oryza sativa"
xref: Reactome:REACT_114095 "phosphoenolpyruvate + ADP => pyruvate + ATP, Arabidopsis thaliana"
xref: Reactome:REACT_114481 "phosphoenolpyruvate + ADP => pyruvate + ATP, Plasmodium falciparum"
xref: Reactome:REACT_115607 "phosphoenolpyruvate + ADP => pyruvate + ATP, Gallus gallus"
xref: Reactome:REACT_1524 "phosphoenolpyruvate + ADP => pyruvate + ATP, Homo sapiens"
xref: Reactome:REACT_30287 "phosphoenolpyruvate + ADP => pyruvate + ATP, Danio rerio"
xref: Reactome:REACT_32324 "phosphoenolpyruvate + ADP => pyruvate + ATP, Bos taurus"
xref: Reactome:REACT_78511 "phosphoenolpyruvate + ADP => pyruvate + ATP, Taeniopygia guttata"
xref: Reactome:REACT_78996 "phosphoenolpyruvate + ADP => pyruvate + ATP, Xenopus tropicalis"
xref: Reactome:REACT_81895 "phosphoenolpyruvate + ADP => pyruvate + ATP, Rattus norvegicus"
xref: Reactome:REACT_85253 "phosphoenolpyruvate + ADP => pyruvate + ATP, Dictyostelium discoideum"
xref: Reactome:REACT_88726 "phosphoenolpyruvate + ADP => pyruvate + ATP, Gallus gallus"
xref: Reactome:REACT_89006 "phosphoenolpyruvate + ADP => pyruvate + ATP, Staphylococcus aureus N315"
xref: Reactome:REACT_91947 "phosphoenolpyruvate + ADP => pyruvate + ATP, Sus scrofa"
xref: Reactome:REACT_99833 "phosphoenolpyruvate + ADP => pyruvate + ATP, Escherichia coli"

@deustp01

This comment has been minimized.

deustp01 commented Jun 20, 2018

A big (fatal?) practical problem with the non-human xrefs is that they are NOT necessarily stable from one Reactome release to the next.

The exception is non-human instances that were manually annotated, in order to support manual inference of a human event for which there is no experimental evidence. There are only a few of these.

The bulk of our non-human instances are created by computational inference based on sequence similarity. The previous (REACT_ version) of our code tried to generate truly stable IDs for these computationally inferred instances and in fact failed badly. The current version (I think I said this near the start of the discussion of xref updating, probably on the RHEA ticket), for any computationally inferred instance, gives it an identifier R-XXX-##### where XXX identifies the species of the inferred instance (MMU = mouse, for example) and ##### is the numerical part of the stable ID for the human instance from which this one was inferred. That means that the XXX identifier persists only as long as its human counterpart does AND only if ortho-inference is successful. If either condition is not met, the model organism identifier disappears silently.

So you could make xrefs and many would persist correctly for a long time but a fraction (small but unpredictable) would change with every Reactome release and maintaining a completely correct set of xrefs for nonhuman instances would require a full update every 3 monts, times to our release cycle.

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 20, 2018

OK. So I think the plan that @goodb suggests is a good one. Since there will be no procedure in place to update the non-human xrefs, it is better just to leave them out for now.

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 20, 2018

Just to be sure we dot all our i's and cross all our t's. Here is the line in today's mapping file that should change tomorrow:

Reactome:REACT_1524 > GO:pyruvate kinase activity ; GO:0004743

Note that all the other lines are also there. These will disappear once we run @goodb's update, eg:

Reactome:REACT_101257 > GO:pyruvate kinase activity ; GO:0004743

@goodb

This comment has been minimized.

goodb commented Jun 20, 2018

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 20, 2018

I think the most straightforward way would be for you to update the ontology and put in a pull request when @deustp01 and I have a chance to sanity check the changes. I suspect we would want to take an hour or so and just make sure that a subset of the mappings look ok.

We should also fix the definition xrefs. The definition xrefs are a mixed bag. Ones like GOC:mah refer to a GO curator, in this case @mah11, GOC:dph refers to me. These were used to identify curators who worked on terms in case questions came up about the term in the future. Ones like GOC:Signaling refer to projects. In these cases, we often go back to project notes if questions arise about the terms.

@deustp01, @vanaukenk , @goodb is it time for a touch base call some time this week?

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 20, 2018

One thing we want to be sure of first is that your changes only result in the deletion and addition of REACTOME xrefs. Is there any way you can do that programmatically?

@goodb

This comment has been minimized.

goodb commented Jun 20, 2018

@goodb

This comment has been minimized.

goodb commented Jun 20, 2018

@deustp01

This comment has been minimized.

deustp01 commented Jun 20, 2018

I can't find a mapping either. The closest I can get is complex R-HSA-2466116 formed in reaction R-HSA-1474244. This complex does consist of one integrin alphav subunit (chosen from a menu), one integrin beta5 subunit (also chosen from a menu), and vitronectin. I suspect that this complex superseded REACT_14045, but I can't find any journaling record to verify this (historically we were not reliable about this) - sorry

@goodb

This comment has been minimized.

goodb commented Jun 21, 2018

Correction, I found more than a 100 missing mappings. @deustp01 might be worth having your team check on these. See below for a text file with the problem children.

missing_mapping_old_new_reactome.txt

@deustp01

This comment has been minimized.

deustp01 commented Jun 21, 2018

The list has 161 entries of type xref and one of type IAO_0000115. Not being able to resolve that number of old Reactome IDs is not surprising, unfortunately - when the old system collapsed, a fair number of the old IDs were corrupted so when new-form identifiers were assigned to entity and event instances, we didn't have a reliable old-form ID to pair with them and so we have no mapping.

If I understand right, though, we should have a way to get reliable GO - Reactome xrefs. What the xref appears to do is connect a GO molecular function term, e.g., GO:0055056 "D-glucose transmembrane transporter activity" to a Reactome reaction that is enabled by a protein or protein complex with that molecular function attribute. If that's right, then a list of all of the Reactome reaction instances that should be listed as xrefs for GO:0055056 can be found by looking for all catalystActivity instances in Reactome that have that GO term as their "activity" attribute and looking in turn for all reactionlike event instances that have each of those catalystActivity instances as their catalystActivity attribute.

GO:0055056 is referred to by six catalystActivity instances, each associating transport activity with a different protein, complex, or set of proteins:
[CatalystActivity:211383] D-glucose transmembrane transporter activity of GLUT2 / SLC2A2 tetramer [plasma membrane]
[CatalystActivity:429072] D-glucose transmembrane transporter activity of SLC2A6,8,10,12 [plasma membrane]
[CatalystActivity:5358708] D-glucose transmembrane transporter activity of GLUT1 / SLC2A1 tetramer [plasma membrane]
[CatalystActivity:8981550] D-glucose transmembrane transporter activity of GLUT4 / SLC2A4 tetramer [plasma membrane]
[CatalystActivity:8981551] D-glucose transmembrane transporter activity of GLUT14 / SLC2A14 tetramer [plasma membrane]
[CatalystActivity:8981556] D-glucose transmembrane transporter activity of GLUT3 / SLC2A3 tetramer [plasma membrane]

The first of these catalystActivity instances is associated with two Reactome reactions:
[Reaction:450095] GLUT2 (SLC2A2) transports Glc from cytosol to extracellular region [Homo sapiens] stableID R-HSA-450095.2 and
[Reaction:8981574] GLUT2 (SLC2A2) tetramer transports Glc from extracellular region to cytosol [Homo sapiens] stableID R-HSA-8981574.1

The second is associated with one reaction:
[Reaction:429094] SLC2A6,8,10,12 transport Glc from extracellular region to cytosol [Homo sapiens] stableID R-HSA-429094.1

The third is associated with one reaction:
Reaction:5339524] GLUT1 (SLC2A1) tetramer transports Glc from extracellular region to cytosol [Homo sapiens] stableID R-HSA-5339524.3

The fourth is associated with one reaction:
[Reaction:8981570] GLUT4 (SLC2A4) tetramer transports Glc from extracellular region to cytosol [Homo sapiens] stable ID R-HSA-8981570.1

The fifth is associated with one reaction:
[Reaction:8981553] GLUT14 (SLC2A14) tetramer transports Glc from extracellular region to cytosol [Homo sapiens] stable ID R-HSA-8981553.1

The sixth is associated with one reaction:
[Reaction:8981564] GLUT3 (SLC2A3) tetramer transports Glc from extracellular region to cytosol [Homo sapiens] stableID R-HSA-8981564.1

Bottom line – if I’ve understood the xref system correctly, GO:0055056 should have seven xrefs R-HSA-450095, R-HSA-8981574, R-HSA-429094, R-HSA-5339524, R-HSA-8981570, R-HSA-8981553, and R-HSA-8981564. (The version suffixes are stripped because we want the xref to resolve to the current version of the Reactome reaction instance).

Tangential point. One catalystActivity instance was associated with two reactions while the others were associated with one each because under normal physiological conditions all of the transporters can mediate uptake of extracellular glucose into the cytosol, but only GLUT2 ever mediates glucose export, and as we annotate the two directions of a reversible reaction as two separate reaction instances (and only annotate physiologically relevant reactions), we get two reactions for GLUT2 nd one each for all the others.

I worked this example by hand as a sanity test, but if this result looks correct there is surely a way to automate the recursive searching to find the names and stable identifiers of all of the Reactome reaction instances corresponding to each of the GO terms on the list.

@goodb

This comment has been minimized.

goodb commented Jun 22, 2018

If team GO gives the green light, I'm pretty sure I could implement that xref generation pattern using the BioPAX export. I like the pattern - it would make it fairly straightforward to make a script that kept things up to date using it. Should this be done on the Reactome side (by Reactome developers) or on the GO side? How are similar xrefing scenarios usually handled?

@goodb

This comment has been minimized.

goodb commented Jun 26, 2018

Here is a version of go-edit.obo with all of the old REACT_ identifiers either replaced with their current R-HSA-.. identifier or deleted (if no current mapping exists or they were pointed to non-human ids) and a text report describing the changes made. I don't have permission to add a branch to the GO git. If I can get that permission I can push it there so you can review changes via git, otherwise someone more privileged can make the commit. (This uses the id mapping file provided by the Reactome team, it does not use the pattern @deustp01 describes above.)

reactome_xref_update.zip

@cmungall

This comment has been minimized.

Member

cmungall commented Jun 26, 2018

you should have permission now

(but note that anyone can fork)

Looks like this is consistent with identifiers.org
https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000018

and with db-xrefs.yaml

- database: Reactome
  name: Reactome - a curated knowledgebase of biological pathways
  synonyms:
    - REACTOME
    - REAC
  rdf_uri_prefix: http://identifiers.org/reactome/
  generic_urls:
    - http://www.reactome.org/
  entity_types:
    - type_name: entity
      type_id: BET:0000000
      id_syntax: R-[A-Z]{3}-[0-9]+(-[0-9]+){0,1}(\.[0-9]+){0,1}
      url_syntax: http://www.reactome.org/content/detail/[example_id]
      example_id: Reactome:R-HSA-109582
      example_url: http://www.reactome.org/content/detail/R-HSA-109582

goodb pushed a commit that referenced this pull request Jun 26, 2018

goodb
updated Reactome xrefs
Using mapping file provided by Reactome, executed a script that replaced the old REACT_ ids with their corresponding current HSA_..  ids.  Also removed xrefs to non-human reactome ids as these are not stable and are generated automatically from the human ids.  Removed old xrefs to REACT_ids that have no current mapping.  See issue comments #15921 for more information.
@goodb

This comment has been minimized.

goodb commented Jun 26, 2018

Okay I pushed the branch for review. https://github.com/geneontology/go-ontology/tree/issue-15921 If folks are satisfied please go ahead and pull or let me know and I will issue the pull request.

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 26, 2018

Wow! 2,986 additions, 28,838 deletions

@deustp01 and I will need to check this carefully. It might take a few days. The diff is large enough that we have to look at it locally. It might make more sense to look at the above file.

@goodb

This comment has been minimized.

goodb commented Jun 26, 2018

Many of the deletions are old references to non-human species ids. In any case, ball is in your court now. Let me know if I can help. The associated report file in the zip may be useful.

@goodb goodb referenced this pull request Jun 26, 2018

Closed

Update Rhea xrefs #15927

@ukemi

This comment has been minimized.

Contributor

ukemi commented Jun 28, 2018

I just tested to be sure that after pulling I could switch to your branch and look at the file in Protege. I didn't even need to ask Erich and hope it is all ok!!! It looks like the HSA refs are there in the branch I am looking at so @deustp01 and I can start looking at these together. Peter, we could look at these randomly or come up with a plan to do it systematically. Do you have any suspicions about ones that might be problematic to look at first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment