Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align GO JSON-LD context with dipper curie-map #582

Open
cmungall opened this issue May 8, 2018 · 11 comments
Open

Align GO JSON-LD context with dipper curie-map #582

cmungall opened this issue May 8, 2018 · 11 comments

Comments

@cmungall
Copy link
Member

cmungall commented May 8, 2018

TODO: report on clashes

@cmungall
Copy link
Member Author

cmungall commented May 8, 2018

@cmungall cmungall changed the title Align GO and Monarch json-ld contexts Align GO JSON-LD context with dipper curie-map May 8, 2018
@TomConlin
Copy link
Contributor

  1. I applaud giving the primary database URLs their due.
  2. dipper warns on non 1:1 maps https://github.com/monarch-initiative/dipper/blob/master/dipper/utils/CurieUtil.py#L21 GO might want to as well
  3. mind httpS where possible

@cmungall
Copy link
Member Author

I applaud giving the primary database URLs their due
The problem here is that these are often ad-hoc and subset to change. In GO we are going for identifiers.org, more stable, predictable

dipper warns on non 1:1 maps https://github.com/monarch-initiative/dipper/blob/master/dipper/utils/CurieUtil.py#L21 GO might want to as well

We should look at merging dipper CurieUtil and https://github.com/prefixcommons/prefixcommons-py

mind httpS where possible

I have gotten assurances from identifiers.org that they will support http in perpetuity. Same of course true for OBO. Stability is key here

@cmungall
Copy link
Member Author

Note the list above only includes cases where the prefix matches or the URL matches.

It isn't reporting the fact that GO has

- database: Reactome
  name: Reactome - a curated knowledgebase of biological pathways
  synonyms:
    - REACTOME
    - REAC
  rdf_uri_prefix: http://identifiers.org/reactome/
  generic_urls:
    - http://www.reactome.org/
  entity_types:
    - type_name: entity
      type_id: BET:0000000
      id_syntax: R-[A-Z]{3}-[0-9]+(-[0-9]+){0,1}(\.[0-9]+){0,1}
      url_syntax: http://www.reactome.org/content/detail/[example_id]
      example_id: Reactome:R-HSA-109582
      example_url: http://www.reactome.org/content/detail/R-HSA-109582

whereas dipper has

'REACT': 'http://www.reactome.org/PathwayBrowser/#/'

It looks like we have just recommended REACT to translator folks ah well. I'm not sure where this abbreviation came from.

But the URL is a good example of a bad semantic web PURL http://www.reactome.org/PathwayBrowser/#/

cmungall added a commit to biolink/biolink-model that referenced this issue May 10, 2018
@jmcmurry
Copy link
Member

jmcmurry commented May 19, 2018

Please note that the shortform curie resolution is now supported in identifiers.org. For example http://identifiers.org/MGI:3764834, my preference would be to use these simple URIs throughout our stack, except for OBO purls and other sources that have additional semantic sugar. I've made specific recommendations here.

@nathandunn
Copy link

@jmcmurry (sorry to interject) I was talking with @TomConlin about this. I think that its going to be problematic even if it goes to the canonical source. I think you're going to run into problem if you squat on the base-level CURIE. I would propose something like (such that its always scoped):

http://identifiers.org/monarch/MGI:3764834

This way, if the AGR, MONARCH, MGI, etc. can choose where their external links resolve and it reduces any possibility of data collision along the way. Doing it this way, you don't really have to consult anyone outside Monarch, whereas doing it at the root level will require a higher level of coordination for establishing and changing them.

@nathandunn
Copy link

But I really do like the identiferis.org approach overall. Its a nice approach to the ever moving / dying web. I'm not sure if there is a better solution, I would just scope any curie in a way that you can own it long-term.

@nathandunn
Copy link

Just to clarify my point. It might be fine to use the short-form if, for example, MGI is committed to supporting it internally, as they do the rest of their IDs, but even then, I think you are better off coming up with a scoping model. The reasons are:

1 - prevent potential collisions (can you register an entire CURIE?)

2 - allow an organization that doesn't own the IDs to quickly update changed external IDs (for example, if a downstream organization is using your IDs in a load, so they won't pickup your changed links)

3 - allows for individual organizations to change where a pointed ID goes to, as there are several entities that house the same IDs. e.g., external links on http://identifiers.org/monarch/MGI:107476 points to http://www.informatics.jax.org/marker/MGI:107476 , but http://identifiers.org/myorg/MGI:107476 points to https://www.alliancegenome.org/gene/MGI:107476

4 - at a minimum I don't think we'll be able to grab CURIE's for organizations we don't actively own (I imagine orgs would furious if an organization other than their own controlled their CURIE). It wouldn't be a bad thing to encourage the MODs (for example) to register these with identifiers.org as @jmcmurry suggested, though.

This sort of resolves to a poor man's DNS in some ways, but I think it simplifies things quite a bit.

@cmungall / @jmcmurry / @TomConlin I would be happy to chat about this. A lot of orgs are going to face this. I think that identifiers.org is the right way to do this for many reasons, but I think there needs to be a bit of nuance on the implementation.

@cmungall
Copy link
Member Author

@nathandunn I think you're starting from some different assumptions. Primary use case here is joining triples, not resolution, hence URIs must be identical, organism-specific URIs contrary to this.

choice is between standard id.org URLs or the newer ones that embed CURIEs in URL directly, latter is preferable for many reasons but concerns over effect of colons in various semweb specs

@nathandunn
Copy link

@cmungall Thanks for the clarification, and sorry for any confusion. Yes, the CURIE is a no-brainer.

@jmcmurry
Copy link
Member

No prob Nathan, agreed we would never ever squat on a curie for our own 3rd party purposes. It would break trust of both users and providers.
The new identifiers.org syntax is such that a provider can be specified OR omitted as the user desires; however, where the user omits provider they're redirected to whichever of the trusted authoritative original sources and their close collaborators have the best "up time" record that month. There are some issues related to that, but it is what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants