Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: correcting the definition of subject_type and object_type #323

Closed
matentzn opened this issue Oct 12, 2023 · 13 comments
Closed

Comments

@matentzn
Copy link
Collaborator

Currently the subject_type and object_type fields are defined using an enum.

 entity_type_enum:
    permissible_values:
      owl class:
        meaning: owl:Class
      owl object property:
        meaning: owl:ObjectProperty
      owl data property:
        meaning: owl:DataProperty
      owl annotation property:
        meaning: owl:AnnotationProperty
      owl named individual:
        meaning: owl:NamedIndividual
      skos concept:
        meaning: skos:Concept
      rdfs resource:
        meaning: rdfs:Resource
      rdfs class:
        meaning: rdfs:Class
      rdfs literal:
        meaning: rdfs:Literal
      rdfs datatype:
        meaning: rdfs:Datatype
      rdf property:
        meaning: rdf:Property

When I added this, I intended to be able to use this enum like this:

subject_id .... subject_type
HP:0000118 ... owl:Class

Now I learn, that, according to the LinkML model, this is how it would look like on data level:

subject_id .... subject_type
HP:0000118 ... owl class

I still need to find our how to define an enum that takes exactly owl:Class, owl:ObjectProperty, owl:AnnotationProperty as values, but in a way that it is understood that these are curies?

So when I translate say this dataset:

subject_id .... subject_type
HP:0000118 ... owl:Class

into RDF, that I get:

[] a sssom:Mapping;
sssom:subject_type <http://www.w3.org/2002/07/owl#Class> ;
sssom:subject_id <http://purl.obolibrary.org/obo/HP_0000118> .

Help @sierra-moxon @hrshdhgd

@hrshdhgd
Copy link
Contributor

As per this , I'm guessing permissible_values should be the value of meaning (which I know will not fit well) but I'll let Sierra answer since she has more vast knowledge about linkml in general than me.

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Oct 12, 2023

My guess is something like:

 entity_type_enum:
    permissible_values:
      "owl:Class":
      "owl:ObjectProperty":
      "owl:DataProperty":
      "owl:AnnotationProperty":
      "owl:NamedIndividual":
      "skos:Concept":
      "rdfs:Resource":
      "rdfs:Class":
      "rdfs:Literal":
      "rdfs:Datatype":
      "rdf:Property":

But then it'll be considered as a string rather than a uriorcurie

@sierra-moxon
Copy link
Contributor

Right @hrshdhgd - after looking at the code, I don't think rdfgen will automatically expand those strings to URIs. Is that ok? There is a bit of code in owlgen that we probably want to add to rdfgen.

@hrshdhgd
Copy link
Contributor

I don't think rdfgen will automatically expand those strings to URIs. Is that ok?

We would want it to be URIs.

https://github.com/linkml/linkml/blob/5c1cbeae9abb841b4d35e23d73e4945ea2685e29/linkml/generators/owlgen.py#L693-L695

may be useful, Nico?

@matentzn
Copy link
Collaborator Author

Thank you both for your support. Before we look into solutions for the problem, I would like to understand what the design intention is here on the LinkML side.

The question is simply: can an instance of an enum (a value) be an entity reference (uriorcurie), or not?

So if I have

subject_id .... subject_type
HP:0000118 ... owl:Class

or

"mappings": [ {
"subject_id": "HP:0000118"
"subject_type": "owl:Class"
} ]

Can the enum that restricts subject_type be defined at all to be a uriorcurie instance so that if I translate into RDF, I get:

[] a sssom:Mapping;
sssom:subject_type <http://www.w3.org/2002/07/owl#Class> ;
sssom:subject_id <http://purl.obolibrary.org/obo/HP_0000118> .

@sierra-moxon
Copy link
Contributor

sierra-moxon commented Oct 13, 2023

I think so yeah; it's just that not all generators are feature-complete. The OWL generator code that I and Harshad linked is the implementation that I think we need to add to rdfgen (you are using rdfgen in SSSOM right?)

@matentzn
Copy link
Collaborator Author

Thank you. It is good to hear that it is not a conceptual issue. So forgetting about RDF for now - can I somehow specify the base type of the enum? Stating that it is an instance of say, uri or curie or is this currently not possible in the spec?

@sierra-moxon
Copy link
Contributor

sierra-moxon commented Oct 13, 2023

No, we can't specify the range of a permissible value at this point, but I like the idea. "meaning" range is restricted to a uriorcurie however, so we can assume that the meaning can be translated in a constrained kind of way based on the serialization.

@matentzn
Copy link
Collaborator Author

matentzn commented Nov 8, 2023

@gouttegd can you articulate your position on this? You prefer "owl class" as the rendering of the a lot value in TSV and JSON?

@gouttegd
Copy link
Contributor

gouttegd commented Nov 8, 2023

@matentzn It’s not a matter of what I prefer.

First, I think such a breaking change at this stage, so close to 1.0, would be harmful, for little to no benefit, especially since what you seem to want should be achievable without the need for a breaking change (more on that latter).

Consider that it has already been more than one year since match_type and the associated enum was replaced by mapping_justification and the SEMAPV vocabulary. There has been 9 releases of the SSSOM schema since then. And yet as of today we still find mapping sets that use either the old slot and/or the old enum.

I don’t know why you prefer

subject_id   ...   subject_type
HP:0000118   ...   owl:Class

over

subject_id   ...   subject_type
HP:0000118   ...   owl class

but it is almost certain the second form is going to still be present in the wild for the years to come. So I’d recommend not changing the enum now.

What seems to really bother you is that you would like the enum value to be rendered as an IRI when serialising to RDF, am I correct?

Well then, just amend the spec to state that, when serialising to RDF, a value of type entity_type_enum should be serialised as the IRI indicated in the meaning field associated with the value (so, owl class should be serialised as http://www.w3.org/2002/07/owl#Class, etc.). That’s it. No need to break anything.

Arguably this could even be specified once and for all at the LinkML level, not only for SSSOM and not only for entity_type_enum: When serialising to RDF, if the permissible values of an enum have a meaning field, then values of that enum should not be serialised as the string representation of the value but as whatever entity is referenced in the meaning field for each value.

This would seem to me like a perfectly reasonable behaviour, and actually the behaviour that I would expect – because otherwise I quite don’t see the point of the meaning field – the doc says it “allows enums to be backed by external ontologies“, but what does that mean exactly?

But my real concern is with this:

The question is simply: can an instance of an enum (a value) be an entity reference (uriorcurie), or not?

Please don’t. An enum value is of the type of that particular enum, nothing more, nothing less. Each language has its own way of representing enumerations – in particular, in many languages, they are integers. Specifying straight in the data model that enum values should be of a certain type is only going to make supporting SSSOM in non-Python languages even more complicated.

Basically the point I am trying to make is: let enum values be opaque values and if you want some of them to be serialised in a certain way, enforce that at the level of the parser/serialiser, not at the level of the data model.

@cmungall
Copy link
Contributor

@gouttegd is correct. And in fact the current behavior for rdf serialization is to use the meaning URI if present. json and python use the text. This needs to be clarified in the docs: https://linkml.io/linkml/schemas/enums.html#mapping-permissible-values-to-ontologies

@cmungall
Copy link
Contributor

We discussed this on the linkml. We weren't totally clear if this pertained to schema or data. Note rdfgen, which was mentioned (and all the *gens) are for schema conversion. However, I think it's actually about data conversion (linkml-convert) - is that right?

Either way: meaning, if present, is used to generate a URI, whether representing PVs in a schema, or data

@matentzn
Copy link
Collaborator Author

Ok. Let's leave things as they are then. I have slept a few nights now on the matter, and am fine with not making a change. I guess I will see what that means for the Json serialisation, because I long forgot checking if it is based on JSON-LD (which means I would expect it roundtrips with RDF) or some other standard representation.

@matentzn matentzn closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants