Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL version of the vocabulary needs improvement #84

Open
reckart opened this issue Mar 11, 2019 · 3 comments
Open

TTL version of the vocabulary needs improvement #84

reckart opened this issue Mar 11, 2019 · 3 comments

Comments

@reckart
Copy link

reckart commented Mar 11, 2019

It seems to me as if the TTL version (or maybe all LD versions) of the LAPPS vocabulary could use some refactoring.

My understanding is that these should represent a schema (based on OWL and/or RDFS). As such, the LAPPS types would be classes (rdfs:class or owl:class) and their attributes should be properties (rdfs:Property, owl:DatatypeProperty or owl:ObjectProperty).

Let's take http://vocab.lappsgrid.org/Token as an example. The current TTL file says:

<http://vocab.lappsgrid.org/Token>
        a                owl:Class , rdfs:Class , rdfs:Resource ;
        rdfs:comment     "A string of one or more characters that serves as an indivisible unit for the purposes of morpho-syntactic labeling (part of speech tagging)." ;
        rdfs:subClassOf  <http://vocab.lappsgrid.org/Region> , <http://vocab.lappsgrid.org/Token> , <http://vocab.lappsgrid.org/Annotation> , <http://vocab.lappsgrid.org/Thing> ;
       <http://vocab.lappsgrid.org/Token#pos>
                "String or URI" .

<http://vocab.lappsgrid.org/Token#pos>
        a             owl:DatatypeProperty ;
        rdfs:comment  "Part-of-speech tag associated with the token." .

The inheritance information is highly redundant. The triple <http://vocab.lappsgrid.org/Token> <http://vocab.lappsgrid.org/Token#pos> "String or URI" does not express in RDFS or OWL that Token has an attribute called pos which can take a String or URI.

I believe a better representation would be e.g.

<http://vocab.lappsgrid.org/Token>
        a                owl:Class ;
        rdfs:comment     "A string of one or more characters that serves as an indivisible unit for the purposes of morpho-syntactic labeling (part of speech tagging)." ;
        rdfs:subClassOf  <http://vocab.lappsgrid.org/Region> ;

<http://vocab.lappsgrid.org/Token#pos>
        a             owl:DatatypeProperty ;
        rdfs:comment  "Part-of-speech tag associated with the token." ;
        rdfs:domain <http://vocab.lappsgrid.org/Token> ;
        rdfs:range xsd:string .

I removed the (inferred) redundant information from the a and rdfs:cubClassOf statements and rendered the value type information as rdfs:range.

However, there is still a little problem here: it does not express that the range can be a "String or URI" - specifying multiple types as range indicates an intersection of the types (which would be empty in this case), not a disjunction. That is why I only put the "more generic" type xsd:stringhere.

@ksuderman
Copy link

The RDF, OWL, JSONLD, and TTL files are generated by Apache Jena from the same data model and I notice that the OWL, JSONLD, and TTL files all contain redundant inheritance declarations while the RDF file does not. The only difference between how the files were generated is the value of the RDFFormat parameter to the RDFDataMgr.write() method. OntClass.setSuperClass(Resource) is only being called once. We are using an old version of Jena so hopefully simply updating the dependency will correct this.

The code that generates the property definitions is just plain buggy.

Both issues will be fixed in lappsgrid-incubator/vocabulary-dsl#10

@ksuderman
Copy link

ksuderman commented Mar 17, 2019

@reckart I have deployed a test version to http://vocab.lappsgrid.org/1.3.0-SNAPSHOT for comment and review. In particular the RDF files are at http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/lapps-vocabulary.ttl (et al).

All of the generated RDF files had the same redundant information as the default Jena model uses a Reasoner that generates all the triples it can infer. The redundant triples are removed by specifying a model that does not do inferencing.

The domain and range of properties should now be specified correctly.

Note There are two definitions for Morphology included (http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/Morphology and http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/Token#morph). These are included only to test the schema generation and file deployment and do not represent how the WSEV may eventually represent morphological annotations.

@ksuderman
Copy link

NOTE Updated URLs now contain -SNAPSHOT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants