Skip to content

Issue 19 analysis note

Thomas Francart edited this page Sep 27, 2023 · 3 revisions

Analysis note on how to articulate GSBPM and GSIM in COOS

Context : in COOS GSBPM is modelled as a SKOS taxonomy, with instances of coos:ActivityCategory, itself a subClassOf skos:Concept. In COOS, GSIM, being a data model, is captured as a set of OWL classes and properties. The question is "how to express the information that a given GSBPM subProcess can (but not must) take as an input instances of given GSIM classes" ? The difficulty is to align SKOS-world entities with OWL-world entities.

Variant 1 : pure OWL implementation

# declaration of a defined class under coos:SubProcess
<http://id.unece.org/def/coos/SelectSample_SubProcess> rdf:type owl:Class ;
   rdfs:subClassOf coos:SubProcess ;
   # formal equivalence with the GSBPM ActivityCategory value
   owl:equivalentClass [ rdf:type owl:Restriction ;
                         owl:onProperty dcterms:type ;
                         owl:hasValue <http://id.unece.org/activities/subProcess/4.1>
                       ] ;
   # Expression of the fact that any items in this set must have GSIM type A or B as a value for its prov:used property
   rdfs:subClassOf [
      rdf:type owl:Restriction ;
      owl:onProperty prov:used ;
      owl:allValuesFrom [
        rdf:type owl:Class ;
        owl:unionOf ( coos:A
                     coos:B
                   )
      ];              
   ]
.

Advantages:

  • Formal OWL sets are defined
  • The restriction is reasonable (=can be interpreted by a reasoner) and as formal as it can be

Disadvantages:

  • This requires to write one new defined OWL classes for each GSBPM type we want to describe. Basically we are duplicating the GSBPM taxonomy as OWL classes. Note: if the GSBPM had been modelled as OWL classes (subclasses of coos:ActivityCategory) then we would not have this problem
  • The expression of the constraint is "hard" (yes/no), and instances of such subProcesses with values outside of A or B would be incorrect or yield reasoning errors. In particular subProcesses could take as input something else than GSIM entities.

Variant 2 : OWL implementation with an annotation

In this variant the formal equivalent is replaced by a "loose" annotation with no formal semantics.

# declaration of a defined class under coos:SubProcess
<http://id.unece.org/def/coos/SelectSample_SubProcess> rdf:type owl:Class ;
   rdfs:subClassOf coos:SubProcess ;
   # formal equivalence with the GSBPM ActivityCategory value
   owl:equivalentClass [ rdf:type owl:Restriction ;
                         owl:onProperty dcterms:type ;
                         owl:hasValue <http://id.unece.org/activities/subProcess/4.1>
                       ] ;
   # Annotation indicating that input "includes" (but is not restricted to) those 2 classes
   coos:inputIncludes coos:A, coos:B ;
.

The semantics of coos:inputIncludes is inspired by schema:domainIncludes / rangeIncludes (and also dcterms:domainIncludes https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/dcam/domainIncludes).

Advantages:

  • Formal OWL sets are defined
  • The restriction is easy to assert
  • The restriction is "loose" (instances of other classes are allowed)

Disadvantages:

  • This is not OWL-reasonable
  • This requires to write one new defined OWL classes for each GSBPM type we want to describe. Basically we are duplicating the GSBPM taxonomy as OWL classes. Note: if the GSBPM had been modelled as OWL classes (subclasses of coos:ActivityCategory) then we would not have this problem

Variant 3 : SKOS implementation

In this variant the link between GSBPM and GSIM is expressed at the SKOS level, on the GSBPM concepts.

# remember coos:ActivityCategory is subClassOf skos:Concept
<http://id.unece.org/activities/subProcess/4.1> a coos:ActivityCategory ;
    coos:inputIncludes coos:A, coos:B ;
    # coos:outputIncludes ...

Advantages:

  • The restriction is easy to assert
  • The restriction would be a subPropertyOf skos:mappingRelation. This would implicitely make GSIM classes instances of skos:Concept, as per the range definition of skos:semanticRelation, which is absolutely not a problem
  • The restriction is "loose" (instances of other classes are allowed)

Disadvantages:

  • This is not OWL-reasonable

Variant 4 : SHACL implementation - one rule per subProcess type

@prefix sh:      <http://www.w3.org/ns/shacl#>.
@prefix prov:    <http://www.w3.org/ns/prov#>.
@prefix coos:    <http://id.unece.org/def/coos#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix myshape: <http://id.unece.org/def/coos_shapes#>.

myshape:TaskInsideSubProcess_4_1 a sh:NodeShape ;
   sh:target [
      sh:select """
         PREFIX dcterms: <http://purl.org/dc/terms/>
         PREFIX coos: <http://id.unece.org/def/coos#>
         SELECT ?this
         WHERE {
            # Any task inside a subprocess of type 4.1
            ?this a coos:Task .
            ?this (dcterms:isPartOf|^dcterms:hasPart)* ?sp_41
            ?sp_41 dcterms:type <http://id.unece.org/activities/subProcess/4.1>
         }
      """
   ] ;
   sh:property [
      # SHOULD have instances of A or B as value for prov:used
      sh:path prov:used ;
      sh:or (
         [sh:class coos:A]
         [sh:class coos:B]
      )
      # This is just a warning
      sh:severity sh:Warning ;
   ]
.

Advantages:

  • This is decoupled from the actual OWL ontology and placed in another separate file
  • Constraint is expressed as a Warning, not a strong Violation
  • This is SHACL-interpretable and can be given to a SHACL validator (e.g. https://shacl-play.sparna.fr/play/)

Disadvantages:

  • Requires to write a maintain one rule per subProcess type
  • Link between GSBPM and GSIM is hidden in not part of the GSBPM taxonomy

Variant 5 : SHACL implementation - one single rule

This is a combination of variants 3 and 4.

In the OWL file:

# remember coos:ActivityCategory is subClassOf skos:Concept
<http://id.unece.org/activities/subProcess/4.1> a coos:ActivityCategory ;
    coos:inputIncludes coos:A, coos:B ;
    # coos:outputIncludes ...

And in the SHACL file:

@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix prov:    <http://www.w3.org/ns/prov#>.
@prefix coos:    <http://id.unece.org/def/coos#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix coosshapes: <http://id.unece.org/def/coos_shapes#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

# 1. Declare a constraint component
coosshapes:CheckCoosInputIncludesComponent
  a sh:ConstraintComponent ;
  # declare the parameter to use to trigger that constraint
  sh:parameter [
    sh:path coosshapes:isIncludedInput ;
    sh:name "is included input" ;
    sh:description "Set to true to verify that the value has a class indicated by the coos:inputIncludes annotation on the dcterms:type of this instance" ;
    sh:datatype xsd:boolean
  ] ;
  sh:labelTemplate "The input does not have an expected type" ;
  # link to validator
  sh:propertyValidator coosshapes:CheckCoosInputIncludesValidator .

# 2. The corresponding validator
coosshapes:CheckCoosInputIncludesValidator
  a sh:SPARQLSelectValidator ;
  sh:message "{$PATH} is not of one of the expected inputs" ;
  # The SPARQL query
  sh:select """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX coos: <http://id.unece.org/def/coos#>
    SELECT ?this ?value WHERE {
      # Check the class and super-classes of the value of the property
      # "?this" will be bound to the node being checked, in our case most often the task subprocess
      # "?PATH" will be replaced by the value of sh:path, in our case most often prov:used
      ?this ?PATH ?value .
      ?value rdf:type/rdfs:subClassOf* ?valueClass .
      # Trigger a violation if the value class cannot be found as a value of
      # coos:inputIncludes on the dcterms:type of the task/subprocess
      FILTER NOT EXISTS {
         ?this dcterms:type ?gsbpmConcept .
         ?gsbpmConcept coos:inputIncludes ?valueClass .
      }
      # This is just so that if the parameter is set to false in the shape,
      # The validation will not be triggered
      FILTER($isIncludedInput)
    }""" .

# 3. Define our shape with this constraint component

coosshapes:SubProcess a sh:NodeShape ;
   # shapes apply to all coos:SubProcess
   sh:targetClass coos:SubProcess ;
   sh:property [
      # This rule will trigger a simple Warning, this is not a strong Violation
      sh:severity sh:Warning ;
      # on property prov:used (this will be used as the value of ?PATH variable)
      sh:path prov:used ;
      # checks that the rdf:type of values are correct according to our rule
      coosshapes:isIncludedInput true ;
   ]
.

Advantages:

  • This is a combination of variants 3 and 4
  • Link between GSBPM and GSIM is really part of the GSBPM taxonomy
  • Constraint is expressed as a Warning, not a strong Violation
  • This is SHACL-interpretable and can be given to a SHACL validator (e.g. https://shacl-play.sparna.fr/play/)

Disadvantages:

  • none :-)

This can be tested with the following test data, which will trigger one warning (copy paste SHACL and data into form at https://shacl-play.sparna.fr/play/validate):

@prefix prov:    <http://www.w3.org/ns/prov#>.
@prefix coos:    <http://id.unece.org/def/coos#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ex: <http://exemple.fr#>.

# Some example data
<http://id.unece.org/activities/subProcess/4.1> a skos:Concept ;
    coos:inputIncludes coos:Dataset, coos:Product ;
.

# This one is correct
ex:myCorrectSubProcess a coos:SubProcess ;
   dcterms:type <http://id.unece.org/activities/subProcess/4.1> ;
   prov:used ex:myDataset ;
.
ex:myDataset a coos:Dataset .

# This one is incorrect
ex:myINCorrectSubProcess a coos:SubProcess ;
   dcterms:type <http://id.unece.org/activities/subProcess/4.1> ;
   prov:used ex:somethingElse ;
.
ex:somethingElse a ex:AnotherType .