Skip to content

Review of COOS model Thomas Francart 2021 09 16

Thomas Francart edited this page Jul 6, 2022 · 4 revisions

COOS Ontology Review

This is a quick review of the COOS Ontology 20210916 -- https://github.com/linked-statistics/coos MoSCoW is used to clearly indicate priority/severity on each finding Author : thomas.francart [at] sparna.fr

Findings

Documentation

  • if the model is drawn from specification documents GAMSO and GSPBM, a reference to the corresponding section/paragraph of these specifications should be placed on classes and properties derived from them

--> Do you mean using rdfs:isDefinedBy or with a textual annotation? --> [Thomas 2022-07-06] not rdfs:isDefinedBy, rather dcterms:source pointing to the exact version of the spec where the entity is originally defined.

ProductContent / ProductPresentation

  • Classes ProductContent and ProductPresentation MUST be declared subClassOf skos:Concept as they correspond to skos:Concept (from a given skos:ConceptScheme)

--> Done (commit #473ed72)

  • The equivalences between ProductContent and ProductPresentation SHOULD be declared with a formal owl:equivalence with an hasValue restriction on the corresponding concept scheme.

--> What do you mean by equivalences between ProductContent and ProductPresentation? --> [Thomas 2022-07-06] I meant an equivalence between the owl:Class and the corresponding skos:ConceptScheme. Sorry, I think this applied on ProductContent and ProductPresentation but not between ProductContent and ProductPresentation.

  • a skos:definition CAN be added to ProductContent / ProductPresentation to indicate that a skos:ConceptScheme with values is proposed, but not enforced and that additionnal values can be introduced by implementors

--> Done (commit #473ed72)

StatisticalInformationObject

  • StatisticalInformationObject MUST be declared as subClassOf InformationObject

--> Done (commit #67ee5c4)

Activity / StatisticalActivity

  • I sense there is a confusion between the notion of an Activity that actually happened, or that is planned to happen, and the type of these activities as described in GSBPM. Here the 2 notions are merged in the class "StatisticalActivity" that covers both UML classes from the GSIM (Program / ProgramCycle) that are intented to be instantiated by users, and classes from GSBPM (SubProcesses) that are instantiated with types of activitives from GSBPM nomenclatures, but will not be instantiated directly by users.
insee:FrenchCensus a coos:StatisticalProgram, coos:StatisticalActivity ;
        # also by transitivity : rdf:type coos:Activity, prov:Activity, skos:Concept ;
	skos:prefLabel "recensement français"@fr ;
.

insee:FrenchCensus2021 a coos:StatisticalProgramCycle, coos:StatisticalActivity ;
        # also by transitivity : rdf:type coos:Activity, prov:Activity, skos:Concept ;
	skos:prefLabel "recensement français 2021"@fr ;
	# potential link between cycle and program, not in COOS
	# coos:isCycleOf insee:FrenchCensus ;
.

insee:SendQuestionnairesForFrenchCensus2021 a coos:Task, coos:StatisticalActivity ;
        # also by transitivity : rdf:type coos:Activity, prov:Activity, skos:Concept ;
	skos:prefLabel "Envoi des questionnaires pour le recensement 2021"@fr ;
	# link between a Task and a SubProcess
	xkos:classifiedUnder <http://id.unece.org/activities/subProcess/4.3> ;
.

<http://id.unece.org/activities/subProcess/4.3>
	a coos:SubProcess ; 
	a coos:StatisticalProductionActivity ;
        # also by transitivity : rdf:type coos:Activity, prov:Activity, skos:Concept ;
	# this is inconsistent as both the instances and the types of these instances use the same class
	a coos:StatisticalActivity ;
	skos:broader <http://id.unece.org/activities/phase/4> ;
.
  • I think a distinction MUST be made between coos:Activity, that are to be instantiated in the model, and a new coos:ActivityType class, that is a set of skos:Concept from GSBPM. A coos:Activity is a subCLassOf prov:Activity, while coos:ActivityType is not a subClassOf prov:Activity. prov:Activity is used to describe activities that really happened at a certain time, made by a certain agent, not "types of activities"

    • Note : the existing class organization would be correct if subProcesses instances were defined as owl:Class under coos:SubProcess and not as skos:Concept instances of these classes.
  • A possible refactoring could be :

    1. Introduce class "ActivityCategory", subClassOf skos:Concept
    2. Move classes OverarchingActivitives, Phase, SubProcess under "ActivityCategory"
    3. Remove the class "Activity" from under skos:Concept
    4. Introduce a property "activityCategory", domain Activity, range ActivityCategory (or specify that e.g. dct:type or xkos:classifiedUnder should be used for this)

StatisticalProduct

  • Would it be relevant to declare StatisticalProduct a subClassOf StatisticalEntity ?
  • If StatisticalProduct is a GSIM class, it MUST be declared subClassOf InformationObject, or StatisticalInformationObject, or StatisticalEntity

--> Done in the meantime

Capability

  • As with "Activity", probably the class Capability MUST be removed from under skos:Concept
  • To be analyzed, whether a separation between Capability and CapabilityCategory would make sense

StatisticalProgram / StatisticalProgramCycle

  • The 2 classes SHOULD be linked with a property, equivalent to the GSIM property "has". Having the 2 classes without a property that links them prevents constructing a consistent Knowledge Graph.
  • Where do you draw the line in terms of scope of which item from the GSIM you put in COOS ? (answer : it is only the items that bridge the 2 models)

--> Issue created on this subject

Application Profile

  • I understand COOS is supposed to be used in combination with other models, like SKOS or PROV, or ORG. But this is not specified / described anywhere. The ontology SHOULD be complemented with an application profile (specified with SHACL) to indicate how conformant graph data must be created.
  • In particular, COOS does not define many properties to relate the classes it declares, so it is currently impossible/very hard to understand how the classes need to be articulated together.
  • Note : if introducing an application profile, a clarification or clear statement should be made as to whether implementations or local extensions of COOS must comply to the application profile. In other words, it is the application profile that becomes the "interoperability contract", and not the OWL ontology.

--> agreed

StatisticalDataset

  • The wording "StatisticalDataset" tend to lead non-expert readers to think it includes statistical data, while I understand a "StatisticalDataset" is not necessarily this, as a StatisticalDataset can be a "metadataFor" something else, and I don't see how this could be the case for a dataset that contains statistical information. Or does it mean the domain of "metadataFor" should be dcat:Dataset rather than StatisticalDataset ?
    • is it a Dataset that is related to some statistical data ? or is it a Dataset that contains the result of a StatisticProgram(Cycle) ?
  • As a parallel, see how prov:Activity > coos:Activity > coos:StatisticalActivity was modelled, and see if it meaningful to apply this to Datasets, by introducting dcat:Dataset > coos:Dataset > coos:StatisticalDataset

--> In the meantime, it was decided to rename coos:StatisticalDataset to coos:Dataset, which clarifies a bit, but additional text will also be added to the specification (issue opened).

NationalStatisticalInstitute

  • A property CAN link a NationalStatisticalInstitute with a Country, as it is a definition criteria of this class.

--> What class could we use for Country? If we limit ourselves to a datatype property giving the ISO 3166 code, we can think of gn:countryCode but it is defined with domain gn:Feature, which is problematic here. --> [Thomas 2022-07-06] Good question. To avoid any dependency you could declare your own class Country and a corresponding property.

Explicit assertion of rdf:type skos:Concept on taxonomy entries

  • All taxonomy entries MUST have an explicit "rdf:type skos:Concept" assertion. This is not the case for GAMSO or GSBPM taxonomic entries. This will ensure compatibilities with SKOS tools.

Linking InformationObject to SubProcesses

  • I refer to that comment from the Github : https://github.com/linked-statistics/COOS/issues/19#issuecomment-872432789
  • I assume that subClasses of InformationObject will be created corresponding to GSIM entities, or that another ontology at GSIM-level would declare subclasses of the generic InformationObject COOS class. So classes like :
    • DataStructure
    • Variable
    • Population
    • ProcessMethod
    • Rules
  • These are classes because they will be instantiated in actual implementations :
ex:myVariable_X rdf:type coos:Variable .
ex:myVariable_X rdf:type someOtherOntologyThatExtendsCoos:Variable .
  • What we want to express is that "Activities that are categorized in SubProcess 2.4 Design frame and sample can have as inputs Variables, Data structures, Datasets and Populations, and can have as output Process Methods and Rules"

  • To do it the OWL way, this requires to :

    1. Declare subclasses of coos:Activity with an equivalence on "xkos:classifiedUnder hasValue = subProcess X.Y" :
coos:DesignFrameAndSampleActivity a owl:Class ;
	rdfs:subClassOf coos:Activity ;
	owl:equivalentTo [
		a owl:Restriction ;
		owl:onProperty xkos:classifiedUnder ;
		owl:hasValue <http://id.unece.org/activities/subProcess/2.4> ;
	] ;
2. Declare subclasses of InformationObject (DataStructure, Variable, Population, etc.)
3. Declare properties :
	- "input" with domain = Activity and range = InformationObject
	- "output" with domain = Activity and range = InformationObject
4. Create OWL restrictions relating subclasses of coos:Activity with subclasses of InformationObject :
coos:DesignFrameAndSampleActivity rdfs:subClassOf [
	rdf:type owl:Restriction ;
	owl:onProperty coos:input ;
	owl:allValuesFrom [ rdf:type owl:Class ;
      owl:unionOf ( coos:DataStructure
                    coos:Variable
                    coos:Population
                  )
    ]
] ;