# Modelling Provenance and Reasoning with CIDOC-CRM

When we use SKOS to model the genre systems we can only record "one opinion". It is like a static view of the system. The data reflects the interpretation of a certain individual or a group (me and Peer), e.g Peer sent me a list of terms in a yaml-ish format which I then converted to SKOS (withot looking in the actual source). This whole genesis of the genesis of the SKOS in the first place is lost. What is also lost: How and why I created the mappings of some of the concepts across the different skos:ConceptSchemes. The logic behind this is partly included in the notebooks 03-05. The process becomes transparent there but is not evident from the actual SKOS. It could also be the case that Peer would have a diverging opinion about the mappings: I tried to use string identity or similarity as a base for the `skos:closeMatch` statements, but of course there were some false positives, e.g. `Ode` and `Oper` when using a Levenshtein Edit Distance of 2 (`d` substituted by `p`, `r` added; with weights of 1 per operation this amounts to an edit distance of 2). I needed two because the case of different spellings of a-Umlaut (ä) would result in a Levenshtein Distance of 2. What I am trying to convey: There are automatic processes involved when mapping the concepts, but also manual decisions and this all goes into the final SKOS. 

In this notebook we try to be more transparent about the genesis of and the rationale behind a single `skos:ConceptScheme` and it's mappings to others. We test out severals ways of modeling with CIDOC-CRM. Ultimately the goal could be a system that does self-documentation when creating the "Genre System SKOS representation".

The following sections discuss some of the aspects of a prospective data model based on CIDOC and its extensions.

## Interpreting the source when creating a ConceptScheme

This means: modeling the Bibliographic Metadata. In the SKOS this information is only included as `rdfs:label` on the `skos:ConceptScheme`. There would probably be other options to include this is a more structured or at least machine-readable/semantic way, e.g. by also usign Dublin Core. //TODO: add example.

In the CIDOC-world we would use classes and properties of an extension used for bibliographic/library data; we use LRMoo.



![image](./img/01_bibl_skos_concept.png)

How can we now connect the `skos:Concept` (which is an `crm:E55_Type`) and the `skos:ConceptScheme` (`crm:E32_Authority_Document`)?
(Property to connect E32 to E55 is P71 lists; equivalent to the inverse of skos:inScheme).

We could say that there is an Conceptual Object (the concept "Äsopische Fabel") as Bouterwek understands and defines it in his work. He does this is his text (the Expression as a realization of his Work at a certain point in time). We try to understand his conception of the concept and include it in the SKOS. 
There might also be a second Conceptual Object here: Bouterwek's "idea" of the Genre system that he describes in his Work. 

The skos:Concept is "our" understanding of his idea about the "Äsopische Fabel"; and the skos:ConceptScheme is our understanding of his idea of the Genre System that he describes.

If we take only the F2 Expression for starters and look at the ways in which we derive at the first draft of the skos:ConceptScheme we might be able to make this process transparent and model it in its complexity.

![image](./img/02_f2_to_concept_generic.png)

On the most general level we created a type and based it on the text of Bouterwek.

We could try to be more specific here: On which parts of the text is the concept based? This could be a Linguistic Object (a part of the text; or the abstract concept of Bouterwek as we distill it from the text.) We can test both. `P136` is quite flexible: it has the range of `crm:E1 Entity` meaning it can point to anything. Therefore it might make sense to use some form of reification to make the relation `was based on` more specific.

A more detailed description of the relation between the `skos:Concept` and `crm:E83_Type_Creation` could involve introducing the genre as described or defined by Bouterwek in the `F2 Expression` (the Text). A candidate property would be `crm: P148 has component` that connects two `crm:E89 Propositional Object`. We still have to contemplate if the Thing "Äsopische Fabel" as described and defined by Bouterwerk can be best understood as a "Propositional Object":

> This class [Propositional Object, IB.] comprises immaterial items, including but not limited to stories, plots, procedural
prescriptions, algorithms, laws of physics or images that are, or represent in some sense, sets of
propositions about real or imaginary things and that are documented as single units or serve as
topic of discourse.
This class also comprises items that are “about” something in the sense of a subject. In the
wider sense, this class includes expressions of psychological value such as non-figural art and
musical themes. However, conceptual items such as types and classes are not instances of E89
Propositional Object. This should not be confused with the definition of a type, which is
indeed an instance of E89 Propositional Object.

I would argue that we capture herewith the statements/assertions (propositions?) by Bouterwek about a concept, i.e. the "äsopische Fabel" which, might not be a "Propositional Object" but rather on the level of `lrm:F1 Work`, i.e. a "Conceptual Object"(? need to further think about that).

![image](./img/03_f2_connected_via_propositions_about_concept.png)

The relation to the Concept (Genre) in the unterstanding of Bouterwerk could be modeled as such:

![image](./img/04_general_understanding_of_a_genre.png)

> Macht das Sinn für dich: Bouterweks Text enthält Aussagen (Propositional Object) über das, was er sich unter "Äsopischer Fabel" vorstellt (Conceptual Object). Diese "Idee" von Bouterwek repräsentiert auf unbestimmten Grad eine "intersubjektive Vorstellung" (brauch da besseren Begriff) von der "Äsopischen Fabel".
> Meinst du, man kann sowas wie den "common sense" von einer Gattung annehmen? Sowas wie ein generelles Verständnis davon, was die eine oder andere Gattung ist? Das ist natürlich schwierig, weil wohl eher eine philosophische Frage oder? Man könnte ja auch sagen, dass es das nicht gibt, sondern dass es eben diskursiv durch eben solche Arbeiten wie Bouterwek hervorgebracht wird. Dann kann ich es aber trotzdem addressierbar machen, oder?
> Das "Ding" ist aber auch jedenfalls nicht statisch oder klar abgegrenzt, das Verständnis davon, was eine Gattung ist, ist ja auch historisch bedingt... Die historische und subjektive Auffassung ist aber eh erfasst mit dem Conceptual Object (Äsopische Fabel as understood by B.)? Reicht sowas aus um die Dynamik des Konzepts ("common understanding") auszudrücken?
> Aus meiner Sicht wäre nämlich so ein Modell recht hilfreich, weil ich über den Knoten ("common understanding") die unterschiedlichen Gattungssysteme in Beziehung setzen kann

![image](./img/05_linking_concepts_cidoc.png)

We consider the "common understanding" node problematic for theoretical resons, because it would lead into an extensialist thinking. Instead we keep the (reconstructed by us; assumed) understandings of a genre of an individual author ("understood by X") in the chart above and connect these:

![image](./img/06_charts-connections_of_concepts_of_individual_authors.png)

This "linking" (we say: one concept is similar in some sense to another) can be recorded as an attribute assignement (`crm:E13_Attribute_Assignment`):

![image](./img/07_shows_features_of_as_atttribute_assignment.png)

We can not use skos:closeMatch as the range of `P177` because the range should be an E55 Type. For the properties defined in the CIDOC-CRM this is the case:
> Note that the properties defined by the CIDOC CRM also constitute instances of E55 Type themselves.

If we wanted to be more specific in the Attribute assignement we could define E55 Types of the similarities of these concepts. This could be an open list.

Another far more complex option would be to rely on the CIDOC Argumentation Model (not too sure about that). But for reasons of comprehensiveness I note that here. If we implement that we might need additional named graphs to hold the actual argumentation. But let's see.
[CRMinf](https://cidoc-crm.org/sites/default/files/CRMinf%20v1.0%28site%29.pdf) is still a draft; and it's difficult. I have not tried too hard to adopt it; I am lacking examples of using it in practice.

![image](./img/08_believe-argumentation_model.png)

I am not sure what can be gained from such a model.

Modeling Example: Schiller-plays

* Bouterwek, S. 226, nennt Friedrich Schiller: "Die Braut von Messina" als Beispiel für "Das Trauerspiel"
* Eschenburg, S. 307 nennt die "Trauerspiele" von Friedrich Schiller als besondere Beispiel für Trauerspiele der Deutschen"

![image](./img/09_bouterwek_eschenburg_schiller_plays.png)

Das ist aber eine sehr verkürzte Darstellung...
By using "Belief" for Bouterwek's statement that Die Braut von Messina is a "Trauerspiel" (Bouterwek holds the belief that...)

![image](./img/10_bouterwek_belief.png)

Not sure how this would work for Eschenburg:
> die "Trauerspiele" von Friedrich Schiller als besondere Beispiel für Trauerspiele der Deutschen"

Maybe use Attribute Assignments to record the Statements? This does not work well for Eschenburg:

![image](./img/11_eschenbug_bouterwek_schiller_example.png)

Maybe use a fictitious example in which two authors apply a concept with a similar label to the same work. And then say, we don't have this ideal case, but messy stuff like the graph above (which is also not really correct, because actually Eschenburg does not mention "Die Braut von Messina" but "Trauerspiele von Friedrich Schiller".