Fundamentals section (#133)

Add background on relevant concepts and technologies of SimPhoNy
simphony · Apr 12, 2021 · 56e5e6b · 56e5e6b
1 parent f8b5b54
commit 56e5e6b
Show file tree

Hide file tree

Showing 4 changed files with 173 additions and 39 deletions.
diff --git a/docs/source/fundamentals.md b/docs/source/fundamentals.md
@@ -1,13 +1,15 @@
-## Fundamental concepts
+# Fundamental concepts
 In this section we will present some of the main concepts behind SimPhoNy.
-### Operability
+
+## General notions
+### Degrees of interoperability
 There is a multitude of tools and programs out there, all with their own formats and protocols.
 
 Every time a user wants to use one of these tools, they must familiarise themselves with the software.
 Furthermore, if they want to integrate multiple tools in one workflow, the must, in most cases,
 take care of the conversion on their own.
 
-Based on how tools communicate with other tools, we can define 3 levels of operability:
+Based on how tools communicate with other tools, we can define 3 levels:
 
 #### Compatibility
   ```eval_rst
@@ -77,16 +79,12 @@ Based on how tools communicate with other tools, we can define 3 levels of opera
   Here there is no need for all tools to go through the De Facto standard, 
   because there is a format that is known by all of them and enables all components to communicate among themselves.
 
-  This final stage could be compared to all parties learning a language like 
-  [Esperanto](https://en.wikipedia.org/wiki/Esperanto).
+  This final stage could be compared to all parties using an instant translator that can convert
+  text from one language into any other.
 
 
 Interoperability between software tools is one of the most important objectives of the SimPhoNy framework.
-
-
-### Abstraction and generalisation
-Once a certain degree of interoperability has been reached, other interesting concepts and details that arise:
-#### Semantic vs. syntactic
+### Semantic vs. syntactic
   We can interpret a word as a specific sequence of characters without caring about the meaning itself.
   This way, a simulation engine parsing an input file will know that the integer written after the keyword
   `step` will be used to set the number of iterations the execution loop will run.
@@ -98,32 +96,169 @@ Once a certain degree of interoperability has been reached, other interesting co
   Based on the domain, a person can also list other relevant concepts and relationships
   (e.g. when thinking of a stair, the `material` or the `width`).
 
-  Being able to know the semantic meaning of an instance and hence its connection to other concepts
-  is one of the principles of SimPhoNy. For that, ontologies play a major role.
+  Being able to know the semantic meaning of an instance, and hence its connection to other concepts,
+  is one of the principles of SimPhoNy. For achieving this goal, ontologies play a major role.
+
+### Ontology
+```eval_rst
+.. important::
+   An ontology is a formal specification of a shared conceptualization.  `[Borst, 1997]
+   <https://research.utwente.nl/en/publications/construction-of-engineering-ontologies-for-knowledge-sharing-and->`_ .
+
+```
+
+Let's look at the individual components of this definition, starting from the end.
+ - _Conceptualization_, an ontology will work on the ideas and relationships in an area of interest.
+ - _Shared_, the ideas and concepts are perceived and agreed by multiple people.
+ - _Specification_, it will define and describe them in detail, following some predetermined rules and format.
+ - _Formal_, meaning it will follow a machine readable syntax.
+
+In a simpler way, an ontology can be seen as the definition of concepts relevant to a given domain, 
+as well as the relationships between them, in a way that a machine can interpret it.
+
+For a deeper, more detailed analysis of the definition, refer to [[Guarino, 2009]](http://dx.doi.org/10.1007/978-3-540-92673-3_0).
+
+Ontologies are more elaborated than taxonomies in that they can include multiple kinds of relationships
+(not just parent-child) between complex concepts in big domains.
+
+#### EMMO
+The European Materials Modelling Ontology ([EMMO](https://github.com/emmo-repo/EMMO)) is an ontology developed by the European Materials Modelling Council ([EMMC](https://emmc.info/)).
+EMMO's goal is to define a representational system universal for scientists in the field of materials modelling to enable interoperability.
+
+It has been designed from the bottom up, starting with the concepts of different domains and application fields 
+and generalising into a middle and top level layers, and it is currently being further 
+developed in multiple projects of the European Union.
+
+SimPhoNy is being developed with the intention of being compatible with EMMO, and an easy installation of the 
+ontology is available (further explained [here](./ontologies_included.md#working-with-emmo)).
+
+There is also [documentation](https://ontology.pages.fraunhofer.de/documentation/latest/) available for developing an EMMO compliant ontology (requires login).
+
+### CUDS
+CUDS, or Common Universal Data Structure, is the ontology compliant data format of OSP-core:
+- **CUDS is an ontology individual**: each CUDS object is an instantiation of a class in the ontology.
+  If we assume a food ontology that describes classes like pizza or pasta, a CUDS object could represent one specific pizza or pasta dish, that exists in the real world.
+  Similar to ontology individuals, CUDS objects can be related with other individuals/CUDS by relations defined in the ontology. Like a _pizza_ that 'hasPart' _tomato sauce_
+- **CUDS is API**: To allow users to interact with the ontology individuals and their data, CUDS provide a CRUD API.
+- **CUDS is a container**: Depending on the relationship connecting two CUDS objects, a certain instance can be seen as a container of other instances.
+  We call a relationship that express containment an 'active relationship'.
+  In the pizza example, 'hasPart' would be an 'active relationship'. If one would like to share the pizza CUDS object with others, one would like to share also the tomato sauce.
+- **CUDS is RDF**: Internally a CUDS object is only an interface to an RDF-based triple store that contains the data of all CUDS objects.
+- **CUDS is a node in a graph**: : CUDS being individuals in an RDF graph implies that each CUDS object can also be seen as a node in a graph.
+  This does not conflict with the container perspective, instead we see it as to different views on the data.
+## Technologies and frameworks
+### RDF
+[RDF](https://www.w3.org/RDF/) (Resource Description Framework) is a formal language for describing structured information
+used in the Semantic Web. Its first specification was published in 1999 and extended in 2004.
+
+Knowledge is represented in directed graphs where the nodes are either ontological classes,
+instances of those classes or literals and the edges the relationships connecting them. 
 
-#### Requirement simplification
-  Since we know what a user means from the semantic approach, we can use this to automatise and simplify
-  the setup and initialisation of processes using default settings.
+The graph is serialised in the form of triples of the form "subject-predicate-object"
+- _Subject_: The IRI of the entity the triple refers to.
+	Blank nodes have no IRI, but they are outside of the scope of this thesis.
+- _Predicate_: IRI of the relationship from subject to object.
+- _Object_: Literal or IRI of an entity
 
-  For example, a user could decide they want to run a simple simulation with a certain level of detail
-  (let's say low, medium or high).
-  This could be translated into a meaningful initial state that might suffice a general situation.
+The following is an example of an RDF triple. This example will also be used to show the different serialisation formats of RDF.
+For the IRIs, `dbpedia`'s namespace was used.
+```eval_rst
+.. uml::
+   :align: center
+   :caption: RDF triple sample
 
-  For other, more complex use cases, a higher level of customisation will of course still be available.
-#### Coupling and linking
-  In the domain of physics simulations, another interesting use case is coupling and linking.
-
-  For example, a certain engine might be useful for representing structures made up of atomistic particles
-  (molecular dynamics).
+   (dbr:J._R._R._Tolkien) as tolkien
+   (dbr:The_Lord_of_the_Rings) as lotr
+   lotr -> tolkien : dbo:author
+```
 
-  Another software tool could be focussed on representing bodies of fluids (fluid dynamics).
-  If both tools can communicate (i.e. there exists some interoperability between them),
-  they could both be run and synced simultaneously to create more complex scenarios.
+The most used formats for storing RDF data are:
+- [XML](https://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-syntax-grammar-20030117/):
+  Historically the most common format given the amount of libraries for handling it.
+	It was released hand in hand with the RDF specification.
+	Unfortunately, XML is best used with tree-like structures rather than graphs,
+	which also makes it harder for humans to read.
 
-  Here is an example of what that could look like:
+	The example triple in XML is:
+  ```xml
+    <?xml version="1.0" encoding="utf-8"?>
+    <?xml version="1.0" encoding="utf-8"?>
+    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+            xmlns:dbp="http://dbpedia.org/property/">
+        <rdf:Description rdf:about="http://dbpedia.org/resource/The_Lord_of_the_Rings">
+            <dbp:author rdf:resource="http://dbpedia.org/resource/J._R._R._Tolkien"/>
+        </rdf:Description>
+    </rdf:RDF>
+  ```
+
+- [N3](https://www.w3.org/TeamSubmission/n3/): Notation3 is designed with human readability as a motivator.
+	The RDF triples are written one per line, with the possibility to define common prefixes
+	and other directives for simplicity.
+
+	The previous example in N3 would be:
+  ```turtle
+      @prefix dbo: <http://dbpedia.org/ontology/> .
+      @prefix dbr: <http://dbpedia.org/resource/> .
+      dbr:The_Lord_of_the_Rings  dbo:author  dbr:J._R._R._Tolkien .
+  ```
+
+- [Turtle](https://www.w3.org/TR/turtle/): Based on N3, it strips some of its syntax, making it easier to parse
+	for machines.
+	The recurring example would be exactly the same in Turtle as in N3.
 
-  <iframe src="./_static/videos/coupling_and_linking.mp4" frameborder="0" allowfullscreen="true">
-  </iframe>
+- [N-Triples](https://www.w3.org/TR/n-triples/): N-Triples are even simpler, without any of the syntactic sugar from N3 or Turtle.
+	The triples are written one per line without prefixes. This makes it a very easy format to parse
+	but complex to maintain/read by a human.
+
+	The following representation should be in one line (it has been split for readability)
+  ```xml
+    <http://dbpedia.org/resource/The_Lord_of_the_Rings>
+      <http://dbpedia.org/ontology/author>
+      <http://dbpedia.org/resource/J._R._R._Tolkien> .
+  ```
+
+- [JSON-LD](https://www.w3.org/TR/json-ld/): uses the commonly accepted web data scheme for serialising RDF triples.
+	Easier than XML for humans, JSON has standard libraries in practically all programming languages.
+
+	The example in JSON is:
+  ```json
+    {"@id": "http://dbpedia.org/resource/The_Lord_of_the_Rings",
+      "http://dbpedia.org/property/author": 
+        [{"@id": "http://dbpedia.org/resource/J._R._R._Tolkien"}]
+      }
+  ```
+SimPhoNy supports all the previous formats (plus a simpler custom YAML) as inputs in the ontology installation.
+
+#### SPARQL
+[SPARQL](https://www.w3.org/TR/sparql11-overview/) (recursively SPARQL Protocol and RDF Query Language) is the most common query language for RDF.
+Queries are graph patterns (similar to the triples of Turtle) with variables for the parts of the pattern that make up the result.
+
+Variables start with the identifier `?` and represent concrete values that will be matched in the query process.
+They can appear in multiple locations in the patterns and those present in the 
+`SELECT` clause will be returned as the query result.
+
+The query for the author of _The Lord of the Rings_ from our sample triples in SPARQL is:
+  ```sparql
+    PREFIX dbo: <http://dbpedia.org/ontology/>
+    PREFIX dbr: <http://dbpedia.org/resource/>
+    SELECT ?person WHERE {
+        dbr:The_Lord_of_the_Rings  dbo:author  ?person .
+    }
+  ```
+
+The SPARQL query language offers multiple types of result sets and clauses, most of which won't be used for this Master's thesis.
+One which should be mentioned is the `FILTER` keyword.
+This will limit the result to those that evaluate `true` to the expression inside the brackets.
+For instance (omitting the prefix declaration for simplicity):
+
+  ```sparql
+    SELECT ?character WHERE {
+        ?character dbp:affiliation dbr:The_Lord_of_the_Rings .
+        ?character dbo:age ?age .
+        FILTER(?age >= 100)
+    } 
+  ```
+The previous query would return the characters from the book series with an  age higher or equal to 100.
+(Note that while the query is correct, the result is empty, as such information is not stored on DBpedia).
 
-  Furthermore, a truly interoperable platform would enable users to store and 
-  access data in databases or other repositories of information.
+For a very interesting and comprehensive introduction into RDF and SPARQL, see [[Hitzler, 2009]](http://dx.doi.org/10.1201/9781420090512).
diff --git a/docs/source/general_architecture.md b/docs/source/general_architecture.md
@@ -95,7 +95,7 @@ For that, a 3 layer schema is used:
 The closer to the user, the closer to the ontology concepts.
 The abstraction is replaced by specificity when you move towards the backend.
 
-For example, the City, Street or Neighborhood classes from the demonstrative [City Ontology](./ontologies_included.html#the-city-ontology) included in OSP-core, as well as the individuals that can be instantiated using them, would be part of the semantic layer. Any wrapper (e.g. the included [SQLite wrapper](https://github.com/simphony/osp-core/tree/master/osp/wrappers/sqlite)), would be part of the interoperability layer. Finally, following the SQLite example, the [sqlite3 library](https://docs.python.org/3/library/sqlite3.html) from python would be part of the syntactic layer.
+For example, the City, Street or Neighborhood classes from the demonstrative [City Ontology](./ontologies_included.md#the-city-ontology) included in OSP-core, as well as the individuals that can be instantiated using them, would be part of the semantic layer. Any wrapper (e.g. the included [SQLite wrapper](https://github.com/simphony/osp-core/tree/master/osp/wrappers/sqlite)), would be part of the interoperability layer. Finally, following the SQLite example, the [sqlite3 library](https://docs.python.org/3/library/sqlite3.html) from python would be part of the syntactic layer.
 
 
 For a full explanation on the architecture and design, go to [detailed design](./detailed_design.md).

diff --git a/docs/source/links.md b/docs/source/links.md
@@ -33,6 +33,8 @@ OntoTRANS      Horizon 2020  H2020-NMBP-TO-IND-2019            862136
 =============  ============  ===============================   ==================
 ```
 
+Some of the explanations and background provided have been adapted from Pablo de Andres' 
+master thesis on "Natural Language Search on an ontology-based data structure".
 
 # Compatibility table
 The following table describes the compatibilities between of SimPhoNy docs and OSP-core.

diff --git a/docs/source/ontologies_included.md b/docs/source/ontologies_included.md
@@ -22,14 +22,11 @@ pico install city
 
 Take a look at our [examples](jupyter/cuds-api.md) to see how you can build your own city!
 
-## Working with EMMO using OSP-core
+## Working with EMMO
 
 The second ontology that is ready to be used out of the box is the European
-Materials Modelling Ontology, or EMMO in short. This ontology is an effort
-to develop an ontology for applied sciences. It is based on physics,
-analytical philosophy and information and communication technologies.
-Its source code is open and [available on Github](https://github.com/emmo-repo/EMMO).
-If you want to develop an emmo compliant ontology, see [the documentation](https://ontology.pages.fraunhofer.de/documentation/latest/).
+Materials Modelling Ontology, or EMMO in short.
+For a short introduction, see the [fundamentals](./fundamentals.md#emmo).
 
 You can install EMMO using [Pico](utils.md#pico-installs-cuds-ontologies).