Skip to content

Commit

Permalink
Fundamentals section (#133)
Browse files Browse the repository at this point in the history
Add background on relevant concepts and technologies of SimPhoNy
  • Loading branch information
create-issue-branch[bot] committed Apr 12, 2021
1 parent f8b5b54 commit 56e5e6b
Show file tree
Hide file tree
Showing 4 changed files with 173 additions and 39 deletions.
199 changes: 167 additions & 32 deletions docs/source/fundamentals.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
## Fundamental concepts
# Fundamental concepts
In this section we will present some of the main concepts behind SimPhoNy.
### Operability

## General notions
### Degrees of interoperability
There is a multitude of tools and programs out there, all with their own formats and protocols.

Every time a user wants to use one of these tools, they must familiarise themselves with the software.
Furthermore, if they want to integrate multiple tools in one workflow, the must, in most cases,
take care of the conversion on their own.

Based on how tools communicate with other tools, we can define 3 levels of operability:
Based on how tools communicate with other tools, we can define 3 levels:

#### Compatibility
```eval_rst
Expand Down Expand Up @@ -77,16 +79,12 @@ Based on how tools communicate with other tools, we can define 3 levels of opera
Here there is no need for all tools to go through the De Facto standard,
because there is a format that is known by all of them and enables all components to communicate among themselves.

This final stage could be compared to all parties learning a language like
[Esperanto](https://en.wikipedia.org/wiki/Esperanto).
This final stage could be compared to all parties using an instant translator that can convert
text from one language into any other.


Interoperability between software tools is one of the most important objectives of the SimPhoNy framework.


### Abstraction and generalisation
Once a certain degree of interoperability has been reached, other interesting concepts and details that arise:
#### Semantic vs. syntactic
### Semantic vs. syntactic
We can interpret a word as a specific sequence of characters without caring about the meaning itself.
This way, a simulation engine parsing an input file will know that the integer written after the keyword
`step` will be used to set the number of iterations the execution loop will run.
Expand All @@ -98,32 +96,169 @@ Once a certain degree of interoperability has been reached, other interesting co
Based on the domain, a person can also list other relevant concepts and relationships
(e.g. when thinking of a stair, the `material` or the `width`).

Being able to know the semantic meaning of an instance and hence its connection to other concepts
is one of the principles of SimPhoNy. For that, ontologies play a major role.
Being able to know the semantic meaning of an instance, and hence its connection to other concepts,
is one of the principles of SimPhoNy. For achieving this goal, ontologies play a major role.

### Ontology
```eval_rst
.. important::
An ontology is a formal specification of a shared conceptualization. `[Borst, 1997]
<https://research.utwente.nl/en/publications/construction-of-engineering-ontologies-for-knowledge-sharing-and->`_ .
```

Let's look at the individual components of this definition, starting from the end.
- _Conceptualization_, an ontology will work on the ideas and relationships in an area of interest.
- _Shared_, the ideas and concepts are perceived and agreed by multiple people.
- _Specification_, it will define and describe them in detail, following some predetermined rules and format.
- _Formal_, meaning it will follow a machine readable syntax.

In a simpler way, an ontology can be seen as the definition of concepts relevant to a given domain,
as well as the relationships between them, in a way that a machine can interpret it.

For a deeper, more detailed analysis of the definition, refer to [[Guarino, 2009]](http://dx.doi.org/10.1007/978-3-540-92673-3_0).

Ontologies are more elaborated than taxonomies in that they can include multiple kinds of relationships
(not just parent-child) between complex concepts in big domains.

#### EMMO
The European Materials Modelling Ontology ([EMMO](https://github.com/emmo-repo/EMMO)) is an ontology developed by the European Materials Modelling Council ([EMMC](https://emmc.info/)).
EMMO's goal is to define a representational system universal for scientists in the field of materials modelling to enable interoperability.

It has been designed from the bottom up, starting with the concepts of different domains and application fields
and generalising into a middle and top level layers, and it is currently being further
developed in multiple projects of the European Union.

SimPhoNy is being developed with the intention of being compatible with EMMO, and an easy installation of the
ontology is available (further explained [here](./ontologies_included.md#working-with-emmo)).

There is also [documentation](https://ontology.pages.fraunhofer.de/documentation/latest/) available for developing an EMMO compliant ontology (requires login).

### CUDS
CUDS, or Common Universal Data Structure, is the ontology compliant data format of OSP-core:
- **CUDS is an ontology individual**: each CUDS object is an instantiation of a class in the ontology.
If we assume a food ontology that describes classes like pizza or pasta, a CUDS object could represent one specific pizza or pasta dish, that exists in the real world.
Similar to ontology individuals, CUDS objects can be related with other individuals/CUDS by relations defined in the ontology. Like a _pizza_ that 'hasPart' _tomato sauce_
- **CUDS is API**: To allow users to interact with the ontology individuals and their data, CUDS provide a CRUD API.
- **CUDS is a container**: Depending on the relationship connecting two CUDS objects, a certain instance can be seen as a container of other instances.
We call a relationship that express containment an 'active relationship'.
In the pizza example, 'hasPart' would be an 'active relationship'. If one would like to share the pizza CUDS object with others, one would like to share also the tomato sauce.
- **CUDS is RDF**: Internally a CUDS object is only an interface to an RDF-based triple store that contains the data of all CUDS objects.
- **CUDS is a node in a graph**: : CUDS being individuals in an RDF graph implies that each CUDS object can also be seen as a node in a graph.
This does not conflict with the container perspective, instead we see it as to different views on the data.
## Technologies and frameworks
### RDF
[RDF](https://www.w3.org/RDF/) (Resource Description Framework) is a formal language for describing structured information
used in the Semantic Web. Its first specification was published in 1999 and extended in 2004.

Knowledge is represented in directed graphs where the nodes are either ontological classes,
instances of those classes or literals and the edges the relationships connecting them.

#### Requirement simplification
Since we know what a user means from the semantic approach, we can use this to automatise and simplify
the setup and initialisation of processes using default settings.
The graph is serialised in the form of triples of the form "subject-predicate-object"
- _Subject_: The IRI of the entity the triple refers to.
Blank nodes have no IRI, but they are outside of the scope of this thesis.
- _Predicate_: IRI of the relationship from subject to object.
- _Object_: Literal or IRI of an entity

For example, a user could decide they want to run a simple simulation with a certain level of detail
(let's say low, medium or high).
This could be translated into a meaningful initial state that might suffice a general situation.
The following is an example of an RDF triple. This example will also be used to show the different serialisation formats of RDF.
For the IRIs, `dbpedia`'s namespace was used.
```eval_rst
.. uml::
:align: center
:caption: RDF triple sample
For other, more complex use cases, a higher level of customisation will of course still be available.
#### Coupling and linking
In the domain of physics simulations, another interesting use case is coupling and linking.

For example, a certain engine might be useful for representing structures made up of atomistic particles
(molecular dynamics).
(dbr:J._R._R._Tolkien) as tolkien
(dbr:The_Lord_of_the_Rings) as lotr
lotr -> tolkien : dbo:author
```

Another software tool could be focussed on representing bodies of fluids (fluid dynamics).
If both tools can communicate (i.e. there exists some interoperability between them),
they could both be run and synced simultaneously to create more complex scenarios.
The most used formats for storing RDF data are:
- [XML](https://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-syntax-grammar-20030117/):
Historically the most common format given the amount of libraries for handling it.
It was released hand in hand with the RDF specification.
Unfortunately, XML is best used with tree-like structures rather than graphs,
which also makes it harder for humans to read.

Here is an example of what that could look like:
The example triple in XML is:
```xml
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/property/">
<rdf:Description rdf:about="http://dbpedia.org/resource/The_Lord_of_the_Rings">
<dbp:author rdf:resource="http://dbpedia.org/resource/J._R._R._Tolkien"/>
</rdf:Description>
</rdf:RDF>
```

- [N3](https://www.w3.org/TeamSubmission/n3/): Notation3 is designed with human readability as a motivator.
The RDF triples are written one per line, with the possibility to define common prefixes
and other directives for simplicity.

The previous example in N3 would be:
```turtle
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dbr: <http://dbpedia.org/resource/> .
dbr:The_Lord_of_the_Rings dbo:author dbr:J._R._R._Tolkien .
```

- [Turtle](https://www.w3.org/TR/turtle/): Based on N3, it strips some of its syntax, making it easier to parse
for machines.
The recurring example would be exactly the same in Turtle as in N3.

<iframe src="./_static/videos/coupling_and_linking.mp4" frameborder="0" allowfullscreen="true">
</iframe>
- [N-Triples](https://www.w3.org/TR/n-triples/): N-Triples are even simpler, without any of the syntactic sugar from N3 or Turtle.
The triples are written one per line without prefixes. This makes it a very easy format to parse
but complex to maintain/read by a human.

The following representation should be in one line (it has been split for readability)
```xml
<http://dbpedia.org/resource/The_Lord_of_the_Rings>
<http://dbpedia.org/ontology/author>
<http://dbpedia.org/resource/J._R._R._Tolkien> .
```

- [JSON-LD](https://www.w3.org/TR/json-ld/): uses the commonly accepted web data scheme for serialising RDF triples.
Easier than XML for humans, JSON has standard libraries in practically all programming languages.

The example in JSON is:
```json
{"@id": "http://dbpedia.org/resource/The_Lord_of_the_Rings",
"http://dbpedia.org/property/author":
[{"@id": "http://dbpedia.org/resource/J._R._R._Tolkien"}]
}
```
SimPhoNy supports all the previous formats (plus a simpler custom YAML) as inputs in the ontology installation.

#### SPARQL
[SPARQL](https://www.w3.org/TR/sparql11-overview/) (recursively SPARQL Protocol and RDF Query Language) is the most common query language for RDF.
Queries are graph patterns (similar to the triples of Turtle) with variables for the parts of the pattern that make up the result.

Variables start with the identifier `?` and represent concrete values that will be matched in the query process.
They can appear in multiple locations in the patterns and those present in the
`SELECT` clause will be returned as the query result.

The query for the author of _The Lord of the Rings_ from our sample triples in SPARQL is:
```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?person WHERE {
dbr:The_Lord_of_the_Rings dbo:author ?person .
}
```

The SPARQL query language offers multiple types of result sets and clauses, most of which won't be used for this Master's thesis.
One which should be mentioned is the `FILTER` keyword.
This will limit the result to those that evaluate `true` to the expression inside the brackets.
For instance (omitting the prefix declaration for simplicity):

```sparql
SELECT ?character WHERE {
?character dbp:affiliation dbr:The_Lord_of_the_Rings .
?character dbo:age ?age .
FILTER(?age >= 100)
}
```
The previous query would return the characters from the book series with an age higher or equal to 100.
(Note that while the query is correct, the result is empty, as such information is not stored on DBpedia).

Furthermore, a truly interoperable platform would enable users to store and
access data in databases or other repositories of information.
For a very interesting and comprehensive introduction into RDF and SPARQL, see [[Hitzler, 2009]](http://dx.doi.org/10.1201/9781420090512).
2 changes: 1 addition & 1 deletion docs/source/general_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ For that, a 3 layer schema is used:
The closer to the user, the closer to the ontology concepts.
The abstraction is replaced by specificity when you move towards the backend.

For example, the City, Street or Neighborhood classes from the demonstrative [City Ontology](./ontologies_included.html#the-city-ontology) included in OSP-core, as well as the individuals that can be instantiated using them, would be part of the semantic layer. Any wrapper (e.g. the included [SQLite wrapper](https://github.com/simphony/osp-core/tree/master/osp/wrappers/sqlite)), would be part of the interoperability layer. Finally, following the SQLite example, the [sqlite3 library](https://docs.python.org/3/library/sqlite3.html) from python would be part of the syntactic layer.
For example, the City, Street or Neighborhood classes from the demonstrative [City Ontology](./ontologies_included.md#the-city-ontology) included in OSP-core, as well as the individuals that can be instantiated using them, would be part of the semantic layer. Any wrapper (e.g. the included [SQLite wrapper](https://github.com/simphony/osp-core/tree/master/osp/wrappers/sqlite)), would be part of the interoperability layer. Finally, following the SQLite example, the [sqlite3 library](https://docs.python.org/3/library/sqlite3.html) from python would be part of the syntactic layer.


For a full explanation on the architecture and design, go to [detailed design](./detailed_design.md).
Expand Down
2 changes: 2 additions & 0 deletions docs/source/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ OntoTRANS Horizon 2020 H2020-NMBP-TO-IND-2019 862136
============= ============ =============================== ==================
```

Some of the explanations and background provided have been adapted from Pablo de Andres'
master thesis on "Natural Language Search on an ontology-based data structure".

# Compatibility table
The following table describes the compatibilities between of SimPhoNy docs and OSP-core.
Expand Down
9 changes: 3 additions & 6 deletions docs/source/ontologies_included.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,11 @@ pico install city

Take a look at our [examples](jupyter/cuds-api.md) to see how you can build your own city!

## Working with EMMO using OSP-core
## Working with EMMO

The second ontology that is ready to be used out of the box is the European
Materials Modelling Ontology, or EMMO in short. This ontology is an effort
to develop an ontology for applied sciences. It is based on physics,
analytical philosophy and information and communication technologies.
Its source code is open and [available on Github](https://github.com/emmo-repo/EMMO).
If you want to develop an emmo compliant ontology, see [the documentation](https://ontology.pages.fraunhofer.de/documentation/latest/).
Materials Modelling Ontology, or EMMO in short.
For a short introduction, see the [fundamentals](./fundamentals.md#emmo).

You can install EMMO using [Pico](utils.md#pico-installs-cuds-ontologies).

Expand Down

0 comments on commit 56e5e6b

Please sign in to comment.