# ABECTO Tutorial

ABECTO is an **AB**ox **E**valuation and **C**omparison **T**ool for **O**ntologies.
It allows to easily compare and evaluate two or more ontologies regarding the correctness and completeness of the contained facts.
ABECTO implements a workflow that consists of five components:
* A *source* component to load the ontologies,
* a *transformation* component to add deduced axioms to the ontologies in preparation of further processing,
* a *mapping* component to map the resources of the ontologies,
* a *comparison* component to provide measurements of the ontologies, and
* an *evaluation* component to identify potential mistakes in the ontologies.

For each component, ABECTO provides a couple of *processors*, which provide specific functionality.
These processors can be arranged into a processing pipeline to define the comparison process.
This tutorial provides an introduction to the use of ABECTO inside a Jupyter Notebook.

## Preparation

Before we can start, we need to do a few preparation steps. If ABECTO has not been compiled yet, we should do it now. (This step is not needed, if you run this notebook on [mybinder.org](https://mybinder.org).)

```
mvn package -Dmaven.test.skip=true
```

ABECTO is running as a HTTP REST service in the background. We will use some provided Python functions, which hide the raw HTTP requests.

In [None]:
from abecto import *

First, we create some sample files that we will use in this tutorial.
These four files belong to three ontologies, which all describe some of people their relations.
To load a file from the filesystem of the maschin that hosts the Jupyter notebook, we could use `open("path/to/ontology.file")` instead.

In [None]:
import tempfile

source1file1 = tempfile.TemporaryFile(mode = "w+")
source1file1.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :pnr       "45678"^^xsd:integer ;
           :boss      :bob .
""")
source1file1.seek(0)

source1file2 = tempfile.TemporaryFile(mode = "w+")
source1file2.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :bill rdf:type   :Person ;
          rdfs:label "Bill" ;
          :pnr       "67890"^^xsd:integer ;
          :boss      :alice .
""")
source1file2.seek(0)

source2file1 = tempfile.TemporaryFile(mode = "w+")
source2file1.write("""
    BASE            <http://example.org/b/>
    PREFIX :        <http://example.org/b/>
    PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX owl:     <http://www.w3.org/2002/07/owl#>
    PREFIX dcterms: <http://purl.org/dc/terms/>
    
    <http://example.org/a/> rdf:type owl:Ontology ;
                            dcterms:modified "2020-07-21" .

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :boss      :alice .

    :william rdf:type   :Person ;
             rdfs:label "William" ;
             :boss      "Alice" .

    :charlie rdf:type   :Person ;
             rdfs:label "Charlie" .
""")
source2file1.seek(0);

source3file1 = tempfile.TemporaryFile(mode = "w+")
source3file1.write("""
    BASE         <http://example.org/c/>
    PREFIX :     <http://example.org/c/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
    PREFIX owl:  <http://www.w3.org/2002/07/owl#>
    
    <http://example.org/c/> rdf:type owl:Ontology ;
                            owl:versionInfo "2.1" .

    :P001 rdf:type   :Person ;
          rdfs:label "Alice" ;
          :pnr       "12345"^^xsd:integer .

    :P002 rdf:type   :Person ;
          rdfs:label "Charlie" ;
          :pnr       "45678"^^xsd:integer .

    :P003 rdf:type   :Person ;
          rdfs:label "Dave" ;
          :pnr       "98765"^^xsd:integer .

    :P004 rdf:type   :Person ;
          rdfs:label "Williams" ;
          :pnr       "10000"^^xsd:integer .
""")
source3file1.seek(0);

Now, we start the service. **This might take a few seconds.** (If ABECTO is already running, this will just initialize the Python object needed in this notebook.)

In [None]:
abecto = Abecto("target/abecto.jar") if 'abecto' not in locals() else abecto
abecto.start()

After the service is started, we are ready to create our ontology evaluation and comparison project.

## Project Setup

First, we create a new ABECTO project. If a project with that name already exists, the existing project will be loaded.

In [None]:
project = abecto.project("My Comparison Project")

A project consists of the ontologies to analyze and a processing pipeline. Each node of the pipeline calls a processor with a specific set of parameters and input nodes. The results of these processings are RDF models that can be consumed by further nodes or fetched for analysis.

Now, we create the ontology objectc for the ontologies we want to include into our project. If a ontology object with that name already exists, the existing ontology object will be loaded.

In [None]:
onto1 = project.ontology("ABC")
onto2 = project.ontology("DEF")
onto3 = project.ontology("GHI")

A ontology might consist of several sources. For each source we create at least one source node. In this case, we use the `RdfFileSourceProcessor`, witch reads RDF files from the local file system.

In [None]:
onto1source1 = onto1.source("RdfFileSourceProcessor")
onto1source2 = onto1.source("RdfFileSourceProcessor")
onto2source1 = onto2.source("RdfFileSourceProcessor")
onto3source1 = onto3.source("RdfFileSourceProcessor")

Now, we load the RDF files into the source nodes. This is done in a two-stage process to allow later updates of the sources.

In [None]:
onto1source1.load(source1file1)
onto1source2.load(source1file2)
onto2source1.load(source2file1)
onto3source1.load(source3file1);

To compare the ontologies, ABECTO needs to know what we want to compare. This is declared with so called "categories". For each ontology we can define one pattern for each applicable category. The patterns use the Turtle/SPARQL syntax and one variable needs to have the same name as the category itself. In this case, we use the `ManualCategoryProcessor` to declare a single category called "person". We use `into()` to create the following node for each ontology with the source node as input. We use `+` to combine the two source nodes of ontology 1.

In [None]:
categories1 = (onto1source1 + onto1source2).into("ManualCategoryProcessor", {"patterns": {
    "person": """{?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/a/pnr>                   ?pnr ;
                         <http://example.org/a/boss>                  ?boss .}"""}})
categories2 = onto2source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """{?person <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
                 OPTIONAL {
                     ?person <http://example.org/b/boss> ?boss .
                 }}"""}})
categories3 = onto3source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """{?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/c/pnr>                   ?pnr .}"""}})

Next, we map the resources of the different ontologies. It is possible to use multiple mappers in one project. A `MappingProcessor` can set mappings of resource pairs, but can also prohibit mappings of resource pairs. Thereby, they will consider the mappings of previous `MappingProcessor`s. To enable manual mapping corrections, we will first add a `ManualMappingProcessor`. The manual mappings will be defined in the node parameters. Here we prohibit a mapping of `http://example.org/b/william` and `http://example.org/c/P004`. The parameters can be manipulated later to add further manual mapping corrections. We use all three category nodes as input. By this, the results of the source nodes are also available for the mapping node, as the results of earlier nodes will be passed through. Next, we use the `JaroWinklerMappingProcessor`, a simple mapper that utilizes the Jaro-Winkler Similarity, to automatically map the entities of the different ontologies. The mapping will by used in the further nodes. As we do not need to interact with the single nodes, we will chain all the node definitions.

In [None]:
mapping = (categories1 + categories2 + categories3).into("ManualMappingProcessor", {
    "mappings": [],
    "suppressed_mappings": [
        ["http://example.org/b/william", "http://example.org/c/P004"]
    ]})\
    .into("JaroWinklerMappingProcessor", {"threshold": 0.9, "case_sensitive": False, "category": "person", "variables": ["label"]})

Now we define some nodes for comparison and evaluation. As the nodes do not depend among each other, we directly use the mapping as input for each of them. This enables parallel processing of the these nodes.

In [None]:
mapping.into("CategoryCountProcessor")
mapping.into("LiteralDeviationProcessor", {"variables": {"person": ["label", "pnr"] }})
mapping.into("ResourceDeviationProcessor", {"variables": {"person": ["boss"] }})
mapping.into("CompletenessProcessor");

# Project Execution and Result Reporting

After all nodes have been defined, we will now go to execute the pipeline.

In [None]:
execution = project.runAndAwait()

The returned `Execution` can be used to inspect the execution results. To ensure some extent of reproducibility, we display avaliable metadata of the used sources.

In [None]:
execution.metadata()

Next, we will take a look on the mapping results and apparently missing resources. We could notice a missing mapping of `http://example.org/a/bill` and `http://example.org/b/william` add them to the manual mappings for future executions.

In [None]:
execution.mappings()
execution.omissions()

Now we will inspect the results of the `CategoryCountProcessor`. The following command would also show all measurements generated by any processors.

In [None]:
execution.measurements()

Next, we will inspect the deviations between mapped resources, as provided by the `LiteralDeviationProcessor` or the `ResourceDeviationProcessor`.

In [None]:
execution.deviations()

Some Processors might also have reported issues, which we want to inspect now.

In [None]:
execution.issues()

Finally, we will shutdown the ABECTO server.

In [None]:
abecto.stop()

# Advanced Features

In [None]:
# restart the ABECTO server
abecto.start()

In [None]:
# list project
abecto.projects()

In [None]:
# get project by id
id = abecto.projects()[0].id
abecto.getProject(id)

In [None]:
# get project information
project.info()

In [None]:
# delete projects
trashProject = abecto.project("Trash Project")
trashProject.delete()
abecto.projects()

In [None]:
# get ontologies of a project
project.ontologies()

In [None]:
# get ontologies by id
id = project.ontologies()[0].id
abecto.getOntology(id)

In [None]:
# get ontologies information
onto1.info()

In [None]:
# delete ontologies
trashKB = project.ontology("Trash Ontology")
trashKB.delete()
project.ontologies()

In [None]:
# get nodes of a project
project.nodes()

In [None]:
# get nodes by id
id = project.nodes()[0].id
abecto.getNode(id)

In [None]:
# get node information
mapping.info()

In [None]:
# get processings of a node
mapping.processings()

In [None]:
# get the last processing of a node
mapping.lastProcessing()

In [None]:
# get the raw results of a processing; might be useful for debugging
mapping.lastProcessing().raw()

In [None]:
# get the result graph of a processing as JSON-LD
mapping.lastProcessing().graph()

In [None]:
# get the results of the processing as pandas.DataFrame
mapping.lastProcessing().dataFrame()

In [None]:
# shutdown the ABECTO server
abecto.stop()