# ABECTO Tutorial

ABECTO is an **AB**ox **E**valuation and **C**omparison **T**ool for **O**ntologies. It allows to easily compare and evaluate two or more RDF knowledge bases regarding the contained information. This tutorial provides an introduction to the use of ABECTO inside a Jupyter Notebook.

## Preparation

Before we can start, we need to do a few preparation steps. If ABECTO has not been compiled yet, we should do it now. (This step is not needed, if you run this notebook on [mybinder.org](https://mybinder.org).)

```
mvn package -Dmaven.test.skip=true
```

ABECTO is running as a HTTP REST service in the background. We will use some provided Phyton functions, which hide the raw HTTP requests.

In [None]:
from abecto import *

First, we create some sample files that we will use in this tutorial.

In [None]:
import tempfile

source1file1 = tempfile.TemporaryFile(mode = "w+")
source1file1.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :pnr       "45678"^^xsd:integer ;
           :boss      :bob .
""")
source1file1.seek(0)

source1file2 = tempfile.TemporaryFile(mode = "w+")
source1file2.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :bill rdf:type   :Person ;
          rdfs:label "Bill" ;
          :pnr       "67890"^^xsd:integer ;
          :boss      :alice .
""")
source1file2.seek(0)

source2file1 = tempfile.TemporaryFile(mode = "w+")
source2file1.write("""
    BASE         <http://example.org/b/>
    PREFIX :     <http://example.org/b/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :boss      :alice .

    :william rdf:type   :Person ;
             rdfs:label "William" ;
             :boss      "Alice" .

    :charlie rdf:type   :Person ;
             rdfs:label "Charlie" .
""")
source2file1.seek(0);

source3file1 = tempfile.TemporaryFile(mode = "w+")
source3file1.write("""
    BASE         <http://example.org/c/>
    PREFIX :     <http://example.org/c/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :P001 rdf:type   :Person ;
          rdfs:label "Alice" ;
          :pnr       "12345"^^xsd:integer .

    :P002 rdf:type   :Person ;
          rdfs:label "Charlie" ;
          :pnr       "45678"^^xsd:integer .

    :P003 rdf:type   :Person ;
          rdfs:label "Dave" ;
          :pnr       "98765"^^xsd:integer .

    :P004 rdf:type   :Person ;
          rdfs:label "Williams" ;
          :pnr       "10000"^^xsd:integer .
""")
source3file1.seek(0);

Now, we start the service. **This might take a few seconds.** (If ABECTO is already running, this will just initialize the Phyton object needed in this notebook.)

In [None]:
abecto = Abecto("http://localhost:8080/", "target/abecto.jar")
abecto.start()

After the service was started, we are ready to create our ontology evaluation and comparison project.

## Project Setup

First, we create a new ABECTO project. We can also give the project an arbitrary name.

In [None]:
project = abecto.project("My Comparison Project")

A project consists of the knowledge bases to analyse and a processing pipeline. Each node of the pipeline (called "step") calls a processor with a specific set of parameters and input steps. The results of these processings are RDF models that can be consumed by further steps or fetched for analysis.

Now, we create the knowledge base object for the knowledge base we want to include into our project.

In [None]:
kb1 = project.knowledgeBase("ABC")
kb2 = project.knowledgeBase("DEF")
kb3 = project.knowledgeBase("GHI")

A knowledge base might consist of several sources. For each source we create at least one source step. In this case, we use the `RdfFileSourceProcessor`, witch reads RDF files from the local file system.

In [None]:
kb1source1 = kb1.source("RdfFileSourceProcessor")
kb1source2 = kb1.source("RdfFileSourceProcessor")
kb2source1 = kb2.source("RdfFileSourceProcessor")
kb3source1 = kb3.source("RdfFileSourceProcessor")

Now, we load the RDF files into the source steps. This is done in a two-stage process to allow later updates of the sources.

In [None]:
kb1source1.load(source1file1)
kb1source2.load(source1file2)
kb2source1.load(source2file1)
kb3source1.load(source3file1);

To compare the knowledge bases, ABECTO needs to know what we want to compare. This is declare with so called "categories". For each knowledge base we can define one pattern for each applicable category. The patterns use the Turtle/SPARQL syntax and one variable needs to have the same name as the category itself. In this case, we use the `ManualCategoryProcessor` to declare a single category called "person". We use `into()` to create the following step for each knowledge base with the source step as input. We use `+` to combine the two source steps of knowledge base 1.

In [None]:
categories1 = (kb1source1 + kb1source2).into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/a/pnr>                   ?pnr ;
                         <http://example.org/a/boss>                  ?boss ."""}})
categories2 = kb2source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
                 OPTIONAL {
                     ?person <http://example.org/b/boss> ?boss .
                 }"""}})
categories3 = kb3source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/c/pnr>                   ?pnr ."""}})

Next, we map the resources of the different knowledge bases. It is possible to use multiple mappers in one project. A `MappingProcessor` can set mappings of a resource pairs, but can also prohibit mappings of resource pairs. Thereby they will consider the mappings of previous `MappingProcessor`s. To enable manuall mapping corrections, we will first add a `ManuallMappingProcessor`. The manual mappings will be defined in the processing parameters. Here we prohibit a mapping of `http://example.org/b/william` and `http://example.org/c/P004`. The parameters can be manipulated later to add further manuall mapping corrections. We use all three category steps as input. By this, the results of the source steps are also avaliable for the mapping step, as the results of earlier steps will be passed through. Next, we use the `JaroWinklerMappingProcessor`, a simple mapper that utilizes the Jaro-Winkler Similarity, to automatically map the entities of the different knowledge bases. The mapping will by used be the further steps.

In [None]:
manualMapping = (categories1 + categories2 + categories3).into("ManualMappingProcessor", {
    "mappings": [],
    "suppressed_mappings": [
        ["http://example.org/b/william", "http://example.org/c/P004"]
    ]})
autoMapping = manualMapping.into("JaroWinklerMappingProcessor", {"threshold": 0.9, "case_sensitive": False, "category": "person", "variables": ["label"]})

Now we define some steps for comparison and evaluation. As we do not need to address the single steps, we will chain all the step definitions.

In [None]:
autoMapping.into("CategoryCountProcessor")\
           .into("LiteralDeviationProcessor", {"variables": {"person": ["label", "pnr"] }})\
           .into("ResourceDeviationProcessor", {"variables": {"person": ["boss"] }});

# Project Execution and Reporting

After all steps have been defined, we will now go to execute the pipeline.

In [None]:
execution = project.runAndAwait()

The returned `Execution` can be used to inspect the execution results. We will first take a look on the mapping and can update them, if we want to. For example, we could amend a mapping of `http://example.org/a/bill` and `http://example.org/b/william`.

In [None]:
execution.mappings(manualMapping)

If we updated the mapping, the changes will be contained in the parameters of the `ManualMappingProcessor`.

In [None]:
manualMapping.parameters()

To take the new mapping into account, we need to reexecute the project.

In [None]:
execution = project.runAndAwait()

Now we will inspect the results of the `CategoryCountProcessor`. The following command would also show any other measurements generated by other processors.

In [None]:
execution.measures()

Next, we will inspect the deviations between mapped resources, as provided by the `LiteralDeviationProcessor` or the `ResourceDeviationProcessor`.

In [None]:
execution.deviations()

Some Processors might also have reported issues, which we want to inspect now.

In [None]:
execution.issues()

Finally, we will shutdown the ABECTO server.

In [None]:
abecto.stop()

# Advanced Features

In [None]:
# restart the ABECTO server
abecto.start()

In [None]:
# list project
abecto.projects()

In [None]:
# get project by id
id = abecto.projects()[0].id
abecto.getProject(id)

In [None]:
# get project information
project.info()

In [None]:
# delete projects
trashProject = abecto.project("Trash Project")
trashProject.delete()
abecto.projects()

In [None]:
# get knowledge bases of a project
project.knowledgeBases()

In [None]:
# get knowledge bases by id
id = project.knowledgeBases()[0].id
abecto.getKnowledgeBase(id)

In [None]:
# get knowledge bases information
kb1.info()

In [None]:
# delete knowledge bases
trashKB = project.knowledgeBase("Trash Knowledge Base")
trashKB.delete()
project.knowledgeBases()

In [None]:
# get steps of a project
project.steps()

In [None]:
# get steps by id
id = project.steps()[0].id
abecto.getStep(id)

In [None]:
# get step information
manualMapping.info()

In [None]:
# get processings of a step
manualMapping.processings()

In [None]:
# get the last processing of a step
manualMapping.lastProcessing()

In [None]:
# get the raw results of a processing; might be useful for debugging
manualMapping.lastProcessing().raw()

In [None]:
# get the result graph of a processing as JSON-LD
manualMapping.lastProcessing().graph()

In [None]:
# get the results of the processing as pandas.DataFrame
manualMapping.lastProcessing().dataFrame()

In [None]:
# shutdown the ABECTO server
abecto.stop()