# ABECTO Tutorial

ABECTO is an **AB**ox **E**valuation and **C**omparison **T**ool for **O**ntologies. It allows to easily compare and evaluate two or more RDF knowledge bases regarding the contained information. This tutorial provides an introduction to the use of ABECTO.


## Preparation

Before we can start, we need to do a few preparation steps. If ABECTO has not been compiled yet, we should do it now. (This step is not needed, if you run this notebook on [mybinder.org](https://mybinder.org).)

```
mvn package -Dmaven.test.skip=true
```

ABECTO is running as a HTTP REST service in the background. We will use some provided Phyton functions, which hide the raw HTTP requests.

In [None]:
from abecto import *

First, we create some sample files that we will use in this tutorial.

In [None]:
import tempfile

source1file1 = tempfile.TemporaryFile(mode = "w+")
source1file1.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :pnr       "45678"^^xsd:integer ;
           :boss      :bob .
""")
source1file1.seek(0)

source1file2 = tempfile.TemporaryFile(mode = "w+")
source1file2.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :bill rdf:type   :Person ;
          rdfs:label "Bill" ;
          :pnr       "67890"^^xsd:integer ;
          :boss      :alice .
""")
source1file2.seek(0)

source2file1 = tempfile.TemporaryFile(mode = "w+")
source2file1.write("""
    BASE         <http://example.org/b/>
    PREFIX :     <http://example.org/b/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :boss      :alice .

    :william rdf:type   :Person ;
             rdfs:label "William" ;
             :boss      "Alice" .

    :charlie rdf:type   :Person ;
             rdfs:label "Charlie" .
""")
source2file1.seek(0);

source3file1 = tempfile.TemporaryFile(mode = "w+")
source3file1.write("""
    BASE         <http://example.org/c/>
    PREFIX :     <http://example.org/c/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :P001 rdf:type   :Person ;
          rdfs:label "Alice" ;
          :pnr       "12345"^^xsd:integer .

    :P002 rdf:type   :Person ;
          rdfs:label "Charlie" ;
          :pnr       "45678"^^xsd:integer .
""")
source3file1.seek(0);

Now, we start the service. **This might take a few seconds.** (If ABECTO is already running, this will just initialize the Phyton object needed in this notebook.)

In [None]:
abecto = Abecto("http://localhost:8080/", "target/abecto.jar")
abecto.start()

After the service was started, we are ready to create our ontology evaluation and comparison project.

## Project Setup

First, we create a new ABECTO project. We can also give the project an arbitrary name.

In [None]:
project = abecto.project("My Comparison Project")

A project consists of the knowledge bases to analyse and a processing pipeline. Each node of the pipeline (called "step") calls a processor with a specific set of parameters and input steps. The results of these processings are RDF models that can be consumed by further steps or fetched for analysis.

Now, we create the knowledge base object for the knowledge base we want to include into our project.

In [None]:
kb1 = project.knowledgeBase("ABC")
kb2 = project.knowledgeBase("DEF")
kb3 = project.knowledgeBase("GHI")

A knowledge base might consist of several sources. For each source we create at least one source step. In this case, we use the `RdfFileSourceProcessor`, witch reads RDF files from the local file system.

In [None]:
kb1source1 = kb1.source("RdfFileSourceProcessor")
kb1source2 = kb1.source("RdfFileSourceProcessor")
kb2source1 = kb2.source("RdfFileSourceProcessor")
kb3source1 = kb3.source("RdfFileSourceProcessor")

Now, we load the RDF files into the source steps. This is done in a two-stage process to allow later updates of the sources.

In [None]:
kb1source1.load(source1file1)
kb1source2.load(source1file2)
kb2source1.load(source2file1)
kb3source1.load(source3file1);

To compare the knowledge bases, ABECTO needs to know what we want to compare. This is declare with so called "categories". For each knowledge base we can define one pattern for each applicable category. The patterns use the Turtle/SPARQL syntax and one variable needs to have the same name as the category itself. In this case, we use the `ManualCategoryProcessor` to declare a single category called "person". We use `into()` to create the following step for each knowledge base with the source step as input. We use `plus()` to combine the two source steps of knowledge base 1.

In [None]:
categories1 = kb1source1.plus(kb1source2).into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/a/pnr>                   ?pnr ;
                         <http://example.org/a/boss>                  ?boss ."""}})
categories2 = kb2source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
                 OPTIONAL {
                     ?person <http://example.org/b/boss> ?boss .
                 }"""}})
categories3 = kb3source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/c/pnr>                   ?pnr ."""}})

In the next step we use the `JaroWinklerMappingProcessor` to map the entities of the different knowledge bases. Therefor, we use all three category steps as input. By this, the results of the source steps are also avaliable for the mapping step, as the results of earlier steps will be passed through. The mapping will by used be the further steps.

In [None]:
mapping1 = categories1.plus(categories2).plus(categories3).into("JaroWinklerMappingProcessor", {"threshold": 0.9, "case_sensitive": False, "category": "person", "variables": ["label"]})

It is also possible to use multiple mappers in one project. We now improve the mapping using the `ManuallMappingProcessor`.

In [None]:
mapping2 = mapping1.into("ManualMappingProcessor", {"mappings": [["http://example.org/a/bill","http://example.org/b/william"]]})

With the `MappingReportProcessor` we prepare the mappings for later reporting.

In [None]:
mappingReport = mapping2.into("MappingReportProcessor")

In [None]:
count = mappingReport.into("CategoryCountProcessor")

In [None]:
literalDeviation = count.into("LiteralDeviationProcessor", {"variables": {"person": ["label", "pnr"] }})

In [None]:
resourceDeviation = literalDeviation.into("ResourceDeviationProcessor", {"variables": {"person": ["boss"] }})

In [None]:
deviationReport = resourceDeviation.into("DeviationReportProcessor")

In [None]:
issueReport = deviationReport.into("IssueReportProcessor")

# Project Execution and Reporting

In [None]:
execution = project.runAndAwait()

In [None]:
execution.measures()

In [None]:
mappingReport.report()

In [None]:
count.report()

In [None]:
deviationReport.report()

In [None]:
issueReport.report()

# Advanced Features

Now, we see our new project in the list of projects.

In [None]:
abecto.projects()

We can also request information of a certain project.

In [None]:
project.info()

Furthermore, we can delete projects.

In [None]:
trashProject = abecto.project("Trash Project")
trashProject.delete()
abecto.projects()

The knowledge bases appear now in the list of knowledge bases of the project.

In [None]:
project.knowledgeBases()

We can also request information of a certain knowledge base.

In [None]:
kb1.info()

And we can delete knowledge bases.

In [None]:
trashKB = project.knowledgeBase("Trash Knowledge Base")
trashKB.delete()
project.knowledgeBases()

In [None]:
project.steps()

In [None]:
kb1source1.info()

In [None]:
mapping2.processings()

In [None]:
mappingReport.last()

In [None]:
mappingReport.last().raw()

In [None]:
mappingReport.last().graph()

In [None]:
mappingReport.last().graphAsDataFrame()