# ABECTO Tutorial

ABECTO is an **AB**ox **E**valuation and **C**omparison **T**ool for **O**ntologies. It allows to easily compare and evaluate two or more RDF knowledge bases regarding the contained information. This tutorial provides an introduction to the use of ABECTO.


## Preparation

Before we can start, we need to do a few preparation steps. If ABECTO has not been compiled yet, we should do it now. (This step is not needed, if you run this notebook on [mybinder.org](https://mybinder.org).)

```
mvn package -Dmaven.test.skip=true
```

ABECTO is running as a HTTP REST service in the background. We will use some provided Phyton functions, which hide the raw HTTP requests.

In [None]:
from abecto import *

First, we create some sample files that we will use in this tutorial.

In [None]:
import tempfile

source1file1 = tempfile.TemporaryFile(mode = "w+")
source1file1.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :pnr       "45678"^^xsd:integer ;
           :boss      :bob .
""")
source1file1.seek(0)

source1file2 = tempfile.TemporaryFile(mode = "w+")
source1file2.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :bill rdf:type   :Person ;
          rdfs:label "Bill" ;
          :pnr       "67890"^^xsd:integer ;
          :boss      :alice .
""")
source1file2.seek(0)

source2file1 = tempfile.TemporaryFile(mode = "w+")
source2file1.write("""
    BASE         <http://example.org/b/>
    PREFIX :     <http://example.org/b/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :boss      :alice .

    :william rdf:type   :Person ;
             rdfs:label "William" ;
             :boss      "Alice" .

    :charlie rdf:type   :Person ;
             rdfs:label "Charlie" .
""")
source2file1.seek(0);

source3file1 = tempfile.TemporaryFile(mode = "w+")
source3file1.write("""
    BASE         <http://example.org/c/>
    PREFIX :     <http://example.org/c/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :P001 rdf:type   :Person ;
          rdfs:label "Alice" ;
          :pnr       "12345"^^xsd:integer .

    :P002 rdf:type   :Person ;
          rdfs:label "Charlie" ;
          :pnr       "45678"^^xsd:integer .

    :P003 rdf:type   :Person ;
          rdfs:label "Dave" ;
          :pnr       "98765"^^xsd:integer .
""")
source3file1.seek(0);

Now, we start the service. **This might take a few seconds.** (If ABECTO is already running, this will just initialize the Phyton object needed in this notebook.)

In [None]:
abecto = Abecto("http://localhost:8080/", "target/abecto.jar")
abecto.start()

After the service was started, we are ready to create our ontology evaluation and comparison project.

## Project Setup

First, we create a new ABECTO project. We can also give the project an arbitrary name.

In [None]:
project = abecto.project("My Comparison Project")

A project consists of the knowledge bases to analyse and a processing pipeline. Each node of the pipeline (called "step") calls a processor with a specific set of parameters and input steps. The results of these processings are RDF models that can be consumed by further steps or fetched for analysis.

Now, we create the knowledge base object for the knowledge base we want to include into our project.

In [None]:
kb1 = project.knowledgeBase("ABC")
kb2 = project.knowledgeBase("DEF")
kb3 = project.knowledgeBase("GHI")

A knowledge base might consist of several sources. For each source we create at least one source step. In this case, we use the `RdfFileSourceProcessor`, witch reads RDF files from the local file system.

In [None]:
kb1source1 = kb1.source("RdfFileSourceProcessor")
kb1source2 = kb1.source("RdfFileSourceProcessor")
kb2source1 = kb2.source("RdfFileSourceProcessor")
kb3source1 = kb3.source("RdfFileSourceProcessor")

Now, we load the RDF files into the source steps. This is done in a two-stage process to allow later updates of the sources.

In [None]:
kb1source1.load(source1file1)
kb1source2.load(source1file2)
kb2source1.load(source2file1)
kb3source1.load(source3file1);

To compare the knowledge bases, ABECTO needs to know what we want to compare. This is declare with so called "categories". For each knowledge base we can define one pattern for each applicable category. The patterns use the Turtle/SPARQL syntax and one variable needs to have the same name as the category itself. In this case, we use the `ManualCategoryProcessor` to declare a single category called "person". We use `into()` to create the following step for each knowledge base with the source step as input. We use `+` to combine the two source steps of knowledge base 1.

In [None]:
categories1 = (kb1source1 + kb1source2).into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/a/pnr>                   ?pnr ;
                         <http://example.org/a/boss>                  ?boss ."""}})
categories2 = kb2source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
                 OPTIONAL {
                     ?person <http://example.org/b/boss> ?boss .
                 }"""}})
categories3 = kb3source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/c/pnr>                   ?pnr ."""}})

In the next step we use the `JaroWinklerMappingProcessor` to map the entities of the different knowledge bases. Therefor, we use all three category steps as input. By this, the results of the source steps are also avaliable for the mapping step, as the results of earlier steps will be passed through. The mapping will by used be the further steps.

In [None]:
autoMapping = (categories1 + categories2 + categories3).into("JaroWinklerMappingProcessor", {"threshold": 0.9, "case_sensitive": False, "category": "person", "variables": ["label"]})

It is also possible to use multiple mappers in one project. We now improve the mapping using the `ManuallMappingProcessor`.

In [None]:
#manualMapping = autoMapping.into("ManualMappingProcessor", {"mappings": [["http://example.org/a/bill","http://example.org/b/william"]]})
manualMapping = autoMapping.into("ManualMappingProcessor")

Now we define some steps for comparison and evaluation. As we do not need to address the single steps, we will chain all the step definitions.

In [None]:
manualMapping.into("CategoryCountProcessor")\
             .into("LiteralDeviationProcessor", {"variables": {"person": ["label", "pnr"] }})\
             .into("ResourceDeviationProcessor", {"variables": {"person": ["boss"] }});

# Project Execution and Reporting

After all steps have been defined, we will now go to execute the pipeline and look into the results.

In [None]:
execution = project.runAndAwait()

In [None]:
execution.measures()

In [None]:
execution.deviations()

In [None]:
execution.issues()

In [None]:
#execution.mappings(manualMapping)
self=execution
manualMappingStep=manualMapping


# symboles
accepted = "✓"
retained = "?"
rejected = "✗"

output = widgets.Output()

# widgets templates
def unmappedWidget(resource, resourceData, resourceSink):
    with output:
        table = "<table>"
        if any(list(resourceData)):
            for key in sorted(resourceData):
                table += "<tr>"
                table += "<td style=\"text-align:center;\">" + key + "</td>"
                table += "<td style=\"text-align:right;\">" + ", ".join(resourceData[key]) + "</td>"
                table += "</tr>"
        table += "</table>"
        button = widgets.Button(description=resource, tooltip='Use', layout={'width': 'max-content'})
        def use(b):
            resourceSink.value = resource
        button.on_click(use)
        return widgets.VBox([button, widgets.HTML(value=table)], layout={'border': 'solid 1px lightgrey'})

def mappingPairWidget(resource1, resource2, resource1Data, resource1Data2, value):
    with output:
        keys = set(list(resource1Data)).union(set(list(resource2Data)))
        table = "<table><tr><th>" + resource1 + "</th><th></th><th>"+ resource2 + "</th></tr>"
        if any(keys):
            for key in sorted(keys):
                table += "<tr>"
                table += "<td style=\"text-align:right;\">" + (", ".join(resource1Data[key]) if key in resource1Data else "") + "</td>"
                table += "<td style=\"text-align:center;\">" + key + "</td>"
                table += "<td style=\"text-align:left;\">" + (", ".join(resource2Data[key]) if key in resource2Data else "") + "</td>"
                table += "</tr>"
        table += "</table>"
        button = widgets.ToggleButtons(options=[accepted,retained,rejected],value=value,tooltips=["Accept", "Retain", "Reject"], style={'button_width': 'auto'})
        return widgets.HBox([button, widgets.HTML(value=table)], layout={'border': 'solid 1px lightgrey'})

def newMappingWidget(resource1, resource2, newMappingSink):
    with output:
        button = widgets.Button(description=rejected, tooltip='Remove')
        widget = widgets.HBox([widgets.Label(value=resource1), button, widgets.Label(value=resource2)])
        def remove(b):
            newMappingSink.children = tuple(x for x in newMappingSink.children if x != widget)
        button.on_click(remove)
        return widget

def unmappedPairingWidget(resource1Widget, resource2Widget, newMappingSink):
    with output:
        button = widgets.Button(description='Add Mapping')
        def add(b):
            if resource1Widget.value != "" and resource2Widget.value != "":
                newMappingSink.children += (newMappingWidget(resource1Widget.value, resource2Widget.value, newMappingSink),)
                resource1Widget.value = ""
                resource2Widget.value = ""
        button.on_click(add)
        return widgets.VBox([widgets.HBox([resource1Widget, button, resource2Widget]), newMappingSink])

# collect execution data
categoryData = self.resultDataFrame("Category")
categories =  set(categoryData["name"])
knowledgeBases = self.sortedKnowledgeBases(set(categoryData["knowledgeBase"]))
mappings = {}
for mapping in self.results("Mapping"):
    if mapping["resourcesMap"]:
        if mapping["resource1"] in mappings:
            mappings[mapping["resource1"]].add(mapping["resource2"])
        else:
            mappings[mapping["resource1"]] = {mapping["resource2"]}
        if mapping["resource2"] in mappings:
            mappings[mapping["resource2"]].add(mapping["resource1"])
        else:
            mappings[mapping["resource2"]] = {mapping["resource1"]}
# get manual mapping parameters
manualMappingParameters = manualMappingStep.parameters()["parameters"]
display(manualMappingParameters)
# collect positive manual mappings from parameters
manualPositiveMappings = {}
for mappingList in (manualMappingParameters["mappings"] if manualMappingParameters["mappings"] else []):
    for resource1 in mappingList:
        for resource2 in mappingList:
            if resource1 != resource2:
                if resource1 in manualPositiveMappings:
                    manualPositiveMappings[resource1].add(resource2)
                else:
                    manualPositiveMappings[resource1] = {resource2}
                if resource2 in manualPositiveMappings:
                    manualPositiveMappings[resource2].add(resource1)
                else:
                    manualPositiveMappings[resource2] = {resource1}
# collect negativ manual mappings from parameters
manualNegativeMappings = {}
for mappingList in (manualMappingParameters["suppressed_mappings"] if manualMappingParameters["suppressed_mappings"] else []):
    for resource1 in mappingList:
        for resource2 in mappingList:
            if resource1 != resource2:
                if resource1 in manualNegativeMappings:
                    manualNegativeMappings[resource1].add(resource2)
                else:
                    manualNegativeMappings[resource1] = {resource2}
                if resource2 in manualNegativeMappings:
                    manualNegativeMappings[resource2].add(resource1)
                else:
                    manualNegativeMappings[resource2] = {resource1}
# prepare unmapped resources widgets
resourceSinks = {}
unmappedResourceWidgets = {}
kbData = {}
for categoryName in categories:
    resourceSinks[categoryName] = {}
    kbData[categoryName]= {}
    unmappedResourceWidgets[categoryName]= {}
    for (kbId, kbLabel) in knowledgeBases:
        resourceSinks[categoryName][kbId] = widgets.Text(value='', placeholder='Resource to map')
        kbData[categoryName][kbId] = self.data(categoryName, kbId)
        unmapped = []
        for unmappedResource in set(kbData[categoryName][kbId])-set(mappings)-set(manualPositiveMappings):
            unmapped.append(unmappedWidget(unmappedResource, kbData[categoryName][kbId][unmappedResource], resourceSinks[categoryName][kbId]))
        unmappedResourceWidgets[categoryName][kbId] = widgets.VBox(unmapped,layout={'width': '50%','max_height':'20em'})

mappingPairWidgets = {}
newMappingSinks = []
categoryTabChildren = []
categoryTabTitles = []
for categoryName in categories:
    kbTabChildrens = []
    kbTabTitles = []
    for (kb1Id, kb1Label) in knowledgeBases:
        for (kb2Id, kb2Label) in knowledgeBases:
            if (kb1Label < kb2Label):
                pairs = []
                for resource1 in kbData[categoryName][kb1Id]:
                    resource1Data = kbData[categoryName][kb1Id][resource1]
                    # add positive manual mappings
                    if resource1 in manualPositiveMappings:
                        for resource2 in manualPositiveMappings[resource1]:
                            if resource2 in kbData[categoryName][kb2Id]:
                                resource2Data = kbData[categoryName][kb2Id][resource2]
                                pair = mappingPairWidget(resource1, resource2, resource1Data, resource2Data, accepted)
                                mappingPairWidgets[pair] = [resource1, resource2]
                                pairs.append(pair)
                    # add negative manual mappings
                    if resource1 in manualNegativeMappings:
                        for resource2 in manualNegativeMappings[resource1]:
                            if resource2 in kbData[categoryName][kb2Id]:
                                resource2Data = kbData[categoryName][kb2Id][resource2]
                                pair = mappingPairWidget(resource1, resource2, resource1Data, resource2Data, rejected)
                                mappingPairWidgets[pair] = [resource1, resource2]
                                pairs.append(pair)
                    # add none manual mappings
                    if resource1 in mappings:
                        for resource2 in mappings[resource1]:
                            if resource2 in kbData[categoryName][kb2Id] and not (
                                resource1 in manualPositiveMappings and resource2 in manualPositiveMappings[resource1] or
                                resource1 in manualNegativeMappings and resource2 in manualNegativeMappings[resource1] ):
                                resource2Data = kbData[categoryName][kb2Id][resource2]
                                pair = mappingPairWidget(resource1, resource2, resource1Data, resource2Data, retained)
                                mappingPairWidgets[pair] = [resource1, resource2]
                                pairs.append(pair)
                # widgets management
                newMappingSink = widgets.VBox([],layout={'max_height':'30em'})
                pairTab = widgets.VBox([
                    widgets.VBox(pairs,layout={'max_height':'30em'}),
                    widgets.HBox([
                        unmappedResourceWidgets[categoryName][kb1Id],
                        unmappedResourceWidgets[categoryName][kb2Id]
                    ]),
                    unmappedPairingWidget(resourceSinks[categoryName][kb1Id], resourceSinks[categoryName][kb2Id], newMappingSink)
                ])
                kbTabChildrens.append(pairTab)
                newMappingSinks.append(newMappingSink)
                kbTabTitles.append(kb1Label + " <-> " + kb2Label)
    # widgets management
    kbTabs = widgets.Tab(children=kbTabChildrens)
    for i, title in enumerate(kbTabTitles):
        kbTabs.set_title(i, title)    
    categoryTabChildren.append(kbTabs)
    categoryTabTitles.append(categoryName)
# widgets management
categoryTabs = widgets.Tab(children=categoryTabChildren)
for i, title in enumerate(categoryTabTitles):
    categoryTabs.set_title(i, title)
    
display(HTML("<h2>Mapping Review</h2>"))
display(categoryTabs)

updateButton = widgets.Button(description='Update Mappings')
hideButton = widgets.Button(description='Show Only Undecided', layout={'width': 'max-content'})
showButton = widgets.Button(description='Show All')
showButton.layout.display = "none"
display(widgets.HBox([updateButton, hideButton, showButton]), output)
def updateMappings(b):
    with output:
        # get remote manual mapping data
        manualMappingParameters = manualMappingStep.parameters()["parameters"]
        manualPositiveMappings = manualMappingParameters["mappings"] if manualMappingParameters["mappings"] else []
        manualNegativeMappings = manualMappingParameters["suppressed_mappings"] if manualMappingParameters["suppressed_mappings"] else []
        # update local manual mapping data
        for mappingPairWidget in mappingPairWidgets:
            if mappingPairWidget.children[0].value == accepted:
                while mappingPairWidgets[mappingPairWidget] not in manualPositiveMappings: manualPositiveMappings.append(mappingPairWidgets[mappingPairWidget])
                while mappingPairWidgets[mappingPairWidget] in manualNegativeMappings: manualNegativeMappings.remove(mappingPairWidgets[mappingPairWidget])
            if mappingPairWidget.children[0].value == retained:
                while mappingPairWidgets[mappingPairWidget] in manualPositiveMappings: manualPositiveMappings.remove(mappingPairWidgets[mappingPairWidget])
                while mappingPairWidgets[mappingPairWidget] in manualNegativeMappings: manualNegativeMappings.remove(mappingPairWidgets[mappingPairWidget])
            elif mappingPairWidget.children[0].value == rejected:
                while mappingPairWidgets[mappingPairWidget] in manualPositiveMappings: manualPositiveMappings.remove(mappingPairWidgets[mappingPairWidget])
                while mappingPairWidgets[mappingPairWidget] not in manualNegativeMappings: manualNegativeMappings.append(mappingPairWidgets[mappingPairWidget])
        for newMappingSink in newMappingSinks:
            for newMapping in newMappingSink.children:
                manualPositiveMappings.append([newMapping.children[0].value, newMapping.children[2].value])
        # update remote manual mapping data
        manualMappingStep.setParameter("mappings", manualPositiveMappings)
        manualMappingStep.setParameter("suppressed_mappings", manualNegativeMappings)
        display(HTML("Manual Mappings updated."))
updateButton.on_click(updateMappings)
def hide(b):
    with output:
        for mappingPairWidget in mappingPairWidgets:
            if mappingPairWidget.children[0].value != retained:
                mappingPairWidget.layout.display = "none"
        hideButton.layout.display = "none"
        showButton.layout.display = "inline-flex"
hideButton.on_click(hide)
def show(b):
    with output:
        for mappingPairWidget in mappingPairWidgets:
            if mappingPairWidget.children[0].value != retained:
                mappingPairWidget.layout.display = "inline-flex"
        hideButton.layout.display = "inline-flex"
        showButton.layout.display = "none"
showButton.on_click(show)


# Advanced Features

Now, we see our new project in the list of projects.

In [None]:
abecto.projects()

We can also request information of a certain project.

In [None]:
project.info()

Furthermore, we can delete projects.

In [None]:
trashProject = abecto.project("Trash Project")
trashProject.delete()
abecto.projects()

The knowledge bases appear now in the list of knowledge bases of the project.

In [None]:
project.knowledgeBases()

We can also request information of a certain knowledge base.

In [None]:
kb1.info()

And we can delete knowledge bases.

In [None]:
trashKB = project.knowledgeBase("Trash Knowledge Base")
trashKB.delete()
project.knowledgeBases()

In [None]:
project.steps()

In [None]:
kb1source1.info()

In [None]:
manualMapping.processings()

In [None]:
manualMapping.last()

In [None]:
manualMapping.last().raw()

In [None]:
manualMapping.last().graph()

In [None]:
manualMapping.last().graphAsDataFrame()

In [None]:
#abecto.stop()