# ABECTO Tutorial

ABECTO is an **AB**ox **E**valuation and **C**omparison **T**ool for **O**ntologies. It allows to easily compare and evaluate two or more RDF knowledge bases regarding the contained information. This tutorial provides an introduction to the use of ABECTO.


## Preparation

Before we can start, we need to do a few preparation steps. If ABECTO has not been compiled yet, we should do it now. (This step is not needed, if you run this notebook on [mybinder.org](https://mybinder.org).)

```
mvn package -Dmaven.test.skip=true
```

ABECTO is running as a HTTP REST service in the background. We will use some provided Phyton functions, which hide the raw HTTP requests.

In [1]:
from abecto import *

First, we create some sample files that we will use in this tutorial.

In [2]:
import tempfile

source1file1 = tempfile.TemporaryFile(mode = "w+")
source1file1.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :pnr       "45678"^^xsd:integer ;
           :boss      :bob .
""")
source1file1.seek(0)

source1file2 = tempfile.TemporaryFile(mode = "w+")
source1file2.write("""
    BASE         <http://example.org/a/>
    PREFIX :     <http://example.org/a/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :bill rdf:type   :Person ;
          rdfs:label "Bill" ;
          :pnr       "67890"^^xsd:integer ;
          :boss      :alice .
""")
source1file2.seek(0)

source2file1 = tempfile.TemporaryFile(mode = "w+")
source2file1.write("""
    BASE         <http://example.org/b/>
    PREFIX :     <http://example.org/b/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    :alice rdf:type   :Person ;
           rdfs:label "Alice" ;
           :boss      :alice .

    :william rdf:type   :Person ;
             rdfs:label "William" ;
             :boss      "Alice" .

    :charlie rdf:type   :Person ;
             rdfs:label "Charlie" .
""")
source2file1.seek(0);

source3file1 = tempfile.TemporaryFile(mode = "w+")
source3file1.write("""
    BASE         <http://example.org/c/>
    PREFIX :     <http://example.org/c/>
    PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

    :P001 rdf:type   :Person ;
          rdfs:label "Alice" ;
          :pnr       "12345"^^xsd:integer .

    :P002 rdf:type   :Person ;
          rdfs:label "Charlie" ;
          :pnr       "45678"^^xsd:integer .
""")
source3file1.seek(0);

Now, we start the service. **This might take a few seconds.** (If ABECTO is already running, this will just initialize the Phyton object needed in this notebook.)

In [3]:
abecto = Abecto("http://localhost:8080/", "target/abecto.jar")
abecto.start()

After the service was started, we are ready to create our ontology evaluation and comparison project.

## Project Setup

First, we create a new ABECTO project. We can also give the project an arbitrary name.

In [4]:
project = abecto.project("My Comparison Project")

A project consists of the knowledge bases to analyse and a processing pipeline. Each node of the pipeline (called "step") calls a processor with a specific set of parameters and input steps. The results of these processings are RDF models that can be consumed by further steps or fetched for analysis.

Now, we create the knowledge base object for the knowledge base we want to include into our project.

In [5]:
kb1 = project.knowledgeBase("ABC")
kb2 = project.knowledgeBase("DEF")
kb3 = project.knowledgeBase("GHI")

A knowledge base might consist of several sources. For each source we create at least one source step. In this case, we use the `RdfFileSourceProcessor`, witch reads RDF files from the local file system.

In [6]:
kb1source1 = kb1.source("RdfFileSourceProcessor")
kb1source2 = kb1.source("RdfFileSourceProcessor")
kb2source1 = kb2.source("RdfFileSourceProcessor")
kb3source1 = kb3.source("RdfFileSourceProcessor")

Now, we load the RDF files into the source steps. This is done in a two-stage process to allow later updates of the sources.

In [7]:
kb1source1.load(source1file1)
kb1source2.load(source1file2)
kb2source1.load(source2file1)
kb3source1.load(source3file1);

To compare the knowledge bases, ABECTO needs to know what we want to compare. This is declare with so called "categories". For each knowledge base we can define one pattern for each applicable category. The patterns use the Turtle/SPARQL syntax and one variable needs to have the same name as the category itself. In this case, we use the `ManualCategoryProcessor` to declare a single category called "person". We use `into()` to create the following step for each knowledge base with the source step as input. We use `plus()` to combine the two source steps of knowledge base 1.

In [8]:
categories1 = kb1source1.plus(kb1source2).into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/a/pnr>                   ?pnr ;
                         <http://example.org/a/boss>                  ?boss ."""}})
categories2 = kb2source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
                 OPTIONAL {
                     ?person <http://example.org/b/boss> ?boss .
                 }"""}})
categories3 = kb3source1.into("ManualCategoryProcessor", {"patterns": {
    "person": """?person <http://www.w3.org/2000/01/rdf-schema#label> ?label ;
                         <http://example.org/c/pnr>                   ?pnr ."""}})

In the next step we use the `JaroWinklerMappingProcessor` to map the entities of the different knowledge bases. Therefor, we use all three category steps as input. By this, the results of the source steps are also avaliable for the mapping step, as the results of earlier steps will be passed through. The mapping will by used be the further steps.

In [9]:
mapping1 = categories1.plus(categories2).plus(categories3).into("JaroWinklerMappingProcessor", {"threshold": 0.9, "case_sensitive": False, "category": "person", "variables": ["label"]})

It is also possible to use multiple mappers in one project. We now improve the mapping using the `ManuallMappingProcessor`.

In [10]:
mapping2 = mapping1.into("ManualMappingProcessor", {"mappings": [["http://example.org/a/bill","http://example.org/b/william"]]})

With the `MappingReportProcessor` we prepare the mappings for later reporting.

In [11]:
mappingReport = mapping2.into("MappingReportProcessor")

In [12]:
count = mappingReport.into("CategoryCountProcessor")

In [13]:
literalDeviation = count.into("LiteralDeviationProcessor", {"variables": {"person": ["label", "pnr"] }})

In [14]:
resourceDeviation = literalDeviation.into("ResourceDeviationProcessor", {"variables": {"person": ["boss"] }})

In [15]:
deviationReport = resourceDeviation.into("DeviationReportProcessor")

In [16]:
issueReport = deviationReport.into("IssueReportProcessor")

# Project Execution and Reporting

In [17]:
execution = project.runAndAwait()

In [18]:
mappingReport.report()

DEF,Unnamed: 1,ABC
http://example.org/b/alice,,http://example.org/a/alice
http://example.org/b/alice,boss,http://example.org/a/bob
Alice,label,Alice
,pnr,45678
http://example.org/b/charlie,,
Charlie,label,
http://example.org/b/william,,http://example.org/a/bill
Alice,boss,http://example.org/a/alice
William,label,Bill
,pnr,67890


GHI,Unnamed: 1,ABC
http://example.org/c/P001,,http://example.org/a/alice
,boss,http://example.org/a/bob
Alice,label,Alice
12345,pnr,45678
http://example.org/c/P002,,
Charlie,label,
45678,pnr,
,,http://example.org/a/bill
,boss,http://example.org/a/alice
,label,Bill


GHI,Unnamed: 1,DEF
http://example.org/c/P001,,http://example.org/b/alice
,boss,http://example.org/b/alice
Alice,label,Alice
12345,pnr,
http://example.org/c/P002,,http://example.org/b/charlie
Charlie,label,Charlie
45678,pnr,
,,http://example.org/b/william
,boss,Alice
,label,William


In [19]:
count.report()

Category,Variable,GHI,ABC,DEF
person,,2.0,2,3.0
person,boss,,2,2.0
person,label,2.0,2,3.0
person,pnr,2.0,2,


In [20]:
deviationReport.report()

ABC,ABC.1,ABC.2,DEF,DEF.1,DEF.2
http://example.org/a/alice,boss,<http://example.org/a/bob>,<http://example.org/b/alice>,boss,http://example.org/b/alice
http://example.org/a/bill,label,Bill,William,label,http://example.org/b/william


ABC,ABC.1,ABC.2,GHI,GHI.1,GHI.2
http://example.org/a/alice,pnr,45678^^http://www.w3.org/2001/XMLSchema#integer,12345^^http://www.w3.org/2001/XMLSchema#integer,pnr,http://example.org/c/P001


In [21]:
issueReport.report()

Issue Type,Affected Entity,Message
UnexpectedValueType,http://example.org/b/william,"Value of property ""boss"" is not a resource."


# Advanced Features

Now, we see our new project in the list of projects.

In [22]:
abecto.projects()

[{'id': 'dffef576-5bda-4084-8d12-965158dbf862', 'label': 'My Comparison Project'},
 {'id': '4c938347-a4c0-4c4c-ba93-2ad51310932f', 'label': 'My Comparison Project'},
 {'id': 'ffc52c87-1ac0-4f5e-894d-ad2dba8de680', 'label': 'My Comparison Project'},
 {'id': 'baf24ad7-2894-480b-a808-f0adb6f260ff', 'label': 'My Comparison Project'},
 {'id': 'fffa5ac2-6d94-45c5-92c6-e4041b688160', 'label': 'My Comparison Project'},
 {'id': '66a74f15-b393-485a-a339-0c6a8edbcbe9', 'label': 'My Comparison Project'},
 {'id': '0f7ec2c0-064e-49ff-a436-cbd33c545440', 'label': 'My Comparison Project'},
 {'id': 'ae8b7263-f265-45ef-a2b1-448d86847656', 'label': 'My Comparison Project'},
 {'id': 'bcf5a0b1-4637-40ce-8d4f-2713b73d7157', 'label': 'My Comparison Project'},
 {'id': 'acd73eda-17a3-4cde-9b3c-a55aac9218df', 'label': 'My Comparison Project'},
 {'id': '62f2c72e-f370-4654-82a1-46dc7807af8e', 'label': 'My Comparison Project'},
 {'id': '52d9e2ec-002b-4844-8f68-6044a4f65072', 'label': 'My Comparison Project'},
 {'i

We can also request information of a certain project.

In [23]:
project.info()

{'id': 'aed99acd-3be0-46f0-962a-c6696a0b4650',
 'label': 'My Comparison Project'}

Furthermore, we can delete projects.

In [24]:
trashProject = abecto.project("Trash Project")
trashProject.delete()
abecto.projects()

[{'id': 'dffef576-5bda-4084-8d12-965158dbf862', 'label': 'My Comparison Project'},
 {'id': '4c938347-a4c0-4c4c-ba93-2ad51310932f', 'label': 'My Comparison Project'},
 {'id': 'ffc52c87-1ac0-4f5e-894d-ad2dba8de680', 'label': 'My Comparison Project'},
 {'id': 'baf24ad7-2894-480b-a808-f0adb6f260ff', 'label': 'My Comparison Project'},
 {'id': 'fffa5ac2-6d94-45c5-92c6-e4041b688160', 'label': 'My Comparison Project'},
 {'id': '66a74f15-b393-485a-a339-0c6a8edbcbe9', 'label': 'My Comparison Project'},
 {'id': '0f7ec2c0-064e-49ff-a436-cbd33c545440', 'label': 'My Comparison Project'},
 {'id': 'ae8b7263-f265-45ef-a2b1-448d86847656', 'label': 'My Comparison Project'},
 {'id': 'bcf5a0b1-4637-40ce-8d4f-2713b73d7157', 'label': 'My Comparison Project'},
 {'id': 'acd73eda-17a3-4cde-9b3c-a55aac9218df', 'label': 'My Comparison Project'},
 {'id': '62f2c72e-f370-4654-82a1-46dc7807af8e', 'label': 'My Comparison Project'},
 {'id': '52d9e2ec-002b-4844-8f68-6044a4f65072', 'label': 'My Comparison Project'},
 {'i

The knowledge bases appear now in the list of knowledge bases of the project.

In [25]:
project.knowledgeBases()

[{'id': '5cc584d1-f322-44db-8c7f-62905107d482', 'label': 'ABC', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'},
 {'id': '6b50d0ef-3519-429d-9476-1151585e7edd', 'label': 'DEF', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'},
 {'id': 'fe3a9f1e-1320-41b2-aaf4-38489b2cc8af', 'label': 'GHI', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'}]

We can also request information of a certain knowledge base.

In [26]:
kb1.info()

{'id': '5cc584d1-f322-44db-8c7f-62905107d482',
 'label': 'ABC',
 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'}

And we can delete knowledge bases.

In [27]:
trashKB = project.knowledgeBase("Trash Knowledge Base")
trashKB.delete()
project.knowledgeBases()

[{'id': '5cc584d1-f322-44db-8c7f-62905107d482', 'label': 'ABC', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'},
 {'id': '6b50d0ef-3519-429d-9476-1151585e7edd', 'label': 'DEF', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'},
 {'id': 'fe3a9f1e-1320-41b2-aaf4-38489b2cc8af', 'label': 'GHI', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650'}]

In [28]:
project.steps()

[{'id': '169e445f-c8ef-4498-bc80-0b5cb77b6664', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650', 'knowledgeBase': '5cc584d1-f322-44db-8c7f-62905107d482', 'parameter': {'id': '1905f1b7-2cae-4c18-b6d7-556f2c1581b4', 'parameters': {}}, 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.RdfFileSourceProcessor'},
 {'id': 'c86bf51d-9679-46dc-93c6-3f847a0d7ee6', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650', 'knowledgeBase': '5cc584d1-f322-44db-8c7f-62905107d482', 'parameter': {'id': '5af9d31b-35d2-47f7-af70-6c9b2dbe8397', 'parameters': {}}, 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.RdfFileSourceProcessor'},
 {'id': '162226b8-d126-4a5a-bfc8-727f11e871d0', 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650', 'knowledgeBase': '6b50d0ef-3519-429d-9476-1151585e7edd', 'parameter': {'id': '9d6b2247-174f-419d-9395-eaa3d7ccbbe6', 'parameters': {}}, 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.RdfFileSourceProcessor'}

In [29]:
kb1source1.info()

{'id': '169e445f-c8ef-4498-bc80-0b5cb77b6664',
 'project': 'aed99acd-3be0-46f0-962a-c6696a0b4650',
 'knowledgeBase': '5cc584d1-f322-44db-8c7f-62905107d482',
 'parameter': {'id': '1905f1b7-2cae-4c18-b6d7-556f2c1581b4', 'parameters': {}},
 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.RdfFileSourceProcessor'}

In [30]:
mapping2.processings()

[{'id': '46138029-1bb7-4e5a-a5ce-844a030792ab', 'step': '2a8d9790-6c4d-452b-b1d7-3c89cf305dc7', 'parameter': {'id': 'b82c3413-6509-42ec-bbcb-0f63fd687760', 'parameters': {'mappings': [['http://example.org/a/bill', 'http://example.org/b/william']], 'suppressed_mappings': None}}, 'inputProcessings': ['03002939-6e3f-4eae-833f-874af42c15df'], 'startDateTime': '2020-02-26T17:32:21.884883+01:00', 'endDateTime': '2020-02-26T17:32:21.956685+01:00', 'status': 'SUCCEEDED', 'stackTrace': None, 'modelHash': '9b9a13a8a9b192a141c61e72cbd4a7c621633668', 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.ManualMappingProcessor'}]

In [31]:
mappingReport.last()

{'id': '3c1ef2ec-9570-4b74-931f-f9c416b2c2ac', 'step': 'e246b740-3052-4e9c-a3b9-45707c72035f', 'parameter': {'id': 'a48d1290-c93b-410e-a61b-60686a55b538', 'parameters': {}}, 'inputProcessings': ['46138029-1bb7-4e5a-a5ce-844a030792ab'], 'startDateTime': '2020-02-26T17:32:21.888523+01:00', 'endDateTime': '2020-02-26T17:32:21.972734+01:00', 'status': 'SUCCEEDED', 'stackTrace': None, 'modelHash': 'b91ffab08407cca6d87ff30fbf364bed536eff71', 'processorClass': 'de.uni_jena.cs.fusion.abecto.processor.implementation.MappingReportProcessor'}

In [32]:
mappingReport.last().raw()

'{\n  "@context" : {\n    "knowledgeBase2" : "http://fusion.cs.uni-jena.de/ontology/abecto#knowledgeBase2",\n    "category2" : "http://fusion.cs.uni-jena.de/ontology/abecto#category2",\n    "knowledgeBase1" : "http://fusion.cs.uni-jena.de/ontology/abecto#knowledgeBase1",\n    "data2" : "http://fusion.cs.uni-jena.de/ontology/abecto#data2",\n    "id2" : "http://fusion.cs.uni-jena.de/ontology/abecto#id2",\n    "id1" : "http://fusion.cs.uni-jena.de/ontology/abecto#id1",\n    "data1" : "http://fusion.cs.uni-jena.de/ontology/abecto#data1",\n    "category1" : "http://fusion.cs.uni-jena.de/ontology/abecto#category1",\n    "owl" : "http://www.w3.org/2002/07/owl#",\n    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",\n    "xsd" : "http://www.w3.org/2001/XMLSchema#",\n    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"\n  },\n  "@graph" : [ {\n    "@id" : "_:b0",\n    "@type" : "http://fusion.cs.uni-jena.de/ontology/abecto#MappingReportEntity",\n    "category1" : "person",\n    "category2

In [33]:
mappingReport.last().graph()

[{'@id': '_:b0',
  '@type': 'http://fusion.cs.uni-jena.de/ontology/abecto#MappingReportEntity',
  'category1': 'person',
  'category2': 'person',
  'data1': '{"label":"Alice","person":"http://example.org/c/P001","pnr":"12345"}',
  'data2': '{"boss":"http://example.org/a/bob","label":"Alice","person":"http://example.org/a/alice","pnr":"45678"}',
  'id1': 'http://example.org/c/P001',
  'id2': 'http://example.org/a/alice',
  'knowledgeBase1': 'fe3a9f1e-1320-41b2-aaf4-38489b2cc8af',
  'knowledgeBase2': '5cc584d1-f322-44db-8c7f-62905107d482'},
 {'@id': '_:b1',
  '@type': 'http://fusion.cs.uni-jena.de/ontology/abecto#MappingReportEntity',
  'category1': 'person',
  'category2': 'person',
  'data1': '{"label":"Charlie","person":"http://example.org/c/P002","pnr":"45678"}',
  'id1': 'http://example.org/c/P002',
  'knowledgeBase1': 'fe3a9f1e-1320-41b2-aaf4-38489b2cc8af',
  'knowledgeBase2': '5cc584d1-f322-44db-8c7f-62905107d482'},
 {'@id': '_:b2',
  '@type': 'http://fusion.cs.uni-jena.de/ontolog

In [34]:
mappingReport.last().graphAsDataFrame()

Unnamed: 0,@id,@type,category1,category2,data1,data2,id1,id2,knowledgeBase1,knowledgeBase2
0,_:b0,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""label"":""Alice"",""person"":""http://example.org/...","{""boss"":""http://example.org/a/bob"",""label"":""Al...",http://example.org/c/P001,http://example.org/a/alice,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,5cc584d1-f322-44db-8c7f-62905107d482
1,_:b1,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""label"":""Charlie"",""person"":""http://example.or...",,http://example.org/c/P002,,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,5cc584d1-f322-44db-8c7f-62905107d482
2,_:b2,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""label"":""Charlie"",""person"":""http://example.or...",,http://example.org/b/charlie,,6b50d0ef-3519-429d-9476-1151585e7edd,5cc584d1-f322-44db-8c7f-62905107d482
3,_:b3,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""label"":""Alice"",""person"":""http://example.org/...","{""boss"":""http://example.org/b/alice"",""label"":""...",http://example.org/c/P001,http://example.org/b/alice,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,6b50d0ef-3519-429d-9476-1151585e7edd
4,_:b4,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,,"{""boss"":""http://example.org/a/alice"",""label"":""...",,http://example.org/a/bill,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,5cc584d1-f322-44db-8c7f-62905107d482
5,_:b5,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""label"":""Charlie"",""person"":""http://example.or...","{""label"":""Charlie"",""person"":""http://example.or...",http://example.org/c/P002,http://example.org/b/charlie,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,6b50d0ef-3519-429d-9476-1151585e7edd
6,_:b6,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""boss"":""Alice"",""label"":""William"",""person"":""ht...","{""boss"":""http://example.org/a/alice"",""label"":""...",http://example.org/b/william,http://example.org/a/bill,6b50d0ef-3519-429d-9476-1151585e7edd,5cc584d1-f322-44db-8c7f-62905107d482
7,_:b7,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,,"{""boss"":""Alice"",""label"":""William"",""person"":""ht...",,http://example.org/b/william,fe3a9f1e-1320-41b2-aaf4-38489b2cc8af,6b50d0ef-3519-429d-9476-1151585e7edd
8,_:b8,http://fusion.cs.uni-jena.de/ontology/abecto#M...,person,person,"{""boss"":""http://example.org/b/alice"",""label"":""...","{""boss"":""http://example.org/a/bob"",""label"":""Al...",http://example.org/b/alice,http://example.org/a/alice,6b50d0ef-3519-429d-9476-1151585e7edd,5cc584d1-f322-44db-8c7f-62905107d482
