## Ontology Alignment

In this Tutorial we will align the taxonomy created in Tutorial 6 with an upper ontology

In [None]:
!pip install owlready2
!pip install thefuzz
!pip install nltk

In [None]:
import nltk
nltk.download('wordnet')

In [None]:
# Use this cell if the 'owlready2' module cannot be found in the steps below
# Adapt the path accordingly for your user
import sys
modulename = 'owlready2'
if modulename not in sys.modules:
    sys.path.append('path to your Python packages (e.g. /home/USER/.local/lib/python3.13/site-packages)')

### Let us first load an upper ontology, here DUL, and list its content

In [None]:
from owlready2 import *

onto = get_ontology("http://www.ease-crc.org/ont/DUL.owl").load()

list(onto.classes())

#### Let us get the class relations

For this, we create a second namespace for the loaded ontology (dul), and list subclasses

We can do this until we reach a class that might be a fit as upper class

In [None]:
dul = get_namespace("http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#")
                    
list(dul.Entity.subclasses())

In [None]:
list(dul.Object.subclasses())

In [None]:
list(dul.PhysicalObject.subclasses())

#### Now we reached a point where I am unsure what the difference between a physical object and a physical artifact is. In such situations, we can access the ontologie to get the class comment (rdfs.comment annotation property) 

Note that the rdfs namespace is automatically loaded just as well as the owl namespace, which makes things easier.

In [None]:
print(dul.PhysicalArtifact.comment)

#### At this point I personally decide to stop my search. Our goal is to align our products from the product taxonomy to the DUL top level ontology. I argue that products are physical objects, but with the above definition are not physical artifacts (since products often are structurally designed, like a shampoo that is designed for a certain use). Therefore, we should align our products to be a subclass of the physical object class.

### In a second step we want to also load our product taxonomy created in Tutorial 6 and align it with the DUL ontology

#### For this, please upload the saved ontology to this jupyter notebook (if you are using the dockerized version).

The ontology alignment in this case is to assign the product class as a subclass of the DUL.PhysicalObject class

#### To load the file, we now have to use second name. Print the classes to make sure you loaded correctly

In [None]:
prod = get_ontology("ProductTaxonomyFromLidl.owl").load()

list(prod.classes())

#### Let us again assign a new namespace for the product taxonomy

In [None]:
tax = get_namespace("http://ProductTaxonomyFromLidl.owl#")

print(tax.Product.iri)

## Now on to actually aligning the ontologies!

#### This can now be easily done by assigning the product class from the product taxonomy file as a subclass of the dul upper ontology.

In [None]:
with onto: 
    tax.Product.is_a.append(dul.PhysicalObject)

list(dul.PhysicalObject.subclasses())

### Unfortunately, we need to do this for all subclasses of the product class as well.

In [None]:
for i in tax.Product.subclasses():
    # Clear all subClassOf relationships or else it will not append this class to the Product class in the AlignedOntology
    # As long as you don't save the LIDL Product taxonomy, these destroyed relationships are not persisted.
    i.is_a = []
    with onto:
        i.is_a.append(tax.Product)

#### Let us save the aligned ontology

In [None]:
onto.save(file = "AlignedOntology.owl", format = "rdfxml")

## Entity Matching

In the following we will automatically link the Product subclasses from the LIDL ontology to fitting Product classes from the [GoodRelations ontology](http://www.heppnetz.de/projects/goodrelations/) using the 'hasDbXref' property.
First we will use the ['thefuzz' library](https://pypi.org/project/thefuzz/) to calculate the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between two classes, represented by their label.
If the resulting score is above a specified threshold, both classes are matched.
If no match is found, we use WordNet to iterate over all possible synonyms of the class label and chose the highest matching one. 

In [None]:
link_foodon = get_ontology("linkFoodOn.owl").load()
taxonomy = get_ontology("product-taxonomy.owl").load()

# Define namespaces
gr = get_namespace("http://purl.org/goodrelations/v1#")
obo = get_namespace("http://www.geneontology.org/formats/oboInOwl#")

#Top-level concept of all products.
product_service = gr["ProductOrService"]
print(product_service.iri)

In [None]:
with onto:
    class hasDbXref(AnnotationProperty):
        namespace = obo

In [None]:
from thefuzz import process
from nltk.corpus import wordnet

match_thresh = 75
label_cls = {}

# Create k,v pair of label 'en' with its associated class
for product_category in product_service.descendants():
        labels = product_category.label
        if len(labels) == 0:
            continue
     
        labels = [l for l in labels if isinstance(l, owlready2.util.locstr) and l.lang == 'en']
        if len(labels) == 0:
            continue
        label = labels[0].lower()
        label_cls.update({label: product_category})
#print(label_cls)
        
""" Apply distance based string matching onto lidl product classes and find a matching 
    class from the product-taxonomy 

    We don't find matching classes based on semantic similarity. 
"""
for product_category in prod.Product.subclasses():
    products = product_category.label
    assert len(products) == 1
    labels = products[0].split("&")
    
    matched = ''
    high_score = -1
    for label in labels:     
        label = label.lower().strip()
        match = process.extractOne(label, label_cls.keys())
        if match[1] >= match_thresh:      
            if match[1] > high_score:
                high_score = match[1]
                matched = match[0]
        else:
            # use synonyms to match by gathering all synsets
            synsets = wordnet.synsets(label)
            for syn in synsets:
                # iterate over all synsets & look for the best match:
                for synonym in syn.lemma_names():
                    match = process.extractOne(synonym, label_cls.keys())     
                    if match[1] > high_score:
                        high_score = match[1]
                        matched = match[0]
        
    with onto:
        print(f'\'{product_category.name}\' from LIDL taxonomy matched against \'{label_cls.get(matched).name}\' from product-taxonomy with score {high_score}')    
        product_category.hasDbXref = label_cls.get(matched)

#### Let us save the matchings in the ontology

In [None]:
onto.save(file = "AlignedOntology.owl", format = "rdfxml")