## Notes

- [This paper](https://arxiv.org/pdf/2309.07172) suggests that Flan-T5-XXL might perform best: [Hugging face link to model](https://huggingface.co/google/flan-t5-xxl)

## Steps
1. Ontology Parsing 
- support for rdf and owl
- save into "standardized" json file with the relevant information that we want
2. Lexical Equivalence Matching (optional - skip for now)
3. Matching (parallel)
    - String Matching
    - Embeddings Matching
4. Combining & Filtering
- output of this will be a .rdf with the alignment

In [13]:
# imports
import json
from owlready2 import *

In [14]:
# input paths
onto1_path_in = "test_ontologies/mouse.owl"
onto2_path_in = "test_ontologies/human.owl"

# output paths
onto1_path_out = "ontology_jsons/onto1.json"
onto2_path_out = "ontology_jsons/onto2.json"

In [15]:
# load ontologies
onto1 = get_ontology(onto1_path).load()
onto2 = get_ontology(onto2_path).load()
print(onto1, onto2)

get_ontology("http://mouse.owl#") get_ontology("http://human.owl#")


In [16]:
# Extract classes and object properties
classes1 = list(onto1.classes())
properties1 = list(onto1.object_properties())
annotations1 = list(onto1.annotation_properties())

classes2 = list(onto2.classes())
properties2 = list(onto2.object_properties())
annotations2 = list(onto2.annotation_properties())


In [17]:
print(classes1[:10])
print(annotations1[:10])

[owl.Thing, mouse.MA_0000001, mouse.MA_0000002, mouse.MA_0001112, mouse.MA_0000216, mouse.MA_0000003, mouse.MA_0002405, mouse.MA_0000004, mouse.MA_0002433, mouse.MA_0000005]
[rdf-schema.label, oboInOwl.hasRelatedSynonym, oboInOwl.hasDbXref, oboInOwl.hasDefaultNamespace, oboInOwl.hasAlternativeId, oboInOwl.savedBy, oboInOwl.hasDate]


In [18]:
print(classes2[:10])
print(annotations2[:10])

[oboInOwl.DbXref, oboInOwl.Definition, oboInOwl.ObsoleteClass, oboInOwl.Subset, oboInOwl.Synonym, oboInOwl.SynonymType, human.NCI_C12219, human.NCI_C12220, human.NCI_C38617, human.NCI_C12419]
[rdf-schema.label, oboInOwl.hasRelatedSynonym, oboInOwl.hasDefaultNamespace, oboInOwl.savedBy, oboInOwl.hasDate, oboInOwl.hasDefinition]


In [19]:
# Extracting classes and their attributes for ontology 1
count = 0
for cls in onto1.classes():
    count += 1
    print("Class:", cls)
    print("Superclasses:", list(cls.is_a))
    print("Annotations:", list(cls.label))
    if count == 10:
        break

Class: owl.Thing
Superclasses: []
Annotations: []
Class: mouse.MA_0000001
Superclasses: [owl.Thing]
Annotations: ['mouse anatomy']
Class: mouse.MA_0000002
Superclasses: [mouse.MA_0001112, mouse.UNDEFINED_part_of.some(mouse.MA_0000216)]
Annotations: ['spinal cord grey matter']
Class: mouse.MA_0001112
Superclasses: [owl.Thing, mouse.UNDEFINED_part_of.some(mouse.MA_0000167)]
Annotations: ['grey matter']
Class: mouse.MA_0000216
Superclasses: [mouse.MA_0001901, mouse.UNDEFINED_part_of.some(mouse.MA_0000167)]
Annotations: ['spinal cord']
Class: mouse.MA_0000003
Superclasses: [owl.Thing, mouse.UNDEFINED_part_of.some(mouse.MA_0002405)]
Annotations: ['organ system']
Class: mouse.MA_0002405
Superclasses: [owl.Thing, mouse.UNDEFINED_part_of.some(mouse.MA_0000001)]
Annotations: ['adult mouse']
Class: mouse.MA_0000004
Superclasses: [mouse.MA_0002433]
Annotations: ['trunk']
Class: mouse.MA_0002433
Superclasses: [owl.Thing, mouse.UNDEFINED_part_of.some(mouse.MA_0002405)]
Annotations: ['anatomic regio

In [20]:
# extract information from onto into json
classes_info = []
for cls in onto1.classes():
    class_details = {
        "id": cls.iri,
        "label": cls.label[0] if cls.label else "No label",
        "superclasses": [supercls.iri for supercls in cls.is_a if hasattr(supercls, 'iri')],
        "annotations": {
            "comment": cls.comment[0] if cls.comment else "No comment"
        }
    }
    classes_info.append(class_details)

def save_to_json(file_path, data):
    with open(file_path, 'w') as f:
        json.dump(data, f, indent=4)
    print(f"Data has been saved to '{file_path}'.")

save_to_json(onto1_path_out, classes_info)

Data has been saved to 'ontology_jsons/onto1.json'.


Output format example:
```
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment" 
	 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	 xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

<Alignment>
<xml>yes</xml>
<level>0</level>
<type>??</type>

<map>
	<Cell>
		<entity1 rdf:resource="http://mouse.owl#MA_0002401"/>
		<entity2 rdf:resource="http://human.owl#NCI_C52561"/>
		<measure rdf:datatype="xsd:float">1.0</measure>
		<relation>=</relation>
	</Cell>
</map>
```