![logo](https://raw.githubusercontent.com/sciknoworg/OntoAligner/main/images/logo-with-background.png)

[![PyPI version](https://badge.fury.io/py/OntoAligner.svg)](https://badge.fury.io/py/OntoAligner)
[![PyPI Downloads](https://static.pepy.tech/badge/ontoaligner)](https://pepy.tech/projects/ontoaligner)
![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![Documentation Status](https://readthedocs.org/projects/ontoaligner/badge/?version=main)](https://ontoaligner.readthedocs.io/)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](MAINTANANCE.md)
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14533133.svg)](https://doi.org/10.5281/zenodo.14533133)

- **Documentation website**: [https://ontoaligner.readthedocs.io/index.html](https://ontoaligner.readthedocs.io/index.html)
- **Resource Paper**: [https://doi.org/10.1007/978-3-031-94578-6_10](https://doi.org/10.1007/978-3-031-94578-6_10)


--------

# Deep Dive into OntoAligner Modules 2 (Retriever, LLM, and RAG Approaches)




## Tutorial Overview

This comprehensive tutorial explores three powerful alignment strategies implemented in OntoAligner, moving from traditional retrieval-based methods to cutting-edge retrieval-augmented generation (RAG) approaches. By the end of this tutorial, you'll understand how to leverage different techniques to align ontologies in real-world scenarios.

### What You'll Learn

**Ontology alignment** is the task of finding semantic correspondences between entities in different ontologies‚Äîa critical challenge in knowledge engineering, data integration, and semantic web applications. This tutorial covers:

1. **Retrieval-Based Alignment**: Fast, lightweight semantic matching using embeddings
2. **LLM-Based Alignment**: Leveraging large language models for deep semantic understanding
3. **RAG-Based Alignment**: Combining retrieval and generation for hybrid, context-aware matching

### Real-World Context

Consider a scenario where you need to integrate medical ontologies from different hospitals, or match product hierarchies across e-commerce platforms. These aligners help you automatically discover which entities represent the same concepts, saving weeks of manual curation work.

# ü§ñ Understanding the LLMs4OM Approach

The LLMs4OM ([Large Language Models for Ontology Matching](https://arxiv.org/abs/2404.10317)) framework represents the current state-of-the-art in applying modern NLP techniques to ontology alignment. The diagram below illustrates how different components work together:

![](https://raw.githubusercontent.com/sciknoworg/OntoAligner/refs/heads/dev/docs/source/img/LLMs4OM.jpg)

# üõ†Ô∏è Installation, and Dataset Initialization


In [None]:
# Install the OntoAligner library. (restart the notebook after installation)
!pip install -q ontoaligner numpy>=2.0

We previously introduced the OA dataset, Here we will use the **MaterialInformation-MatOnto Ontology Matching (OAEI) dataset**. This is a standard benchmark in the ontology matching in domain of material science and engineering.


**The Three Encoders**: Before alignment, we need to encode the ontologies into suitable representations:
- **ConceptLightweightEncoder**: Suitable for retrieval based Aligners.
- **ConceptLLMEncoder**: Suitable LLM based Aligners.
- **ConceptParentRAGEncoder** : Enriches lightweight structrues with parent concept information, suitable for RAG based Aligners.

>> Indeed there are `ConceptParentLLMEncoder` or `ConceptParentLightweightEncoder` encoders but within this tutorial we will use the three above encoders.

In [89]:
import json
from tqdm import tqdm
from pprint import pprint

from ontoaligner.utils import metrics

from ontoaligner.ontology import GenericOMDataset
from ontoaligner.encoder import ConceptLightweightEncoder        # For Retriever aligner
from ontoaligner.encoder import ConceptLLMEncoder                # For LLM aligner
from ontoaligner.encoder import ConceptParentRAGEncoder          # For RAG aligner

# Load the dataset
task = GenericOMDataset()

dataset = task.collect(
    source_ontology_path="https://raw.githubusercontent.com/sciknoworg/OntoAligner/main/assets/cmt-conference/source.xml",
    target_ontology_path="https://raw.githubusercontent.com/sciknoworg/OntoAligner/main/assets/cmt-conference/target.xml",
    reference_matching_path="https://raw.githubusercontent.com/sciknoworg/OntoAligner/main/assets/cmt-conference/reference.xml"
)

print(f"\n{'='*60}")
print("DATASET INFORMATION")
print(f"{'='*60}")
print(f"Dataset: {dataset['dataset-info']}")
print(f"Source Ontology Entities: {len(dataset['source'])}")
print(f"Target Ontology Entities: {len(dataset['target'])}")
print(f"Reference Alignments (Ground Truth): {len(dataset['reference'])}")
print(f"{'='*60}\n")

print("Sample Source Entity:")
print(f"  {dataset['source'][0]}")
print("\nSample Target Entity:")
print(f"  {dataset['target'][0]}")
print("\nSample Reference Alignment:")
print(f"  {dataset['reference'][0]}")

32it [00:00, 17936.35it/s]
81it [00:00, 17423.39it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 102/102 [00:00<00:00, 20902.87it/s]


DATASET INFORMATION
Dataset: {'track': 'Generic', 'ontology-name': 'Source-Target'}
Source Ontology Entities: 29
Target Ontology Entities: 59
Reference Alignments (Ground Truth): 15

Sample Source Entity:
  {'name': 'Meta-Reviewer', 'iri': 'http://cmt#Meta-Reviewer', 'label': 'Meta-Reviewer', 'childrens': [], 'parents': [{'iri': 'http://cmt#Reviewer', 'label': 'Reviewer', 'name': 'Reviewer'}], 'synonyms': [], 'comment': ['A special type of Reviewer.  There can be any number of Reviewers for a given paper, but only one Meta-Reviewer.  The Meta-Reviewer can go over all the reviews submitted for the paper and submit their own review.  Questions for the Meta-Reviewer can be different from those for a normal Reviewer.  Their role is to ensure that the reviews are good and consistent. The decision to use Meta-Reviewers is optional, and must be set before paper assignment occurs.']}

Sample Target Entity:
  {'name': 'Conference_participant', 'iri': 'http://conference#Conference_participant',




In [90]:
# Lightweight encoding - fast, suitable for retrieval
print("[1/3] Encoding with ConceptLightweightEncoder...")
retrieve_encoder = ConceptLightweightEncoder()
retriever_encoded_ontos = retrieve_encoder(source=dataset['source'], target=dataset['target'])
print("\n      ‚úì Lightweight encoding complete")

# LLM encoding - computationally intensive, semantically rich
print("[2/3] Encoding with ConceptLLMEncoder...")
llm_encoder = ConceptLLMEncoder()
llm_encoded_ontos = llm_encoder(source=dataset['source'], target=dataset['target'])
print("\n      ‚úì LLM encoding complete")

# RAG encoding - lightweight with parent information for context
print("[3/3] Encoding with ConceptParentLightweightEncoder...")
rag_encoder = ConceptParentRAGEncoder()
rag_encoded_ontos = rag_encoder(source=dataset['source'], target=dataset['target'])
print("\n      ‚úì RAG encoding complete\n")

[1/3] Encoding with ConceptLightweightEncoder...

      ‚úì Lightweight encoding complete
[2/3] Encoding with ConceptLLMEncoder...

      ‚úì LLM encoding complete
[3/3] Encoding with ConceptParentLightweightEncoder...

      ‚úì RAG encoding complete



---

# 1Ô∏è‚É£. Retrieval-Based Alignment: Fast Semantic Matching


Retrieval-based alignment uses **semantic embeddings** to find the most similar target entities for each source entity. This approach is inspired by information retrieval techniques and offers several advantages:

**Advantages**
- **Speed**: Can process large ontologies in seconds
- **Scalability**: Memory-efficient, suitable for billions of entities
- **Simplicity**: Easy to understand and debug
- **Flexibility**: Works with various embedding models

**How It Works**
1. Each entity is converted to a dense vector using an embedding model
2. For each source entity, we find the `top_k` most similar target entities
3. Similarity is measured using distance metrics (e.g., cosine similarity)
4. Candidates are ranked by similarity score

In [51]:
from ontoaligner.aligner import SBERTRetrieval

# Initialize retrieval model with configuration
# device='cuda' leverages GPU for faster computation
# top_k=10 means for each source entity, we retrieve 10 best matches from target
retriever_aligner = SBERTRetrieval(device='cuda', top_k=10)
retriever_aligner.load(path="all-MiniLM-L6-v2")

# Generate alignments
retriever_alignments = retriever_aligner.generate(input_data=retriever_encoded_ontos)

Batches:   0%|          | 0/4 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

29it [00:00, 6441.16it/s]


In [52]:
print(f"\nGenerated {len(retriever_alignments)} alignment predictions")

print("\nSample alignment:")
pprint(retriever_alignments[0])


Generated 29 alignment predictions

Sample alignment:
{'score-cands': [0.8213430643081665,
                 0.6479050517082214,
                 0.5971811413764954,
                 0.5774840712547302,
                 0.5129503011703491,
                 0.3322896361351013,
                 0.28883448243141174,
                 0.2737828493118286,
                 0.26960983872413635,
                 0.2677655816078186],
 'source': 'http://cmt#Meta-Reviewer',
 'target-cands': ['http://conference#Reviewer',
                  'http://conference#Review',
                  'http://conference#Review_expertise',
                  'http://conference#Review_preference',
                  'http://conference#Reviewed_contribution',
                  'http://conference#Regular_author',
                  'http://conference#Conference_contributor',
                  'http://conference#Committee',
                  'http://conference#Rejected_contribution',
                  'http://conference#Po

In [53]:
from ontoaligner.postprocess import retriever_postprocessor

# Post-process alignments to convert raw scores to final 1:1 matches
processed_retriever_alignments = retriever_postprocessor(retriever_alignments)

print(f"After post-processing: {len(processed_retriever_alignments)} alignments\n")

print("Evaluating against ground truth references...\n")

# Evaluate alignments against the reference set
evaluation = metrics.evaluation_report(
    predicts=processed_retriever_alignments,
    references=dataset['reference']
)

print("Evaluation Report:")
print(json.dumps(evaluation, indent=4))

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29/29 [00:00<00:00, 95250.44it/s]

After post-processing: 290 alignments

Evaluating against ground truth references...

Evaluation Report:
{
    "intersection": 11,
    "precision": 3.793103448275862,
    "recall": 73.33333333333333,
    "f-score": 7.213114754098362,
    "predictions-len": 290,
    "reference-len": 15
}





### Alternative Retrieval Methods

OntoAligner supports multiple retrieval approaches. Choose based on your specific needs:

```python
# TF-IDF Retrieval - Lightweight, works on term frequency
from ontoaligner.aligner import TFIDFRetrieval
aligner = TFIDFRetrieval(top_k=5)
aligner.load()
predictions = aligner.generate(input_data=encoded_ontos)

# BM25 Retrieval - Information retrieval standard, good term weighting
from ontoaligner.aligner import BM25Retrieval
aligner = BM25Retrieval(top_k=5)
aligner.load()
predictions = aligner.generate(input_data=encoded_ontos)

# Ada Embeddings from OpenAI - Premium, high-quality embeddings
from ontoaligner.aligner import AdaRetrieval
aligner = AdaRetrieval(top_k=5, openai_key='your-api-key')
aligner.load(path='text-embedding-3-small')
predictions = aligner.generate(input_data=encoded_ontos)
```

### Performance Comparison

| Method | Speed | Quality | Cost | Use Case |
|--------|-------|---------|------|----------|
| TF-IDF | ‚ö°‚ö°‚ö° | ‚≠ê‚≠ê | Free | Quick baselines |
| BM25 | ‚ö°‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê | Free | Production-grade aligner |
| SBERT | ‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê‚≠ê | Free | Balanced quality/speed |
| Ada | ‚ö° | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | üí∞ | Maximum quality |

---

# 2Ô∏è‚É£. LLM-Based Alignment: Deep Semantic Understanding


LLM-based alignment leverages the reasoning capabilities of large language models to understand semantic relationships between entities. Rather than just matching similarity scores, LLMs can consider complex contextual information.

**Key Characteristics:**
- **Semantic Reasoning**: Models understand concepts, synonymy, hypernymy, and other relations
- **Context Awareness**: Incorporates descriptions, comments, and related concepts
- **Flexibility**: Can handle various entity representations and data quality issues
- **Interpretability**: Can generate explanations for alignment decisions

**Typical Workflow**:
1. Create dataset objects from encoded ontologies
2. Generate prompts that describe the alignment task
3. Process prompts through the LLM in batches
4. Extract entity references from model outputs
5. Post-process with a mapper to convert outputs to final alignments

In [54]:
from torch.utils.data import DataLoader
from ontoaligner.aligner import AutoModelDecoderLLM, ConceptLLMDataset

# Create dataset object with filling the prompt template.
llm_dataset = ConceptLLMDataset(source_onto=llm_encoded_ontos[0], target_onto=llm_encoded_ontos[1])

print(f"Total alignment pairs to evaluate: {len(llm_dataset)}")

Total alignment pairs to evaluate: 1711


if you whish to have your own prompt in the backend you might get an inheretence from `LLMDataset` and create your own LLMDataset with your desred prompt:

```python
from ontoaligner.aligner import LLMDataset

class MyLLMDataset(LLMDataset):
    prompt = """your prompt"""
    
    def fill_one_sample(self, input_data: Any) -> str:
        ....
```

The current ConceptLLMDataset is defined as following:

```python
class ConceptLLMDataset(LLMDataset):
    prompt = """Determine whether the following two concepts refer to the same real-world entity. Respond with "yes" or "no" only.
### Concept 1:
{source}
### Concept 2:
{target}
### Your Answer:"""

    def fill_one_sample(self, input_data: Any) -> str:
        source = self.preprocess(input_data["source"]["concept"])
        target = self.preprocess(input_data["target"]["concept"])
        return self.prompt.replace("{source}", source).replace("{target}", target)
```

In [55]:
# Create DataLoader for efficient batch processing
dataloader = DataLoader(
    llm_dataset,
    batch_size=256,
    shuffle=False,
    collate_fn=llm_dataset.collate_fn
)

print(f"DataLoader ready with {len(dataloader)} batches")

DataLoader ready with 7 batches


In [60]:
# Initialize LLM model
# max_length: max input tokens (prompt + entity descriptions)
# max_new_tokens: maximum output tokens (we expect short answers like 'yes', 'no', or entity IDs)
model = AutoModelDecoderLLM(device='cuda', max_length=300, max_new_tokens=20)
model.load(path="Qwen/Qwen2.5-0.5B")

print("‚úì Model loaded successfully")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/681 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

‚úì Model loaded successfully


In [61]:
# List to store predictions
predictions = []

# Generate predictions for ontology alignments using the model
for batch in tqdm(dataloader, desc="Processing batches"):  # Iterate through batched data
    prompts = batch["prompts"]                             # Extract prompts from the batch
    sequences = model.generate(prompts)                    # Generate predictions using the model
    predictions.extend(sequences)                          # Append predictions to the list

print("\n‚úì Generation complete")
print(f"Generated predictions for {len(predictions)} pairs")

print("\nSample predictions (first 10):")
for i, pred in enumerate(predictions[:10]):
    print(f"  {i+1}. {pred[:100]}..." if len(str(pred)) > 100 else f"  {i+1}. {pred}")

Processing batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:30<00:00,  4.41s/it]


‚úì Generation complete
Generated predictions for 1711 pairs

Sample predictions (first 10):
  1.  yes

The two concepts "meta-reviewer" and "conference participant" refer to the same real
  2.  yes

The two concepts "meta-reviewer" and "active conference participant" refer to the same
  3.  yes

The two concepts "meta-reviewer" and "passive conference participant" refer to the
  4.  yes

The two concepts "meta-reviewer" and "person" refer to the same real-world
  5.  yes

The two concepts "meta-reviewer" and "review expertise" refer to the same real
  6.  yes

The two concepts "meta-reviewer" and "submitted contribution" refer to the same real
  7.  yes
  8.  yes
  9.  yes

The two concepts "meta-reviewer" and "conference contributor" refer to the same real
  10.  yes

The two concepts "meta-reviewer" and "contribution 1th-author" refer





In [62]:
from ontoaligner.postprocess import TFIDFLabelMapper, llm_postprocessor
from sklearn.linear_model import LogisticRegression

# Create a mapper that will convert LLM text outputs to entity references
# TFIDFLabelMapper uses TF-IDF similarity to match generated text to actual entity labels (yes/no)
# yes= it is a match, no=it means there is no match.
mapper = TFIDFLabelMapper(classifier=LogisticRegression(), ngram_range=(1, 1))

# Post-process the LLM predictions
# Converts text outputs to structured alignment objects
llm_alignments = llm_postprocessor(predicts=predictions, mapper=mapper, dataset=llm_dataset)

print("‚úì Post-processing complete")
print(f"Final alignments: {len(llm_alignments)}")

‚úì Post-processing complete
Final alignments: 1428


In [63]:
# Evaluating LLM alignments against ground truth...
evaluation = metrics.evaluation_report(predicts=llm_alignments, references=dataset['reference'])

print("Evaluation Report:")
print(json.dumps(evaluation, indent=4))

Evaluation Report:
{
    "intersection": 10,
    "precision": 0.700280112044818,
    "recall": 66.66666666666666,
    "f-score": 1.3860013860013862,
    "predictions-len": 1428,
    "reference-len": 15
}


### Additional LLM Models Available

OntoAligner supports various LLM architectures:

```python
# Encoder-Decoder Models
from ontoaligner.aligner import FlanT5LEncoderDecoderLM
model = FlanT5LEncoderDecoderLM(device='cuda')
model.load(path="google/flan-t5-base")

# Decoder-Only Models (like Qwen in this tutorial)
from ontoaligner.aligner import AutoModelDecoderLLM
model = AutoModelDecoderLLM(device='cuda', max_length=300, max_new_tokens=10)
model.load(path="Qwen/Qwen2.5-0.5B")  # or any HuggingFace model

# OpenAI GPT Models (requires API key)
from ontoaligner.aligner import GPTOpenAILLM
model = GPTOpenAILLM(openai_api_key="your-api-key")
model.load(path="gpt-4")  # Latest OpenAI model
```

### Model Selection Guide

| Model | Size | Speed | Quality | Cost | Best For |
|-------|------|-------|---------|------|----------|
| Qwen2-0.5B | 500M | Very Fast | Good | Free | Prototyping, demos |
| Flan-T5-Base | 250M | Fast | Good | Free | Balanced approach |
| Flan-T5-Large | 770M | Medium | Very Good | Free | Production use |
| Mistral-7B | 7B | Slow | Excellent | Free | High-quality results |
| GPT-4 | Unknown | Slow | State-of-art | üí∞üí∞üí∞ | Critical applications |

### Prompt Engineering Tips

The quality of LLM alignments depends heavily on how you formulate prompts:

1. **Be Specific**: Include entity descriptions or type information if it is avaliable.
2. **Provide Context**: Mention relationships and hierarchies when relevant.
3. **Clear Instructions**: Explicitly state what constitutes a match.
4. **Constraint Outputs**: Ask for specific formats (yes/no, entity IDs, confidence scores)

---

# 3Ô∏è‚É£. RAG-Based Alignment: Hybrid Approach for Optimal Performance


**Retrieval-Augmented Generation (RAG)** combines the strengths of both previous approaches:
1. **Retrieval Phase**: Use fast semantic embeddings to find candidate matches
2. **Augmentation**: Enrich retrieved context with parent/sibling concepts, definitions, etc.
3. **Generation Phase**: Feed the LLM with relevant context, not all possibilities
4. **Decision Making**: Combine retrieval confidence scores with LLM reasoning.


For this tutorial we will use a  custom RAG aligner using the following codes practice:

```python
from ontoaligner.aligner.rag import RAG, AutoModelDecoderRAGLLM
from ontoaligner.aligner import SBERTRetrieval

class CustomRAGAligner(RAG):
    Retrieval = SBERTRetrieval      
    LLM = AutoModelDecoderRAGLLM         
````

However you can also use the existings such as:
```python
from ontoaligner.aligner import LLaMALLMBERTRetrieverRAG
```
OntoAligner supports several advanced RAG approaches for more details: https://ontoaligner.readthedocs.io/aligner/rag.html.

In [110]:
from ontoaligner.aligner.rag import RAG, AutoModelDecoderRAGLLM
from ontoaligner.aligner import SBERTRetrieval

class CustomRAGAligner(RAG):
    Retrieval = SBERTRetrieval
    LLM = AutoModelDecoderRAGLLM

In [129]:
# Define configuration for both components
retriever_config = {
    "device": 'cuda',   # Use GPU for retrieval
    "top_k": 15,        # Number of top candidates to retrieve
    "threshold": 0.0,   # Confidence threshold for retrieval=
}

llm_config = {
    "device": "cuda",                                          # Use GPU for LLM
    "max_length": 300,                                         # Maximum length for LLM input
    "max_new_tokens": 10,                                      # Maximum number of new tokens generated by LLM
    "huggingface_access_token": "....",                        # Access token for Huggingface models
    "device_map": 'balanced',                                  # Distribute LLM model across devices
    "batch_size": 15,                                          # Batch size for processing
    "answer_set": {                                            # Define answer mapping for yes/no responses
        "yes": ["yes", "correct", "true", "positive", "valid"],
        "no": ["no", "incorrect", "false", "negative", "invalid"]
    }
}


# Create the RAG aligner with combined configuration
rag_aligner = CustomRAGAligner(retriever_config=retriever_config, llm_config=llm_config)

# Load both the retrieval model and LLM
rag_aligner.load(llm_path="Qwen/Qwen2.5-0.5B", ir_path="all-MiniLM-L6-v2")

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [120]:
# Generate predictions using RAG
# The aligner automatically:
# 1. Retrieves candidates for each source entity
# 2. Passes candidate context to the LLM
# 3. Combines scores from both components
predicts = rag_aligner.generate(input_data=rag_encoded_ontos)
print("‚úì RAG generation complete")

Batches:   0%|          | 0/4 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

29it [00:00, 6152.18it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29/29 [00:00<00:00, 101531.57it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29/29 [00:17<00:00,  1.67it/s]

‚úì RAG generation complete





In [130]:
from ontoaligner.postprocess import rag_heuristic_postprocessor

# Process RAG predictions with heuristic decision thresholds
# This function balances retrieval and LLM confidence
rag_alignments, hybrid_configs = rag_heuristic_postprocessor(
    predicts=predicts,
)
print(f"\nHybrid alignment configuration: {hybrid_configs}")
print(f"Final alignment count: {len(rag_alignments)}\n")

# Evaluate the RAG results
evaluation = metrics.evaluation_report(predicts=rag_alignments, references=dataset['reference'])

print("Evaluation Report:")
print(json.dumps(evaluation, indent=4))

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29/29 [00:00<00:00, 70187.43it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 435/435 [00:00<00:00, 43059.62it/s]


Hybrid alignment configuration: {'topk-confidence-ratio': 3, 'topk-confidence-score': 1, 'confidence-ratio-th': 0.0, 'ir-score-th': 0.5676249296500765, 'llm-confidence-th': 0.0}
Final alignment count: 20

Evaluation Report:
{
    "intersection": 8,
    "precision": 40.0,
    "recall": 53.333333333333336,
    "f-score": 45.714285714285715,
    "predictions-len": 20,
    "reference-len": 15
}





---

# ‚úÖ Resources

For more information and advanced usage:
- **Full Documentation**: [https://ontoaligner.readthedocs.io/](https://ontoaligner.readthedocs.io/)
- **Research Paper**: [https://doi.org/10.1007/978-3-031-94578-6_10](https://doi.org/10.1007/978-3-031-94578-6_10)
- **GitHub Repository**: [https://github.com/sciknoworg/OntoAligner](https://github.com/sciknoworg/OntoAligner)


---


üìÉ Acknowledgement

OntoAligner is licensed under [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)


```bibtex
@inproceedings{babaei2025ontoaligner,
  title={OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment},
  author={Babaei Giglou, Hamed and D‚ÄôSouza, Jennifer and Karras, Oliver and Auer, S{\"o}ren},
  booktitle={European Semantic Web Conference},
  pages={174--191},
  year={2025},
  organization={Springer}
}
```
