### References
- **Fuzzy F1 technique:** Lo, Andy & Jiang, Albert & Li, Wenda & Jamnik, Mateja. (2024). End-to-End Ontology Learning with Large Language Models. 10.48550/arXiv.2410.23584.
- **WordNet treshold:** George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11): 39–41, 1995.
- **Mini-Bert Model for embeddings:** Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert- networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. URL http://arxiv.org/ abs/1908.10084.


## **What is Fuzzy F1?**

The **Fuzzy F1 metric** is an evaluation metric designed to compare structured data, such as taxonomies or graphs, by considering **semantic similarity** rather than exact matches. This is particularly useful when comparing ontologies, where nodes or relationships may differ in wording but share similar meanings.

Traditional metrics like Literal Precision, Recall, and F1 are overly strict because they require exact matches between nodes or relationships. For example, "AML" (Acute Myeloid Leukemia) and "Acute Myeloid Leukemia" would not be considered a match in literal comparison, even though they represent the same concept. The **Fuzzy F1 metric** addresses this limitation by using semantic similarity instead of strict equality.

---

## **How Does Fuzzy F1 Work?**

The Fuzzy F1 metric evaluates the similarity of two graphs (e.g., a reference taxonomy and a generated taxonomy) based on **nodes** (concepts) and **edges** (relationships). It uses embeddings to compute the semantic similarity between nodes and applies a threshold to determine matches. The metric then computes **precision**, **recall**, and **F1** based on these matches.

---

## **Steps in Fuzzy F1 Calculation**

### **1. Node Similarity Using Embeddings**
- Each node (e.g., "AML" or "Leukemia") is converted into a vector representation using a pretrained language model, such as `all-MiniLM-L6-v2` from the SentenceTransformers library.
- The similarity between two nodes is measured using **cosine similarity** of their embeddings:
  $$ \text{NodeSim}(u, u') = \frac{\vec{u} \cdot \vec{u'}}{\|\vec{u}\| \cdot \|\vec{u'}\|}
  $$
  where $\vec{u}$ represents the embedding of node $u$, and the similarity ranges between -1 (completely dissimilar) and 1 (identical).

### **2. Edge Matching**
- An edge $(u, v)$ in one graph is considered a match to $(u', v')$ in the other graph if:
  $$ \text{NodeSim}(u, u') > t \quad \text{and} \quad \text{NodeSim}(v, v') > t $$
  where $t$ is the cosine similarity threshold, typically set to $t = 0.436$ (derived from WordNet synonyms).

### **3. Fuzzy Precision**
- Measures how many edges in the **generated graph** $E'$ are correctly matched to edges in the **reference graph** $E$:
$$
\text{Fuzzy Precision} = \frac{\displaystyle |\{(u', v') \in E' \mid \exists (u, v) \in E, \text{NodeSim}(u, u') > t \wedge \text{NodeSim}(v, v') > t\}|}{\displaystyle |E'|}
$$


### **4. Fuzzy Recall**
- Measures how many edges in the **reference graph** $E$ are correctly matched to edges in the **generated graph** $E'$:
  $$
  \text{Fuzzy Recall} = \frac{|\{(u, v) \in E \mid \exists (u', v') \in E', \text{NodeSim}(u, u') > t \wedge \text{NodeSim}(v, v') > t\}|}{|E|}
  $$

### **5. Fuzzy F1 Score**
- Combines fuzzy precision and recall into a single score:
  $$
  \text{Fuzzy F1} = \frac{2 \cdot \text{Fuzzy Precision} \cdot \text{Fuzzy Recall}}{\text{Fuzzy Precision} + \text{Fuzzy Recall}}
  $$

---

## **Key Characteristics**

- **Semantic Focus:** Unlike strict metrics, Fuzzy F1 tolerates variations in node labels by comparing their meanings using embeddings.
- **Threshold $t$:** Determines the minimum semantic similarity for a match. For example, a lower $t$ tolerates more dissimilar matches, but risks false positives.
- **Edge-Based Evaluation:** The metric evaluates edges (relationships) rather than just individual nodes, ensuring structural fidelity in the comparison.

---

## **Example**

### **Input Graphs**

- **Reference Graph $E$:**
  - **Nodes:** {“Leukemia”, “Blood Cancer”}
  - **Edge:** \( (“Leukemia”, “Blood Cancer”) \)

- **Generated Graph $E'$:**
  - **Nodes:** {“AML”, “Blood Cancer”}
  - **Edge:** \( (“AML”, “Blood Cancer”) \)

### **Embedding Similarities**

- NodeSim("Leukemia", "AML") = 0.85
- NodeSim("Blood Cancer", "Blood Cancer") = 1.0

### **Matching**

- Edge (“AML”, “Blood Cancer”) matches (“Leukemia”, “Blood Cancer”) because both node similarities exceed \( t = 0.436 \).

### **Metric Calculation**

- **Fuzzy Precision:** $\frac{1}{1} = 1.0$
- **Fuzzy Recall:** $\frac{1}{1} = 1.0$
- **Fuzzy F1:** $\frac{2 \cdot (1.0 \cdot 1.0)}{1.0 + 1.0} = 1.0$



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import sys
sys.path.append('/content/drive/MyDrive/thesis')

In [4]:
!pip install rdflib

Collecting rdflib
  Downloading rdflib-7.1.1-py3-none-any.whl.metadata (11 kB)
Collecting isodate<1.0.0,>=0.7.2 (from rdflib)
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Downloading rdflib-7.1.1-py3-none-any.whl (562 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m562.4/562.4 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading isodate-0.7.2-py3-none-any.whl (22 kB)
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.7.2 rdflib-7.1.1


In [5]:
from rdflib import Graph

def extract_triples(file_path):
    """
    Extracts triples (edges) from an RDF or OWL file.

    Args:
        file_path (str): Path to the RDF or OWL file.

    Returns:
        list: A list of triples in the form (subject, predicate, object).
    """
    graph = Graph()
    graph.parse(file_path, format='xml')  # Parse RDF/OWL
    triples = []
    for s, p, o in graph:
        triples.append((str(s), str(p), str(o)))  # Convert nodes to strings
    return triples


In [6]:
!pip install sentence-transformers



In [12]:
from sentence_transformers import SentenceTransformer
import numpy as np
from tqdm import tqdm

# Load the sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

def compute_embeddings(nodes, batch_size=32):
    """
    Computes embeddings for a list of nodes using a sentence transformer, with progress tracking.

    Args:
        nodes (list): List of node strings.
        batch_size (int): Number of nodes to process in each batch.

    Returns:
        dict: A dictionary mapping nodes to their embeddings.
    """
    embeddings = {}
    for i in tqdm(range(0, len(nodes), batch_size), desc="Computing embeddings"):
        batch_nodes = nodes[i:i + batch_size]  # Get the current batch
        batch_embeddings = model.encode(batch_nodes, convert_to_numpy=True)  # Compute embeddings for the batch
        embeddings.update(dict(zip(batch_nodes, batch_embeddings)))  # Update the dictionary with the batch results
    return embeddings



In [8]:
from scipy.spatial.distance import cosine

def cosine_similarity(vec1, vec2):
    """
    Computes cosine similarity between two vectors.

    Args:
        vec1 (np.array): First embedding vector.
        vec2 (np.array): Second embedding vector.

    Returns:
        float: Cosine similarity.
    """
    return 1 - cosine(vec1, vec2)

def compute_fuzzy_metrics(triples_ref, triples_gen, embeddings, threshold=0.436):
    """
    Computes fuzzy precision, recall, and F1 score.

    Args:
        triples_ref (list): List of reference triples (edges).
        triples_gen (list): List of generated triples (edges).
        embeddings (dict): Embeddings for nodes.
        threshold (float): Cosine similarity threshold.

    Returns:
        dict: Fuzzy precision, recall, and F1 score.
    """
    fuzzy_precision_matches = 0
    for u_prime, _, v_prime in triples_gen:
        if u_prime not in embeddings or v_prime not in embeddings:
            continue
        for u, _, v in triples_ref:
            if u not in embeddings or v not in embeddings:
                continue
            if (cosine_similarity(embeddings[u], embeddings[u_prime]) > threshold and
                cosine_similarity(embeddings[v], embeddings[v_prime]) > threshold):
                fuzzy_precision_matches += 1
                break

    fuzzy_recall_matches = 0
    for u, _, v in triples_ref:
        if u not in embeddings or v not in embeddings:
            continue
        for u_prime, _, v_prime in triples_gen:
            if u_prime not in embeddings or v_prime not in embeddings:
                continue
            if (cosine_similarity(embeddings[u], embeddings[u_prime]) > threshold and
                cosine_similarity(embeddings[v], embeddings[v_prime]) > threshold):
                fuzzy_recall_matches += 1
                break

    fuzzy_precision = fuzzy_precision_matches / len(triples_gen) if triples_gen else 0
    fuzzy_recall = fuzzy_recall_matches / len(triples_ref) if triples_ref else 0
    fuzzy_f1 = (2 * fuzzy_precision * fuzzy_recall) / (fuzzy_precision + fuzzy_recall) if (fuzzy_precision + fuzzy_recall) > 0 else 0

    return {
        "Fuzzy Precision": fuzzy_precision,
        "Fuzzy Recall": fuzzy_recall,
        "Fuzzy F1": fuzzy_f1
    }


In [13]:
# File paths
generated_rdf = "/content/drive/MyDrive/thesis/results/symbolic_taxonomy.rdf"
reference_owl = "/content/drive/MyDrive/thesis/benchmark/tumor_types.owl"

# Step 1: Extract triples
triples_generated = extract_triples(generated_rdf)
triples_reference = extract_triples(reference_owl)

# Step 2: Compute embeddings
nodes = set([node for triple in (triples_generated + triples_reference) for node in triple])
node_embeddings = compute_embeddings(list(nodes))

# Step 3: Compute fuzzy metrics
fuzzy_metrics = compute_fuzzy_metrics(triples_reference, triples_generated, node_embeddings)

# Step 4: Display results
print("Fuzzy Metrics:")
for metric, value in fuzzy_metrics.items():
    print(f"{metric}: {value:.4f}")


Computing embeddings: 100%|██████████| 5753/5753 [1:51:59<00:00,  1.17s/it]


Fuzzy Metrics:
Fuzzy Precision: 0.0898
Fuzzy Recall: 0.0970
Fuzzy F1: 0.0932
