# Final Experiment: Multi-Layered Evaluation of LLMs vs. Baselines for Knowledge Graph Generation

This notebook conducts a comprehensive, multi-layered experiment to evaluate LLM-based knowledge graph generation against two standard heuristic baselines: **Pattern-Matching (SVO)** and **Distant Supervision**. The evaluation is designed to move from a rigid, syntactic comparison to a flexible, semantic one, providing a deep analysis of model capabilities.

### Experimental Workflow:
1.  **Setup & Configuration**: Load libraries and define experimental parameters.
2.  **Prompt Engineering**: Define three distinct, thorough prompts for generation and evaluation.
3.  **Data Preparation**: A one-time, heavy preprocessing step to build the crucial ID-to-Name mapping from the full Freebase dataset, with caching for fast re-runs.
4.  **Batch Execution Loop**: Iterate through the specified number of samples.
    - **Checkpointing**: Check if a sample has been processed. If so, load results from disk. If not, proceed.
    - **Graph Generation**: Generate four raw graphs using the LLM and baseline methods.
    - **Save Checkpoint**: Save the raw generated graphs to disk immediately.
5.  **Final Aggregation & Analysis**: After the loop, aggregate the results from all samples and perform the final, multi-layered evaluation, calculating and displaying all metrics.

## 1. Setup and Dependencies

In [1]:
# Ensure all required libraries are installed.
#!pip install pandas ollama langchain tqdm spacy

In [2]:
# Download the spaCy English model for sentence segmentation and dependency parsing.
#!python -m spacy download en_core_web_lg

In [1]:
import os
import json
import pandas as pd
import ollama
import spacy
from tqdm.auto import tqdm
from pathlib import Path
from IPython.display import display, Markdown
from langchain.text_splitter import RecursiveCharacterTextSplitter
from wikigraphs.data import paired_dataset, io_tools

pd.set_option('display.max_colwidth', 300)


## 2. Configuration

In [4]:
# Set to an integer to run on a subset, or None to run on the entire dataset.
SAMPLES_TO_RUN = 100

LLM_MODEL = "llama3.1:8b"
WIKIGRAPHS_DATA_DIR = "data/wikigraphs/"
CHECKPOINT_DIR = "checkpoints"
Path(CHECKPOINT_DIR).mkdir(exist_ok=True)

## 3. Prompt Engineering: Thorough Prompts for Generation and Evaluation

In [5]:
DEFAULT_SYS_PROMPT = """You are a knowledge graph maker who extracts terms and their relations from a given context. 
You are provided with a context chunk (delimited by ```) Your task is to extract the ontology 
of terms mentioned in the given context. These terms should represent the key concepts as per the context. \n
Thought 1: While traversing through each sentence, Think about the key terms mentioned in it.\n
\tTerms may include object, entity, location, organization, person, \n
\tcondition, acronym, documents, service, concept, etc.\n
\tTerms should be as atomistic as possible\n\n
Thought 2: Think about how these terms can have one on one relation with other terms.\n
\tTerms that are mentioned in the same sentence or the same paragraph are typically related to each other.\n
\tTerms can be related to many other terms\n\n
Thought 3: Find out the relation between each such related pair of terms. \n\n
Format your output as a list of json. Each element of the list contains a pair of terms
and the relation between them, like the follwing: \n
[\n
   {\n
       "node_1": "A concept from extracted ontology",\n
       "node_2": "A related concept from extracted ontology",\n
       "edge": "relationship between the two concepts, node_1 and node_2 in one or two sentences"\n
   }, {...}\n
]\n
Do not add any other comment before or after the json. Respond ONLY with a well formed json that can be directly read by a program."""

USER_SUPPLIED_SYS_PROMPT = """You are a knowledge graph maker who extracts terms and their relations from a given context. 
You are provided with a context chunk (delimited by ```). Your task is to extract the ontology 
of terms mentioned in the given context and the relationships between them.\n\nThought 1: First, read through the text to identify the core entities. These are the main people, organizations, locations, and concepts being discussed.\n\nThought 2: Pay close attention to coreferences. If an entity is mentioned multiple times with different names (e.g., 'Valkyria Chronicles III', 'the game', 'it'), I must identify the most complete and descriptive name (e.g., 'Valkyria Chronicles III') and use it consistently for all triplets involving that entity.\n\nThought 3: I must also resolve pronouns. If the text says 'it was developed by Sega', the pronoun 'it' must be resolved to the specific entity it refers to from the preceding text. The final triplet should not contain pronouns like 'he', 'she', or 'it'.\n\nThought 4: For each sentence, I will extract relationships as `(node_1, edge, node_2)` triplets. The `edge` should be a concise, verb-oriented phrase describing the relationship.\n\nFormat your output as a list of json. Each element of the list contains a pair of terms
and the relation between them, like the follwing: \n
[\n
   {\n
       "node_1": "Canonical Entity Name",\n
       "node_2": "Another Canonical Entity Name",\n
       "edge": "A concise relationship phrase"\n
   }, {...}\n
]\n
Respond ONLY with a well-formed JSON list. Do not include any introductory text, comments, or explanations in your response."""

LLM_JUDGE_SYS_PROMPT_CATEGORICAL = """You are an expert evaluator for knowledge graph relations. Your task is to determine if two items are semantically equivalent.
You will be given Item A and Item B.

CRITERIA:
- 'High Confidence': They refer to the exact same concept without ambiguity (e.g., 'Sega' and 'Sega Wow'; 'developed by' and 'developer').
- 'Plausible': They are closely related and could be considered equivalent in a broader context, but are not identical (e.g., 'game' and 'tactical role-playing game'; 'is part of' and 'member of').
- 'No Match': The items refer to different concepts.

Your response MUST be a single phrase from the list: ['High Confidence', 'Plausible', 'No Match']. Do not provide any other text."""

## 4. Helper and Generation Functions

In [6]:
def load_wikigraphs_data(data_root, subset='train', version='max256'):
    print("Loading WikiGraphs dataset...")
    paired_dataset.DATA_ROOT = data_root
    dataset = paired_dataset.ParsedDataset(
        subset=subset, shuffle_data=False, data_dir=None, version=version
    )
    return list(dataset)

def get_ground_truth_graph(pair):
    g = pair.graph
    df = pd.DataFrame(g.edges(), columns=["src", "tgt", "edge"])
    df["subject"] = df["src"].apply(lambda node_id: g.nodes()[node_id])
    df["object"] = df["tgt"].apply(lambda node_id: g.nodes()[node_id])
    df = df[["subject", "edge", "object"]]
    df.rename(columns={"edge": "predicate"}, inplace=True)
    return df.drop_duplicates().reset_index(drop=True)

def fix_prompt_output(text):
    starting_characters = ('"', '{', '}', '[', ']')
    lines = text.splitlines()
    filtered_lines = [line for line in lines if any(line.lstrip().startswith(char) for char in starting_characters)]
    return "\n".join(filtered_lines)

def generate_graph_from_text(text, system_prompt, model=LLM_MODEL):
    splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150, length_function=len)
    pages = splitter.split_text(text)
    all_triplets = []
    print(f"Processing text in {len(pages)} chunks...")
    for page in tqdm(pages, desc="Generating graph with LLM", leave=False):
        if len(page.strip()) < 50: continue
        user_prompt = f"context: ```{page}``` \n\n output: "
        try:
            response_dict = ollama.generate(model=model, system=system_prompt, prompt=user_prompt)
            response_text = response_dict["response"]
            cleaned_response = fix_prompt_output(response_text)
            triplets = json.loads(cleaned_response)
            if isinstance(triplets, list) and all(isinstance(i, dict) for i in triplets):
                all_triplets.extend(triplets)
        except Exception:
            continue
    if not all_triplets: return pd.DataFrame(columns=['node_1', 'node_2', 'edge'])
    valid_triplets = [t for t in all_triplets if all(k in t for k in ['node_1', 'node_2', 'edge'])]
    df = pd.DataFrame(valid_triplets)
    return df.drop_duplicates().reset_index(drop=True)

def generate_graph_with_distant_supervision(text, truth_df, nlp):
    if nlp is None: return pd.DataFrame(columns=['subject', 'object', 'predicate'])
    print("Generating graph using Naive Distant Supervision baseline...")
    doc = nlp(text)
    discovered_triplets = []
    for sentence in tqdm(doc.sents, desc="Scanning sentences for DS", leave=False):
        sentence_text = sentence.text.lower()
        for _, row in truth_df.iterrows():
            subject = str(row['subject']).lower().strip('"')
            obj = str(row['object']).lower().strip('"')
            if len(subject) > 2 and len(obj) > 2 and subject in sentence_text and obj in sentence_text:
                discovered_triplets.append({"subject": row['subject'], "object": row['object'], "predicate": row['predicate']})
    if not discovered_triplets: return pd.DataFrame(columns=['subject', 'object', 'predicate'])
    return pd.DataFrame(discovered_triplets).drop_duplicates().reset_index(drop=True)

def generate_graph_with_pattern_matching(text, nlp):
    if nlp is None: return pd.DataFrame(columns=['node_1', 'node_2', 'edge'])
    print("Generating raw graph using Pattern-Matching (spaCy SVO)... ")
    doc = nlp(text)
    discovered_triplets = []
    for token in tqdm(doc, desc="Parsing dependencies for SVO", leave=False):
        if token.pos_ == 'VERB':
            subjects = [child.text for child in token.children if child.dep_ == 'nsubj']
            objects = [child.text for child in token.children if child.dep_ == 'dobj']
            if subjects and objects:
                for subj in subjects:
                    for obj in objects:
                        discovered_triplets.append({"node_1": subj, "node_2": obj, "edge": token.lemma_})
    if not discovered_triplets: return pd.DataFrame(columns=['node_1', 'node_2', 'edge'])
    return pd.DataFrame(discovered_triplets).drop_duplicates().reset_index(drop=True)

def create_human_readable_ground_truth(truth_df, id_to_name_map):
    readable_df = truth_df.copy()
    readable_df['subject'] = readable_df['subject'].apply(lambda x: id_to_name_map.get(x, x))
    readable_df['object'] = readable_df['object'].apply(lambda x: id_to_name_map.get(x, x))
    return readable_df

class VectorDatabase:
    """
    A robust wrapper for ChromaDB that correctly handles both persistent (on-disk)
    and ephemeral (in-memory) clients.
    """
    def __init__(self, collection_name, texts, path=None, similarity="cosine"):
        import chromadb
        
        # --- BUG FIX: Explicitly handle persistent vs. in-memory clients ---
        if path:
            # For persistent storage
            self.client = chromadb.PersistentClient(path=path)
        else:
            # For ephemeral, in-memory storage
            self.client = chromadb.Client()

        self.collection_name = collection_name
        
        # This will now always receive a valid name
        self.collection = self.client.get_or_create_collection(
            name=self.collection_name,
            metadata={"hnsw:space": similarity}
        )
        
        # Populate the collection if it's empty
        if self.collection.count() == 0 and texts:
            ids = [f"id{i}" for i in range(len(texts))]
            if ids: # Check if there are any texts to add
                self.collection.add(documents=texts, ids=ids)

    def query(self, query_text, n_results):
        if self.collection.count() == 0: return None
        # Ensure query text is a string
        results = self.collection.query(query_texts=[str(query_text)], n_results=n_results)
        # Safely handle the output
        return results['documents'][0] if results and results.get('documents') and results['documents'][0] else None

def standardize_graph_entities(raw_graph_df, vector_db, truth_entities):
    if raw_graph_df.empty: return raw_graph_df
    generated_entities = pd.concat([raw_graph_df['node_1'], raw_graph_df['node_2']]).unique()
    name_mapping = {}
    for entity in tqdm(generated_entities, desc="Linking entities", leave=False):
        best_match_list = vector_db.query(query_text=str(entity), n_results=1)
        if best_match_list and isinstance(best_match_list, list):
            name_mapping[entity] = best_match_list[0]
        else:
            name_mapping[entity] = entity
    std_graph_df = raw_graph_df.copy()
    std_graph_df['node_1'] = std_graph_df['node_1'].map(name_mapping)
    std_graph_df['node_2'] = std_graph_df['node_2'].map(name_mapping)
    return std_graph_df.drop_duplicates().reset_index(drop=True)

def standardize_graph_predicates(entity_std_graph_df, vector_db, truth_predicates_vocab):
    if entity_std_graph_df.empty: return entity_std_graph_df
    generated_edges = entity_std_graph_df['edge'].unique()
    predicate_mapping = {}
    for edge in tqdm(generated_edges, desc="Linking predicates", leave=False):
        best_match_list = vector_db.query(query_text=str(edge), n_results=1)
        if best_match_list and isinstance(best_match_list, list):
            predicate_mapping[edge] = best_match_list[0]
        else:
            predicate_mapping[edge] = None
    fully_std_graph_df = entity_std_graph_df.copy()
    fully_std_graph_df['edge'] = fully_std_graph_df['edge'].map(predicate_mapping)
    fully_std_graph_df.dropna(inplace=True)
    return fully_std_graph_df.drop_duplicates().reset_index(drop=True)

# --- Evaluation Functions --- 
llm_judge_cache = {}

def ask_llm_judge_categorical(item_a, item_b):
    cache_key = tuple(sorted((str(item_a).lower(), str(item_b).lower())))
    if cache_key in llm_judge_cache: return llm_judge_cache[cache_key]
    prompt = f"Item A: \"{item_a}\"\nItem B: \"{item_b}\""
    try:
        response = ollama.generate(model=LLM_MODEL, system=LLM_JUDGE_SYS_PROMPT_CATEGORICAL, prompt=prompt)
        answer = response['response'].strip().lower()
        result = 'High Confidence' if 'high confidence' in answer else 'Plausible' if 'plausible' in answer else 'No Match'
        llm_judge_cache[cache_key] = result
        return result
    except Exception: return 'No Match'

def _calculate_metrics_from_counts(true_positives, generated_count, truth_count):
    if generated_count == 0 and truth_count == 0: return {"Precision": 1.0, "Recall": 1.0, "F1-Score": 1.0}
    precision = true_positives / generated_count if generated_count > 0 else 0
    recall = true_positives / truth_count if truth_count > 0 else 0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    return {"Precision": precision, "Recall": recall, "F1-Score": f1_score}

def run_comprehensive_evaluation(generated_df, truth_df):
    if generated_df.empty: return {
        "Strict Triplets": {"Precision": 0, "Recall": 0, "F1-Score": 0},
        "Resilient Triplets": {"Precision": 0, "Recall": 0, "F1-Score": 0},
        "Semantic (High Confidence)": {"Precision": 0, "Recall": 0, "F1-Score": 0},
        "Semantic (Plausible)": {"Precision": 0, "Recall": 0, "F1-Score": 0}}
    
    gen_norm = generated_df.rename(columns={'node_1': 'subject', 'node_2': 'object', 'edge': 'predicate'})
    gen_norm = gen_norm.apply(lambda x: x.astype(str).str.lower().str.strip())
    truth_norm = truth_df.apply(lambda x: x.astype(str).str.lower().str.strip())
    
    gen_strict_set = set(map(tuple, gen_norm[['subject', 'predicate', 'object']].to_records(index=False)))
    truth_strict_set = set(map(tuple, truth_norm[['subject', 'predicate', 'object']].to_records(index=False)))
    strict_triplet_tp = len(gen_strict_set.intersection(truth_strict_set))
    
    gen_resilient_set = { (tuple(sorted((p.subject, p.object))), p.predicate) for p in gen_norm.itertuples(index=False) }
    truth_resilient_set = { (tuple(sorted((p.subject, p.object))), p.predicate) for p in truth_norm.itertuples(index=False) }
    resilient_triplet_tp = len(gen_resilient_set.intersection(truth_resilient_set))
    
    high_confidence_tp = 0; plausible_tp = 0
    truth_map = {}
    for t in truth_norm.itertuples(index=False): truth_map.setdefault(tuple(sorted((t.subject, t.object))), []).append(t.predicate)
    
    for gen_triplet in tqdm(gen_norm.itertuples(index=False), desc="Running Semantic Evaluation", leave=False):
        gen_key = tuple(sorted((gen_triplet.subject, gen_triplet.object)))
        if gen_key in truth_map:
            best_judgment_for_triplet = 'No Match'
            for truth_pred in truth_map[gen_key]:
                judgment = ask_llm_judge_categorical(gen_triplet.predicate, truth_pred)
                if judgment == 'High Confidence':
                    best_judgment_for_triplet = 'High Confidence'
                    break
                elif judgment == 'Plausible':
                    best_judgment_for_triplet = 'Plausible'
            if best_judgment_for_triplet == 'High Confidence': high_confidence_tp += 1; plausible_tp += 1
            elif best_judgment_for_triplet == 'Plausible': plausible_tp += 1
    
    num_gen, num_truth = len(gen_norm), len(truth_norm)
    return {
        "Strict Triplets": _calculate_metrics_from_counts(strict_triplet_tp, num_gen, num_truth),
        "Resilient Triplets": _calculate_metrics_from_counts(resilient_triplet_tp, num_gen, num_truth),
        "Semantic (High Confidence)": _calculate_metrics_from_counts(high_confidence_tp, num_gen, num_truth),
        "Semantic (Plausible)": _calculate_metrics_from_counts(plausible_tp, num_gen, num_truth)}

## 5. Data Preparation (One-Time Setup)

In [7]:
def load_or_build_id_to_name_map():
    """Loads the ID-to-Name map from a file or builds it if it doesn't exist."""
    id_map_path = Path('id_to_name_map.json')
    if id_map_path.exists():
        print(f"Loading existing ID-to-Name map from {id_map_path}...")
        with open(id_map_path, 'r') as f:
            id_to_name = json.load(f)
        print(f"ID-to-Name dictionary loaded with {len(id_to_name)} entries.")
        return id_to_name
    else:
        print("ID-to-Name map not found. Building from raw Freebase data (this may take over 10 minutes)...\n")
        # This is the heavy preprocessing logic
        freebase_graphs_generator = io_tools.graphs_from_file("data/freebase/max1024/whole.gz")
        
        # We need to convert the io_tools.Graph to a paired_dataset.Graph to use its methods
        freebase_graphs = [paired_dataset.Graph.from_edges(g.edges) for g in list(freebase_graphs_generator)]
        
        df_list = []
        for g in tqdm(freebase_graphs, desc="Processing Freebase graphs"):
            
            # --- THE CRITICAL BUG FIX IS HERE ---
            # Call .edges() as a method to get the data.
            df = pd.DataFrame(g.edges(), columns=["src", "tgt", "edge"])
            
            node_map = {i: node_val for i, node_val in enumerate(g.nodes())}
            df["subject"] = df["src"].map(node_map)
            df["object"] = df["tgt"].map(node_map)
            df_list.append(df[["subject", "edge", "object"]])

        full_freebase_df = pd.concat(df_list, ignore_index=True).drop_duplicates()
        object_name_df = full_freebase_df[full_freebase_df["edge"] == "ns/type.object.name"].drop_duplicates()
        
        id_to_name = {row['subject']: str(row['object']) for _, row in object_name_df.iterrows()}
        
        print(f"Saving ID-to-Name map with {len(id_to_name)} entries to {id_map_path}...")
        with open(id_map_path, 'w') as f:
            json.dump(id_to_name, f)
            
        return id_to_name

id_to_name = load_or_build_id_to_name_map()

Loading existing ID-to-Name map from id_to_name_map.json...
ID-to-Name dictionary loaded with 92128 entries.


## 6. Main Experiment Loop with Checkpointing

In [8]:
def load_all_dependencies():
    """
    Loads all heavy dependencies like spaCy and the Freebase ID-to-Name map.
    This function encapsulates the one-time setup costs.
    """
    print("Loading all heavy dependencies...")
    try:
        nlp = spacy.load("en_core_web_lg")
        print("spaCy model 'en_core_web_lg' loaded successfully.")
    except OSError:
        print("spaCy model not found. Please run: !python -m spacy download en_core_web_lg")
        nlp = None

    # This is the logic from your "load_or_build_id_to_name_map" function
    id_map_path = Path('id_to_name_map.json')
    if id_map_path.exists():
        print(f"Loading existing ID-to-Name map from {id_map_path}...")
        with open(id_map_path, 'r') as f:
            id_to_name = json.load(f)
        print(f"ID-to-Name dictionary loaded with {len(id_to_name)} entries.")
    else:
        print("ID-to-Name map not found. Building from raw Freebase data (this may take over 10 minutes)...\n")
        freebase_graphs_generator = io_tools.graphs_from_file("data/freebase/max1024/whole.gz")
        freebase_graphs = [paired_dataset.Graph.from_edges(g.edges) for g in list(freebase_graphs_generator)]
        df_list = []
        for g in tqdm(freebase_graphs, desc="Processing Freebase graphs"):
            df = pd.DataFrame(g.edges(), columns=["src", "tgt", "edge"])
            node_map = {i: node_val for i, node_val in enumerate(g.nodes())}
            df["subject"] = df["src"].map(node_map)
            df["object"] = df["tgt"].map(node_map)
            df_list.append(df[["subject", "edge", "object"]])
        full_freebase_df = pd.concat(df_list, ignore_index=True).drop_duplicates()
        object_name_df = full_freebase_df[full_freebase_df["edge"] == "ns/type.object.name"].drop_duplicates()
        id_to_name = {row['subject']: str(row['object']) for _, row in object_name_df.iterrows()}
        print(f"Saving ID-to-Name map with {len(id_to_name)} entries to {id_map_path}...")
        with open(id_map_path, 'w') as f:
            json.dump(id_to_name, f)
            
    return nlp, id_to_name

def run_batch_experiment(nlp, id_to_name):
    """
    Main function to run the batch experiment.
    Dependencies (nlp, id_to_name) are passed in as arguments for efficiency.
    """
    
    parsed_pairs = load_wikigraphs_data(WIKIGRAPHS_DATA_DIR)
    num_samples = SAMPLES_TO_RUN if SAMPLES_TO_RUN is not None else len(parsed_pairs)
    
    all_raw_graphs = []
    all_final_graphs = []

    for i in tqdm(range(num_samples), desc="Processing Samples"):
        sample_pair = parsed_pairs[i]
        sample_title = "".join(c for c in sample_pair.title if c.isalnum() or c in (' ', '_')).rstrip()
        checkpoint_path = Path(CHECKPOINT_DIR) / f"sample_{i}_{sample_title}.json"

        if checkpoint_path.exists():
            print(f"\n--- Sample {i}: '{sample_pair.title}' --- Checkpoint found, loading results...")
            with open(checkpoint_path, 'r') as f:
                raw_graphs_dict = json.load(f)
            raw_graph_default_df = pd.DataFrame(raw_graphs_dict['default_llm']['data'], columns=raw_graphs_dict['default_llm']['columns'])
            raw_graph_user_df = pd.DataFrame(raw_graphs_dict['user_llm']['data'], columns=raw_graphs_dict['user_llm']['columns'])
            raw_graph_pattern_based_df = pd.DataFrame(raw_graphs_dict['pattern_based']['data'], columns=raw_graphs_dict['pattern_based']['columns'])
            graph_distant_supervision_df = pd.DataFrame(raw_graphs_dict['distant_supervision']['data'], columns=raw_graphs_dict['distant_supervision']['columns'])
        
        else:
            print(f"\n--- Sample {i}: '{sample_pair.title}' --- No checkpoint, running generation...")
            original_text = sample_pair.text
            sample_text = original_text.replace(' @-@ ', '-')
            ground_truth_df_original = get_ground_truth_graph(sample_pair)

            raw_graph_default_df = generate_graph_from_text(sample_text, DEFAULT_SYS_PROMPT)
            raw_graph_user_df = generate_graph_from_text(sample_text, USER_SUPPLIED_SYS_PROMPT)
            raw_graph_pattern_based_df = generate_graph_with_pattern_matching(sample_text, nlp)
            graph_distant_supervision_df = generate_graph_with_distant_supervision(sample_text, ground_truth_df_original, nlp)

            checkpoint_data = {
                'default_llm': raw_graph_default_df.to_dict(orient='split'),
                'user_llm': raw_graph_user_df.to_dict(orient='split'),
                'pattern_based': raw_graph_pattern_based_df.to_dict(orient='split'),
                'distant_supervision': graph_distant_supervision_df.to_dict(orient='split'),
            }
            with open(checkpoint_path, 'w') as f:
                json.dump(checkpoint_data, f, indent=4)
            print(f"Saved checkpoint to {checkpoint_path}")

        all_raw_graphs.append({
            "LLM (Default Prompt)": raw_graph_default_df,
            "LLM (User-Supplied Prompt)": raw_graph_user_df
        })
        
        ground_truth_df_original = get_ground_truth_graph(sample_pair)
        ground_truth_df_readable = create_human_readable_ground_truth(ground_truth_df_original, id_to_name)
        
        truth_entities_vocab = list(set(pd.concat([ground_truth_df_readable['subject'], ground_truth_df_readable['object']]).unique()))
        truth_predicates_vocab = list(ground_truth_df_readable['predicate'].unique())
        
        name_db = VectorDatabase(collection_name=f"name_db_for_sample_{i}", texts=truth_entities_vocab)
        predicate_db = VectorDatabase(collection_name=f"predicate_db_for_sample_{i}", texts=truth_predicates_vocab)
        
        std_entity_graph_default_df = standardize_graph_entities(raw_graph_default_df, name_db, truth_entities_vocab)
        fully_std_graph_default_df = standardize_graph_predicates(std_entity_graph_default_df, predicate_db, truth_predicates_vocab)
        
        std_entity_graph_user_df = standardize_graph_entities(raw_graph_user_df, name_db, truth_entities_vocab)
        fully_std_graph_user_df = standardize_graph_predicates(std_entity_graph_user_df, predicate_db, truth_predicates_vocab)
        
        std_entity_graph_pattern_based_df = standardize_graph_entities(raw_graph_pattern_based_df, name_db, truth_entities_vocab)
        fully_std_graph_pattern_based_df = standardize_graph_predicates(std_entity_graph_pattern_based_df, predicate_db, truth_predicates_vocab)
        
        human_readable_ds_graph = create_human_readable_ground_truth(graph_distant_supervision_df, id_to_name)

        all_final_graphs.append({
            "LLM (Default Prompt)": fully_std_graph_default_df,
            "LLM (User-Supplied Prompt)": fully_std_graph_user_df,
            "Baseline (SVO + Embeddings)": fully_std_graph_pattern_based_df,
            "Baseline (Distant Supervision)": human_readable_ds_graph,
            "ground_truth": ground_truth_df_readable
        })
        
    return all_raw_graphs, all_final_graphs

In [9]:
nlp_model, id_to_name_map = load_all_dependencies()

Loading all heavy dependencies...
spaCy model 'en_core_web_lg' loaded successfully.
Loading existing ID-to-Name map from id_to_name_map.json...
ID-to-Name dictionary loaded with 92128 entries.


In [10]:
all_raw_graphs, all_final_graphs = run_batch_experiment(nlp=nlp_model, id_to_name=id_to_name)

Loading WikiGraphs dataset...


Processing Samples:   0%|          | 0/100 [00:00<?, ?it/s]


--- Sample 0: 'Valkyria_Chronicles_III' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/261 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/175 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/243 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/181 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/94 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/48 [00:00<?, ?it/s]


--- Sample 1: 'Tower_Building_of_the_Little_Rock_Arsenal' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/242 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/169 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/206 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/146 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/70 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/31 [00:00<?, ?it/s]


--- Sample 2: 'Cicely_Mary_Barker' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/190 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/96 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/207 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/105 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/81 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/45 [00:00<?, ?it/s]


--- Sample 3: 'Plain_maskray' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/97 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/56 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/77 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/51 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/28 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/14 [00:00<?, ?it/s]


--- Sample 4: '2011$201312_Columbus_Blue_Jackets_season' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/169 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/141 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/141 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/149 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/110 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/60 [00:00<?, ?it/s]


--- Sample 5: 'Gregorian_Tower' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/102 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/73 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/86 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/59 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/24 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/11 [00:00<?, ?it/s]


--- Sample 6: 'There$0027s_Got_to_Be_a_Way' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/52 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/27 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/64 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/41 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/36 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]


--- Sample 7: 'Nebraska_Highway_88' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/29 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/21 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/30 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/21 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/13 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/4 [00:00<?, ?it/s]


--- Sample 8: 'USS_Atlanta_$00281861$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/149 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/129 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/130 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/115 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/88 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/48 [00:00<?, ?it/s]


--- Sample 9: 'Jacqueline_Fernandez' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/177 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/125 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/124 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/111 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/79 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/46 [00:00<?, ?it/s]


--- Sample 10: 'John_Cullen' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/111 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/82 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/82 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/81 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/78 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/50 [00:00<?, ?it/s]


--- Sample 11: 'SMS_Erzherzog_Ferdinand_Max' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/65 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/51 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/50 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/41 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/18 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/13 [00:00<?, ?it/s]


--- Sample 12: 'Ancient_Egyptian_deities' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/635 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/477 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/583 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/438 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/268 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/121 [00:00<?, ?it/s]


--- Sample 13: 'South_of_Heaven' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/141 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/113 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/111 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/92 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/89 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/43 [00:00<?, ?it/s]


--- Sample 14: 'SMS_Zr$00EDnyi' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/132 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/78 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/112 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/74 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/41 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/21 [00:00<?, ?it/s]


--- Sample 15: 'Geopyxis_carbonaria' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/89 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/66 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/73 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/53 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/38 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/16 [00:00<?, ?it/s]


--- Sample 16: 'Gold_dollar' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/272 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/177 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/224 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/180 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/117 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/62 [00:00<?, ?it/s]


--- Sample 17: 'Johnson$2013Corey$2013Chaykovsky_reaction' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/126 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/80 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/121 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/75 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/37 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]


--- Sample 18: 'Treaty_of_Ciudad_Ju$00E1rez' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/134 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/110 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/105 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/101 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/69 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/47 [00:00<?, ?it/s]


--- Sample 19: 'The_Feast_of_the_Goat' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/248 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/193 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/214 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/202 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/141 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/79 [00:00<?, ?it/s]


--- Sample 20: 'Charles_Eaton_$0028RAAF_officer$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/223 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/169 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/189 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/172 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/81 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/44 [00:00<?, ?it/s]


--- Sample 21: 'WASP-44' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/50 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/30 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/27 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/31 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/25 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/10 [00:00<?, ?it/s]


--- Sample 22: 'Zagreb_Synagogue' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/130 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/88 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/116 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/85 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/55 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/25 [00:00<?, ?it/s]


--- Sample 23: '1806_Great_Coastal_hurricane' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/78 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/57 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/80 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/51 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/33 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/20 [00:00<?, ?it/s]


--- Sample 24: 'Trinsey_v$002E_Pennsylvania' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/71 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/53 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/55 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/49 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/26 [00:00<?, ?it/s]


--- Sample 25: 'SMS_Markgraf' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/232 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/192 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/216 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/185 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/106 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/57 [00:00<?, ?it/s]


--- Sample 26: 'Coldrum_Long_Barrow' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/438 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/298 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/440 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/301 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/149 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/72 [00:00<?, ?it/s]


--- Sample 27: 'Soviet_cruiser_Krasnyi_Kavkaz' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/71 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/61 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/58 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/33 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]


--- Sample 28: 'Rhode_Island_Route_4' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/75 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/74 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/85 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/78 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/46 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/23 [00:00<?, ?it/s]


--- Sample 29: 'West_End_Girls' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/98 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/71 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/124 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/93 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/50 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/27 [00:00<?, ?it/s]


--- Sample 30: 'Wrapped_in_Red' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/254 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/216 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/218 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/197 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/127 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/73 [00:00<?, ?it/s]


--- Sample 31: 'Christmas_1994_nor$0027easter' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/159 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/79 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/163 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/90 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/60 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/32 [00:00<?, ?it/s]


--- Sample 32: 'Sholay' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/352 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/248 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/306 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/255 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/171 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/90 [00:00<?, ?it/s]


--- Sample 33: 'Adam_Stansfield' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/96 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/81 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/86 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/76 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/64 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/28 [00:00<?, ?it/s]


--- Sample 34: 'Saprang_Kalayanamitr' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/227 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/163 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/212 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/176 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/123 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/77 [00:00<?, ?it/s]


--- Sample 35: 'Grammy_Award_for_Best_Concept_Music_Video' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/54 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/26 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/54 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/28 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/20 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/9 [00:00<?, ?it/s]


--- Sample 36: 'Hadji_Ali' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/122 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/96 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/105 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/90 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/43 [00:00<?, ?it/s]


--- Sample 37: 'Battle_of_Tellicherry' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/53 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/61 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/52 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/28 [00:00<?, ?it/s]


--- Sample 38: 'Loose_$0028Nelly_Furtado_album$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/267 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/199 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/200 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/192 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/127 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/70 [00:00<?, ?it/s]


--- Sample 39: 'Antimony' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/294 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/201 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/339 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/205 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/87 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/30 [00:00<?, ?it/s]


--- Sample 40: 'Mortimer_Wheeler' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/721 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/579 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/599 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/564 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/270 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/158 [00:00<?, ?it/s]


--- Sample 41: 'Species_of_Allosaurus' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/101 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/85 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/122 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/81 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/31 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/14 [00:00<?, ?it/s]


--- Sample 42: 'Astraeus_hygrometricus' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/209 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/129 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/207 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/129 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/31 [00:00<?, ?it/s]


--- Sample 43: 'Paul_Thomas_Anderson' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/168 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/140 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/182 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/138 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/79 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/40 [00:00<?, ?it/s]


--- Sample 44: 'Joe_Nathan' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/183 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/164 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/128 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/131 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/80 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/48 [00:00<?, ?it/s]


--- Sample 45: 'Art_Ross' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/184 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/185 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/164 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/172 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/99 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/69 [00:00<?, ?it/s]


--- Sample 46: 'Saint_Leonard_Catholic_Church_$0028Madison$002C_Nebraska$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/237 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/149 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/241 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/153 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/87 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/36 [00:00<?, ?it/s]


--- Sample 47: 'Portuguese_ironclad_Vasco_da_Gama' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/63 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/45 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/52 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/43 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/19 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/9 [00:00<?, ?it/s]


--- Sample 48: 'Livin$0027_the_Dream' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/96 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/56 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/85 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/56 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/70 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/29 [00:00<?, ?it/s]


--- Sample 49: 'Tonin$00E1' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/157 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/122 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/175 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/129 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/71 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]


--- Sample 50: 'Corn_crake' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/374 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/256 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/342 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/228 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/97 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/41 [00:00<?, ?it/s]


--- Sample 51: 'Acute_myeloid_leukemia' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/252 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/152 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/272 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/172 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/93 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]


--- Sample 52: 'Shaoguan_incident' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/97 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/70 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/103 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/74 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/62 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]


--- Sample 53: 'French_cruiser_Sully' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/36 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/26 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/37 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/21 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/20 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/6 [00:00<?, ?it/s]


--- Sample 54: 'Norman_Finkelstein' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/481 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/352 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/386 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/307 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/257 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/130 [00:00<?, ?it/s]


--- Sample 55: 'Mutinus_elegans' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/75 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/50 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/65 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/39 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/24 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/12 [00:00<?, ?it/s]


--- Sample 56: 'The_Boat_Race_1900' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/31 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/25 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/32 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/30 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/21 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/10 [00:00<?, ?it/s]


--- Sample 57: 'Yamaha_NS-10' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/92 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/68 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/89 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/70 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/53 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/28 [00:00<?, ?it/s]


--- Sample 58: 'Utah_State_Route_61' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/31 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/23 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/21 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/16 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/10 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/6 [00:00<?, ?it/s]


--- Sample 59: 'Edward_Creutz' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/152 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/111 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/129 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/116 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/84 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/46 [00:00<?, ?it/s]


--- Sample 60: 'Leanne_Del_Toso' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/46 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/40 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/47 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/47 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/33 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/14 [00:00<?, ?it/s]


--- Sample 61: 'Vitamin_D_$0028Glee$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/81 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/40 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/83 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/61 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/65 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]


--- Sample 62: 'Fern_Hobbs' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/97 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/66 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/60 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/46 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/30 [00:00<?, ?it/s]


--- Sample 63: 'Jessie_Stephen' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/74 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/61 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/46 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/14 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/9 [00:00<?, ?it/s]


--- Sample 64: 'Of_Human_Feelings' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/201 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/131 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/161 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/129 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/106 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/62 [00:00<?, ?it/s]


--- Sample 65: 'Dangerously_in_Love_Tour' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/56 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/40 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/70 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/42 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/32 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]


--- Sample 66: 'Zhou_Tong_$0028archer$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/335 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/298 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/243 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/280 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/181 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/103 [00:00<?, ?it/s]


--- Sample 67: 'Romanian_Land_Forces' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/267 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/159 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/283 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/144 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/100 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]


--- Sample 68: 'Not_Quite_Hollywood$003A_The_Wild$002C_Untold_Story_of_Ozploitation$0021' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/68 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/47 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/71 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/49 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/32 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/17 [00:00<?, ?it/s]


--- Sample 69: 'Papal_conclave$002C_1769' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/266 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/119 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/234 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/110 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/54 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/32 [00:00<?, ?it/s]


--- Sample 70: 'West_Hendford_Cricket_Ground$002C_Yeovil' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/42 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/36 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/36 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/39 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/42 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/19 [00:00<?, ?it/s]


--- Sample 71: 'New_Year$0027s_Eve_$0028Up_All_Night$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/66 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/53 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/39 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/42 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/24 [00:00<?, ?it/s]


--- Sample 72: 'World_War_Z' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/256 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/165 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/219 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/162 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/129 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/65 [00:00<?, ?it/s]


--- Sample 73: 'Sentence_spacing' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/213 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/142 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/223 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/152 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/113 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]


--- Sample 74: 'The_Crab_with_the_Golden_Claws' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/190 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/158 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/156 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/150 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/97 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/66 [00:00<?, ?it/s]


--- Sample 75: 'L$002EA$002EM$002EB$002E' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/99 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/61 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/83 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/49 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/60 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/29 [00:00<?, ?it/s]


--- Sample 76: 'First-move_advantage_in_chess' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/392 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/304 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/342 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/285 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/199 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/93 [00:00<?, ?it/s]


--- Sample 77: 'Frederick_Reines' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/160 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/112 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/133 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/102 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/79 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/45 [00:00<?, ?it/s]


--- Sample 78: 'Lock_Haven$002C_Pennsylvania' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/416 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/274 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/384 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/262 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/204 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/70 [00:00<?, ?it/s]


--- Sample 79: 'Rachel_Green' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/386 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/321 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/312 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/315 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/196 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/114 [00:00<?, ?it/s]


--- Sample 80: 'Krak_des_Chevaliers' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/283 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/237 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/251 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/226 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/130 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/71 [00:00<?, ?it/s]


--- Sample 81: 'The_Importance_of_Being_Earnest' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/385 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/268 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/389 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/256 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/232 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/124 [00:00<?, ?it/s]


--- Sample 82: 'Lloyd_Mathews' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/153 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/121 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/138 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/128 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/58 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]


--- Sample 83: 'Kaimanawa_horse' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/125 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/80 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/123 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/88 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/42 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/20 [00:00<?, ?it/s]


--- Sample 84: 'The_Remix_$0028Lady_Gaga_album$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/99 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/59 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/83 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/61 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/58 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/24 [00:00<?, ?it/s]


--- Sample 85: 'Lost_Horizons_$0028Lemon_Jelly_album$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/91 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/54 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/69 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/49 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/25 [00:00<?, ?it/s]


--- Sample 86: 'Fastra_II' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/58 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/42 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/61 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/52 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/31 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/13 [00:00<?, ?it/s]


--- Sample 87: 'USS_Breese_$0028DD-122$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/98 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/80 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/91 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/79 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/34 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]


--- Sample 88: 'Sandwich_Day' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/66 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/33 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/65 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/37 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/31 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/15 [00:00<?, ?it/s]


--- Sample 89: 'Tiber_Oil_Field' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/22 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/18 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/21 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/19 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/17 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/9 [00:00<?, ?it/s]


--- Sample 90: 'Glorious_First_of_June' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/413 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/360 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/401 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/345 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/178 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/97 [00:00<?, ?it/s]


--- Sample 91: 'New_York_State_Route_368' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/15 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/12 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/14 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/14 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/6 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/3 [00:00<?, ?it/s]


--- Sample 92: 'M-122_$0028Michigan_highway$0029' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/14 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/9 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/10 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/7 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/5 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/3 [00:00<?, ?it/s]


--- Sample 93: 'Tupolev_Tu-12' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/69 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/36 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/56 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/35 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/20 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/11 [00:00<?, ?it/s]


--- Sample 94: 'Civilian_Public_Service' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/323 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/185 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/288 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/184 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/130 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/61 [00:00<?, ?it/s]


--- Sample 95: 'St_Nazaire_Raid' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/435 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/286 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/392 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/281 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/178 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/86 [00:00<?, ?it/s]


--- Sample 96: 'Curtis_Woodhouse' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/127 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/120 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/102 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/120 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/77 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/38 [00:00<?, ?it/s]


--- Sample 97: 'Thom_Darden' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/42 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/39 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/44 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/37 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/23 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/14 [00:00<?, ?it/s]


--- Sample 98: 'Voyage$003A_Inspired_by_Jules_Verne' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/116 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/92 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/105 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/86 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/79 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/44 [00:00<?, ?it/s]


--- Sample 99: 'Old_Baltimore_Pike' --- Checkpoint found, loading results...


Linking entities:   0%|          | 0/38 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/26 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/38 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/27 [00:00<?, ?it/s]

Linking entities:   0%|          | 0/14 [00:00<?, ?it/s]

Linking predicates:   0%|          | 0/5 [00:00<?, ?it/s]

## 7. Final Aggregated Results & Analysis

In [15]:
def compare_generated_graphs(graphs_dict):
    display(Markdown("\\n" + "="*25 + " Quantitative Graph Comparison " + "="*25))
    stats = {}
    for name, df in graphs_dict.items():
        if df.empty: stats[name] = {"Unique Nodes": 0, "Edges (Triplets)": 0}
        else:
            stats[name] = {
                "Unique Nodes": len(pd.concat([df['node_1'], df['node_2']]).unique()),
                "Edges (Triplets)": len(df)}
    summary_df = pd.DataFrame.from_dict(stats, orient='index')
    display(summary_df)
    display(Markdown("="*79))

def compare_coreference_resolution(default_graph_df, user_graph_df, prompt1_name="Default Prompt", prompt2_name="User-Supplied Prompt"):
    display(Markdown("\\n" + "="*25 + " Direct Coreference Resolution Comparison " + "="*25))
    display(Markdown("Compares the two LLM-generated graphs. Better coreference resolution leads to fewer unique entities."))
    default_entities = set(pd.concat([default_graph_df['node_1'], default_graph_df['node_2']]).astype(str).str.lower().str.strip().unique())
    user_entities = set(pd.concat([user_graph_df['node_1'], user_graph_df['node_2']]).astype(str).str.lower().str.strip().unique())
    summary_data = {
        "Metric": ["Total Unique Entities Generated", "Entities Common to Both"],
        prompt1_name: [len(default_entities), len(default_entities.intersection(user_entities))],
        prompt2_name: [len(user_entities), len(user_entities.intersection(default_entities))]}
    summary_df = pd.DataFrame.from_dict(summary_data).set_index("Metric")
    display(summary_df)
    display(Markdown("**Interpretation:**"))
    if len(user_entities) < len(default_entities):
        improvement = (len(default_entities) - len(user_entities)) / len(default_entities) * 100
        print(f"Conclusion: The '{prompt2_name}' prompt shows superior coreference resolution with {improvement:.2f}% fewer unique entities.")
    elif len(default_entities) < len(user_entities):
        degradation = (len(user_entities) - len(default_entities)) / len(user_entities) * 100
        print(f"Conclusion: The '{prompt1_name}' prompt shows superior coreference resolution with {degradation:.2f}% fewer unique entities.")
    else:
        print("Conclusion: Both prompts resulted in the same number of unique entities.")
    display(Markdown("="*82))

llm_judge_cache = {}

def ask_llm_judge_categorical(item_a, item_b):
    """
    Asks the LLM if two items are semantically equivalent, with caching.
    Configured to be deterministic by setting temperature to 0.
    """
    # Normalize and create a canonical cache key
    cache_key = tuple(sorted((str(item_a).lower(), str(item_b).lower())))
    if cache_key in llm_judge_cache:
        return llm_judge_cache[cache_key]

    prompt = f"Item A: \\\"{item_a}\\\"\\nItem B: \\\"{item_b}\\\""
    
    try:
        # --- THE CRITICAL FIX IS HERE ---
        # We add the `options` parameter to force deterministic output.
        response = ollama.generate(
            model=LLM_MODEL,
            system=LLM_JUDGE_SYS_PROMPT_CATEGORICAL,
            prompt=prompt,
            options={
                'temperature': 0.0,
                'top_p': 0.1 # Further reduces randomness by focusing on the most likely tokens
            }
        )
        
        answer = response['response'].strip().lower()
        
        if 'high confidence' in answer:
            result = 'High Confidence'
        elif 'plausible' in answer:
            result = 'Plausible'
        else:
            result = 'No Match'
        
        llm_judge_cache[cache_key] = result
        return result
    except Exception as e:
        print(f"LLM Judge Error: {e}")
        llm_judge_cache[cache_key] = 'No Match'
        return 'No Match'

def _calculate_metrics_from_counts(true_positives, generated_count, truth_count):
    if generated_count == 0 and truth_count == 0: return {"Precision": 1.0, "Recall": 1.0, "F1-Score": 1.0}
    precision = true_positives / generated_count if generated_count > 0 else 0
    recall = true_positives / truth_count if truth_count > 0 else 0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    return {"Precision": precision, "Recall": recall, "F1-Score": f1_score}

def run_comprehensive_evaluation(generated_df, truth_df):
    if generated_df.empty:
        return {"Strict Triplets": {"Precision": 0, "Recall": 0, "F1-Score": 0},
                "Resilient Triplets": {"Precision": 0, "Recall": 0, "F1-Score": 0},
                "Semantic (High Confidence)": {"Precision": 0, "Recall": 0, "F1-Score": 0},
                "Semantic (Plausible)": {"Precision": 0, "Recall": 0, "F1-Score": 0}}
    
    gen_norm = generated_df.rename(columns={'node_1': 'subject', 'node_2': 'object', 'edge': 'predicate'})
    gen_norm = gen_norm.apply(lambda x: x.astype(str).str.lower().str.strip())
    truth_norm = truth_df.apply(lambda x: x.astype(str).str.lower().str.strip())
    num_gen, num_truth = len(gen_norm), len(truth_norm)

    gen_strict_set = set(map(tuple, gen_norm[['subject', 'predicate', 'object']].to_records(index=False)))
    truth_strict_set = set(map(tuple, truth_norm[['subject', 'predicate', 'object']].to_records(index=False)))
    strict_triplet_tp = len(gen_strict_set.intersection(truth_strict_set))

    gen_resilient_set = { (tuple(sorted((p.subject, p.object))), p.predicate) for p in gen_norm.itertuples(index=False) }
    truth_resilient_set = { (tuple(sorted((p.subject, p.object))), p.predicate) for p in truth_norm.itertuples(index=False) }
    resilient_triplet_tp = len(gen_resilient_set.intersection(truth_resilient_set))

    high_confidence_tp = 0; plausible_tp = 0
    unmatched_truth_set = truth_resilient_set.copy()
    for gen_triplet in tqdm(gen_resilient_set, desc="Running Semantic Evaluation", leave=False):
        gen_nodes, gen_pred = gen_triplet
        best_match_found = 'No Match'
        matched_truth_triplet = None
        for truth_triplet in unmatched_truth_set:
            truth_nodes, truth_pred = truth_triplet
            if gen_nodes == truth_nodes:
                judgment = ask_llm_judge_categorical(gen_pred, truth_pred)
                if judgment == 'High Confidence':
                    best_match_found = 'High Confidence'
                    matched_truth_triplet = truth_triplet
                    break
                elif judgment == 'Plausible':
                    best_match_found = 'Plausible'
                    matched_truth_triplet = truth_triplet
        if matched_truth_triplet:
            if best_match_found == 'High Confidence': high_confidence_tp += 1
            plausible_tp += 1
            unmatched_truth_set.remove(matched_truth_triplet)
            
    return {
        "Strict Triplets": _calculate_metrics_from_counts(strict_triplet_tp, num_gen, num_truth),
        "Resilient Triplets": _calculate_metrics_from_counts(resilient_triplet_tp, num_gen, num_truth),
        "Semantic (High Confidence)": _calculate_metrics_from_counts(high_confidence_tp, num_gen, num_truth),
        "Semantic (Plausible)": _calculate_metrics_from_counts(plausible_tp, num_gen, num_truth)}

# ==============================================================================
# --- EXECUTION BLOCK ---
# ==============================================================================
# This block calls the functions defined above to perform the final analysis.

# --- 1. Aggregate Quantitative & Coreference --- 
quant_results = {name: [] for name in all_final_graphs[0].keys() if name != 'ground_truth'}
for sample in all_final_graphs:
    for name, df in sample.items():
        if name == 'ground_truth': continue
        quant_results[name].append(len(df))

avg_triplets = {name: pd.Series(counts).mean() for name, counts in quant_results.items()}

coref_results = {'Default Prompt': [], 'User-Supplied Prompt': []}
for sample in all_raw_graphs:
    if not sample['LLM (Default Prompt)'].empty:
        coref_results['Default Prompt'].append(len(pd.concat([sample['LLM (Default Prompt)']['node_1'], sample['LLM (Default Prompt)']['node_2']]).unique()))
    if not sample['LLM (User-Supplied Prompt)'].empty:
        coref_results['User-Supplied Prompt'].append(len(pd.concat([sample['LLM (User-Supplied Prompt)']['node_1'], sample['LLM (User-Supplied Prompt)']['node_2']]).unique()))

avg_coref = {name: pd.Series(counts).mean() for name, counts in coref_results.items()}

display(Markdown("### 7.1 Aggregated Quantitative and Coreference Analysis"))
print(f"Based on {len(all_final_graphs)} samples:\\n")
print("Average Triplets Generated:", avg_triplets)
print("Average Unique Entities (Coreference):", avg_coref)

# --- 2. Aggregate Performance Metrics --- 
final_metrics = {name: [] for name in all_final_graphs[0].keys() if name != 'ground_truth'}

llm_judge_cache.clear()

for sample in tqdm(all_final_graphs, desc="Aggregating Final Evaluations"):
    gt = sample['ground_truth']
    for name, df in sample.items():
        if name == 'ground_truth': continue
        metrics = run_comprehensive_evaluation(df, gt)
        final_metrics[name].append(metrics)

aggregated_results = {}
for name, results_list in final_metrics.items():
    metric_types = results_list[0].keys()
    temp_agg = {}
    for mtype in metric_types:
        df = pd.DataFrame([res[mtype] for res in results_list])
        for col in df.columns:
            temp_agg[f"{col} ({mtype})"] = df[col].mean()
    aggregated_results[name] = temp_agg

final_df = pd.DataFrame.from_dict(aggregated_results, orient='index')

# --- 3. Format Final DataFrame for Percentage Display ---
# Multiply all performance scores by 100
final_df_percent = final_df * 100

# Use pandas styling to format every column to show two decimal places and a '%' sign
styled_df = final_df_percent.style.format("{:.2f}%")

display(Markdown("### 7.2 Aggregated Performance Metrics vs. Ground Truth"))
display(styled_df)

### 7.1 Aggregated Quantitative and Coreference Analysis

Based on 100 samples:\n
Average Triplets Generated: {'LLM (Default Prompt)': 96.42, 'LLM (User-Supplied Prompt)': 95.34, 'Baseline (SVO + Embeddings)': 47.68, 'Baseline (Distant Supervision)': 0.0}
Average Unique Entities (Coreference): {'Default Prompt': 172.03, 'User-Supplied Prompt': 155.74}


Aggregating Final Evaluations:   0%|          | 0/100 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/174 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/187 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/59 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/159 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/136 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/39 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/116 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/130 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/60 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/53 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/49 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/17 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/27 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/22 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/33 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/45 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/18 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/6 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/55 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/59 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/33 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/126 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/103 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/67 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/78 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/71 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/60 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/27 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/107 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/101 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/56 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/44 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/17 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/46 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/41 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/23 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/16 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/9 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/13 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/13 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/187 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/194 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/107 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/145 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/125 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/56 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/18 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/18 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/73 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/69 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/13 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/31 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/36 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/68 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/73 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/33 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/88 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/94 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/35 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/39 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/33 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/19 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/69 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/90 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/40 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/212 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/200 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/104 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/11 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/17 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/271 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/282 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/134 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/65 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/60 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/45 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/143 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/143 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/91 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/28 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/10 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/76 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/45 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/54 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/54 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/31 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/208 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/204 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/116 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/111 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/122 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/43 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/517 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/441 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/247 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/24 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/113 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/123 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/38 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/157 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/184 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/69 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/157 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/117 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/88 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/140 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/140 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/105 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/147 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/168 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/46 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/19 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/10 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/59 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/64 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/48 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/40 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/47 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/174 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/163 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/57 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/159 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/166 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/64 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/57 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/64 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/23 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/9 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/369 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/307 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/215 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/40 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/40 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/14 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/14 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/16 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/19 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/13 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/6 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/81 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/72 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/59 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/39 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/36 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/22 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/56 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/68 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/44 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/56 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/35 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/47 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/41 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/9 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/125 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/124 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/64 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/39 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/39 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/127 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/115 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/102 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/98 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/96 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/43 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/46 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/52 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/23 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/9 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/27 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/24 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/16 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/51 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/46 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/31 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/167 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/160 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/80 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/19 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/22 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/13 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/153 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/143 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/74 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/31 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/23 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/34 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/35 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/107 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/99 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/56 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/294 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/300 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/128 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/222 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/215 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/168 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/173 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/157 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/65 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/279 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/294 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/185 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/123 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/116 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/43 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/33 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/70 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/75 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/36 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/61 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/30 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/11 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/12 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/10 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/41 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/45 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/53 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/52 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/20 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/11 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/11 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/7 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/338 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/363 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/137 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/7 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/10 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/3 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/7 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/5 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/3 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/26 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/27 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/9 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/16 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/16 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/10 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/279 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/274 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/124 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/95 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/84 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/51 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/27 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/28 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/17 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/88 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/88 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/55 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/19 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/17 [00:00<?, ?it/s]

Running Semantic Evaluation:   0%|          | 0/6 [00:00<?, ?it/s]

### 7.2 Aggregated Performance Metrics vs. Ground Truth

Unnamed: 0,Precision (Strict Triplets),Recall (Strict Triplets),F1-Score (Strict Triplets),Precision (Resilient Triplets),Recall (Resilient Triplets),F1-Score (Resilient Triplets),Precision (Semantic (High Confidence)),Recall (Semantic (High Confidence)),F1-Score (Semantic (High Confidence)),Precision (Semantic (Plausible)),Recall (Semantic (Plausible)),F1-Score (Semantic (Plausible))
LLM (Default Prompt),3.57%,16.94%,5.50%,4.52%,20.10%,6.86%,14.96%,46.65%,20.28%,15.04%,46.81%,20.37%
LLM (User-Supplied Prompt),3.82%,17.96%,5.87%,4.61%,20.36%,6.91%,14.35%,44.31%,19.32%,14.44%,44.46%,19.42%
Baseline (SVO + Embeddings),2.63%,7.78%,3.41%,3.59%,10.20%,4.59%,15.68%,28.16%,17.35%,15.74%,28.21%,17.41%
Baseline (Distant Supervision),0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%
