TheatreNet: Knowledge Graph Construction and Reconciliation Pipeline
Integrating Theatrical Heritage through Semantic Modeling and AI

This notebook implements the technical workflow for TheatreNet, a project developed in collaboration with Promemoria Group. The goal is to transform heterogeneous, fragmented datasets from the Teatro Regio di Parma and the Fondazione I Teatri di Reggio Emilia into a unified, event-centric Knowledge Graph.

This pipeline covers data ingestion, semantic enrichment via vector embeddings, and multi-layered entity resolution to create "Golden Records."

In [1]:
pip install neo4j python-dotenv sentence-transformers


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m26.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3.10 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Data Ingestion Pipeline: The Teatro Regio di Parma Corpus

This script executes the core ingestion phase for the Teatro Regio di Parma dataset, transforming static CSV records into the foundational nodes of the TheatreNet graph.

The process is organized into the following functional modules:

- System Initialization: Performs a database cleanup and establishes Uniqueness Constraints to ensure referential integrity across the graph.

- Entity Ingestion (People & Works): Creates Person and Work nodes, mapping biographical metadata and primary creative relationships such as composers and librettists.

- Structural Layers (Seasons & Productions): Models the institutional context and the "Performance Plan" (FRBRoo F25), connecting creative teams (directors, set designers) to their specific productions.

- Event Mapping (Performances): Captures the concrete performance events (FRBRoo F31), linking singers to characters and conductors to specific dates and venues.

Key feature: creation of independent ID nodes for every entity. This strategy ensures strict data provenance and prepares the environment for the subsequent entity resolution and reconciliation phases

CODE OF THE FILE property_graph/1_cypher_regio.py

In [None]:
from neo4j import GraphDatabase
from dotenv import load_dotenv
import os, sys, traceback

dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

user = os.getenv("ID")
password = os.getenv("SECRET_KEY")
uri_db = "bolt://archiuidev.promemoriagroup.com:7687"

FILE_REGIO_OPERE = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/regio/regio_opere_pulito_con_anno.csv'
FILE_REGIO_PERSONE = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/regio/regio_persone.csv' 
FILE_REGIO_STAGIONI = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/regio/regio_stagioni.csv'
FILE_REGIO_PRODUZIONI = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/regio/regio_produzioni.csv'
FILE_REGIO_RECITE = 'https://media.githubusercontent.com/media/elena2notti/theatreNet/main/regio/recite-regio-luoghi-qid2.csv'

def execute_cypher_script(tx, script):
    result = tx.run(script)
    try:
        summary = result.single().value()
    except AttributeError:
        summary = "Nessun risultato restituito."
    return summary

def run_import_step(driver, command, step_name):
    with driver.session() as session:
        print(f"\n--- Inizio: {step_name} ---")
        try:
            result_summary = session.execute_write(execute_cypher_script, command)
            print(f"SUCCESSO: {step_name} completato.")
            print(f"Risultati: {result_summary}")
        except Exception as e:
            print(f"ERRORE CRITICO in {step_name}: {e}")
            print(">>> Il processo continua con lo step successivo...")

def clean_db(driver):
    print("\n--- 0. PULIZIA DATABASE (DETACH DELETE e rimozione vincoli) ---")
    with driver.session() as session:
        try:
            session.run("CALL apoc.schema.assert({}, {})").consume()
            print("Vincoli e indici rimossi.")
        except Exception:
            try:
                result = session.run("SHOW CONSTRAINTS")
                for record in result:
                    session.run(f"DROP CONSTRAINT {record['name']}").consume()
            except: pass
        
        try:
            session.run("MATCH (n) DETACH DELETE n").consume()
            print("Database pulito con successo.")
        except Exception as e:
            print(f"Errore pulizia: {e}")

def create_constraints(driver):
    print("\n--- 0.1 CREAZIONE VINCOLI DI UNICITÀ REGIO ---")
    with driver.session() as session:
        constraints = [
            "CREATE CONSTRAINT person_id_regio_unique IF NOT EXISTS FOR (p:Person) REQUIRE p.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT id_code_unique IF NOT EXISTS FOR (i:ID) REQUIRE i.code IS UNIQUE",   
            "CREATE CONSTRAINT work_id_regio_unique IF NOT EXISTS FOR (o:Work) REQUIRE o.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT season_id_regio_unique IF NOT EXISTS FOR (s:Season) REQUIRE s.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT performance_id_regio_unique IF NOT EXISTS FOR (r:Performance) REQUIRE r.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT production_id_regio_unique IF NOT EXISTS FOR (p:Production) REQUIRE p.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT organizer_id_regio_unique IF NOT EXISTS FOR (o:Organizer) REQUIRE o.internal_id_regio IS UNIQUE",
            "CREATE CONSTRAINT ensemble_id_regio_unique IF NOT EXISTS FOR (e:Ensemble) REQUIRE e.internal_id_regio IS UNIQUE"
        ]
        
        for constraint in constraints:
            try:
                session.run(constraint).consume()
            except Exception:
                pass
        print("Vincoli creati.")


# 1. Importazione Persone
cypher_import_persone = f"""
LOAD CSV WITH HEADERS FROM '{FILE_REGIO_PERSONE}' AS row
FIELDTERMINATOR ','
WITH row 
WHERE row.person_id IS NOT NULL AND TRIM(row.person_id) <> ''
MERGE (p:Person {{internal_id_regio: row.person_id}})
ON CREATE SET 
    p.name = row.full_name,
    p.full_name = row.full_name,
    p.wikidata_qid = row.wikidata_id,
    p.wikidata_uri = row.wikidata_uri,
    p.birth_date = row.birth_date,
    p.birth_place = row.birth_place,
    p.death_date = row.death_date,
    p.death_place = row.death_place,
    p.occupation = row.occupation,
    p.viaf = row.viaf,
    p.source = 'Regio'

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'regio_' + row.person_id}})
ON CREATE SET id_node.source = 'Regio'
MERGE (id_node)-[:IS_ID_OF]->(p)

RETURN count(p) AS total_people;
"""
# 2. Importazione Opere
cypher_import_opere_complete = f"""
LOAD CSV WITH HEADERS FROM '{FILE_REGIO_OPERE}' AS row
FIELDTERMINATOR ','
WITH row
WHERE row.compositions_id IS NOT NULL AND TRIM(row.compositions_id) <> ''

// --- A. CREAZIONE OPERA (Work) ---
MERGE (o:Work {{internal_id_regio: row.compositions_id}})
ON CREATE SET 
    o.title = row.dcTitle,
    o.year = CASE WHEN row.Anno IS NOT NULL AND row.Anno <> '' THEN toInteger(row.Anno) ELSE NULL END,
    o.wikidata_qid = row.wikidata_entity_id,
    o.wikidata_uri = row.composizione_uri,
    o.from_date = row.from,
    o.to_date = row.to,
    o.source = 'Regio'

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'regio_' + row.compositions_id}})
ON CREATE SET id_node.source = 'Regio'
MERGE (id_node)-[:IS_ID_OF]->(o)

// --- B. COLLEGA COMPOSITORE (Bidirezionale) ---
WITH o, row
WHERE row.autore_musica IS NOT NULL AND TRIM(row.autore_musica) <> ''
WITH o, row, SPLIT(SPLIT(row.autore_musica, '(')[1], ')')[0] AS comp_id
MATCH (comp:Person {{internal_id_regio: comp_id}})
// Relazione 1: L'Opera HA il Compositore
MERGE (o)-[:HAS_COMPOSER]->(comp)
// Relazione 2: La Persona È COMPOSITORE dell'Opera
MERGE (comp)-[:IS_COMPOSER]->(o)

// --- C. COLLEGA LIBRETTISTA (Bidirezionale) ---
WITH o, row
WHERE row.autore_testo IS NOT NULL AND TRIM(row.autore_testo) <> ''
WITH o, row, SPLIT(SPLIT(row.autore_testo, '(')[1], ')')[0] AS lib_id
MATCH (lib:Person {{internal_id_regio: lib_id}})
// Relazione 1: L'Opera HA il Librettista
MERGE (o)-[:HAS_LIBRETTIST]->(lib)
// Relazione 2: La Persona È LIBRETTISTA dell'Opera
MERGE (lib)-[:IS_LIBRETTIST]->(o)

// --- D. COLLEGA AUTORE LETTERARIO (Bidirezionale) ---
WITH o, row
WHERE row.literary_author_id IS NOT NULL AND TRIM(row.literary_author_id) <> ''
MATCH (lit:Person {{internal_id_regio: row.literary_author_id}})
// Relazione 1: L'Opera HA l'Autore
MERGE (o)-[:HAS_LITERARY_AUTHOR]->(lit)
// Relazione 2: La Persona È AUTORE dell'Opera
MERGE (lit)-[:IS_LITERARY_AUTHOR]->(o)

// --- E. CREAZIONE PERSONAGGI (HAS_CHARACTER) ---
WITH o, row
WHERE row.character_wikidata_id IS NOT NULL AND TRIM(row.character_wikidata_id) <> ''
MERGE (c:Character {{wikidata_qid: row.character_wikidata_id}})
ON CREATE SET
    c.name = row.character_name,
    c.voice_type = row.voice_type,
    c.gender = row.character_gender,
    c.source = 'Regio'
MERGE (o)-[:HAS_CHARACTER]->(c)

RETURN count(o) AS total_works;
"""

# 3. Importazione Stagioni
cypher_import_stagioni = f"""
LOAD CSV WITH HEADERS FROM '{FILE_REGIO_STAGIONI}' AS row
FIELDTERMINATOR ','
WITH row
WHERE row.season_id IS NOT NULL AND TRIM(row.season_id) <> ''

// --- A. CREAZIONE STAGIONE ---
MERGE (s:Season {{internal_id_regio: row.season_id}})
ON CREATE SET 
    s.title = row.season_title,
    s.type = row.season_type,
    s.start_date = row.season_start_date,
    s.end_date = row.season_end_date,
    s.source = 'Regio'

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'regio_' + row.season_id}})
ON CREATE SET id_node.source = 'Regio'
MERGE (id_node)-[:IS_ID_OF]->(s)

// --- B. CREAZIONE ORGANIZZATORE ---
WITH s, row
WHERE row.organizer_id IS NOT NULL AND TRIM(row.organizer_id) <> ''
MERGE (org:Organizer {{internal_id_regio: row.organizer_id}})
ON CREATE SET 
    org.name = row.organizer_name,
    org.source = 'Regio'

// --- C. COLLEGA STAGIONE -> ORGANIZZATORE ---
MERGE (s)-[:ORGANIZED_BY]->(org)

// --- D. COLLEGA STAGIONE -> PRODUZIONI ---
WITH s, row
WHERE row.linked_production_ids IS NOT NULL AND TRIM(row.linked_production_ids) <> ''
UNWIND SPLIT(row.linked_production_ids, ',') AS production_id
WITH s, TRIM(production_id) AS prod_id
WHERE prod_id <> ''
MERGE (p:Production {{internal_id_regio: prod_id}})
ON CREATE SET p.source = 'Regio'
MERGE (s)-[:INCLUDES_PRODUCTION]->(p)
MERGE (p)-[:IS_PART_OF]->(s)

RETURN count(s) AS total_seasons;
"""

# 4. Importazione Produzioni
cypher_import_produzioni_recite = f"""
LOAD CSV WITH HEADERS FROM '{FILE_REGIO_PRODUZIONI}' AS row
FIELDTERMINATOR ','
WITH row
WHERE row.production_id IS NOT NULL AND TRIM(row.production_id) <> ''

// --- A. CREAZIONE PRODUZIONE (Sempre eseguita) ---
MERGE (r:Production {{internal_id_regio: row.production_id}})
ON CREATE SET
    r.title = row.work_title,
    r.start_date = row.performance_start_date,
    r.end_date = row.performance_end_date, 
    r.year = CASE WHEN row.year IS NOT NULL THEN toInteger(row.year) ELSE NULL END,
    r.first_location = row.first_location,
    r.first_venue = row.first_venue,
    r.source = 'Regio'
// Se esiste già, aggiorniamo le info per sicurezza
ON MATCH SET
    r.title = row.work_title

MERGE (id_node:ID {{code: 'regio_' + row.production_id}})
ON CREATE SET id_node.source = 'Regio'
MERGE (id_node)-[:IS_ID_OF]->(r)

// --- B. COLLEGA OPERA (SAFE MODE) ---
// Usiamo OPTIONAL MATCH così se l'opera non c'è, la riga non muore
WITH r, row
WHERE row.related_work_id IS NOT NULL AND TRIM(row.related_work_id) <> ''
OPTIONAL MATCH (w:Work {{internal_id_regio: row.related_work_id}})

// Il FOREACH è un trucco: esegue il MERGE solo se 'w' è stato trovato (non è null)
FOREACH (_ IN CASE WHEN w IS NOT NULL THEN [1] ELSE [] END |
    MERGE (r)-[:RELATED_TO_WORK]->(w)
    MERGE (w)-[:RELATES_TO]->(r)
)

// --- C. COLLEGA PERSONALE (SAFE MODE) ---
// Ripartiamo da 'r' e 'row', ignorando se il passo B ha fallito o no
WITH r, row
WHERE row.person_id IS NOT NULL AND TRIM(row.person_id) <> ''
// Usiamo MATCH normale qui? Meglio OPTIONAL anche qui, se manca la persona nel DB Persone
OPTIONAL MATCH (p:Person {{internal_id_regio: row.person_id}})

WITH r, row, p,
     CASE
        WHEN row.person_role = 'Regista' THEN 'DIRECTED'
        WHEN row.person_role = 'Scenografo' THEN 'DESIGNED_SET'
        WHEN row.person_role = 'Coreografo' THEN 'CHOREOGRAPHED'
        WHEN row.person_role CONTAINS 'Costumista' THEN 'DESIGNED_COSTUMES'
        ELSE 'HAD_ROLE_IN'
     END AS relation_type

// Eseguiamo i merge solo se la persona 'p' è stata trovata
FOREACH (_ IN CASE WHEN p IS NOT NULL AND relation_type = 'DIRECTED' THEN [1] ELSE [] END |
    MERGE (p)-[:DIRECTED]->(r)
)
FOREACH (_ IN CASE WHEN p IS NOT NULL AND relation_type = 'DESIGNED_SET' THEN [1] ELSE [] END |
    MERGE (p)-[:DESIGNED_SET]->(r)
)
FOREACH (_ IN CASE WHEN p IS NOT NULL AND relation_type = 'CHOREOGRAPHED' THEN [1] ELSE [] END |
    MERGE (p)-[:CHOREOGRAPHED]->(r)
)
FOREACH (_ IN CASE WHEN p IS NOT NULL AND relation_type = 'DESIGNED_COSTUMES' THEN [1] ELSE [] END |
    MERGE (p)-[:DESIGNED_COSTUMES]->(r)
)
FOREACH (_ IN CASE WHEN p IS NOT NULL AND relation_type = 'HAD_ROLE_IN' THEN [1] ELSE [] END |
    MERGE (p)-[rel:HAD_ROLE_IN]->(r)
    SET rel.role = row.person_role
)

// Ritorniamo il conteggio dei nodi DISTINTI creati/toccati
RETURN count(DISTINCT r) AS distinct_productions_processed;
"""

# 5. Importazione Dettagli Performance
cypher_import_dettagli_performance = f"""
LOAD CSV WITH HEADERS FROM '{FILE_REGIO_RECITE}' AS row
FIELDTERMINATOR ','
WITH row
WHERE row.production_id IS NOT NULL AND TRIM(row.production_id) <> ''
  AND row.id_recita IS NOT NULL AND TRIM(row.id_recita) <> ''

WITH row, row.production_id + '_' + row.id_recita AS unique_perf_id

// --- 1. MATCH PRODUZIONE E CREA RECITA ---
MERGE (rec:Performance {{internal_id_regio: unique_perf_id}})
ON CREATE SET
    rec.internal_id_dettaglio = row.id_recita,
    rec.title = row.titolo_breve,
    rec.date = row.from,
    rec.venue = row.luogo_nome,
    rec.building = row.edificio_nome,
    rec.source = 'Regio'

MERGE (id_node:ID {{code: 'regio_' + unique_perf_id}})
ON CREATE SET id_node.source = 'Regio'
MERGE (id_node)-[:IS_ID_OF]->(rec)

// Collega alla Produzione Padre
WITH rec, row
MERGE (prod:Production {{internal_id_regio: row.production_id}})
MERGE (prod)-[:HAS_PERFORMANCE]->(rec)

// --- 1.5 (NUOVO) COLLEGA DIRETTAMENTE ALL'OPERA (WORK) ---
// Questo allinea il modello a quello della Fondazione
WITH rec, row
WHERE row.composizione_id IS NOT NULL AND TRIM(row.composizione_id) <> ''
MATCH (w:Work {{internal_id_regio: row.composizione_id}})
MERGE (rec)-[:RELATED_TO_WORK]->(w)
MERGE (w)-[:RELATES_TO]->(rec)

// --- 2. COLLEGA DIRETTORI ---
WITH rec, row
WHERE row.curatore_id IS NOT NULL AND TRIM(row.curatore_id) <> ''
  AND row.curatore_ruolo IS NOT NULL
MERGE (cur:Person {{internal_id_regio: row.curatore_id}})
ON CREATE SET cur.name = row.curatore_nome, cur.source = 'Regio'
WITH rec, row, cur
FOREACH (i IN CASE WHEN row.curatore_ruolo CONTAINS 'Direttore' THEN [1] ELSE [] END |
    MERGE (cur)-[:CONDUCTED]->(rec)
)

// --- 3. GESTIONE INTERPRETI E PERSONAGGI ---
WITH rec, row
WHERE row.interprete_id IS NOT NULL AND TRIM(row.interprete_id) <> ''
  AND row.personaggio IS NOT NULL AND TRIM(row.personaggio) <> ''
MERGE (int:Person {{internal_id_regio: row.interprete_id}})
ON CREATE SET int.name = row.interprete, int.source = 'Regio'

WITH rec, row, int
MERGE (char:Character {{name: row.personaggio}})
ON CREATE SET char.voice_type = row.personaggio_voce, char.source = 'Regio'

MERGE (int)-[:INTERPRETED]->(char)
MERGE (char)-[:APPEARED_IN]->(rec)

MERGE (int)-[r:PERFORMED_IN]->(rec)

FOREACH (_ IN CASE WHEN row.ruolo IS NOT NULL AND TRIM(row.ruolo) <> '' THEN [1] ELSE [] END |
    SET r.role = row.ruolo
)

// --- 4. GESTIONE ESECUTORI DI GRUPPO ---
WITH rec, row
WHERE row.esecutore_id IS NOT NULL AND TRIM(row.esecutore_id) <> ''
MERGE (e:Ensemble {{internal_id_regio: row.esecutore_id}})
ON CREATE SET
    e.name = row.esecutore_nome,
    e.type = row.esecutore_ruolo,
    e.source = 'Regio'
MERGE (e)-[:PARTICIPATED_IN]->(rec)

RETURN count(rec) AS total_performances;
"""

driver = None 
try:
    driver = GraphDatabase.driver(uri_db, auth=(user, password))
    driver.verify_connectivity()
    print(f"Connessione a Neo4j stabilita all'URI: {uri_db}")
    
    print("\n[STEP 0/5] Esecuzione pulizia database e creazione vincoli...")
    clean_db(driver) 
    create_constraints(driver) 
    
    print("\n[STEP 1/5] Importazione Persone...")
    run_import_step(driver, cypher_import_persone, "1. Importazione Nodi Person")

    print("\n[STEP 2/5] Importazione Opere...")
    run_import_step(driver, cypher_import_opere_complete, "2. Importazione Works")

    print("\n[STEP 3/5] Importazione Stagioni...")
    run_import_step(driver, cypher_import_stagioni, "3. Importazione Seasons")

    print("\n[STEP 4/5] Importazione Produzioni...")
    run_import_step(driver, cypher_import_produzioni_recite, "4. Importazione Productions")

    print("\n[STEP 5/5] Importazione Performances...")
    run_import_step(driver, cypher_import_dettagli_performance, "5. Importazione Performances")

    # NOTA: Step 6 rimosso. 
    # Per unire i nodi, lanciare il comando apoc.refactor.mergeNodes DOPO aver caricato anche la Fondazione.

    print("\n>>> SUCCESSO: Importazione Regio (English + Updated ID) completata!")
    
except Exception as e:
    print("\n!!! ERRORE FATALE DURANTE IL PROCESSO DI UPLOAD !!!")
    print(e)
    traceback.print_exc(file=sys.stdout)
finally:
    if driver:
        driver.close()
        print("Connessione a Neo4j chiusa.")

Connessione a Neo4j stabilita all'URI: bolt://archiuidev.promemoriagroup.com:7687

[STEP 0/5] Esecuzione pulizia database e creazione vincoli...

--- 0. PULIZIA DATABASE (DETACH DELETE e rimozione vincoli) ---
Vincoli e indici rimossi.
Database pulito con successo.

--- 0.1 CREAZIONE VINCOLI DI UNICITÀ REGIO ---
Vincoli creati.

[STEP 1/5] Importazione Persone...

--- Inizio: 1. Importazione Nodi Person ---
SUCCESSO: 1. Importazione Nodi Person completato.
Risultati: 18039

[STEP 2/5] Importazione Opere...

--- Inizio: 2. Importazione Works ---
SUCCESSO: 2. Importazione Works completato.
Risultati: 887

[STEP 3/5] Importazione Stagioni...

--- Inizio: 3. Importazione Seasons ---
SUCCESSO: 3. Importazione Seasons completato.
Risultati: 1162

[STEP 4/5] Importazione Produzioni...

--- Inizio: 4. Importazione Productions ---
SUCCESSO: 4. Importazione Productions completato.
Risultati: 483

[STEP 5/5] Importazione Performances...

--- Inizio: 5. Importazione Performances ---
SUCCESSO: 5. I

Data Ingestion Pipeline: The Fondazione I Teatri Corpus

This script executes the ingestion phase for the Fondazione I Teatri dataset, integrating Reggio Emilia's theatrical records into the TheatreNet infrastructure.

The process is organized into the following functional modules:

- Semantic Mapping (People & Works): Ingests Person and Work nodes, prioritizing Wikidata QIDs for future reconciliation and mapping creative roles.

- Structural Organization (Productions): Translates institutional role descriptions into standardized semantic relationships, linking productions to their creative teams.

- Event Hierarchy (Performances): Maps concrete performance events, establishing the complex "triangle" between performers, characters, and musical works.

- Relational Linking: Connects performances to their parent productions and seasons.

Key feature: use of institution-specific ID nodes and prefixes. This ensures strict data lineage and prevents identifier collisions, preparing the graph for the final multi-source reconciliation phase.

CODE OF THE FILE property_graph/2_cypher_fondazione.py

In [3]:
from neo4j import GraphDatabase
from dotenv import load_dotenv
import os, sys, traceback

dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

user = os.getenv("ID")
password = os.getenv("SECRET_KEY")
uri_db = "bolt://archiuidev.promemoriagroup.com:7687"

FILE_FONDAZIONE_OPERE = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/fondazione-iteatri-opere-musicali-wiki-reconciled.csv'
FILE_FONDAZIONE_PERSONE = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/persone.csv' 
FILE_FONDAZIONE_STAGIONI = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/stagioni.csv'
FILE_FONDAZIONE_PRODUZIONI = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/produzioni_clean.csv'
FILE_FONDAZIONE_RECITE = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/recite_fondazione_con_qid.csv'
FILE_FONDAZIONE_LINKS = 'https://raw.githubusercontent.com/elena2notti/theatreNet/refs/heads/main/fondazione/20251125_fondazione-iteatri-export-produzione-recite.csv'

def execute_cypher_script(tx, script):
    result = tx.run(script)
    try:
        summary = result.single().value()
    except AttributeError:
        summary = "Nessun risultato restituito."
    return summary

def run_import_step(driver, command, step_name):
    with driver.session() as session:
        print(f"\n--- Inizio: {step_name} ---")
        try:
            result_summary = session.execute_write(execute_cypher_script, command)
            print(f"SUCCESSO: {step_name} completato.")
            print(f"Risultati: {result_summary}")
        except Exception as e:
            print(f"ERRORE CRITICO in {step_name}: {e}")
            print(">>> Il processo continua con lo step successivo...")

def create_constraints_fondazione(driver):
    print("\n--- 0.1 CREAZIONE VINCOLI (ENGLISH) ---")
    with driver.session() as session:
        constraints = [
            "CREATE CONSTRAINT person_internal_id_fond_unique IF NOT EXISTS FOR (p:Person) REQUIRE p.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT work_internal_id_fondazione_unique IF NOT EXISTS FOR (o:Work) REQUIRE o.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT id_code_unique IF NOT EXISTS FOR (i:ID) REQUIRE i.code IS UNIQUE",
            "CREATE CONSTRAINT season_internal_id_fondazione_unique IF NOT EXISTS FOR (s:Season) REQUIRE s.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT production_internal_id_fondazione_unique IF NOT EXISTS FOR (p:Production) REQUIRE p.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT performance_internal_id_fondazione_unique IF NOT EXISTS FOR (r:Performance) REQUIRE r.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT ensemble_internal_id_fondazione_unique IF NOT EXISTS FOR (e:Ensemble) REQUIRE e.internal_id_fondazione IS UNIQUE",
            "CREATE CONSTRAINT building_id_fondazione_unique IF NOT EXISTS FOR (b:Building) REQUIRE b.internal_id_fondazione IS UNIQUE"
        ]
        for c in constraints:
            try:
                session.run(c).consume()
            except Exception:
                pass
        print("Vincoli verificati.")

# 1. Importazione Persone
cypher_import_persone = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_PERSONE}' AS row 
FIELDTERMINATOR ',' 
WITH row WHERE row.id IS NOT NULL AND TRIM(row.id) <> ''

MERGE (p:Person {{internal_id_fondazione: row.id}})
ON CREATE SET 
    p.name = row.dcTitle,
    p.wikidata_qid = row.entity,
    p.wikidata_uri = row.uri,
    p.source = 'Fondazione'
ON MATCH SET
    p.wikidata_qid = row.entity,
    p.wikidata_uri = row.uri

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'fondazione_' + row.id}})
ON CREATE SET id_node.source = 'Fondazione'
MERGE (id_node)-[:IS_ID_OF]->(p)

RETURN count(p) as persone_aggiornate
"""

# 2. Importazione Opere
cypher_import_opere = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_OPERE}' AS row FIELDTERMINATOR ','
WITH row WHERE row.id IS NOT NULL AND TRIM(row.id) <> ''

MERGE (o:Work {{internal_id_fondazione: row.id}})
ON CREATE SET o.title = row.dcTitle, o.wikidata_qid = row.entity_id, o.source = 'Fondazione'

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'fondazione_' + row.id}})
ON CREATE SET id_node.source = 'Fondazione'
MERGE (id_node)-[:IS_ID_OF]->(o)

// Collegamento persone (Crea stub se non esistono, o collega a quelle esistenti)
WITH o, row
WHERE row.persone_collegate IS NOT NULL
UNWIND SPLIT(row.persone_collegate, ',') AS path
WITH o, SPLIT(SPLIT(path, '(')[1], ')')[0] AS p_id
WHERE p_id IS NOT NULL AND TRIM(p_id) <> ''

MERGE (p:Person {{internal_id_fondazione: p_id}})
ON CREATE SET p.source = 'Fondazione' // Setta solo se creato nuovo
MERGE (p)-[:HAD_ROLE_IN {{source: 'Fondazione'}}]->(o)

RETURN count(o)
"""

# 3. Produzioni
cypher_import_produzioni = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_PRODUZIONI}' AS row 
FIELDTERMINATOR ';' 
WITH row WHERE row.id IS NOT NULL AND TRIM(row.id) <> ''

// --- 1. CREAZIONE PRODUZIONE E ID ---
MERGE (p:Production {{internal_id_fondazione: row.id}})

// Creazione ID
MERGE (id_node:ID {{code: 'fondazione_' + row.id}})
ON CREATE SET id_node.source = 'Fondazione'
MERGE (id_node)-[:IS_ID_OF]->(p)

// Set proprietà base
SET
    p.title = row.dcTitle,
    p.start_date = row.from,
    p.end_date = row.to,
    p.source = 'Fondazione',
    p.city = row.luogo_rappresentazione,
    p.venue = row.edificio_rappresentazione

// --- 2. COLLEGAMENTO OPERE ---
WITH p, row
UNWIND SPLIT(COALESCE(row.opere_collegate_id, ''), ',') AS work_id_raw
WITH p, row, TRIM(work_id_raw) AS work_id
WHERE work_id <> ''

MERGE (w:Work {{internal_id_fondazione: work_id}})
ON CREATE SET w.source = 'Fondazione'
MERGE (p)-[:RELATED_TO_WORK]->(w)
MERGE (w)-[:RELATES_TO]->(p)

// --- 3. COLLEGAMENTO PERSONE ---
WITH p, row
WHERE row.persone_collegate_id IS NOT NULL AND TRIM(row.persone_collegate_id) <> ''

WITH p, 
     SPLIT(row.persone_collegate_id, ',') AS ids, 
     SPLIT(COALESCE(row.persone_collegate_ruolo, ''), ',') AS roles

UNWIND range(0, size(ids)-1) AS i
WITH p, TRIM(ids[i]) AS pid, TRIM(roles[i]) AS role_text
WHERE pid IS NOT NULL AND pid <> ''

MERGE (per:Person {{internal_id_fondazione: pid}})
ON CREATE SET per.source = 'Fondazione'

// Logica Ruoli
WITH p, per, role_text,
     CASE
        WHEN role_text CONTAINS 'Regista' OR role_text CONTAINS 'regia' THEN 'DIRECTED'
        WHEN role_text CONTAINS 'Scenografo' OR role_text CONTAINS 'scene' THEN 'DESIGNED_SET'
        WHEN role_text CONTAINS 'Coreografo' OR role_text CONTAINS 'coreografia' THEN 'CHOREOGRAPHED'
        WHEN role_text CONTAINS 'Costumista' OR role_text CONTAINS 'costumi' THEN 'DESIGNED_COSTUMES'
        ELSE 'HAD_ROLE_IN'
     END AS relation_type

FOREACH(ignore IN CASE WHEN relation_type = 'DIRECTED' THEN [1] ELSE [] END | MERGE (per)-[:DIRECTED]->(p))
FOREACH(ignore IN CASE WHEN relation_type = 'DESIGNED_SET' THEN [1] ELSE [] END | MERGE (per)-[:DESIGNED_SET]->(p))
FOREACH(ignore IN CASE WHEN relation_type = 'CHOREOGRAPHED' THEN [1] ELSE [] END | MERGE (per)-[:CHOREOGRAPHED]->(p))
FOREACH(ignore IN CASE WHEN relation_type = 'DESIGNED_COSTUMES' THEN [1] ELSE [] END | MERGE (per)-[:DESIGNED_COSTUMES]->(p))

FOREACH(ignore IN CASE WHEN relation_type = 'HAD_ROLE_IN' THEN [1] ELSE [] END | 
    MERGE (per)-[r:HAD_ROLE_IN]->(p) 
    SET r.role = role_text
)

RETURN count(distinct p) as total_productions
"""

# 4. Recite
cypher_import_recite = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_RECITE}' AS row FIELDTERMINATOR ','
WITH row WHERE row.id IS NOT NULL AND TRIM(row.id) <> ''

// 1. Creazione Performance
MERGE (r:Performance {{internal_id_fondazione: row.id}})
ON CREATE SET 
    r.title = row.titolo_breve,
    r.date = row.from,
    r.venue = row.luogo_nome,
    r.building_text = row.edificio_nome,
    r.source = 'Fondazione'
    
// 2. Nodo ID
MERGE (id_node:ID {{code: 'fondazione_' + row.id}})
ON CREATE SET id_node.source = 'Fondazione'
MERGE (id_node)-[:IS_ID_OF]->(r)

// 3. Gestione Building
WITH r, row
WHERE row.edificio_id IS NOT NULL AND TRIM(row.edificio_id) <> ''
MERGE (b:Building {{internal_id_fondazione: row.edificio_id}})
ON CREATE SET 
    b.name = row.edificio_nome,
    b.city = row.luogo_nome,
    b.wikidata_qid = row.entity,
    b.wikidata_uri = row.uri,
    b.source = 'Fondazione'
ON MATCH SET
    b.wikidata_qid = CASE WHEN row.entity IS NOT NULL AND row.entity <> '' THEN row.entity ELSE b.wikidata_qid END
MERGE (r)-[:HELD_IN]->(b)

// 4. Collega Opera (Work) - Prima connessione
WITH r, row
WHERE row.composizione_id IS NOT NULL AND TRIM(row.composizione_id) <> ''
MERGE (o:Work {{internal_id_fondazione: row.composizione_id}})
MERGE (r)-[:RELATED_TO_WORK]->(o)
MERGE (o)-[:RELATES_TO]->(r)

// 5. Direttore (Conductor)
// Qui usiamo WITH r, row -> La variabile 'o' viene persa qui, ma va bene così
WITH r, row
WHERE row.curatore_id IS NOT NULL AND TRIM(row.curatore_id) <> ''
MERGE (cur:Person {{internal_id_fondazione: row.curatore_id}})
ON CREATE SET cur.name = row.curatore_nome, cur.source = 'Fondazione'
MERGE (cur)-[:CONDUCTED]->(r)

// 6. Esecutore (Ensemble)
WITH r, row
WHERE row.esecutore_id IS NOT NULL AND TRIM(row.esecutore_id) <> ''
MERGE (esec:Ensemble {{internal_id_fondazione: row.esecutore_id}})
ON CREATE SET esec.name = row.esecutore_nome, esec.source = 'Fondazione'
MERGE (esec)-[rel:PARTICIPATED_IN]->(r)
SET rel.role = row.esecutore_ruolo

// 7. Interprete (Person)
WITH r, row
WHERE row.interprete_id IS NOT NULL AND TRIM(row.interprete_id) <> ''
MERGE (int:Person {{internal_id_fondazione: row.interprete_id}})
ON CREATE SET int.name = row.interprete, int.source = 'Fondazione'

MERGE (int)-[rel_int:PERFORMED_IN]->(r)
FOREACH (_ IN CASE WHEN row.ruolo IS NOT NULL AND TRIM(row.ruolo) <> '' THEN [1] ELSE [] END |
    SET rel_int.role = row.ruolo
)

// 8. GESTIONE PERSONAGGI (Character)
// Fix: Non chiediamo 'o' nel WITH, la recuperiamo sotto
WITH r, row, int
WHERE row.personaggio IS NOT NULL AND TRIM(row.personaggio) <> ''

MERGE (char:Character {{name: row.personaggio}})
ON CREATE SET char.voice_type = row.personaggio_voce, char.source = 'Fondazione'

// Triangolo
MERGE (int)-[:INTERPRETED]->(char)
MERGE (char)-[:APPEARED_IN]->(r)

// *** FIX RECUPERO OPERA ***
// Recuperiamo l'Opera usando l'ID nella riga, così siamo sicuri di averla
WITH char, row
WHERE row.composizione_id IS NOT NULL AND TRIM(row.composizione_id) <> ''
MATCH (o_final:Work {{internal_id_fondazione: row.composizione_id}})

// Creiamo il collegamento Opera -> Personaggio (Richiesta Supervisor)
MERGE (o_final)-[:HAS_CHARACTER]->(char)

RETURN count(distinct char)
"""

# 4.5 Collegamento Produzione -> Recite (HAS_PERFORMANCE)
cypher_link_produzioni_recite = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_LINKS}' AS row
FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL AND row.recite_collegate IS NOT NULL AND TRIM(row.recite_collegate) <> ''

// Trova Produzione
MATCH (p:Production {{internal_id_fondazione: row.id}})

// Trova e collega le Recite
UNWIND SPLIT(row.recite_collegate, ',') AS path_raw
WITH p, path_raw, SPLIT(path_raw, '(') AS parts
WITH p, SPLIT(parts[-1], ')')[0] AS recita_id
WHERE recita_id IS NOT NULL AND TRIM(recita_id) <> ''

MATCH (r:Performance {{internal_id_fondazione: TRIM(recita_id)}})
MERGE (p)-[:HAS_PERFORMANCE]->(r)

RETURN count(*) as links_created
"""

# 5. Stagioni
cypher_import_stagioni = f"""
LOAD CSV WITH HEADERS FROM '{FILE_FONDAZIONE_STAGIONI}' AS row FIELDTERMINATOR ','
WITH row WHERE row.id IS NOT NULL AND TRIM(row.id) <> ''

MERGE (s:Season {{internal_id_fondazione: row.id}})
ON CREATE SET 
    s.title = row.dcTitle,
    s.type = row.dcType,
    s.start_date = row.from,
    s.end_date = row.to,
    s.source = 'Fondazione'

// CREAZIONE NODO ID
MERGE (id_node:ID {{code: 'fondazione_' + row.id}})
ON CREATE SET id_node.source = 'Fondazione'
MERGE (id_node)-[:IS_ID_OF]->(s)

// Collega Produzioni (INCLUDES_PRODUCTION)
WITH s, row
UNWIND SPLIT(row.produzioni_collegate_id, ',') AS pid
OPTIONAL MATCH (p:Production {{internal_id_fondazione: TRIM(pid)}})
FOREACH (_ IN CASE WHEN p IS NOT NULL THEN [1] ELSE [] END |
    MERGE (s)-[:INCLUDES_PRODUCTION]->(p)
    MERGE (p)-[:IS_PART_OF]->(s)
)

// Collega Recite (INCLUDES_PERFORMANCE) - Piano B sempre utile
WITH s, row
UNWIND SPLIT(row.manifestazioni_recite_concerti_collegati_id, ',') AS rid
OPTIONAL MATCH (r:Performance {{internal_id_fondazione: TRIM(rid)}})
FOREACH (_ IN CASE WHEN r IS NOT NULL THEN [1] ELSE [] END |
    MERGE (s)-[:INCLUDES_PERFORMANCE]->(r)
)

RETURN count(distinct s)
"""

if __name__ == "__main__":
    driver = None 
    try:
        driver = GraphDatabase.driver(uri_db, auth=(user, password))
        driver.verify_connectivity()
        print(f"Connesso a {uri_db}")
        
        create_constraints_fondazione(driver)
        
        run_import_step(driver, cypher_import_persone, "1. Persone (Arricchimento Wikidata)")
        run_import_step(driver, cypher_import_opere, "2. Opere (Works)")
        run_import_step(driver, cypher_import_produzioni, "3. Produzioni (Productions)")
        run_import_step(driver, cypher_import_recite, "4. Recite (Performances)")
        run_import_step(driver, cypher_link_produzioni_recite, "4.5 Link Produzioni->Recite")
        run_import_step(driver, cypher_import_stagioni, "5. Stagioni (Seasons)")
        
        print("\n>>> IMPORTAZIONE FONDAZIONE COMPLETATA.")
        print("    ORA ESEGUI LO SCRIPT 'reconcile_final.py' PER UNIRE I NODI!")

    except Exception as e:
        print(f"\n!!! ERRORE GENERALE: {e}")
        traceback.print_exc()
    finally:
        if driver: driver.close()

Connesso a bolt://archiuidev.promemoriagroup.com:7687

--- 0.1 CREAZIONE VINCOLI (ENGLISH) ---
Vincoli verificati.

--- Inizio: 1. Persone (Arricchimento Wikidata) ---
SUCCESSO: 1. Persone (Arricchimento Wikidata) completato.
Risultati: 24279

--- Inizio: 2. Opere (Works) ---
SUCCESSO: 2. Opere (Works) completato.
Risultati: 5309

--- Inizio: 3. Produzioni (Productions) ---
SUCCESSO: 3. Produzioni (Productions) completato.
Risultati: 587

--- Inizio: 4. Recite (Performances) ---
SUCCESSO: 4. Recite (Performances) completato.
Risultati: 1846

--- Inizio: 4.5 Link Produzioni->Recite ---
SUCCESSO: 4.5 Link Produzioni->Recite completato.
Risultati: 7891

--- Inizio: 5. Stagioni (Seasons) ---
SUCCESSO: 5. Stagioni (Seasons) completato.
Risultati: 138

>>> IMPORTAZIONE FONDAZIONE COMPLETATA.
    ORA ESEGUI LO SCRIPT 'reconcile_final.py' PER UNIRE I NODI!


In [4]:
pip install -U ipywidgets jupyter tqdm



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m26.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3.10 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Semantic Enrichment Pipeline: Work Vectorization

This script implements the AI-driven enrichment layer of the TheatreNet graph, focusing on the transformation of textual metadata into high-dimensional numerical vectors for Work nodes.

The process is organized into the following functional modules:

- AI Model Integration: Initializes the Sentence-Transformers (SBERT) model (all-MiniLM-L6-v2) to translate artistic metadata into a 384-dimensional semantic space.

- Contextual Data Fetching: Retrieves titles and associated composers for all works lacking external identifiers (Wikidata QIDs), ensuring that the vector representation captures both the "conception" and its authorship.

- Latent Semantic Embedding: Encodes the combined metadata into vectors where mathematical proximity (cosine similarity) correlates to conceptual and historical similarity.

- Graph Property Injection: Performs batch updates to Neo4j, storing the calculated embeddings directly on the nodes to enable high-speed vector-based reconciliation.

Key feature: transition from literal matching to latent semantic understanding. This step is essential for resolving fragmented records across different archives, allowing the system to "recognize" the same work even when recorded with slight title variations or linguistic differences.

CODE OF THE FILE property_graph/3_vector_opere.py

In [5]:
from neo4j import GraphDatabase, basic_auth
from sentence_transformers import SentenceTransformer
import os
import sys
from dotenv import load_dotenv

# 1. Config
dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

URI = "bolt://archiuidev.promemoriagroup.com:7687"
USER = os.getenv("ID")
PASSWORD = os.getenv("SECRET_KEY")

# DEBUG CHECK
if not USER or not PASSWORD:
    print(f"ERROR: Could not load credentials from {dotenv_path}")
    print(f"USER: {USER}, PASSWORD: {'SET' if PASSWORD else 'NONE'}")
    sys.exit(1)

print("Loading AI Model...")
model = SentenceTransformer('all-MiniLM-L6-v2') 

def add_embeddings(driver):
    with driver.session() as session:
        print("Fetching works from database...")
        # A. Fetch all Works 
        result = session.run("""
            MATCH (w:Work)
            WHERE w.wikidata_qid IS NULL OR trim(w.wikidata_qid) = ''
            OPTIONAL MATCH (w)-[:HAS_COMPOSER]->(c:Person)
            WITH w, collect(DISTINCT c.name) AS composers
            RETURN elementId(w) AS id, w.title AS title, composers
        """)
        
        operations = []
        count = 0
        
        print("Calculating vectors...")
        
        # Consume the result entirely to avoid timeout/cursor issues
        records = list(result)
        
        for record in records:
            title = record['title']
            # Safe handling if title is missing
            if not title: continue 
            
            composers = record['composers']
            
            # Create text to embed
            text_to_embed = f"{title} {' '.join(composers) if composers else ''}"
            
            # Calculate Vector
            vector = model.encode(text_to_embed).tolist()
            
            # Prepare the update
            operations.append({"id": record["id"], "vector": vector})
            count += 1
            
            if count % 100 == 0:
                print(f"Processed {count} works...")

        # B. Write vectors back to Neo4j
        if operations:
            print(f"Writing {len(operations)} vectors to Neo4j...")
            
            # Batch write for performance
            session.run("""
                UNWIND $batch AS item
                MATCH (w) WHERE elementId(w) = item.id
                CALL db.create.setNodeVectorProperty(w, 'embedding', item.vector)
            """, batch=operations)
        else:
            print("No works found to update.")

# Run
try:
    print(f"Connecting to {URI} as {USER}...")
    driver = GraphDatabase.driver(URI, auth=basic_auth(USER, PASSWORD))
    driver.verify_connectivity() # Check connection before running logic
    
    add_embeddings(driver)
    
except Exception as e:
    print("\nCRITICAL ERROR:")
    print(e)
finally:
    if 'driver' in locals():
        driver.close()
    print("Done.")

Loading AI Model...




Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


Connecting to bolt://archiuidev.promemoriagroup.com:7687 as elenabinotti...
Fetching works from database...
Calculating vectors...
Processed 100 works...
Processed 200 works...
Processed 300 works...
Processed 400 works...
Processed 500 works...
Processed 600 works...
Processed 700 works...
Processed 800 works...
Processed 900 works...
Processed 1000 works...
Processed 1100 works...
Processed 1200 works...
Processed 1300 works...
Processed 1400 works...
Processed 1500 works...
Processed 1600 works...
Processed 1700 works...
Processed 1800 works...
Processed 1900 works...
Processed 2000 works...
Processed 2100 works...
Processed 2200 works...
Processed 2300 works...
Processed 2400 works...
Processed 2500 works...
Processed 2600 works...
Processed 2700 works...
Processed 2800 works...
Processed 2900 works...
Processed 3000 works...
Processed 3100 works...
Processed 3200 works...
Processed 3300 works...
Processed 3400 works...
Processed 3500 works...
Processed 3600 works...
Processed 3700

Semantic Enrichment Pipeline: Person Vectorization

This script extends the AI enrichment layer to Person nodes, creating a high-dimensional representation of the artists and professionals within the TheatreNet graph.

The process is organized into the following functional modules:

- Identity Contextualization: Retrieves names along with birth and death dates for individuals lacking global identifiers (Wikidata QIDs). This metadata combination is crucial for the model to distinguish between historical homonyms across different centuries.

- Biographical Embedding: Utilizes the SBERT model to encode biographical identities into 384-dimensional vectors, capturing the unique "signature" of each person in a latent semantic space.

- Vector Property Mapping: Systematically updates the Neo4j database by injecting these embeddings into the nodes, enabling advanced similarity searches and cross-institutional person reconciliation.

Key feature: use of temporal metadata as a semantic anchor. By embedding dates alongside names, the system ensures that reconciliation logic remains historically grounded, preventing the erroneous merging of different individuals who happen to share the same name.

CODE OF THE FILE property_graph/4_vector_persone.py

In [6]:
from neo4j import GraphDatabase, basic_auth
from sentence_transformers import SentenceTransformer
import os
import sys
from dotenv import load_dotenv

dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

URI = "bolt://archiuidev.promemoriagroup.com:7687"
USER = os.getenv("ID")
PASSWORD = os.getenv("SECRET_KEY")

if not USER or not PASSWORD:
    print("ERRORE: Credenziali mancanti.")
    sys.exit(1)

print("Loading AI Model...")
model = SentenceTransformer('all-MiniLM-L6-v2') 

def add_person_embeddings(driver):
    with driver.session() as session:
        print("Fetching people from database...")
        
        result = session.run("""
            MATCH (p:Person)
            WHERE p.wikidata_qid IS NULL OR trim(p.wikidata_qid) = ''
            RETURN elementId(p) AS id, p.name AS name, p.birth_date AS bdate, p.death_date AS ddate
        """)
        
        operations = []
        count = 0
        print("Calculating vectors for People...")
        
        records = list(result)
        
        for record in records:
            name = record['name']
            if not name: continue 
            
            # aiuta l'AI a distinguere omonimi di secoli diversi.
            bdate = str(record['bdate']) if record['bdate'] else ""
            ddate = str(record['ddate']) if record['ddate'] else ""
            
            # Stringa finale da vettorizzare
            text_to_embed = f"{name} {bdate} {ddate}".strip()
            
            # Calcolo Vettore
            vector = model.encode(text_to_embed).tolist()
            
            operations.append({"id": record["id"], "vector": vector})
            count += 1
            
            if count % 500 == 0:
                print(f"Processed {count} people...")

        if operations:
            print(f"Writing {len(operations)} vectors to Neo4j...")
            session.run("""
                UNWIND $batch AS item
                MATCH (p) WHERE elementId(p) = item.id
                CALL db.create.setNodeVectorProperty(p, 'embedding', item.vector)
            """, batch=operations)
        else:
            print("No people found.")

try:
    driver = GraphDatabase.driver(URI, auth=basic_auth(USER, PASSWORD))
    driver.verify_connectivity()
    add_person_embeddings(driver)
except Exception as e:
    print(e)
finally:
    if 'driver' in locals(): driver.close()
    print("Done.")

Loading AI Model...


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


Fetching people from database...
Calculating vectors for People...
Processed 500 people...
Processed 1000 people...
Processed 1500 people...
Processed 2000 people...
Processed 2500 people...
Processed 3000 people...
Processed 3500 people...
Processed 4000 people...
Processed 4500 people...
Processed 5000 people...
Processed 5500 people...
Processed 6000 people...
Processed 6500 people...
Processed 7000 people...
Processed 7500 people...
Processed 8000 people...
Processed 8500 people...
Processed 9000 people...
Processed 9500 people...
Processed 10000 people...
Processed 10500 people...
Processed 11000 people...
Processed 11500 people...
Processed 12000 people...
Processed 12500 people...
Processed 13000 people...
Processed 13500 people...
Processed 14000 people...
Processed 14500 people...
Processed 15000 people...
Processed 15500 people...
Processed 16000 people...
Processed 16500 people...
Processed 17000 people...
Processed 17500 people...
Processed 18000 people...
Writing 18310 vec

Physical Reconciliation Pipeline: Wikidata-Based Fusion

This script implements the final stage of the Entity Resolution process within the TheatreNet graph. It performs a physical fusion of duplicate nodes to create unified "Golden Records."

The process is organized into the following functional modules:

- Wikidata-Driven Matching: Identifies Person, Work, and Building nodes that share the same Wikidata QID, treating it as the authoritative global identifier for matching across different institutional sources.

- Atomic Node Fusion: Utilizes the apoc.refactor.mergeNodes procedure to collapse multiple source-specific nodes into a single, enriched entity.

- Property Conflict Resolution: Implements specific merging policies: combine for provenance metadata (preserving both 'Regio' and 'Fondazione' as sources) and overwrite for primary identifiers and names.

- Relational Integrity Maintenance: Ensures that all incoming and outgoing relationships (such as PERFORMED_IN, CONDUCTED, or HELD_IN) are preserved and re-linked to the new unified node.

Key feature: merging these entities in a Golden Record. With this the script heals the fragmentation of the theatrical heritage, transforming two separate datasets into a single network of shared history.

CODE OF THE FILE property_graph/5_node_merge.py

In [7]:
from neo4j import GraphDatabase
from dotenv import load_dotenv
import os

dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

user = os.getenv("ID")
password = os.getenv("SECRET_KEY")
uri_db = "bolt://archiuidev.promemoriagroup.com:7687"

cypher_merge_people = """
MATCH (p:Person)
WHERE p.wikidata_qid IS NOT NULL AND TRIM(p.wikidata_qid) <> ''
WITH p.wikidata_qid as qid, collect(p) as nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {
    properties: {
        source: 'combine',              // Diventa ['Regio', 'Fondazione']
        name: 'overwrite',              // Vince il primo (o quello più aggiornato)
        wikidata_qid: 'discard',        // Sono identici
        wikidata_uri: 'discard',
        internal_id_regio: 'overwrite', // Mantiene l'ID Regio
        internal_id_fondazione: 'overwrite', // Mantiene l'ID Fondazione
        trace_ids: 'combine'            // Unisce eventuali ID storici
    },
    mergeRels: true                     // Unisce le relazioni (PERFORMED_IN, etc.)
}) YIELD node
RETURN count(node) as persone_unite
"""

cypher_merge_works = """
MATCH (w:Work)
WHERE w.wikidata_qid IS NOT NULL AND TRIM(w.wikidata_qid) <> ''
WITH w.wikidata_qid as qid, collect(w) as nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {
    properties: {
        source: 'combine',
        title: 'overwrite',
        wikidata_qid: 'discard',
        internal_id_regio: 'overwrite',
        internal_id_fondazione: 'overwrite',
        year: 'overwrite'
    },
    mergeRels: true
}) YIELD node
RETURN count(node) as opere_unite
"""

cypher_merge_buildings = """
MATCH (b:Building)
WHERE b.wikidata_qid IS NOT NULL AND TRIM(b.wikidata_qid) <> ''
WITH b.wikidata_qid as qid, collect(b) as nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {
    properties: {
        source: 'combine',               // Diventa ['Regio', 'Fondazione']
        
        // Gestione Nomi e Città
        name: 'overwrite',               // Es. "Teatro Regio"
        city: 'overwrite',               // Es. "Torino"
        address: 'overwrite',            // Se c'è l'indirizzo
        
        // Gestione ID Tecnici
        wikidata_qid: 'discard',
        internal_id_regio: 'overwrite',      // Mantiene ID Regio
        internal_id_fondazione: 'overwrite'  // Mantiene ID Fondazione
    },
    mergeRels: true  // Importante: Unisce le relazioni HELD_IN (dove si è tenuta la recita)
}) YIELD node
RETURN count(node) as edifici_uniti
"""

cypher_clean_same_as = """
MATCH (n)-[r:SAME_AS]->(m)
DELETE r
"""

def run_reconciliation(driver):
    with driver.session() as session:
        print("\n--- INIZIO RICONCILIAZIONE (FUSIONE FISICA) ---")
        print("L'obiettivo è creare un Golden Record unico per entità condivisa.")

        # A. Persone
        print("\n1. Fusione Persone...")
        res = session.run(cypher_merge_people).single()
        print(f"   -> Persone unite: {res['persone_unite'] if res else 0}")

        # B. Opere
        print("\n2. Fusione Opere...")
        res = session.run(cypher_merge_works).single()
        print(f"   -> Opere unite: {res['opere_unite'] if res else 0}")

        # C. Edifici (Building)
        print("\n3. Fusione Edifici (Teatri/Luoghi)...")
        res = session.run(cypher_merge_buildings).single()
        print(f"   -> Edifici uniti: {res['edifici_uniti'] if res else 0}")

        # D. Pulizia SAME_AS (Sicurezza)
        print("\n4. Rimozione relazioni SAME_AS residue...")
        session.run(cypher_clean_same_as)
        print("   -> Relazioni SAME_AS rimosse.")
        
        print("\n PROCESSO COMPLETATO.")

if __name__ == "__main__":
    driver = GraphDatabase.driver(uri_db, auth=(user, password))
    try:
        run_reconciliation(driver)
    except Exception as e:
        print(f"Errore: {e}")
    finally:
        driver.close()


--- INIZIO RICONCILIAZIONE (FUSIONE FISICA) ---
L'obiettivo è creare un Golden Record unico per entità condivisa.

1. Fusione Persone...
   -> Persone unite: 3956

2. Fusione Opere...
   -> Opere unite: 225

3. Fusione Edifici (Teatri/Luoghi)...
   -> Edifici uniti: 0

4. Rimozione relazioni SAME_AS residue...
   -> Relazioni SAME_AS rimosse.

 PROCESSO COMPLETATO.


Semantic Reconciliation Pipeline: Vector-Based Person Matching

This script implements the final, advanced layer of Entity Resolution for individuals lacking global identifiers (Wikidata QIDs). It uses the SBERT embeddings generated in previous steps to "bridge" the gaps between the Teatro Regio and Fondazione I Teatri datasets through mathematical similarity.

The process is organized into the following functional modules:

- Similarity Graph Construction: Identifies candidate matches by querying the Neo4j vector index and creating temporary SAME_AS relationships for nodes exceeding a specific similarity threshold (0.944).

- Iterative Component Merging: Detects clusters of similar nodes (subgraphs) and merges them into unified Golden Records using an iterative logic and the apoc.refactor.mergeNodes procedure.

- Refined Property Management: Preserves data lineage by combining sources and trace IDs, while consolidating biographical data (names, dates) and discarding obsolete embeddings.

- Graph Integrity Cleanup: Removes temporary similarity links and resolves self-loops to maintain a clean, production-ready relational structure.

Key feature: use of dual thresholds. By distinguishing between a lower threshold for linking and a stricter one (0.955) for physical merging, the script ensures a high-confidence reconciliation process that respects the historical complexity of theatrical metadata.

In [8]:
from neo4j import GraphDatabase, basic_auth
from dotenv import load_dotenv
import os
import sys

# =========================
# CONFIG
# =========================
dotenv_path = "/Users/elenabinotti/Documents/scuola/unibo/LM-43 DHDK/promemoria group/env.env"
load_dotenv(dotenv_path=dotenv_path)

URI = "bolt://archiuidev.promemoriagroup.com:7687"
USER = os.getenv("ID")
PASSWORD = os.getenv("SECRET_KEY")

if not USER or not PASSWORD:
    print(f"ERROR: Missing credentials from {dotenv_path}")
    sys.exit(1)

# Vector matching params
K = 5
LINK_THRESHOLD = 0.944       # threshold to CREATE SAME_AS
MERGE_THRESHOLD = 0.955      # stricter threshold to MERGE nodes

# =========================
# CYPHER QUERIES
# =========================

# 1) Create SAME_AS links (only Persons without QID, with embedding)
CYPHER_CREATE_SAME_AS = """
WITH $k AS k, $threshold AS threshold
MATCH (p:Person)
WHERE (p.wikidata_qid IS NULL OR trim(p.wikidata_qid) = '')
  AND p.embedding IS NOT NULL
CALL db.index.vector.queryNodes('person_embeddings', k, p.embedding)
YIELD node, score
WHERE node <> p
  AND id(p) < id(node)  // avoid duplicates (A->B and B->A)
  AND (node.wikidata_qid IS NULL OR trim(node.wikidata_qid) = '')
  AND node.embedding IS NOT NULL
  AND score >= threshold
MERGE (p)-[r:SAME_AS]->(node)
SET r.confidence = score,
    r.method = 'sbert_name_dates'
RETURN count(r) AS created;
"""

# 2) Process components one by one using iterative approach
CYPHER_FIND_COMPONENTS = """
// Find distinct components using SAME_AS relationships above threshold
MATCH (a:Person)-[r:SAME_AS]->(b:Person)
WHERE r.confidence >= $threshold
WITH collect(DISTINCT a) + collect(DISTINCT b) AS seeds
UNWIND seeds AS seed
WITH DISTINCT seed
MATCH (seed)
// Use apoc to get component nodes, but limit to components that are still valid
CALL apoc.path.subgraphAll(seed, {
  relationshipFilter: "SAME_AS",
  minLevel: 1,
  maxLevel: 10
}) YIELD nodes AS component
WHERE size(component) > 1
// Return one component at a time
RETURN component
LIMIT 1
"""

CYPHER_MERGE_COMPONENT = """
// Merge a specific component by node IDs
WITH $node_ids AS node_ids
MATCH (n:Person) WHERE id(n) IN node_ids
WITH collect(n) AS nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {
  properties: {
    source: 'combine',
    trace_ids: 'combine',
    name: 'overwrite',
    birth_date: 'overwrite',
    death_date: 'overwrite',
    wikidata_qid: 'overwrite',
    wikidata_uri: 'overwrite',
    internal_id_regio: 'overwrite',
    internal_id_fondazione: 'overwrite',
    embedding: 'discard'
  },
  mergeRels: true
}) YIELD node
RETURN node
"""

# 3) Cleanup SAME_AS relationships (after merges)
CYPHER_DELETE_SAME_AS = """
MATCH ()-[r:SAME_AS]->()
DELETE r
RETURN count(r) AS deleted;
"""

# Optional: remove self-loops if any remain (defensive)
CYPHER_DELETE_SAME_AS_SELF_LOOPS = """
MATCH (n)-[r:SAME_AS]->(n)
DELETE r
RETURN count(r) AS deleted_self_loops;
"""

# =========================
# RUNNER
# =========================

def run_vector_reconciliation_people():
    driver = GraphDatabase.driver(URI, auth=basic_auth(USER, PASSWORD))
    try:
        with driver.session() as session:
            print("\n--- STEP 1: Create SAME_AS links (vector similarity) ---")
            res = session.run(CYPHER_CREATE_SAME_AS, k=K, threshold=LINK_THRESHOLD).single()
            created = res["created"] if res else 0
            print(f"SAME_AS created: {created}")

            if created > 0:
                print("\n--- STEP 2: Iteratively merge SAME_AS components into Golden Records ---")
                merged_count = 0
                
                # Process components one by one until no more components are found
                while True:
                    # Find one component to merge
                    result = session.run(CYPHER_FIND_COMPONENTS, threshold=MERGE_THRESHOLD).single()
                    
                    if not result:
                        print("No more components to merge")
                        break
                    
                    component = result["component"]
                    node_ids = [node.id for node in component]
                    
                    print(f"Found component with {len(node_ids)} nodes. Merging...")
                    
                    try:
                        # Merge this specific component
                        merge_result = session.run(
                            CYPHER_MERGE_COMPONENT, 
                            node_ids=node_ids
                        ).single()
                        
                        if merge_result:
                            merged_count += 1
                            print(f"Component merged successfully. Total merged: {merged_count}")
                    
                    except Exception as e:
                        print(f"Error merging component (will try next): {str(e)[:100]}")
                        # Some nodes might have been already merged - continue with next component
                        continue

                print(f"\nTotal components merged: {merged_count}")
            else:
                print("\n--- STEP 2: No SAME_AS links created, skipping merge ---")

            print("\n--- STEP 3: Cleanup SAME_AS relationships ---")
            res = session.run(CYPHER_DELETE_SAME_AS_SELF_LOOPS).single()
            deleted_loops = res["deleted_self_loops"] if res else 0
            if deleted_loops:
                print(f"SAME_AS self-loops deleted: {deleted_loops}")

            res = session.run(CYPHER_DELETE_SAME_AS).single()
            deleted = res["deleted"] if res else 0
            print(f"SAME_AS deleted: {deleted}")

            print("\nDONE. Note: embeddings were discarded on merged nodes; recompute embeddings after merges if needed.")

    except Exception as e:
        print(f"ERROR: {str(e)}")
        raise
    finally:
        driver.close()

if __name__ == "__main__":
    run_vector_reconciliation_people()


--- STEP 1: Create SAME_AS links (vector similarity) ---




SAME_AS created: 2514

--- STEP 2: Iteratively merge SAME_AS components into Golden Records ---


  node_ids = [node.id for node in component]


Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 1
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 2
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 3




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 4
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 5
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 6




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 7
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 8
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 9
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 10




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 11




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 12
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 13
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 14
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 15
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 16




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 17
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 18
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 19
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 20
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 21




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 22
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 23




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 24
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 25
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 26




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 27
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 28
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 29
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 30




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 31
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 32
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 33




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 34
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 35
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 36




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 37
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 38
Found component with 5 nodes. Merging...




Component merged successfully. Total merged: 39
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 40
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 41
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 42
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 43
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 44
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 45




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 46
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 47
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 48




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 49
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 50
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 51
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 52




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 53




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 54
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 55
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 56
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 57
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 58




Found component with 7 nodes. Merging...
Component merged successfully. Total merged: 59




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 60
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 61
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 62
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 63




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 64
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 65
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 66
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 67
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 68




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 69
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 70
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 71




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 72
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 73
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 74




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 75
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 76
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 77




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 78
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 79
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 80




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 81
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 82
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 83




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 84
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 85
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 86




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 87
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 88
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 89




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 90
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 91
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 92




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 93
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 94
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 95




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 96
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 97
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 98
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 99
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 100
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 101
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 102
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 103
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 104
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 105
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 106




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 107
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 108
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 109




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 110




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 111
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 112
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 113




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 114
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 115
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 116




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 117
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 118
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 119




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 120
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 121
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 122




Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 123
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 124
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 125
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 126
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 127




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 128
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 129
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 130
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 131




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 132
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 133
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 134
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 135




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 136
Found component with 4 nodes. Merging...




Component merged successfully. Total merged: 137
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 138




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 139
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 140
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 141
Found component with 4 nodes. Merging...
Component merged successfully. Total merged: 142




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 143
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 144
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 145




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 146
Found component with 3 nodes. Merging...




Component merged successfully. Total merged: 147
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 148
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 149
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 150
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 151




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 152
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 153
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 154




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 155
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 156
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 157
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 158
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 159
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 160




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 161
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 162
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 163




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 164
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 165
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 166
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 167




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 168




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 169
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 170
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 171




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 172




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 173
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 174
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 175
Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 176




Found component with 2 nodes. Merging...
Component merged successfully. Total merged: 177
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 178
Found component with 2 nodes. Merging...




Component merged successfully. Total merged: 179
Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 180




Found component with 3 nodes. Merging...
Component merged successfully. Total merged: 181
No more components to merge

Total components merged: 181

--- STEP 3: Cleanup SAME_AS relationships ---
SAME_AS self-loops deleted: 170
SAME_AS deleted: 1780

DONE. Note: embeddings were discarded on merged nodes; recompute embeddings after merges if needed.
