# Personal Memory Pipeline — Quality Inspection

This notebook lets you inspect the output of each stage of the personal memory pipeline:

| Stage | What we test |
|-------|--------------|
| **Stage 0** | Build a synthetic session with two clear topics and pronoun references |
| **Stage 1** | Coreference resolution — before vs after comparison |
| **Stage 2** | Topic-boundary chunking — where does the LLM split the session? |
| **Stage 3** | GSW extraction (CONVERSATIONAL prompt) — entity nodes, verb phrases, speaker_id, evidence_turn_ids |
| **Stage 4** | *(optional)* Repeat Stages 1–3 on real LoCoMo data if available |
| **Stage 5** | Reconcile per-chunk GSWs into one merged GSW (synthetic) |
| **Stage 6** | *(optional)* Reconcile real LoCoMo chunks (session 1 only) |
| **Stage 7** | *(optional)* Full pipeline on sessions 1–5: coref → chunk → GSW → Layer 1 per session |
| **Stage 8** | *(optional)* Layer 2: `ConversationReconciler.reconcile_sessions` across 5 sessions |
| **Stage 9** | *(optional)* Layer 3: `reconcile_conversations_agentic` — agentic person-level merge |

**Requirements:** `OPENAI_API_KEY` set in a `.env` file at the repo root.

In [1]:
import json
import os
import sys

# Make sure the package is importable when running from this directory
repo_root = os.path.abspath(os.path.join(os.getcwd(), '../../../../..'))
if repo_root not in sys.path:
    sys.path.insert(0, repo_root)

from dotenv import load_dotenv
load_dotenv(os.path.join(repo_root, '.env'))

from openai import OpenAI

from gsw_memory.personal_memory.data_ingestion.locomo import Session, Turn
from gsw_memory.personal_memory.chunker import TopicBoundaryChunker
from gsw_memory.memory.models import GSWStructure
from gsw_memory.prompts.operator_prompts import CorefPrompts, ConversationalOperatorPrompts

client = OpenAI()
MODEL = 'gpt-4o'

print('Imports OK')

Imports OK


## Stage 0 — Synthetic Session

We use a short synthetic dialogue with:
- Two speakers: **Alex** and **Jordan**
- Two clear topics: work stress (turns 1–9) → birthday party plans (turns 10–18)
- Pronouns (`she`, `he`, `her`, `his`) so coref has something to resolve
- Informal language (`gonna`, `ya`, contractions)

In [2]:
turns = [
    # ----- Topic A: work stress -----
    Turn(speaker="Alex",   text="I had a really rough day. My manager pulled me aside and said she's putting me on a performance plan.",                         dia_id="D1:1"),
    Turn(speaker="Jordan", text="Wait, your manager Sarah? She seemed fine at the all-hands last week.",                                                          dia_id="D1:2"),
    Turn(speaker="Alex",   text="Yeah, her. Apparently she thinks my last two projects were late. She even cc'd HR on the email.",                                dia_id="D1:3"),
    Turn(speaker="Jordan", text="That's rough. Did she give you specific deadlines to hit?",                                                                       dia_id="D1:4"),
    Turn(speaker="Alex",   text="She wants me to finish the analytics dashboard by Friday. I'm not sure I can do it alone.",                                      dia_id="D1:5"),
    Turn(speaker="Jordan", text="I can help you after work tomorrow. I've done dashboards like that before at my old company.",                                    dia_id="D1:6"),
    Turn(speaker="Alex",   text="Seriously? That would be amazing. I owe you one.",                                                                               dia_id="D1:7"),
    Turn(speaker="Jordan", text="No worries. And honestly, if Sarah keeps this up you should talk to someone in HR yourself.",                                    dia_id="D1:8"),
    Turn(speaker="Alex",   text="I might. I just don't want to make things worse with her.",                                                                      dia_id="D1:9"),
    # ----- Topic B: birthday party -----
    Turn(speaker="Alex",   text="Anyway, are you coming to my birthday party on Saturday?",                                               dia_id="D1:10"),
    Turn(speaker="Jordan", text="Of course! Where's it gonna be?",                                                                                                dia_id="D1:11"),
    Turn(speaker="Alex",   text="At my place. My roommate Marcus is helping me set it up. He's ordering pizza and decorations.",                                  dia_id="D1:12"),
    Turn(speaker="Jordan", text="Oh nice, I like Marcus. Should I bring something?",                                                                              dia_id="D1:13"),
    Turn(speaker="Alex",   text="Maybe just some drinks? Marcus already bought a cake.",                                                                          dia_id="D1:14"),
    Turn(speaker="Jordan", text="Sure, I'll grab some wine and maybe some sparkling water for people who don't drink.",                                           dia_id="D1:15"),
    Turn(speaker="Alex",   text="Perfect. It starts at seven. My parents are coming too, so fair warning it might get a bit loud.",                               dia_id="D1:16"),
    Turn(speaker="Jordan", text="Ha, I'll be ready. Is there a dress code?",                                                                                      dia_id="D1:17"),
    Turn(speaker="Alex",   text="Nah, just casual. Marcus said he might dress up as a clown as a joke but don't hold him to that.",                               dia_id="D1:18"),
]

session = Session(session_id=1, date_time="2024-03-15 19:00", turns=turns)
raw_text = session.to_document()
print(raw_text)

[Session 1 — 2024-03-15 19:00]

Alex: I had a really rough day. My manager pulled me aside and said she's putting me on a performance plan.
Jordan: Wait, your manager Sarah? She seemed fine at the all-hands last week.
Alex: Yeah, her. Apparently she thinks my last two projects were late. She even cc'd HR on the email.
Jordan: That's rough. Did she give you specific deadlines to hit?
Alex: She wants me to finish the analytics dashboard by Friday. I'm not sure I can do it alone.
Jordan: I can help you after work tomorrow. I've done dashboards like that before at my old company.
Alex: Seriously? That would be amazing. I owe you one.
Jordan: No worries. And honestly, if Sarah keeps this up you should talk to someone in HR yourself.
Alex: I might. I just don't want to make things worse with her.
Alex: Anyway, are you coming to my birthday party on Saturday?
Jordan: Of course! Where's it gonna be?
Alex: At my place. My roommate Marcus is helping me set it up. He's ordering pizza and decorati

## Stage 1 — Coreference Resolution

We call the `CorefPrompts` directly via the OpenAI client (avoids curator cache files).  
Check: pronouns like `she`, `her`, `him`, `he` should be replaced with full names.

In [3]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": CorefPrompts.SYSTEM_PROMPT},
        {"role": "user",   "content": CorefPrompts.USER_PROMPT_TEMPLATE.format(text=raw_text)},
    ],
    temperature=0,
    max_tokens=2000,
)
resolved_text = response.choices[0].message.content

print('=== ORIGINAL ===')
print(raw_text)
print('\n' + '='*60)
print('=== COREF RESOLVED ===')
print(resolved_text)

=== ORIGINAL ===
[Session 1 — 2024-03-15 19:00]

Alex: I had a really rough day. My manager pulled me aside and said she's putting me on a performance plan.
Jordan: Wait, your manager Sarah? She seemed fine at the all-hands last week.
Alex: Yeah, her. Apparently she thinks my last two projects were late. She even cc'd HR on the email.
Jordan: That's rough. Did she give you specific deadlines to hit?
Alex: She wants me to finish the analytics dashboard by Friday. I'm not sure I can do it alone.
Jordan: I can help you after work tomorrow. I've done dashboards like that before at my old company.
Alex: Seriously? That would be amazing. I owe you one.
Jordan: No worries. And honestly, if Sarah keeps this up you should talk to someone in HR yourself.
Alex: I might. I just don't want to make things worse with her.
Alex: Anyway, are you coming to my birthday party on Saturday?
Jordan: Of course! Where's it gonna be?
Alex: At my place. My roommate Marcus is helping me set it up. He's ordering p

In [4]:
# Quick diff: show lines that changed
orig_lines = raw_text.splitlines()
resolved_lines = resolved_text.splitlines()

print('Lines changed by coref:')
changed = 0
for i, (o, r) in enumerate(zip(orig_lines, resolved_lines)):
    if o != r:
        print(f'  [{i+1}] BEFORE: {o}')
        print(f'       AFTER:  {r}')
        changed += 1
print(f'\nTotal changed: {changed} / {len(orig_lines)} lines')

Lines changed by coref:
  [3] BEFORE: Alex: I had a really rough day. My manager pulled me aside and said she's putting me on a performance plan.
       AFTER:  Alex: I had a really rough day. My manager pulled me aside and said Sarah is putting me on a performance plan.  
  [4] BEFORE: Jordan: Wait, your manager Sarah? She seemed fine at the all-hands last week.
       AFTER:  Jordan: Wait, your manager Sarah? Sarah seemed fine at the all-hands last week.  
  [5] BEFORE: Alex: Yeah, her. Apparently she thinks my last two projects were late. She even cc'd HR on the email.
       AFTER:  Alex: Yeah, Sarah. Apparently, Sarah thinks my last two projects were late. Sarah even cc'd HR on the email.  
  [6] BEFORE: Jordan: That's rough. Did she give you specific deadlines to hit?
       AFTER:  Jordan: That's rough. Did Sarah give you specific deadlines to hit?  
  [7] BEFORE: Alex: She wants me to finish the analytics dashboard by Friday. I'm not sure I can do it alone.
       AFTER:  Alex:

## Stage 2 — Topic-Boundary Chunking

The `TopicBoundaryChunker` detects topic shifts via an LLM call on the **original** session turns,  
then assembles chunk strings from the **coref-resolved** text.

Expected: 2 chunks (work stress | birthday party), split somewhere around turn 9–10.

In [5]:
chunker = TopicBoundaryChunker(model_name=MODEL, max_turns_per_chunk=10)
chunks = chunker.chunk_session_from_text(resolved_text, session)

print(f'Session split into {len(chunks)} chunk(s)\n')
for i, chunk in enumerate(chunks):
    lines = chunk.splitlines()
    print(f'--- Chunk {i+1} ({len(lines)} lines) ---')
    print(chunk)
    print()

Session split into 2 chunk(s)

--- Chunk 1 (11 lines) ---
[Session 1 — 2024-03-15 19:00]

Alex: I had a really rough day. My manager pulled me aside and said Sarah is putting me on a performance plan.  
Jordan: Wait, your manager Sarah? Sarah seemed fine at the all-hands last week.  
Alex: Yeah, Sarah. Apparently, Sarah thinks my last two projects were late. Sarah even cc'd HR on the email.  
Jordan: That's rough. Did Sarah give you specific deadlines to hit?  
Alex: Sarah wants me to finish the analytics dashboard by Friday. I'm not sure I can do it alone.  
Jordan: I can help you after work tomorrow. I've done dashboards like that before at my old company.  
Alex: Seriously? That would be amazing. I owe you one.  
Jordan: No worries. And honestly, if Sarah keeps this up you should talk to someone in HR yourself.  
Alex: I might. I just don't want to make things worse with Sarah.  

--- Chunk 2 (11 lines) ---
[Session 1 — 2024-03-15 19:00]

Alex: Anyway, are you coming to my birthday 

## Stage 3 — GSW Extraction (CONVERSATIONAL prompt)

We run the `ConversationalOperatorPrompts` on **each chunk** via the OpenAI client with structured output (`response_format = json_schema`).

What to look for:
- `speaker_id` populated on entity roles and questions (should be `"Alex"` or `"Jordan"`)
- `evidence_turn_ids` populated (should reference the correct `D1:N` IDs)
- Entity separation: Alex's manager and Jordan's old company should be separate entities

In [6]:
speaker_context = "Speaker A: Alex, Speaker B: Jordan"
gsw_schema = GSWStructure.model_json_schema()
# Remove spacetime fields — these are populated by SpaceTimeLinker, not the operator
for _field in ("space_nodes", "time_nodes", "space_edges", "time_edges", "similarity_edges"):
    gsw_schema.get("properties", {}).pop(_field, None)
    if _field in gsw_schema.get("required", []):
        gsw_schema["required"].remove(_field)

chunk_gsws = []
for i, chunk_text in enumerate(chunks):
    print(f'Extracting GSW for chunk {i+1}...')
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": ConversationalOperatorPrompts.SYSTEM_PROMPT},
            {"role": "user",   "content": ConversationalOperatorPrompts.USER_PROMPT_TEMPLATE.format(
                speaker_context=speaker_context,
                input_text=chunk_text,
                background_context="",
            )},
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {"name": "GSWStructure", "strict": False, "schema": gsw_schema},
        },
        temperature=0,
        max_tokens=4000,
    )
    gsw_dict = json.loads(response.choices[0].message.content)
    gsw = GSWStructure(**gsw_dict)
    chunk_gsws.append(gsw)
    print(f'  → {len(gsw.entity_nodes)} entities, {len(gsw.verb_phrase_nodes)} verb phrases')

print('\nDone.')

Extracting GSW for chunk 1...
  → 6 entities, 4 verb phrases
Extracting GSW for chunk 2...
  → 10 entities, 4 verb phrases

Done.


In [7]:
# Run SpaceTimeLinker on each synthetic chunk (direct OpenAI call)
# Then apply results to the GSW in-place via apply_spacetime_to_gsw
from gsw_memory.memory.operator_utils.spacetime import apply_spacetime_to_gsw
from gsw_memory.prompts.operator_prompts import SpaceTimePrompts

for chunk_idx, (chunk_text, gsw) in enumerate(zip(chunks, chunk_gsws)):
    # Build operator_output_json — just the entity nodes (what SpaceTimeLinker expects)
    operator_output = {"entity_nodes": [e.model_dump() for e in gsw.entity_nodes]}
    operator_output_json = json.dumps(operator_output, indent=2)

    print(f"Running SpaceTimeLinker on chunk {chunk_idx+1}...")
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": SpaceTimePrompts.SYSTEM_PROMPT},
            {"role": "user",   "content": SpaceTimePrompts.USER_PROMPT_TEMPLATE.format(
                text_chunk_content=chunk_text,
                operator_output_json=operator_output_json,
                session_context="",
            )},
        ],
        temperature=0,
        max_tokens=1000,
    )
    answer_text = resp.choices[0].message.content.strip()

    # Parse the JSON response
    if "```json" in answer_text:
        json_content = answer_text.split("```json")[1].split("```")[0].strip()
    elif "```" in answer_text:
        json_content = answer_text.split("```")[1].split("```")[0].strip()
    else:
        json_content = answer_text

    parsed = json.loads(json_content)
    links = parsed.get("spatio_temporal_links", [])
    print(f"  Raw links: {links}")

    # Stamp the nodes/edges onto the GSW
    chunk_id = f"chunk{chunk_idx+1}"
    apply_spacetime_to_gsw(gsw, links, chunk_id=chunk_id)

print("\nSpaceTimeLinker done.")

Running SpaceTimeLinker on chunk 1...
  Raw links: [{'linked_entities': ['e1', 'e2', 'e3'], 'tag_type': 'spatial', 'tag_value': None}, {'linked_entities': ['e1', 'e3', 'e4'], 'tag_type': 'temporal', 'tag_value': '2024-03-15'}, {'linked_entities': ['e5', 'e6'], 'tag_type': 'temporal', 'tag_value': '2024-03-22'}]
Running SpaceTimeLinker on chunk 2...
  Raw links: [{'linked_entities': ['e1', 'e2', 'e3', 'e4', 'e7'], 'tag_type': 'spatial', 'tag_value': "Alex's place"}, {'linked_entities': ['e1', 'e2', 'e3', 'e5', 'e6', 'e7', 'e8', 'e9', 'e10'], 'tag_type': 'temporal', 'tag_value': 'Saturday'}]

SpaceTimeLinker done.


In [8]:
# Display entity nodes for each chunk
SEP = "=" * 60
for chunk_idx, gsw in enumerate(chunk_gsws):
    print(f"\n{SEP}")
    print(f"CHUNK {chunk_idx+1} — Entity Nodes ({len(gsw.entity_nodes)} total)")
    print(SEP)
    for entity in gsw.entity_nodes:
        print(f"  [{entity.id}] {entity.name!r}  (entity.speaker_id={entity.speaker_id!r})")
        for role in entity.roles:
            print(f"       role={role.role!r}")
            print(f"       states={role.states}")
            print(f"       speaker_id={role.speaker_id!r}  evidence_turn_ids={role.evidence_turn_ids}")


CHUNK 1 — Entity Nodes (6 total)
  [e1] 'Alex'  (entity.speaker_id='Alex')
       role='employee'
       states=['having a rough day', 'put on a performance plan']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1', 'Session 1:3']
       role='speaker'
       states=[]
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1', 'Session 1:3', 'Session 1:5', 'Session 1:7', 'Session 1:9']
  [e2] 'Jordan'  (entity.speaker_id='Jordan')
       role='friend'
       states=['offering help']
       speaker_id='Jordan'  evidence_turn_ids=['Session 1:6']
       role='speaker'
       states=[]
       speaker_id='Jordan'  evidence_turn_ids=['Session 1:2', 'Session 1:4', 'Session 1:6', 'Session 1:8']
  [e3] 'Sarah'  (entity.speaker_id=None)
       role='manager'
       states=['putting Alex on a performance plan', "cc'd HR"]
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1', 'Session 1:3']
  [e4] 'HR'  (entity.speaker_id=None)
       role='department'
       states=["cc'd on email

In [9]:
# Display verb phrase nodes + questions for each chunk
SEP = "=" * 60
for chunk_idx, gsw in enumerate(chunk_gsws):
    print(f"\n{SEP}")
    print(f"CHUNK {chunk_idx+1} — Verb Phrase Nodes ({len(gsw.verb_phrase_nodes)} total)")
    print(SEP)
    for vp in gsw.verb_phrase_nodes:
        print(f"  [{vp.id}] \"{vp.phrase}\"")
        for q in vp.questions:
            print(f"    Q: {q.text}")
            print(f"       answers={q.answers}")
            print(f"       speaker_id={q.speaker_id!r}  evidence_turn_ids={q.evidence_turn_ids}")


CHUNK 1 — Verb Phrase Nodes (4 total)
  [v1] "put on"
    Q: Who put [someone] on [something]?
       answers=['e3']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1']
    Q: Who was put on [something]?
       answers=['e1']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1']
    Q: What was [someone] put on?
       answers=['TEXT:a performance plan']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:1']
  [v2] "cc"
    Q: Who cc'd [someone]?
       answers=['e3']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:3']
    Q: Who was cc'd?
       answers=['e4']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:3']
  [v3] "finish"
    Q: Who wants [someone] to finish [something]?
       answers=['e3']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:5']
    Q: What does [someone] want [someone] to finish?
       answers=['e5']
       speaker_id='Alex'  evidence_turn_ids=['Session 1:5']
    Q: When does [someone] want [someone] to finish [somethin

In [10]:
# Speaker attribution summary: how many roles have speaker_id set?
for chunk_idx, gsw in enumerate(chunk_gsws):
    total_roles = sum(len(e.roles) for e in gsw.entity_nodes)
    roles_with_speaker = sum(
        1 for e in gsw.entity_nodes for r in e.roles if r.speaker_id
    )
    roles_with_evidence = sum(
        1 for e in gsw.entity_nodes for r in e.roles if r.evidence_turn_ids
    )
    total_qs = sum(len(vp.questions) for vp in gsw.verb_phrase_nodes)
    qs_with_speaker = sum(
        1 for vp in gsw.verb_phrase_nodes for q in vp.questions if q.speaker_id
    )
    print(f'Chunk {chunk_idx+1} attribution coverage:')
    print(f'  Roles with speaker_id:    {roles_with_speaker}/{total_roles}')
    print(f'  Roles with evidence turns: {roles_with_evidence}/{total_roles}')
    print(f'  Questions with speaker_id: {qs_with_speaker}/{total_qs}')
    print()

Chunk 1 attribution coverage:
  Roles with speaker_id:    8/8
  Roles with evidence turns: 8/8
  Questions with speaker_id: 10/10

Chunk 2 attribution coverage:
  Roles with speaker_id:    10/10
  Roles with evidence turns: 10/10
  Questions with speaker_id: 8/8



In [11]:
def print_full_gsw(gsw, label="GSW"):
    """Print a complete GSWStructure in a readable format."""
    SEP = "=" * 70
    SEP2 = "-" * 70
    print(f"\n{SEP}")
    print(f"  {label}")
    print(SEP)

    # Entities
    print(f"\n  ENTITIES ({len(gsw.entity_nodes)})")
    print(f"  {SEP2}")
    for e in gsw.entity_nodes:
        print(f"  [{e.id}] {e.name!r}  speaker={e.speaker_id!r}")
        for r in e.roles:
            states_str = ", ".join(r.states) if r.states else "—"
            ev_str = ", ".join(r.evidence_turn_ids) if r.evidence_turn_ids else "—"
            print(f"        role={r.role!r}  states=[{states_str}]")
            print(f"               speaker={r.speaker_id!r}  evidence=[{ev_str}]")

    # Verb phrases + questions
    print(f"\n  VERB PHRASES ({len(gsw.verb_phrase_nodes)})")
    print(f"  {SEP2}")
    for vp in gsw.verb_phrase_nodes:
        print(f"  [{vp.id}] \"{vp.phrase}\"")
        for q in vp.questions:
            ans_str = ", ".join(q.answers) if q.answers else "—"
            ev_str = ", ".join(q.evidence_turn_ids) if q.evidence_turn_ids else "—"
            print(f"    Q: {q.text}")
            print(f"       answers=[{ans_str}]  speaker={q.speaker_id!r}  evidence=[{ev_str}]")

    # Space nodes + edges
    print(f"\n  SPACE NODES ({len(gsw.space_nodes)})  /  SPACE EDGES ({len(gsw.space_edges)})")
    print(f"  {SEP2}")
    for sn in gsw.space_nodes:
        linked = [e[0] for e in gsw.space_edges if e[1] == sn.id]
        linked_names = []
        for eid in linked:
            entity = next((x for x in gsw.entity_nodes if x.id == eid), None)
            linked_names.append(entity.name if entity else eid)
        print(f"  [{sn.id}] {sn.current_name!r}  → entities: {linked_names}")
    if not gsw.space_nodes:
        print("  (none)")

    # Time nodes + edges
    print(f"\n  TIME NODES ({len(gsw.time_nodes)})  /  TIME EDGES ({len(gsw.time_edges)})")
    print(f"  {SEP2}")
    for tn in gsw.time_nodes:
        linked = [e[0] for e in gsw.time_edges if e[1] == tn.id]
        linked_names = []
        for eid in linked:
            entity = next((x for x in gsw.entity_nodes if x.id == eid), None)
            linked_names.append(entity.name if entity else eid)
        print(f"  [{tn.id}] {tn.current_name!r}  → entities: {linked_names}")
    if not gsw.time_nodes:
        print("  (none)")

    # Similarity edges
    if gsw.similarity_edges:
        print(f"\n  SIMILARITY EDGES ({len(gsw.similarity_edges)})")
        print(f"  {SEP2}")
        for e1_id, e2_id in gsw.similarity_edges:
            e1 = next((x.name for x in gsw.entity_nodes if x.id == e1_id), e1_id)
            e2 = next((x.name for x in gsw.entity_nodes if x.id == e2_id), e2_id)
            print(f"  {e1!r} ↔ {e2!r}")

    print(f"\n{SEP}\n")


# Print full GSW for each synthetic chunk
for i, gsw in enumerate(chunk_gsws):
    print_full_gsw(gsw, label=f"SYNTHETIC CHUNK {i+1} — Full GSW")


  SYNTHETIC CHUNK 1 — Full GSW

  ENTITIES (6)
  ----------------------------------------------------------------------
  [e1] 'Alex'  speaker='Alex'
        role='employee'  states=[having a rough day, put on a performance plan]
               speaker='Alex'  evidence=[Session 1:1, Session 1:3]
        role='speaker'  states=[—]
               speaker='Alex'  evidence=[Session 1:1, Session 1:3, Session 1:5, Session 1:7, Session 1:9]
  [e2] 'Jordan'  speaker='Jordan'
        role='friend'  states=[offering help]
               speaker='Jordan'  evidence=[Session 1:6]
        role='speaker'  states=[—]
               speaker='Jordan'  evidence=[Session 1:2, Session 1:4, Session 1:6, Session 1:8]
  [e3] 'Sarah'  speaker=None
        role='manager'  states=[putting Alex on a performance plan, cc'd HR]
               speaker='Alex'  evidence=[Session 1:1, Session 1:3]
  [e4] 'HR'  speaker=None
        role='department'  states=[cc'd on email]
               speaker='Alex'  evidence=[Sessi

## Stage 5 — Reconciliation (Synthetic)

The `Reconciler` merges per-chunk GSWs into a single unified GSW.  
It globalises entity IDs (`chunk1::e1`, `chunk2::e3`, ...) and uses exact-string matching  
to merge entities with the same name across chunks.

What to look for:
- `Alex` and `Jordan` from both chunks should merge into single entities with roles from **both** topics
- New entities (`Marcus`, `Sarah`, etc.) should appear once
- Verb phrase questions should have updated answers pointing to the merged global entity IDs

In [22]:
from gsw_memory.memory.reconciler import Reconciler

reconciler = Reconciler(matching_approach="exact")

for i, (chunk_text, gsw) in enumerate(zip(chunks, chunk_gsws)):
    chunk_id = f"chunk{i+1}"
    print(f"Reconciling chunk {i+1} (id={chunk_id!r}) ...")
    reconciler.reconcile(gsw, chunk_id=chunk_id, new_chunk_text=chunk_text)

synth_merged_gsw = reconciler.global_memory
stats = reconciler.get_statistics()

print(f"\nReconciliation done.")
print(f"  Entities:     {stats['entities']}")
print(f"  Verb phrases: {stats['verb_phrases']}")
print(f"  Questions:    {stats['questions']}")
print(f"  Space nodes:  {stats['space_nodes']}")
print(f"  Time nodes:   {stats['time_nodes']}")


Reconciling chunk 1 (id='chunk1') ...
Reconciling chunk 2 (id='chunk2') ...

Reconciliation done.
  Entities:     14
  Verb phrases: 8
  Questions:    18
  Space nodes:  1
  Time nodes:   2


In [23]:
print_full_gsw(synth_merged_gsw, label="SYNTHETIC — Merged GSW (after reconciliation)")


  SYNTHETIC — Merged GSW (after reconciliation)

  ENTITIES (14)
  ----------------------------------------------------------------------
  [chunk1::e1] 'Alex'  speaker='Alex'
        role='employee'  states=[having a rough day, put on a performance plan]
               speaker='Alex'  evidence=[Session 1:1, Session 1:3]
        role='speaker'  states=[—]
               speaker='Alex'  evidence=[Session 1:1, Session 1:3, Session 1:5, Session 1:7, Session 1:9]
        role='party host'  states=[hosting birthday party]
               speaker='Alex'  evidence=[D1:1, D1:3]
  [chunk1::e2] 'Jordan'  speaker='Jordan'
        role='friend'  states=[offering help]
               speaker='Jordan'  evidence=[Session 1:6]
        role='speaker'  states=[—]
               speaker='Jordan'  evidence=[Session 1:2, Session 1:4, Session 1:6, Session 1:8]
        role='party guest'  states=[bringing drinks]
               speaker='Jordan'  evidence=[D1:2, D1:6]
  [chunk1::e3] 'Sarah'  speaker=None
    