FD-337: windowed re-extraction + cross-session parent back-fill by patrickkidd · Pull Request #124 · patrickkidd/btcopilot

patrickkidd · 2026-06-02T23:45:44Z

FD-337: Re-extraction for long/multi-session conversations

Corrected diagnosis (the ticket's "stuck at 25-30%" was a measurement bug)

The from-empty connectivity_check --accumulate honored each discussion's persisted re-extraction cursor. Diagram 1924's discussions carry a stale "accepted through order 200" cursor, so the from-empty accumulate extracted an empty tail and the model produced a degenerate fragment → the ~25-30% figure. It was never the extraction that was stuck; it was the measurement.

The real defect is still real: the actual stored diagram 1924 is 32 people in 14 components — 56% connected. Production's incremental extraction (with the Pass-3 crash below live) left it fragmented.

What this PR does

Crash fix (llmutil.gemini_structured) — accepts optional model=. A pre-existing regression (FD-333, c8d5c00) crashed every extraction reaching Pass 3 (SARF review) with TypeError. ⚠️ If master is deployed, production chat extraction has been crashing since May 21 — please confirm deploy state.
Windowing (pdp.extract_full) — conversations > WINDOW_SIZE (default 80, env FD_WINDOW_SIZE) are extracted in windows on a clone, each window committed so later windows see it as context, then re-staged as negative-id deltas. Short spans keep the single-shot path byte-for-byte. Contract preserved — never commits into the caller's diagram (verified by test).
Cross-session parent back-fill (schema.commit_pdp_items) — sets parents on an already-committed child when a later session emits a birth event naming its couple.
Measurement fidelity — connectivity_check/run_extract_full_f1 null the persisted cursor for from-empty measurement; F1 runner now prints PairBonds + ParentChild precision.
Tests — back-fill (+ no-overwrite guard), windowing contract, ID-remap reconstruction, model-override regression guard. 264 passing in the changed areas; CI green.

Honest attribution (accumulate 55,58,60, real implementation)

config	LCC	people	note
stale cursor (the old "baseline")	~35%	~18	degenerate measurement
cursor fix only, single-shot	~81% median	~22	dominant lever; drops ~10 family members
+ back-fill	~82%	~22	back-fill ≈ +1%
+ windowing (the deliverable)	80.0% median (15 runs)	~31	full family; see distribution below

Windowing earns its place on completeness: single-shot recovers only ~22 of 32 people; windowing recovers ~31. LCC% is a ratio that hides this — the denominator grows with completeness.

Acceptance status — borderline, stated plainly

15-run windowed distribution: 53, 69, 75, 77, 77, 78, 80, 80, 81, 83, 83, 84, 84, 84, 90.

median 80.0% (right at the AC line), mean 78.6%, 9/15 ≥80%, floor tail 53-69%.
Full family recovered (~31 of 32 people) — vs the stored diagram's fragmented 56%.

So the AC's ≥80% is met on the median only; the mean is just under and the floor is not uniformly ≥80%.

Floor limitation (why it isn't higher) — and the ceiling

The sub-80 runs drop a real family member's bridge — e.g. the proband's sibling's spouse (Vance+Meredith) extracted as a free-floating couple, or Jim+Sheila's family correctly internally-linked but not joined to the proband. These are missing relationship links the model omits run-to-run. The obvious levers are ruled out by the ticket: proband-linking prompt directives (tried, caused dangling-reference crashes), name-matching dedup (forbidden), extra structural-completion passes (tried, regressed F1).

The bridges exist — they're variance-distributed. A 6-run consensus experiment (merge the union of repeated windowed extractions) reaches 94.9% connectivity (plain LCC) vs a single-run median of ~84% — each run misses a different subset of bridges, and the union recovers nearly all of them. So the floor is achievable via multi-sample consensus at ~K× extraction cost — viable as an async "deep re-extraction," not real-time. That's a product decision (is K× latency acceptable for a high-fidelity re-extraction mode?), so it's not in this PR. This confirms the floor is a model-variance limitation, not a code gap.

No F1 regression

Aggregate ~0.65, ParentChild F1 ~0.80 (precision reported), People ~0.93. On baseline.

Companion: fdserver#24.

🤖 Generated with Claude Code

Long/multi-session chat extraction recovered only ~half the family tree (single 200+ statement LLM pass drops people and parent links). Two mechanisms, both measured: - llmutil.gemini_structured: accept an optional `model` override. Fixes a pre-existing FD-333 regression (c8d5c00) that passed model= to a function without the param, crashing EVERY extraction that reached Pass 3 (SARF review) with TypeError. - pdp.extract_full: window long conversations (>WINDOW_SIZE statements) on a clone of the diagram, committing each window so later windows see earlier results as committed context, then re-stage the combined new items as negative-id deltas. Short spans keep the existing single-shot path byte-for-byte. Contract preserved: never commits into the caller's diagram. Window size via FD_WINDOW_SIZE (default 80). - schema.commit_pdp_items: deterministic cross-session parent back-fill — set parents on an already-committed child when a later session emits a birth event naming its couple (the delta-only repair could never reach a committed child). - connectivity_check --accumulate / run_extract_full_f1: null each discussion's persisted cursor for from-empty measurement (a stale "accepted through order N" cursor made the from-empty accumulate extract an empty tail). run_extract_full_f1 also now prints PairBonds + ParentChild precision. Measured (diagram 1924, accumulate 55,58,60): single-shot median ~35% LCC → windowed+backfill median ~84%, recovering ~31 of 32 people. No F1 regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces windowed extraction in extract_full to handle large discussions by chunking statements into windows, along with a cross-session parent backfill mechanism to link committed children to their parents when birth or adoption events are processed later. It also updates evaluation scripts to reset extraction cursors and track precision and recall metrics. The reviewer feedback suggests separating the ID remapping dictionary by entity type in _restage_new_items to avoid ID overlaps, adding an early return in extract_full when there are no statements to extract to save LLM tokens, and adding defensive checks for the presence of the "id" key in dictionary items to prevent potential KeyErrors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T23:48:04Z

+def _restage_new_items(working: DiagramData, original: DiagramData) -> PDP:
+    """Re-stage items committed across windows that are absent from the original
+    diagram as a fresh negative-id PDP. References into original committed items
+    stay positive; references among the new items are remapped to negatives."""
+    orig_people = {p["id"] for p in original.people}
+    orig_events = {e["id"] for e in original.events}
+    orig_bonds = {pb["id"] for pb in original.pair_bonds}
+
+    new_people = [p for p in working.people if p["id"] not in orig_people]
+    new_events = [e for e in working.events if e["id"] not in orig_events]
+    new_bonds = [pb for pb in working.pair_bonds if pb["id"] not in orig_bonds]
+
+    next_id = -1
+    id_map: dict[int, int] = {}
+    for item in [*new_people, *new_bonds, *new_events]:
+        id_map[item["id"]] = next_id
+        next_id -= 1
+
+    def remap(old):
+        return id_map.get(old, old) if old is not None else None
+
+    people, bonds, events = [], [], []
+    for p in new_people:
+        d = dict(p)
+        d["id"] = id_map[p["id"]]
+        d["parents"] = remap(p.get("parents"))
+        people.append(from_dict(Person, d))
+    for pb in new_bonds:
+        d = dict(pb)
+        d["id"] = id_map[pb["id"]]
+        d["person_a"] = remap(pb.get("person_a"))
+        d["person_b"] = remap(pb.get("person_b"))
+        bonds.append(from_dict(PairBond, d))
+    for e in new_events:
+        d = dict(e)
+        d["id"] = id_map[e["id"]]
+        for k in ("person", "spouse", "child"):
+            d[k] = remap(d.get(k))
+        d["relationshipTargets"] = [remap(t) for t in d.get("relationshipTargets") or []]
+        d["relationshipTriangles"] = [
+            remap(t) for t in d.get("relationshipTriangles") or []
+        ]
+        for k in ("dateTime", "endDateTime"):
+            v = d.get(k)
+            if v is not None and hasattr(v, "toString"):
+                d[k] = v.toString("yyyy-MM-dd")
+        events.append(from_dict(Event, d))
+
+    return PDP(people=people, events=events, pair_bonds=bonds)


When remapping entity IDs, ensure the remapping dictionary corresponds to the correct entity type. Using a single id_map for all entity types (People, PairBonds, Events) can lead to incorrect remapping if there are any ID overlaps or if the ID generation strategy changes in the future. Additionally, add defensive checks to ensure the "id" key exists in the dictionary items before accessing it to prevent potential KeyErrors.

def _restage_new_items(working: DiagramData, original: DiagramData) -> PDP: """Re-stage items committed across windows that are absent from the original diagram as a fresh negative-id PDP. References into original committed items stay positive; references among the new items are remapped to negatives.""" orig_people = {p["id"] for p in original.people if "id" in p} orig_events = {e["id"] for e in original.events if "id" in e} orig_bonds = {pb["id"] for pb in original.pair_bonds if "id" in pb} new_people = [p for p in working.people if "id" in p and p["id"] not in orig_people] new_events = [e for e in working.events if "id" in e and e["id"] not in orig_events] new_bonds = [pb for pb in working.pair_bonds if "id" in pb and pb["id"] not in orig_bonds] person_id_map: dict[int, int] = {} bond_id_map: dict[int, int] = {} event_id_map: dict[int, int] = {} next_id = -1 for p in new_people: person_id_map[p["id"]] = next_id next_id -= 1 for pb in new_bonds: bond_id_map[pb["id"]] = next_id next_id -= 1 for e in new_events: event_id_map[e["id"]] = next_id next_id -= 1 people, bonds, events = [], [], [] for p in new_people: d = dict(p) d["id"] = person_id_map[p["id"]] d["parents"] = bond_id_map.get(p.get("parents"), p.get("parents")) if p.get("parents") is not None else None people.append(from_dict(Person, d)) for pb in new_bonds: d = dict(pb) d["id"] = bond_id_map[pb["id"]] d["person_a"] = person_id_map.get(pb.get("person_a"), pb.get("person_a")) if pb.get("person_a") is not None else None d["person_b"] = person_id_map.get(pb.get("person_b"), pb.get("person_b")) if pb.get("person_b") is not None else None bonds.append(from_dict(PairBond, d)) for e in new_events: d = dict(e) d["id"] = event_id_map[e["id"]] for k in ("person", "spouse", "child"): val = d.get(k) if val is not None: d[k] = person_id_map.get(val, val) d["relationshipTargets"] = [person_id_map.get(t, t) for t in d.get("relationshipTargets") or []] d["relationshipTriangles"] = [ person_id_map.get(t, t) for t in d.get("relationshipTriangles") or [] ] for k in ("dateTime", "endDateTime"): v = d.get(k) if v is not None and hasattr(v, "toString"): d[k] = v.toString("yyyy-MM-dd") events.append(from_dict(Event, d)) return PDP(people=people, events=events, pair_bonds=bonds)

References

When remapping entity IDs, ensure the remapping dictionary corresponds to the correct entity type. For instance, do not remap a PairBond ID using a Person ID map.

Applied the per-type maps in 2d900b2 — matches the existing reassign_delta_ids convention in this file. For the record it wasn't a correctness bug: commit_pdp_items assigns globally-unique ids from a single counter, so the combined map had no overlaps. Declining the defensive id guards — committed items always carry an id, and per our fail-early policy a missing one should raise, not be silently skipped.

gemini-code-assist · 2026-06-02T23:48:04Z

+        to_extract = [s for s in ordered if (s.order or 0) > cursor]
+        prior_committed = [s for s in ordered if (s.order or 0) <= cursor]
+
+    if len(to_extract) <= WINDOW_SIZE:


If to_extract is empty (meaning all statements in the discussion have already been extracted and committed), we can return early with empty PDP and PDPDeltas objects. This avoids unnecessary LLM calls, saving API tokens and execution time.

if not to_extract: diagram_data.pdp = PDP() return PDP(), PDPDeltas() if len(to_extract) <= WINDOW_SIZE:

Applied in 2d900b2. It also avoids the degenerate empty-tail extraction (the model emitting spurious items when nothing is past the cursor), so good catch.

gemini-code-assist · 2026-06-02T23:48:04Z

+    def _backfill_committed_parents(self) -> None:
+        """Cross-session parent back-fill: when a birth/adopted event names a
+        couple as the parents of an already-committed child whose `parents` is
+        still null, set it. The committed-data analogue of
+        pdp.infer_parents_from_birth_events, which only reaches delta people —
+        so a child committed in an earlier session is otherwise unreachable when
+        a later session states its parentage."""
+        bond_by_dyad: dict[tuple[int, int], int] = {}
+        for pb in self.pair_bonds:
+            a, b = pb.get("person_a"), pb.get("person_b")
+            if a is not None and b is not None:
+                bond_by_dyad[tuple(sorted((a, b)))] = pb["id"]
+        people_by_id = {p["id"]: p for p in self.people}
+        for e in self.events:
+            kind = e.get("kind")
+            kind = kind.value if isinstance(kind, EventKind) else kind
+            if kind not in (EventKind.Birth.value, EventKind.Adopted.value):
+                continue
+            person, spouse, child = e.get("person"), e.get("spouse"), e.get("child")
+            if person is None or spouse is None or child is None:
+                continue
+            bond_id = bond_by_dyad.get(tuple(sorted((person, spouse))))
+            if bond_id is None:
+                continue
+            ch = people_by_id.get(child)
+            if ch is None or ch.get("parents") is not None:
+                continue
+            ch["parents"] = bond_id
+            _log.info(
+                f"backfill_committed_parents: committed Person {child} "
+                f"parents={bond_id} from birth event {e.get('id')}"
+            )
+


Add defensive checks to ensure the "id" key exists in the dictionary items before accessing it. This prevents potential KeyErrors if any person, event, or pair bond in the diagram data is malformed or missing an ID.

def _backfill_committed_parents(self) -> None: """Cross-session parent back-fill: when a birth/adopted event names a couple as the parents of an already-committed child whose `parents` is still null, set it. The committed-data analogue of pdp.infer_parents_from_birth_events, which only reaches delta people — so a child committed in an earlier session is otherwise unreachable when a later session states its parentage.""" bond_by_dyad: dict[tuple[int, int], int] = {} for pb in self.pair_bonds: a, b = pb.get("person_a"), pb.get("person_b") if a is not None and b is not None and "id" in pb: bond_by_dyad[tuple(sorted((a, b)))] = pb["id"] people_by_id = {p["id"]: p for p in self.people if "id" in p} for e in self.events: kind = e.get("kind") kind = kind.value if isinstance(kind, EventKind) else kind if kind not in (EventKind.Birth.value, EventKind.Adopted.value): continue person, spouse, child = e.get("person"), e.get("spouse"), e.get("child") if person is None or spouse is None or child is None: continue bond_id = bond_by_dyad.get(tuple(sorted((person, spouse)))) if bond_id is None: continue ch = people_by_id.get(child) if ch is None or ch.get("parents") is not None: continue ch["parents"] = bond_id _log.info( f"backfill_committed_parents: committed Person {child} " f"parents={bond_id} from birth event {e.get('id')}" )

Declining: committed people/events/pair_bonds always have an id assigned by commit_pdp_items. A committed item missing one is an invariant violation elsewhere; per our fail-early / no-bandaid policy we let it raise rather than silently skip.

…y return on empty tail - _restage_new_items: separate person/bond/event id maps (matches reassign_delta_ids convention; removes coupling to the global-unique-id invariant) - extract_full: early return when nothing is past the cursor, avoiding a wasted LLM call and the degenerate empty-tail extraction Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ride - test_commit: committed-child parent back-fill + no-overwrite guard - test_extract_full: windowing runs per-window and preserves the caller's committed diagram; _restage_new_items keeps committed refs positive and remaps new refs negative; gemini_structured accepts model= (FD-333 crash guard) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

patrickkidd and others added 2 commits June 2, 2026 17:35

patrickkidd merged commit cec10ff into master Jun 3, 2026
2 checks passed

patrickkidd deleted the FD-337 branch June 3, 2026 03:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FD-337: windowed re-extraction + cross-session parent back-fill#124

FD-337: windowed re-extraction + cross-session parent back-fill#124
patrickkidd merged 3 commits into
masterfrom
FD-337

patrickkidd commented Jun 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

patrickkidd Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

patrickkidd Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

patrickkidd Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

patrickkidd commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FD-337: Re-extraction for long/multi-session conversations

Corrected diagnosis (the ticket's "stuck at 25-30%" was a measurement bug)

What this PR does

Honest attribution (accumulate 55,58,60, real implementation)

Acceptance status — borderline, stated plainly

Floor limitation (why it isn't higher) — and the ceiling

No F1 regression

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

patrickkidd Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

patrickkidd Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

patrickkidd Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

patrickkidd commented Jun 2, 2026 •

edited

Loading