FD-324: infer_parents_from_birth_events repair + connectivity_check script#120
Conversation
…cript Deterministic post-extraction repair sets Person.parents from birth events when the LLM fails to forward-reference the PairBond ID. ParentChild F1 0.366→0.815 N=3 avg (+123%), LCC 51%→89.5% (target ≥80% met), no regression. Ships connectivity_check.py script and 5 unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a deterministic repair function, infer_parents_from_birth_events, to improve family-tree connectivity by resolving parent-child relationships from birth and adoption events. It also adds a connectivity_check utility to measure the largest connected component (LCC) percentage and updates documentation to reflect significant improvements in ParentChild F1 and LCC metrics. Feedback focuses on improving the robustness of the parent inference by considering already-committed diagram data and correcting a performance issue in the connectivity check's graph traversal logic.
| def infer_parents_from_birth_events(deltas: PDPDeltas) -> None: | ||
| bond_by_dyad: dict[tuple[int, int], int] = {} | ||
| for pb in deltas.pair_bonds: |
There was a problem hiding this comment.
The infer_parents_from_birth_events function currently only considers pair bonds present in the current deltas. If the parents already have a committed pair bond in the diagram, this repair will fail to link the child to it. Updating the function to accept and check diagram_data will make the repair more robust for multi-pass extractions or re-extractions where parents are already committed.
|
|
||
| fix_committed_person_duplicates(pdp_deltas, diagram_data) | ||
| fix_unresolved_person_refs(pdp_deltas, pdp, diagram_data) | ||
| infer_parents_from_birth_events(pdp_deltas) |
| # BFS connected components | ||
| visited: set[int] = set() | ||
| component_sizes: list[int] = [] | ||
| for start in nodes: | ||
| if start in visited: | ||
| continue | ||
| queue = [start] | ||
| visited.add(start) | ||
| size = 0 | ||
| while queue: | ||
| cur = queue.pop() |
There was a problem hiding this comment.
The implementation uses queue.pop(), which results in a Depth-First Search (DFS) rather than a Breadth-First Search (BFS) as stated in the comment. While both correctly identify connected components, for a proper BFS, collections.deque with popleft() should be used. Additionally, pop(0) on a list is an O(N) operation, making the traversal O(N^2) in the worst case.
Summary
infer_parents_from_birth_events()deterministic post-extraction repair: wiresPerson.parentsfrom birth events when the LLM fails to forward-reference the PairBond ID (ID assignment order problem)connectivity_check.pyLCC % measurement script (FD-324 acceptance criteria)Results (N=3, 6 GT discussions)
Test plan
test_infer_parents_*)🤖 Generated with Claude Code