Implement progressive merging via epoch-based propagation with union-find

## Summary

Implement **epoch-based progressive merging** as described in [`docs/progressive_merging.md`](docs/progressive_merging.md). High-confidence entity merges should be committed *during* propagation (not just in post-processing), enriching neighborhoods for subsequent iterations. This lets evidence from transitively-matched entities compound — the core benefit described in the design doc.

## Background

Currently, `propagate_similarity()` runs to convergence, then `build_match_groups()` applies union-find as a separate post-processing step. An entity appearing in 5 articles only benefits from pairwise similarity — it never gets the combined neighborhood of its merged selves. Progressive merging fixes this by periodically committing high-confidence merges and continuing propagation with enriched neighborhoods.

## Implementation plan

Each step should be a standalone commit where all 54 existing tests pass. Follow the conventions in CLAUDE.md — clean refactors, not patches.

### Step 1: Move negative dampening outside the inner propagation loop

This is a **pure behavior-preserving refactor**. In the current code, the negative factor is already constant per pair within the inner loop (it reads from `name_seed`, which is fixed, and `positive_base`, which the negative factor doesn't depend on). Moving it out doesn't change any final values — only makes the inner loop a clean monotone fixpoint.

Concretely in `propagate_similarity()`:
- The inner `for _ in range(max_iter)` loop should only compute `positive_base` (remove the negative factor application from inside the loop)
- Convergence check should use `positive_base` changes only
- After the loop converges, apply negative dampening in a single pass over all pairs above `neg_gate`

All existing tests must pass unchanged.

### Step 2: Extract `propagate_positive()` and `apply_negative()` as separate functions

Refactor `propagate_similarity()` into:
- `propagate_positive(graph, adjacency, pairs, positive_base, ...) -> positive_base` — the monotone fixpoint inner loop
- `apply_negative(positive_base, forward_adj, rel_sim, name_seed, ...) -> confidence` — one-shot post-convergence dampening
- `propagate_similarity()` calls these two in sequence (same API, same behavior)

All existing tests must pass unchanged.

### Step 3: Add the epoch loop with union-find merging

Wrap the propagate + dampen cycle in an outer epoch loop. The union-find (already in the codebase) tracks merges between epochs.

New parameters for `propagate_similarity()` (or a new top-level function if cleaner):
- `merge_threshold` (float, default 0.9) — only pairs above this are merged between epochs
- `max_epochs` (int, default 5) — maximum number of merge epochs

Epoch loop pseudocode:
```
uf = UnionFind()
for epoch in range(max_epochs):
    1. Build adjacency using UF canonical reps
       (merged entity's neighborhood = union of all members' adjacencies)
    2. Build cross-graph pairs between canonical reps only
       (map original pairs through uf.find(), deduplicate, skip same-group)
    3. Seed name similarity: max across all member-pair name comparisons
    
    4. propagate_positive() to convergence
    5. apply_negative()
    
    6. Find pairs above merge_threshold where uf.find(a) != uf.find(b)
    7. If none → break (global convergence)
    8. uf.union() the new merges
    9. Carry forward: confidence(canon_a, canon_b) = max over member pairs

Return confidence and incorporate UF into the final grouping.
```

**Important**: with default `merge_threshold=0.9` and `max_epochs=5`, existing tests should pass unchanged — pairs that currently score below 0.9 won't trigger any progressive merges, so the epoch loop exits after epoch 1 with no merges.

Between-epoch negative evidence can use converged positive_base (not just name_seed) for neighbor-match checks — there's no circular reinforcement risk since it's between epochs, not within the inner loop. However, if this causes test failures, fall back to using name_seed across epochs too.

### Step 4: Write the enriched-neighborhood test

Add a Layer 2 test in `test_propagation.py` that demonstrates the enriched-neighborhood effect — the core benefit of progressive merging.

The scenario from the design doc:
- Article A: `[Meridian Tech] —acquired→ [DataVault]`
- Article B: `[Meridian Technologies] —bought→ [DataVault Inc]` and `[Meridian Technologies] —headquartered in→ [Austin]`
- Article C: `[Meridian] —headquartered in→ [Austin, TX]`

After epoch 1, A's and B's Meridian merge (strong name sim + structural match via acquisition). The merged entity now has edges to both DataVault and Austin. In epoch 2, C's "Meridian" benefits from TWO structural paths.

The test should verify that C's "Meridian" matches the A+B merged entity. Without progressive merging (max_epochs=1), C's "Meridian" may not match due to low name similarity ("Meridian" vs "Meridian Technologies") and only one structural path per pairwise comparison.

If the scenario doesn't cleanly demonstrate the effect (e.g., name similarity alone is sufficient), adjust entity names to make name similarity insufficient without the enriched neighborhood. The test should **fail with `max_epochs=1`** (or at least produce lower confidence) and **pass with `max_epochs>1`**.

### Step 5: Integration test

Add a Layer 3 test in `test_integration.py` verifying that progressive merging doesn't cause cascading false merges. Reuse the cross-cluster isolation pattern from `test_identical_names_different_contexts_no_merge` — two unrelated clusters with similar structure should stay separate even with progressive merging enabled.

## Acceptance criteria

- [ ] All 54 existing tests pass at every step
- [ ] Negative dampening is cleanly separated from the positive fixpoint loop  
- [ ] Progressive merging is implemented via union-find (no graph mutation)
- [ ] At least one test demonstrates the enriched-neighborhood benefit
- [ ] At least one test verifies no cascading false merges
- [ ] `match_graphs()` API remains backward-compatible (new params have defaults that reproduce current behavior)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement progressive merging via epoch-based propagation with union-find #25

Summary

Background

Implementation plan

Step 1: Move negative dampening outside the inner propagation loop

Step 2: Extract `propagate_positive()` and `apply_negative()` as separate functions

Step 3: Add the epoch loop with union-find merging

Step 4: Write the enriched-neighborhood test

Step 5: Integration test

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement progressive merging via epoch-based propagation with union-find #25

Description

Summary

Background

Implementation plan

Step 1: Move negative dampening outside the inner propagation loop

Step 2: Extract propagate_positive() and apply_negative() as separate functions

Step 3: Add the epoch loop with union-find merging

Step 4: Write the enriched-neighborhood test

Step 5: Integration test

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Step 2: Extract `propagate_positive()` and `apply_negative()` as separate functions