# Product Entity Resolution with Kanoniv

Reconcile product catalogs from 4 heterogeneous retail feeds into a single canonical product catalog using the Kanoniv local SDK.

| Source | Records | Key Identifiers | Challenge |
|--------|---------|-----------------|----------|
| ecommerce_catalog | ~2,600 | UPC (12-digit barcode), SKU | UPC format differs from GTIN |
| wholesale_feed | ~2,600 | GTIN-13 ("0" + UPC), MPN | Different barcode format |
| marketplace_listings | ~2,600 | ASIN only | No barcode or MPN |
| retail_inventory | ~2,600 | manufacturer_code (= MPN) | No barcode |

**10,000+ records** across 4 sources with ~3,000 ground-truth products.

**What this covers:**
- Full `ReconcileResult` API: clusters, golden records, decisions, telemetry, entity lookup
- Iterative spec refinement: rules-based v1 -> Fellegi-Sunter v2
- Three-layer evaluation: structural, stability, ground truth P/R/F1
- Entity-level diffing with `ChangeLog`
- Spec versioning with `diff()`
- Persistence with `save()` / `load()`

```bash
pip install kanoniv pandas
```

---
## 1. Load and Explore the Data

In [None]:
import pandas as pd
from pathlib import Path

DATA = Path("data")

ecom_df = pd.read_csv(DATA / "ecommerce_catalog.csv", dtype=str)
whol_df = pd.read_csv(DATA / "wholesale_feed.csv", dtype=str)
mkt_df = pd.read_csv(DATA / "marketplace_listings.csv", dtype=str)
ret_df = pd.read_csv(DATA / "retail_inventory.csv", dtype=str)

summary = pd.DataFrame([
    {"Source": "ecommerce", "Records": len(ecom_df), "Columns": ", ".join(ecom_df.columns)},
    {"Source": "wholesale", "Records": len(whol_df), "Columns": ", ".join(whol_df.columns)},
    {"Source": "marketplace", "Records": len(mkt_df), "Columns": ", ".join(mkt_df.columns)},
    {"Source": "retail", "Records": len(ret_df), "Columns": ", ".join(ret_df.columns)},
])
summary

### Sample rows from each source

Notice: each source uses different column names, identifiers, and pricing.

In [None]:
print("Ecommerce (UPC barcode, product_name, brand, price_usd):")
display(ecom_df[["product_id", "product_name", "upc", "brand", "price_usd"]].head(5))

print("\nWholesale (GTIN-13 barcode, MPN, item_name, manufacturer):")
display(whol_df[["item_id", "item_name", "gtin", "manufacturer", "mpn", "unit_cost"]].head(5))

print("\nMarketplace (ASIN only - no barcode, no MPN):")
display(mkt_df[["listing_id", "title", "asin", "brand", "list_price"]].head(5))

print("\nRetail (manufacturer_code = MPN, no barcode):")
display(ret_df[["inventory_id", "description", "manufacturer_code", "brand_name", "retail_price"]].head(5))

### Key data challenges

| Challenge | Example |
|-----------|--------|
| UPC (12-digit) vs GTIN-13 ("0" + UPC) | `030000000007` vs `0030000000007` |
| MPN = manufacturer_code (different column names) | wholesale `mpn` = retail `manufacturer_code` |
| Marketplace has no barcode or MPN | Must link via product name + brand only |
| Name variations | `"Samsung Galaxy Buds FE - New"` vs `"Samsung Galaxy Buds FE"` |
| Price differences | Wholesale cost < retail price < marketplace list price |

In [None]:
# Demonstrate the UPC/GTIN format mismatch
print("UPC vs GTIN format:")
print(f"  Ecommerce UPC:   {ecom_df.iloc[0]['upc']}  (12 digits)")

# Find the matching GTIN
gtin_lookup = "0" + ecom_df.iloc[0]["upc"]
match = whol_df[whol_df["gtin"] == gtin_lookup]
if len(match) > 0:
    print(f"  Wholesale GTIN:  {match.iloc[0]['gtin']}  (13 digits = '0' + UPC)")
    print(f"\n  Same product:")
    print(f"    ecommerce: {ecom_df.iloc[0]['product_name']}")
    print(f"    wholesale: {match.iloc[0]['item_name']}")

# Count known overlaps
ecom_upcs = set(ecom_df["upc"])
whol_gtins = set(whol_df["gtin"])
barcode_matches = sum(1 for upc in ecom_upcs if "0" + upc in whol_gtins)

whol_mpns = set(whol_df["mpn"])
ret_mfgs = set(ret_df["manufacturer_code"])
mpn_matches = len(whol_mpns & ret_mfgs)

ecom_names = set(ecom_df["product_name"].str.lower().str.strip())
mkt_names = set(mkt_df["title"].str.lower().str.strip())
name_matches = len(ecom_names & mkt_names)

print(f"\nKnown linkages:")
print(f"  Barcode (UPC->GTIN): {barcode_matches} ecommerce<->wholesale")
print(f"  MPN exact:           {mpn_matches} wholesale<->retail")
print(f"  Name exact:          {name_matches} ecommerce<->marketplace")

---
## 2. Spec v1 - Rules-Based Matching

First attempt: `weighted_sum` scoring with explicit rules.

The spec maps each source's columns to canonical attribute names, defines blocking keys, matching rules, and survivorship.

In [None]:
from kanoniv import Spec, Source, validate, plan, diff, reconcile, ReconcileResult
import textwrap

SPEC_V1 = textwrap.dedent("""\
    api_version: kanoniv/v2
    identity_version: product_v1.0

    entity:
      name: product

    sources:
      - name: ecommerce
        system: csv
        table: ecommerce_catalog
        id: product_id
        attributes:
          barcode: upc
          product_name: product_name
          brand: brand
          price: price_usd
          sku: sku

      - name: wholesale
        system: csv
        table: wholesale_feed
        id: item_id
        attributes:
          barcode: gtin
          product_name: item_name
          brand: manufacturer
          mpn: mpn
          price: unit_cost

      - name: marketplace
        system: csv
        table: marketplace_listings
        id: listing_id
        attributes:
          product_name: title
          brand: brand
          price: list_price

      - name: retail
        system: csv
        table: retail_inventory
        id: inventory_id
        attributes:
          product_name: description
          brand: brand_name
          mpn: manufacturer_code
          price: retail_price

    blocking:
      strategy: composite
      keys:
        - [brand]

    rules:
      - name: barcode_exact
        type: exact
        field: barcode
        weight: 1.0

      - name: mpn_exact
        type: exact
        field: mpn
        weight: 0.9

      - name: name_fuzzy
        type: similarity
        field: product_name
        algorithm: jaro_winkler
        threshold: 0.97
        weight: 0.6

      - name: brand_exact
        type: exact
        field: brand
        weight: 0.3

    decision:
      scoring: weighted_sum
      thresholds:
        match: 0.99
        review: 0.7

    survivorship:
      default: most_complete
      overrides:
        - field: product_name
          strategy: source_priority
          priority: [ecommerce, marketplace, wholesale, retail]
        - field: price
          strategy: aggregate
          function: min
        - field: barcode
          strategy: source_priority
          priority: [ecommerce, wholesale]
""")

spec_v1 = Spec.from_string(SPEC_V1)
print(f"Entity:  {spec_v1.entity}")
print(f"Version: {spec_v1.version}")
print(f"Sources: {len(spec_v1.sources)}")
print(f"Rules:   {len(spec_v1.rules)}")
for r in spec_v1.rules:
    print(f"  {r['name']:18s}  type={r['type']:12s}  weight={r.get('weight')}")

### Validate and plan

In [None]:
vr = validate(spec_v1)
print(f"Valid: {vr.valid}")
if not vr.valid:
    for e in vr.errors:
        print(f"  ERROR: {e}")

plan_v1 = plan(spec_v1)
print(f"\n{plan_v1.summary()}")

---
## 3. First Reconciliation

Load sources, run the engine, explore every part of `ReconcileResult`.

In [None]:
sources = [
    Source.from_csv("ecommerce",   str(DATA / "ecommerce_catalog.csv"),   primary_key="product_id"),
    Source.from_csv("wholesale",   str(DATA / "wholesale_feed.csv"),      primary_key="item_id"),
    Source.from_csv("marketplace", str(DATA / "marketplace_listings.csv"), primary_key="listing_id"),
    Source.from_csv("retail",      str(DATA / "retail_inventory.csv"),     primary_key="inventory_id"),
]

import time
t0 = time.perf_counter()
result_v1 = reconcile(sources, spec_v1)
elapsed = time.perf_counter() - t0

total_records = sum(len(c) for c in result_v1.clusters)
print(f"Input records: {total_records:,}")
print(f"Clusters:      {result_v1.cluster_count:,}")
print(f"Merge rate:    {result_v1.merge_rate:.1%}")
print(f"Runtime:       {elapsed:.2f}s")

### 3a. Clusters

Each cluster is a list of internal UUIDs representing records the engine decided belong to the same product.

In [None]:
multi = [c for c in result_v1.clusters if len(c) > 1]
single = [c for c in result_v1.clusters if len(c) == 1]

print(f"Multi-record clusters: {len(multi)}")
print(f"Singletons:            {len(single)}")

# Cluster size distribution
from collections import Counter
sizes = Counter(len(c) for c in result_v1.clusters)
print(f"\nCluster size distribution:")
for size in sorted(sizes):
    print(f"  size {size:>2d}: {sizes[size]:>4d} clusters")

### 3b. Golden records

`result.to_pandas()` returns the merged canonical records with survivorship applied.

In [None]:
df_golden = result_v1.to_pandas()
print(f"Golden records: {len(df_golden)} rows, {len(df_golden.columns)} columns")

# Show key fields
display_cols = ["kanoniv_id", "product_name", "brand", "price", "barcode", "mpn", "member_count"]
existing = [c for c in display_cols if c in df_golden.columns]
df_golden[existing].head(10)

### 3c. Match decisions

Every candidate pair evaluated by the engine produces a decision: `merge`, `review`, or `nomerge`.

In [None]:
decisions = result_v1.decisions
print(f"Total decisions: {len(decisions):,}")

# Count by type
from collections import Counter
dec_counts = Counter(d.get("decision") for d in decisions)
for dtype, count in sorted(dec_counts.items(), key=lambda x: -x[1]):
    print(f"  {dtype:10s}: {count:,}")

# Show a merge decision
merges = [d for d in decisions if d.get("decision") == "merge"]
if merges:
    print(f"\nExample merge decision:")
    for k, v in merges[0].items():
        print(f"  {k}: {v}")

### 3d. Telemetry

Engine performance metrics: pairs evaluated, blocking groups, per-rule hit rates.

In [None]:
tel = result_v1.telemetry
print(f"Pairs evaluated:  {tel.get('pairs_evaluated', 0):,}")
print(f"Blocking groups:  {tel.get('blocking_groups', 0):,}")

# Per-rule stats
rule_tel = tel.get("rule_telemetry", [])
if rule_tel:
    rule_df = pd.DataFrame([
        {
            "Rule": rt.get("rule_name"),
            "Evaluated": rt.get("evaluated", 0),
            "Matched": rt.get("matched", 0),
            "Match Rate": f"{rt.get('matched', 0) / rt.get('evaluated', 1):.1%}",
            "Avg Score": f"{rt.get('avg_score', 0):.3f}",
        }
        for rt in rule_tel
    ])
    display(rule_df)

### 3e. Entity lookup

Reverse index mapping every source record to its canonical `kanoniv_id`. Use this to join back to operational data.

In [None]:
lookup_df = result_v1.entity_lookup  # property, not method
print(f"Entity lookup: {len(lookup_df)} rows")
display(lookup_df.head(10))

# Show all records for one product
sample_kid = lookup_df.iloc[0]["kanoniv_id"]
members = lookup_df[lookup_df["kanoniv_id"] == sample_kid]
print(f"\nAll source records for {sample_kid[:24]}...:")
display(members)

---
## 4. Evaluation - Layers 1 + 2

`result.evaluate()` returns structural and stability metrics without needing labeled data.

In [None]:
metrics_v1 = result_v1.evaluate()
print(metrics_v1.summary())

In [None]:
# Key metrics as a table
pd.DataFrame([
    {"Metric": "Total Records", "Value": metrics_v1.total_records},
    {"Metric": "Total Clusters", "Value": metrics_v1.total_clusters},
    {"Metric": "Merge Rate", "Value": f"{metrics_v1.merge_rate:.1%}"},
    {"Metric": "Singletons", "Value": f"{metrics_v1.singletons} ({metrics_v1.singletons_pct:.1%})"},
    {"Metric": "Largest Cluster", "Value": metrics_v1.largest_cluster},
    {"Metric": "Pairs Evaluated", "Value": f"{metrics_v1.pairs_evaluated:,}"},
    {"Metric": "Blocking Groups", "Value": metrics_v1.blocking_groups},
])

---
## 5. Ground Truth Evaluation (P/R/F1)

Build ground truth from known deterministic linkages:
1. **Barcode**: ecommerce UPC -> wholesale GTIN (prepend "0")
2. **MPN**: wholesale `mpn` = retail `manufacturer_code`
3. **Name**: ecommerce `product_name` = marketplace `title` (exact, case-insensitive)

Then compute pairwise precision, recall, and F1.

In [None]:
from collections import defaultdict

# Union-find for building ground truth clusters
parent = {}

def find(x):
    if x not in parent:
        parent[x] = x
    while parent[x] != x:
        parent[x] = parent[parent[x]]
        x = parent[x]
    return x

def union(a, b):
    ra, rb = find(a), find(b)
    if ra != rb:
        parent[ra] = rb

# Link 1: ecommerce UPC <-> wholesale GTIN
ecom_by_upc = dict(zip(ecom_df["upc"], ecom_df["product_id"]))
whol_by_gtin = dict(zip(whol_df["gtin"], whol_df["item_id"]))
barcode_links = 0
for upc, eid in ecom_by_upc.items():
    gtin = "0" + upc
    if gtin in whol_by_gtin:
        union(("ecommerce", eid), ("wholesale", whol_by_gtin[gtin]))
        barcode_links += 1

# Link 2: wholesale MPN <-> retail manufacturer_code
whol_by_mpn = dict(zip(whol_df["mpn"], whol_df["item_id"]))
ret_by_mfg = dict(zip(ret_df["manufacturer_code"], ret_df["inventory_id"]))
mpn_links = 0
for mpn, wid in whol_by_mpn.items():
    if mpn in ret_by_mfg:
        union(("wholesale", wid), ("retail", ret_by_mfg[mpn]))
        mpn_links += 1

# Link 3: ecommerce name <-> marketplace title
ecom_by_name = {n.lower().strip(): pid for n, pid in zip(ecom_df["product_name"], ecom_df["product_id"])}
mkt_by_name = {t.lower().strip(): lid for t, lid in zip(mkt_df["title"], mkt_df["listing_id"])}
name_links = 0
for name, eid in ecom_by_name.items():
    if name in mkt_by_name:
        union(("ecommerce", eid), ("marketplace", mkt_by_name[name]))
        name_links += 1

# Collect clusters
gt_clusters = defaultdict(list)
for record in parent:
    gt_clusters[find(record)].append(record)

# Format for evaluate(): {entity_id: [(source, id), ...]}
ground_truth = {
    f"gt_{i}": members
    for i, members in enumerate(gt_clusters.values())
    if len(members) >= 2
}

print(f"Ground truth:")
print(f"  Barcode links (ecom<->wholesale):  {barcode_links}")
print(f"  MPN links (wholesale<->retail):     {mpn_links}")
print(f"  Name links (ecom<->marketplace):    {name_links}")
print(f"  Ground truth clusters:              {len(ground_truth)}")

In [None]:
metrics_v1_gt = result_v1.evaluate(ground_truth=ground_truth)

pd.DataFrame([
    {"Metric": "Precision", "Value": f"{metrics_v1_gt.precision:.4f}"},
    {"Metric": "Recall", "Value": f"{metrics_v1_gt.recall:.4f}"},
    {"Metric": "F1", "Value": f"{metrics_v1_gt.f1:.4f}"},
    {"Metric": "True Positives", "Value": metrics_v1_gt.true_positives},
    {"Metric": "False Positives", "Value": metrics_v1_gt.false_positives},
    {"Metric": "False Negatives", "Value": metrics_v1_gt.false_negatives},
    {"Metric": "Predicted Pairs", "Value": metrics_v1_gt.predicted_pairs},
    {"Metric": "Ground Truth Pairs", "Value": metrics_v1_gt.ground_truth_pairs},
])

---
## 6. Spec v2 - Fellegi-Sunter Probabilistic Matching

Switch from `weighted_sum` to `fellegi_sunter`. Key differences:

| | Weighted Sum | Fellegi-Sunter |
|---|---|---|
| Score formula | `sum(weight * match) / sum(weight)` | `sum(weight * log2(m/u))` |
| Missing fields | Penalizes (contributes 0 to numerator, weight to denominator) | Null-aware (contributes 0) |
| Training | None | EM estimates m/u from data |
| Score scale | 0.0 - 1.0 | Log-likelihood (unbounded) |

FS handles partial evidence gracefully - when barcode and MPN are both missing (marketplace<->retail), name + brand agreement alone can drive a match.

In [None]:
SPEC_V2 = textwrap.dedent("""\
    api_version: kanoniv/v2
    identity_version: product_v2.0

    entity:
      name: product

    sources:
      - name: ecommerce
        system: csv
        table: ecommerce_catalog
        id: product_id
        attributes:
          barcode: upc
          product_name: product_name
          brand: brand
          price: price_usd
          sku: sku

      - name: wholesale
        system: csv
        table: wholesale_feed
        id: item_id
        attributes:
          barcode: gtin
          product_name: item_name
          brand: manufacturer
          mpn: mpn
          price: unit_cost

      - name: marketplace
        system: csv
        table: marketplace_listings
        id: listing_id
        attributes:
          product_name: title
          brand: brand
          price: list_price

      - name: retail
        system: csv
        table: retail_inventory
        id: inventory_id
        attributes:
          product_name: description
          brand: brand_name
          mpn: manufacturer_code
          price: retail_price

    blocking:
      strategy: composite
      keys:
        - [brand]

    decision:
      scoring:
        strategy: fellegi_sunter
        fields:
          - name: barcode
            comparator: exact
            weight: 2.0
            m_probability: 0.95
            u_probability: 0.001
            normalizer: generic
          - name: mpn
            comparator: exact
            weight: 1.5
            m_probability: 0.90
            u_probability: 0.005
            normalizer: generic
          - name: product_name
            comparator: jaro_winkler
            weight: 1.0
            m_probability: 0.85
            u_probability: 0.03
            normalizer: generic
          - name: brand
            comparator: exact
            weight: 0.8
            m_probability: 0.95
            u_probability: 0.05
            normalizer: generic
        thresholds:
          match: 5.6
          possible: 2.0
          non_match: -4.0
      thresholds:
        match: 0.9
        review: 0.7

    survivorship:
      default: most_complete
      overrides:
        - field: product_name
          strategy: source_priority
          priority: [ecommerce, marketplace, wholesale, retail]
        - field: price
          strategy: aggregate
          function: min
        - field: barcode
          strategy: source_priority
          priority: [ecommerce, wholesale]
""")

spec_v2 = Spec.from_string(SPEC_V2)
validate(spec_v2).raise_on_error()
print("Spec v2 valid.")

### Spec diff (v1 -> v2)

`diff()` shows exactly what changed between two spec versions.

In [None]:
d = diff(spec_v1, spec_v2)

print(f"Has changes:        {d.has_changes}")
print(f"Version changed:    {d.version_changed}")
print(f"Rules removed:      {d.rules_removed}")
print(f"Scoring changed:    {d.scoring_changed}")
print(f"Thresholds changed: {d.thresholds_changed}")
print(f"Blocking changed:   {d.blocking_changed}")
print(f"\nSummary: {d.summary}")

---
## 7. V2 Reconciliation + ChangeLog

Run v2, then use `changes_since()` to see exactly which entities merged, split, grew, or were created.

In [None]:
t0 = time.perf_counter()
result_v2 = reconcile(sources, spec_v2)
elapsed = time.perf_counter() - t0

print(f"V2 clusters:   {result_v2.cluster_count:,}")
print(f"V2 merge rate: {result_v2.merge_rate:.1%}")
print(f"V2 runtime:    {elapsed:.2f}s")

In [None]:
changelog = result_v2.changes_since(result_v1)

print(f"Summary: {changelog.summary}")
print(f"\nCreated:   {len(changelog.created):>3d}  (new entities)")
print(f"Grown:     {len(changelog.grown):>3d}  (gained records)")
print(f"Merged:    {len(changelog.merged):>3d}  (v1 entities combined)")
print(f"Split:     {len(changelog.split):>3d}  (v1 entity broke apart)")
print(f"Removed:   {len(changelog.removed):>3d}  (lost all records)")
print(f"Unchanged: {changelog.unchanged_count:>3d}")

In [None]:
# Inspect a merged entity
if changelog.merged:
    m = changelog.merged[0]
    print(f"Example merged entity:")
    print(f"  kanoniv_id:    {m.kanoniv_id}")
    print(f"  Members:       {len(m.source_records)}")
    print(f"  Previous IDs:  {len(m.previous_kanoniv_ids)} entities combined")
    for src, rid in m.source_records[:5]:
        print(f"    {src:14s} {rid}")

# ChangeLog as DataFrame
cl_df = changelog.to_pandas()
print(f"\nChangeLog DataFrame: {len(cl_df)} rows")
display(cl_df.head(10))

---
## 8. V1 vs V2 Comparison

Side-by-side evaluation with ground truth.

In [None]:
metrics_v2_gt = result_v2.evaluate(ground_truth=ground_truth)

comparison = pd.DataFrame([
    {"Metric": "Clusters", "V1 (weighted_sum)": result_v1.cluster_count, "V2 (fellegi_sunter)": result_v2.cluster_count},
    {"Metric": "Merge rate", "V1 (weighted_sum)": f"{result_v1.merge_rate:.1%}", "V2 (fellegi_sunter)": f"{result_v2.merge_rate:.1%}"},
    {"Metric": "Precision", "V1 (weighted_sum)": f"{metrics_v1_gt.precision:.4f}", "V2 (fellegi_sunter)": f"{metrics_v2_gt.precision:.4f}"},
    {"Metric": "Recall", "V1 (weighted_sum)": f"{metrics_v1_gt.recall:.4f}", "V2 (fellegi_sunter)": f"{metrics_v2_gt.recall:.4f}"},
    {"Metric": "F1", "V1 (weighted_sum)": f"{metrics_v1_gt.f1:.4f}", "V2 (fellegi_sunter)": f"{metrics_v2_gt.f1:.4f}"},
    {"Metric": "True Positives", "V1 (weighted_sum)": metrics_v1_gt.true_positives, "V2 (fellegi_sunter)": metrics_v2_gt.true_positives},
    {"Metric": "False Positives", "V1 (weighted_sum)": metrics_v1_gt.false_positives, "V2 (fellegi_sunter)": metrics_v2_gt.false_positives},
    {"Metric": "False Negatives", "V1 (weighted_sum)": metrics_v1_gt.false_negatives, "V2 (fellegi_sunter)": metrics_v2_gt.false_negatives},
])
comparison

### Save and load

Persist results for later comparison or incremental reconciliation.

In [None]:
result_v2.save("product_v2.knv")

restored = ReconcileResult.load("product_v2.knv")
print(f"Saved and restored: {restored.cluster_count} clusters")
assert restored.cluster_count == result_v2.cluster_count

# Clean up
import os
os.remove("product_v2.knv")

---
## 9. Full V2 Evaluation

In [None]:
print(metrics_v2_gt.summary())

---
## Summary

### What we did

1. Loaded 4 product catalogs (10,000+ records) with different schemas and identifiers
2. Wrote a rules-based spec (v1) with `weighted_sum` scoring
3. Ran reconciliation and inspected every part of `ReconcileResult`
4. Evaluated with structural metrics (no labels needed) and ground truth P/R/F1
5. Upgraded to Fellegi-Sunter (v2) with null-aware log-likelihood scoring
6. Used `diff()` to compare spec versions and `changes_since()` to compare results
7. Demonstrated `save()` / `load()` for persistence

### API coverage

| API | Section |
|-----|--------|
| `Source.from_csv()` | 3 |
| `Spec.from_string()` | 2, 6 |
| `validate()` | 2, 6 |
| `plan()` | 2 |
| `diff()` | 6 |
| `reconcile()` | 3, 7 |
| `result.clusters` | 3a |
| `result.to_pandas()` | 3b |
| `result.decisions` | 3c |
| `result.telemetry` | 3d |
| `result.entity_lookup` | 3e |
| `result.cluster_count` / `merge_rate` | 3 |
| `result.evaluate()` | 4 |
| `result.evaluate(ground_truth=)` | 5, 8 |
| `result.changes_since()` | 7 |
| `result.save()` / `ReconcileResult.load()` | 8 |
| `EvaluateResult.summary()` | 4, 9 |
| `ChangeLog` | 7 |
| `DiffResult` | 6 |

### Next steps

- **Active learning**: Label uncertain pairs via the [Cloud API](https://kanoniv.com/docs/sdks/cloud/) feedback endpoint and retrain FS with supervised EM
- **Blocking experiments**: Try `[[product_name]]` or `[[brand, category]]` keys and measure the recall/candidate-pairs tradeoff
- **Incremental reconciliation**: Pass `previous=result_v2` to `reconcile()` when new data arrives
- **Cloud deployment**: Use `kanoniv.Client` for persistent identity graph, real-time resolve, and incremental export