feat(resolution): semantic merchant clustering + correction learning by Copilot · Pull Request #110 · plures/FinancialAdvisor

Copilot · 2026-03-25T14:33:26Z

Adds a Phase 3 intelligence layer to the resolution engine: semantic merchant similarity (replacing pure keyword matching), user-correction feedback loops, and structured explanation objects for every categorization decision.

New modules

SemanticMerchantClusterer (semantic-clustering.ts) — TF-IDF term vectors + cosine similarity for cross-merchant clustering; character tri-gram Jaccard similarity for name variants (e.g. "Trader Joes" ↔ "Trader Joe's"); zero-shot category classification via pre-defined centroid vectors for 8 spending categories.
CorrectionLearner (correction-learning.ts) — indexes user re-categorizations by normalized merchant name (high confidence) and description terms (lower confidence). Confidence scales with repeated corrections; supports exportState/importState for persistence.
ResolutionEngine (resolution-engine.ts) — orchestrates the full pipeline with a strict priority order:
1. User-correction lookup
2. Semantic merchant classification
3. Keyword/rule fallback (TransactionAnalyzer)
Every call to resolve() returns a ResolutionResult with a ResolutionExplanation:

const engine = new ResolutionEngine();
engine.loadHistory(priorTransactions);

const result = engine.resolve(tx);
// result.explanation.reasons →
// [
//   "User previously categorized 'Trader Joes' as Groceries (3 times)",
//   "Amount $42.50 matches typical Groceries range ($20–$80, avg $45)",
//   "Weekly transaction pattern detected"
// ]

engine.applyCorrection(tx, 'Groceries'); // feeds back into CorrectionLearner

Explanation fields: reasons[], fromCorrection / fromSemanticMatch / fromKeywordMatch flags, amountPattern, temporalPattern, matchedMerchants, confidence.

Exports

packages/resolution/src/index.ts re-exports all three new modules alongside the existing TransactionAnalyzer.

Original prompt

This section details on the original issue you should resolve

<issue_title>feat(resolution): semantic merchant clustering + correction learning</issue_title>
<issue_description>## Summary
Phase 3 intelligence layer for the resolution engine.

Requirements

Embed merchant/entity descriptions (via ai-providers or local model)

Cluster semantically similar transactions (not just text matching)

Learn from user corrections: when user re-categorizes, update resolution model

Generate explanation objects: "Classified as Groceries because: similar to previous Trader Joe's transactions, amount range matches, weekly pattern"

Acceptance

Semantic clustering finds related merchants that text matching misses

User corrections improve future resolution accuracy

Explanations are generated for all resolutions</issue_description>

Comments on the Issue (you are @copilot in this section)
@kayodebristol @copilot Please implement this issue.

Fixes feat(resolution): semantic merchant clustering + correction learning #73

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Co-authored-by: kayodebristol <3579196+kayodebristol@users.noreply.github.com> Agent-Logs-Url: https://github.com/plures/FinancialAdvisor/sessions/89b1df2f-7b8f-4fcf-bcb8-fbedde674b7f

Copilot

Pull request overview

Adds a new “Phase 3” resolution layer in @financialadvisor/resolution by introducing semantic merchant classification/clustering, user correction learning, and a new stateful ResolutionEngine that produces structured explanations for categorization decisions.

Changes:

Added SemanticMerchantClusterer (TF‑IDF/cosine + trigram similarity; centroid-based category classification).
Added CorrectionLearner (records user corrections, provides lookup + stats + export/import state).
Added ResolutionEngine (priority pipeline: corrections → semantic classification → keyword fallback; includes explanation enrichment from history) and unit tests covering the new modules.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
test/unit/resolution.test.ts	Adds unit coverage for the new semantic clustering, correction learning, and resolution engine explanation behavior.
packages/resolution/src/semantic-clustering.ts	Implements merchant vectorization, similarity search, clustering, and centroid-based merchant classification.
packages/resolution/src/correction-learning.ts	Implements correction recording, lookup by merchant/terms, stats, and state export/import.
packages/resolution/src/resolution-engine.ts	Introduces the new resolution pipeline and explanation enrichment via history.
packages/resolution/src/index.ts	Re-exports the new modules from the package entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T15:36:00Z

+      corrections: this.corrections,
+      merchantCorrections: Array.from(this.merchantCorrections),
+      termCorrections: Array.from(this.termCorrections),


exportState() returns the internal corrections array by reference, so callers can mutate the learner’s internal state unintentionally (e.g., exported.corrections.push(...)). Consider returning a defensive copy (and/or deep-cloning entries) to keep the learner state encapsulated.

Suggested change

corrections: this.corrections,

merchantCorrections: Array.from(this.merchantCorrections),

termCorrections: Array.from(this.termCorrections),

// Defensive copies to avoid external mutation of internal state

corrections: this.corrections.map(c => ({

...c,

correctedAt: new Date(c.correctedAt),

})),

merchantCorrections: Array.from(

this.merchantCorrections,

([key, value]) => [key, { ...value }],

),

termCorrections: Array.from(

this.termCorrections,

([key, value]) => [key, { ...value }],

),

Copilot · 2026-03-25T15:36:02Z

+    const categoryTxns = this.transactionHistory.filter(
+      t =>
+        t.type === TransactionType.EXPENSE &&
+        (t.category === resolvedCategory ||
+          TransactionAnalyzer.categorizeTransaction(t) === resolvedCategory),
+    );


similarTransactionCount and amountPattern are computed over all historical transactions in the resolved category, not transactions actually “similar” to the current one (same merchant or semantically similar merchants). This makes the explanation text (“Similar to X previous …”) misleading and can inflate confidence adjustments based on very broad category stats. Consider restricting the historical set to the same merchant (and/or the merchant’s semantic peers) before computing counts and amount ranges.

Copilot · 2026-03-25T15:36:02Z

+
+    if (transaction.merchant) {
+      this.clusterer.addMerchant(transaction.merchant);
+    }


applyCorrection() calls clusterer.addMerchant(), which rebuilds the entire IDF model every time a correction is recorded. For many corrections/merchants this becomes O(N²) work. Consider batching rebuilds (e.g., mark IDF dirty and rebuild lazily on the next similarity query, or provide an addMerchants path here) to keep correction application cheap.

Suggested change

if (transaction.merchant) {

this.clusterer.addMerchant(transaction.merchant);

}

Copilot · 2026-03-25T15:36:03Z

+      // Direct substring match with any centroid term is also a strong signal
+      const substringBonus = centroidTerms.some(t => normalized.includes(t)) ? 0.3 : 0;
+      const score = Math.max(cosine, substringBonus);
+
+      if (score > bestScore) {
+        bestScore = score;
+        bestCategory = category;
+      }
+    }
+
+    if (!bestCategory || bestScore < 0.05) return null;
+
+    const centroidTerms = SemanticMerchantClusterer.CATEGORY_CENTROIDS[bestCategory] ?? [];
+    const matchedTerms = Array.from(terms.keys()).filter(t => centroidTerms.includes(t));
+    const reasons: string[] = [];
+
+    if (matchedTerms.length > 0) {
+      reasons.push(`Matched category terms: ${matchedTerms.join(', ')}`);
+    }
+
+    return {
+      category: bestCategory,
+      confidence: Math.min(bestScore * 2, 1),
+      reasons,
+    };


classifyMerchant() can return a non-null classification with an empty reasons array when the best score comes from the substring bonus (e.g., a centroid term is contained in the normalized merchant string but doesn’t appear as an extracted term). This can lead to ResolutionEngine.resolve() returning an explanation with no reasons for semantic matches. Consider always emitting at least one reason (e.g., “Matched substring term …” / “Cosine similarity …”) when returning a classification.

kayodebristol

Auto-approved: CI green + Copilot code review complete.

Initial plan

e5ca45b

Copilot AI assigned Copilot and kayodebristol Mar 25, 2026

Copilot started work on behalf of kayodebristol March 25, 2026 14:33 View session

github-actions Bot marked this pull request as ready for review March 25, 2026 14:33

Copilot AI review requested due to automatic review settings March 25, 2026 14:33

feat(resolution): semantic merchant clustering + correction learning

caaafad

Co-authored-by: kayodebristol <3579196+kayodebristol@users.noreply.github.com> Agent-Logs-Url: https://github.com/plures/FinancialAdvisor/sessions/89b1df2f-7b8f-4fcf-bcb8-fbedde674b7f

Copilot AI requested review from Copilot and removed request for Copilot March 25, 2026 14:45

Copilot AI changed the title ~~[WIP] Add semantic merchant clustering with correction learning~~ feat(resolution): semantic merchant clustering + correction learning Mar 25, 2026

Copilot AI requested a review from kayodebristol March 25, 2026 14:48

Copilot finished work on behalf of kayodebristol March 25, 2026 14:48

kayodebristol requested a review from Copilot March 25, 2026 15:31

Copilot started reviewing on behalf of kayodebristol March 25, 2026 15:32 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

kayodebristol approved these changes Mar 25, 2026

View reviewed changes

kayodebristol merged commit fa20950 into main Mar 25, 2026
11 checks passed

kayodebristol deleted the copilot/feat-merchants-clustering-correction-learning branch March 25, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(resolution): semantic merchant clustering + correction learning#110

feat(resolution): semantic merchant clustering + correction learning#110
kayodebristol merged 2 commits intomainfrom
copilot/feat-merchants-clustering-correction-learning

Copilot AI commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

kayodebristol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-      corrections: this.corrections,
-      merchantCorrections: Array.from(this.merchantCorrections),
-      termCorrections: Array.from(this.termCorrections),
+      // Defensive copies to avoid external mutation of internal state
+      corrections: this.corrections.map(c => ({
+        ...c,
+        correctedAt: new Date(c.correctedAt),
+      })),
+      merchantCorrections: Array.from(
+        this.merchantCorrections,
+        ([key, value]) => [key, { ...value }],
+      ),
+      termCorrections: Array.from(
+        this.termCorrections,
+        ([key, value]) => [key, { ...value }],
+      ),

Conversation

Copilot AI commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New modules

Exports

Requirements

Acceptance

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

kayodebristol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 25, 2026 •

edited

Loading