Skip to content

feat(resolution): semantic merchant clustering + correction learning#110

Merged
kayodebristol merged 2 commits intomainfrom
copilot/feat-merchants-clustering-correction-learning
Mar 25, 2026
Merged

feat(resolution): semantic merchant clustering + correction learning#110
kayodebristol merged 2 commits intomainfrom
copilot/feat-merchants-clustering-correction-learning

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

Adds a Phase 3 intelligence layer to the resolution engine: semantic merchant similarity (replacing pure keyword matching), user-correction feedback loops, and structured explanation objects for every categorization decision.

New modules

  • SemanticMerchantClusterer (semantic-clustering.ts) — TF-IDF term vectors + cosine similarity for cross-merchant clustering; character tri-gram Jaccard similarity for name variants (e.g. "Trader Joes" ↔ "Trader Joe's"); zero-shot category classification via pre-defined centroid vectors for 8 spending categories.

  • CorrectionLearner (correction-learning.ts) — indexes user re-categorizations by normalized merchant name (high confidence) and description terms (lower confidence). Confidence scales with repeated corrections; supports exportState/importState for persistence.

  • ResolutionEngine (resolution-engine.ts) — orchestrates the full pipeline with a strict priority order:

    1. User-correction lookup
    2. Semantic merchant classification
    3. Keyword/rule fallback (TransactionAnalyzer)

    Every call to resolve() returns a ResolutionResult with a ResolutionExplanation:

const engine = new ResolutionEngine();
engine.loadHistory(priorTransactions);

const result = engine.resolve(tx);
// result.explanation.reasons →
// [
//   "User previously categorized 'Trader Joes' as Groceries (3 times)",
//   "Amount $42.50 matches typical Groceries range ($20–$80, avg $45)",
//   "Weekly transaction pattern detected"
// ]

engine.applyCorrection(tx, 'Groceries'); // feeds back into CorrectionLearner

Explanation fields: reasons[], fromCorrection / fromSemanticMatch / fromKeywordMatch flags, amountPattern, temporalPattern, matchedMerchants, confidence.

Exports

packages/resolution/src/index.ts re-exports all three new modules alongside the existing TransactionAnalyzer.

Original prompt

This section details on the original issue you should resolve

<issue_title>feat(resolution): semantic merchant clustering + correction learning</issue_title>
<issue_description>## Summary
Phase 3 intelligence layer for the resolution engine.

Requirements

  • Embed merchant/entity descriptions (via ai-providers or local model)
  • Cluster semantically similar transactions (not just text matching)
  • Learn from user corrections: when user re-categorizes, update resolution model
  • Generate explanation objects: "Classified as Groceries because: similar to previous Trader Joe's transactions, amount range matches, weekly pattern"

Acceptance

  • Semantic clustering finds related merchants that text matching misses
  • User corrections improve future resolution accuracy
  • Explanations are generated for all resolutions</issue_description>

Comments on the Issue (you are @copilot in this section)

@kayodebristol @copilot Please implement this issue.

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

@github-actions github-actions Bot marked this pull request as ready for review March 25, 2026 14:33
Copilot AI review requested due to automatic review settings March 25, 2026 14:33
Copilot AI review requested due to automatic review settings March 25, 2026 14:33
Copilot AI requested review from Copilot and removed request for Copilot March 25, 2026 14:45
Copilot AI changed the title [WIP] Add semantic merchant clustering with correction learning feat(resolution): semantic merchant clustering + correction learning Mar 25, 2026
Copilot AI requested a review from kayodebristol March 25, 2026 14:48
@kayodebristol kayodebristol requested a review from Copilot March 25, 2026 15:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Phase 3” resolution layer in @financialadvisor/resolution by introducing semantic merchant classification/clustering, user correction learning, and a new stateful ResolutionEngine that produces structured explanations for categorization decisions.

Changes:

  • Added SemanticMerchantClusterer (TF‑IDF/cosine + trigram similarity; centroid-based category classification).
  • Added CorrectionLearner (records user corrections, provides lookup + stats + export/import state).
  • Added ResolutionEngine (priority pipeline: corrections → semantic classification → keyword fallback; includes explanation enrichment from history) and unit tests covering the new modules.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/unit/resolution.test.ts Adds unit coverage for the new semantic clustering, correction learning, and resolution engine explanation behavior.
packages/resolution/src/semantic-clustering.ts Implements merchant vectorization, similarity search, clustering, and centroid-based merchant classification.
packages/resolution/src/correction-learning.ts Implements correction recording, lookup by merchant/terms, stats, and state export/import.
packages/resolution/src/resolution-engine.ts Introduces the new resolution pipeline and explanation enrichment via history.
packages/resolution/src/index.ts Re-exports the new modules from the package entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +142 to +144
corrections: this.corrections,
merchantCorrections: Array.from(this.merchantCorrections),
termCorrections: Array.from(this.termCorrections),
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exportState() returns the internal corrections array by reference, so callers can mutate the learner’s internal state unintentionally (e.g., exported.corrections.push(...)). Consider returning a defensive copy (and/or deep-cloning entries) to keep the learner state encapsulated.

Suggested change
corrections: this.corrections,
merchantCorrections: Array.from(this.merchantCorrections),
termCorrections: Array.from(this.termCorrections),
// Defensive copies to avoid external mutation of internal state
corrections: this.corrections.map(c => ({
...c,
correctedAt: new Date(c.correctedAt),
})),
merchantCorrections: Array.from(
this.merchantCorrections,
([key, value]) => [key, { ...value }],
),
termCorrections: Array.from(
this.termCorrections,
([key, value]) => [key, { ...value }],
),

Copilot uses AI. Check for mistakes.
Comment on lines +263 to +268
const categoryTxns = this.transactionHistory.filter(
t =>
t.type === TransactionType.EXPENSE &&
(t.category === resolvedCategory ||
TransactionAnalyzer.categorizeTransaction(t) === resolvedCategory),
);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarTransactionCount and amountPattern are computed over all historical transactions in the resolved category, not transactions actually “similar” to the current one (same merchant or semantically similar merchants). This makes the explanation text (“Similar to X previous …”) misleading and can inflate confidence adjustments based on very broad category stats. Consider restricting the historical set to the same merchant (and/or the merchant’s semantic peers) before computing counts and amount ranges.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +219

if (transaction.merchant) {
this.clusterer.addMerchant(transaction.merchant);
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyCorrection() calls clusterer.addMerchant(), which rebuilds the entire IDF model every time a correction is recorded. For many corrections/merchants this becomes O(N²) work. Consider batching rebuilds (e.g., mark IDF dirty and rebuild lazily on the next similarity query, or provide an addMerchants path here) to keep correction application cheap.

Suggested change
if (transaction.merchant) {
this.clusterer.addMerchant(transaction.merchant);
}

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +164
// Direct substring match with any centroid term is also a strong signal
const substringBonus = centroidTerms.some(t => normalized.includes(t)) ? 0.3 : 0;
const score = Math.max(cosine, substringBonus);

if (score > bestScore) {
bestScore = score;
bestCategory = category;
}
}

if (!bestCategory || bestScore < 0.05) return null;

const centroidTerms = SemanticMerchantClusterer.CATEGORY_CENTROIDS[bestCategory] ?? [];
const matchedTerms = Array.from(terms.keys()).filter(t => centroidTerms.includes(t));
const reasons: string[] = [];

if (matchedTerms.length > 0) {
reasons.push(`Matched category terms: ${matchedTerms.join(', ')}`);
}

return {
category: bestCategory,
confidence: Math.min(bestScore * 2, 1),
reasons,
};
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classifyMerchant() can return a non-null classification with an empty reasons array when the best score comes from the substring bonus (e.g., a centroid term is contained in the normalized merchant string but doesn’t appear as an extracted term). This can lead to ResolutionEngine.resolve() returning an explanation with no reasons for semantic matches. Consider always emitting at least one reason (e.g., “Matched substring term …” / “Cosine similarity …”) when returning a classification.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@kayodebristol kayodebristol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: CI green + Copilot code review complete.

@kayodebristol kayodebristol merged commit fa20950 into main Mar 25, 2026
11 checks passed
@kayodebristol kayodebristol deleted the copilot/feat-merchants-clustering-correction-learning branch March 25, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(resolution): semantic merchant clustering + correction learning

3 participants