feat(resolution): semantic merchant clustering + correction learning#110
Conversation
Co-authored-by: kayodebristol <3579196+kayodebristol@users.noreply.github.com> Agent-Logs-Url: https://github.com/plures/FinancialAdvisor/sessions/89b1df2f-7b8f-4fcf-bcb8-fbedde674b7f
There was a problem hiding this comment.
Pull request overview
Adds a new “Phase 3” resolution layer in @financialadvisor/resolution by introducing semantic merchant classification/clustering, user correction learning, and a new stateful ResolutionEngine that produces structured explanations for categorization decisions.
Changes:
- Added
SemanticMerchantClusterer(TF‑IDF/cosine + trigram similarity; centroid-based category classification). - Added
CorrectionLearner(records user corrections, provides lookup + stats + export/import state). - Added
ResolutionEngine(priority pipeline: corrections → semantic classification → keyword fallback; includes explanation enrichment from history) and unit tests covering the new modules.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/resolution.test.ts | Adds unit coverage for the new semantic clustering, correction learning, and resolution engine explanation behavior. |
| packages/resolution/src/semantic-clustering.ts | Implements merchant vectorization, similarity search, clustering, and centroid-based merchant classification. |
| packages/resolution/src/correction-learning.ts | Implements correction recording, lookup by merchant/terms, stats, and state export/import. |
| packages/resolution/src/resolution-engine.ts | Introduces the new resolution pipeline and explanation enrichment via history. |
| packages/resolution/src/index.ts | Re-exports the new modules from the package entrypoint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| corrections: this.corrections, | ||
| merchantCorrections: Array.from(this.merchantCorrections), | ||
| termCorrections: Array.from(this.termCorrections), |
There was a problem hiding this comment.
exportState() returns the internal corrections array by reference, so callers can mutate the learner’s internal state unintentionally (e.g., exported.corrections.push(...)). Consider returning a defensive copy (and/or deep-cloning entries) to keep the learner state encapsulated.
| corrections: this.corrections, | |
| merchantCorrections: Array.from(this.merchantCorrections), | |
| termCorrections: Array.from(this.termCorrections), | |
| // Defensive copies to avoid external mutation of internal state | |
| corrections: this.corrections.map(c => ({ | |
| ...c, | |
| correctedAt: new Date(c.correctedAt), | |
| })), | |
| merchantCorrections: Array.from( | |
| this.merchantCorrections, | |
| ([key, value]) => [key, { ...value }], | |
| ), | |
| termCorrections: Array.from( | |
| this.termCorrections, | |
| ([key, value]) => [key, { ...value }], | |
| ), |
| const categoryTxns = this.transactionHistory.filter( | ||
| t => | ||
| t.type === TransactionType.EXPENSE && | ||
| (t.category === resolvedCategory || | ||
| TransactionAnalyzer.categorizeTransaction(t) === resolvedCategory), | ||
| ); |
There was a problem hiding this comment.
similarTransactionCount and amountPattern are computed over all historical transactions in the resolved category, not transactions actually “similar” to the current one (same merchant or semantically similar merchants). This makes the explanation text (“Similar to X previous …”) misleading and can inflate confidence adjustments based on very broad category stats. Consider restricting the historical set to the same merchant (and/or the merchant’s semantic peers) before computing counts and amount ranges.
|
|
||
| if (transaction.merchant) { | ||
| this.clusterer.addMerchant(transaction.merchant); | ||
| } |
There was a problem hiding this comment.
applyCorrection() calls clusterer.addMerchant(), which rebuilds the entire IDF model every time a correction is recorded. For many corrections/merchants this becomes O(N²) work. Consider batching rebuilds (e.g., mark IDF dirty and rebuild lazily on the next similarity query, or provide an addMerchants path here) to keep correction application cheap.
| if (transaction.merchant) { | |
| this.clusterer.addMerchant(transaction.merchant); | |
| } |
| // Direct substring match with any centroid term is also a strong signal | ||
| const substringBonus = centroidTerms.some(t => normalized.includes(t)) ? 0.3 : 0; | ||
| const score = Math.max(cosine, substringBonus); | ||
|
|
||
| if (score > bestScore) { | ||
| bestScore = score; | ||
| bestCategory = category; | ||
| } | ||
| } | ||
|
|
||
| if (!bestCategory || bestScore < 0.05) return null; | ||
|
|
||
| const centroidTerms = SemanticMerchantClusterer.CATEGORY_CENTROIDS[bestCategory] ?? []; | ||
| const matchedTerms = Array.from(terms.keys()).filter(t => centroidTerms.includes(t)); | ||
| const reasons: string[] = []; | ||
|
|
||
| if (matchedTerms.length > 0) { | ||
| reasons.push(`Matched category terms: ${matchedTerms.join(', ')}`); | ||
| } | ||
|
|
||
| return { | ||
| category: bestCategory, | ||
| confidence: Math.min(bestScore * 2, 1), | ||
| reasons, | ||
| }; |
There was a problem hiding this comment.
classifyMerchant() can return a non-null classification with an empty reasons array when the best score comes from the substring bonus (e.g., a centroid term is contained in the normalized merchant string but doesn’t appear as an extracted term). This can lead to ResolutionEngine.resolve() returning an explanation with no reasons for semantic matches. Consider always emitting at least one reason (e.g., “Matched substring term …” / “Cosine similarity …”) when returning a classification.
kayodebristol
left a comment
There was a problem hiding this comment.
Auto-approved: CI green + Copilot code review complete.
Adds a Phase 3 intelligence layer to the resolution engine: semantic merchant similarity (replacing pure keyword matching), user-correction feedback loops, and structured explanation objects for every categorization decision.
New modules
SemanticMerchantClusterer(semantic-clustering.ts) — TF-IDF term vectors + cosine similarity for cross-merchant clustering; character tri-gram Jaccard similarity for name variants (e.g. "Trader Joes" ↔ "Trader Joe's"); zero-shot category classification via pre-defined centroid vectors for 8 spending categories.CorrectionLearner(correction-learning.ts) — indexes user re-categorizations by normalized merchant name (high confidence) and description terms (lower confidence). Confidence scales with repeated corrections; supportsexportState/importStatefor persistence.ResolutionEngine(resolution-engine.ts) — orchestrates the full pipeline with a strict priority order:TransactionAnalyzer)Every call to
resolve()returns aResolutionResultwith aResolutionExplanation:Explanation fields:
reasons[],fromCorrection / fromSemanticMatch / fromKeywordMatchflags,amountPattern,temporalPattern,matchedMerchants,confidence.Exports
packages/resolution/src/index.tsre-exports all three new modules alongside the existingTransactionAnalyzer.Original prompt
📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.