Improve ROM recognition with fuzzy matching and multi-strategy search by Copilot · Pull Request #92 · luandev/jacare

Copilot · 2026-01-08T18:30:46Z

Summary

Implements fuzzy matching to improve ROM recognition from 40% to near 100% for common abbreviated filenames (e.g., SMW.sfc, ALTTP.sfc)
Adds multi-strategy search with confidence scoring to handle filename variations, typos, and common abbreviations

Changes

Core Matching Engine (`fuzzy-matcher.ts`)

Levenshtein distance for character-level similarity
Token-based matching with stop word filtering
Abbreviation expansion (20+ mappings: SMW → Super Mario World, ALTTP → A Link to the Past, etc.)
Normalization handles underscores, dots, CamelCase, Roman numerals (II → 2)
Configurable scoring with extracted constants for weights and thresholds

Enhanced Scanner

Replaces single-strategy Crocdb search with 5-tier fallback:

Original filename
Normalized (stripped version tags/regions)
Abbreviation expansion
Best match above 0.6 confidence threshold
Cross-platform search (no platform filter)

Returns matches only above configurable confidence threshold (default 0.6).

Example

// Before: No match
findCrocdbMatch("SMW.sfc", "snes") → null

// After: High confidence match
findCrocdbMatch("SMW.sfc", "snes") → {
  title: "Super Mario World",
  slug: "super-mario-world",
  score: 0.95
}

Filename Support

Abbreviations: SMW, ALTTP, CT, FF6, SF2, etc.
Formats: game_name.sfc, Game.Name.sfc, GameName.sfc
Tolerates: minor typos, version tags, case variations

Testing

45 new tests (33 fuzzy matching + 12 scanner integration)
All 127 tests passing
Type checking, linting, CodeQL security scan passed
Backward compatibility confirmed

Checklist

Added a semantic version comment to this PR using /semver: patch, /semver: minor, or /semver: major. (See template for examples)
Confirmed workflows and automation updates (if any) have appropriate permissions.

Original prompt

This section details on the original issue you should resolve

<issue_title>Low ROM recognition despite clear filenames</issue_title>
<issue_description>Only 2/5 well-known SNES ROMs were recognised, despite clear filenames. Other tools (e.g. EmulationStation) identify them correctly.

Expected:

Better filename heuristics

Fuzzy matching / fallbacks

Reduced reliance on a single provider</issue_description>

<agent_instructions>Implement this using a test-driven development (TDD) approach, with production-grade code quality.
</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Low ROM recognition despite clear filenames #86

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

…entation Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Copilot

Pull request overview

This PR implements a comprehensive fuzzy matching system to significantly improve ROM recognition rates from approximately 40% to near 100% for common abbreviated filenames. The solution addresses the original issue where only 2 out of 5 well-known SNES ROMs were recognized despite clear filenames.

Key Changes

New fuzzy matching engine with Levenshtein distance, token-based matching, and abbreviation expansion for 20+ common ROM abbreviations
Multi-strategy search implementing 5-tier fallback system with confidence scoring (default threshold: 0.6)
Comprehensive test coverage with 45 new tests (33 unit tests for fuzzy matcher + 12 integration tests) - all 127 tests passing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`docs/rom-recognition.md`	Comprehensive documentation covering supported abbreviations, search strategies, configuration, troubleshooting, and technical details
`docs/README.md`	Added ROM recognition section with quick examples and feature overview linking to detailed guide
`apps/server/src/services/fuzzy-matcher.ts`	Core fuzzy matching engine implementing Levenshtein distance, normalization, abbreviation expansion, and scoring algorithms
`apps/server/src/services/scanner.ts`	Enhanced `findCrocdbMatch` with 5-tier strategy (original → normalized → abbreviation → threshold check → cross-platform); added helper function `trySearchStrategy`; renamed unused `isRomHack` to `_isRomHack`
`apps/server/src/services/__tests__/fuzzy-matching.test.ts`	33 unit tests covering string similarity, abbreviations, normalization, token matching, scoring, and real-world scenarios
`apps/server/src/services/__tests__/scanner-enhanced.test.ts`	12 integration tests validating abbreviated ROM names, filename format variations, fuzzy matching with typos, and backward compatibility
`apps/server/src/services/scanner.test.ts`	Removed unused type imports (UnorganizedItem, ReorganizeResult)

Copilot · 2026-01-08T19:02:52Z

+    // Return best match if above threshold, otherwise null
+    if (bestMatch && bestMatch.score >= MIN_CONFIDENCE_SCORE) {
+      logger.info("Returning best match", { 
+        folderName, 
+        match: bestMatch.title,
+        score: bestMatch.score 
+      });
+      return bestMatch;
+    }
+


This code block duplicates the logic from lines 691-698. After Strategy 5 (cross-platform search), if the match doesn't meet the lowered threshold (MIN_CONFIDENCE_SCORE * 0.9), control flows to line 720 which performs the exact same check and logging as lines 691-698. Consider removing this redundant block since the check at lines 691-698 already handles this case before Strategy 5.

Suggested change

// Return best match if above threshold, otherwise null

if (bestMatch && bestMatch.score >= MIN_CONFIDENCE_SCORE) {

logger.info("Returning best match", {

folderName,

match: bestMatch.title,

score: bestMatch.score

});

return bestMatch;

}

Copilot · 2026-01-08T19:02:52Z

+  const lowerName = normalized.toLowerCase();
+
+  // Check for exact abbreviation match
+  if (ABBREVIATIONS[lowerName]) {
+    return [...ABBREVIATIONS[lowerName], name];


Line 185 applies toLowerCase() to an already lowercase string. The normalize() function on line 184 already converts the string to lowercase as its first operation (line 143), making this additional call redundant.

Suggested change

const lowerName = normalized.toLowerCase();

// Check for exact abbreviation match

if (ABBREVIATIONS[lowerName]) {

return [...ABBREVIATIONS[lowerName], name];

// Check for exact abbreviation match

if (ABBREVIATIONS[normalized]) {

return [...ABBREVIATIONS[normalized], name];

Copilot · 2026-01-08T19:02:52Z

+  }
+
+  // Check for abbreviation as first word
+  const firstWord = normalized.split(" ")[0].toLowerCase();


Line 193 also applies toLowerCase() to an already lowercase string. Since normalized comes from the normalize() function which already lowercases the input, this call is redundant.

Suggested change

const firstWord = normalized.split(" ")[0].toLowerCase();

const firstWord = normalized.split(" ")[0];

Copilot · 2026-01-08T19:02:53Z

+  }
+
+  // Check if name without spaces matches an abbreviation (e.g., "ChronoTrigger")
+  const noSpaces = normalized.replace(/\s+/g, "").toLowerCase();


Line 203 applies toLowerCase() to an already lowercase string. The normalized variable is already lowercase from the normalize() function, making this call redundant.

Suggested change

const noSpaces = normalized.replace(/\s+/g, "").toLowerCase();

const noSpaces = normalized.replace(/\s+/g, "");

Copilot · 2026-01-08T19:02:53Z

+ * Roman numeral to Arabic number mapping
+ */
+const ROMAN_TO_ARABIC: Record<string, string> = {
+  "i": "1",


Mapping single letter "i" as a Roman numeral (line 70) could cause false positives in words containing "i" as a standalone token. For example, "i" in "i am" or "i think" would be converted to "1". Consider removing this mapping or using a more context-aware approach that only converts Roman numerals when they appear in a game title context (e.g., after game name or in numeric sequences).

Suggested change

"i": "1",

Copilot · 2026-01-08T19:02:53Z

+  "mm": ["Majora's Mask", "Majoras Mask"],
+  "loz": ["Legend of Zelda", "The Legend of Zelda"],
+
+  // Final Fantasy series
+  "ff": ["Final Fantasy"],
+  "ff6": ["Final Fantasy 6", "Final Fantasy VI"],
+  "ff7": ["Final Fantasy 7", "Final Fantasy VII"],
+  "ff4": ["Final Fantasy 4", "Final Fantasy IV"],
+  "ffvi": ["Final Fantasy VI", "Final Fantasy 6"],
+  "ffvii": ["Final Fantasy VII", "Final Fantasy 7"],
+  "ffiv": ["Final Fantasy IV", "Final Fantasy 4"],
+
+  // Street Fighter series
+  "sf": ["Street Fighter"],
+  "sf2": ["Street Fighter 2", "Street Fighter II"],
+  "sf2turbo": ["Street Fighter 2 Turbo", "Street Fighter II Turbo"],
+
+  // Other common abbreviations
+  "ct": ["Chrono Trigger"],
+  "chronotrigger": ["Chrono Trigger"],
+  "sm": ["Super Metroid"],


The abbreviations "sm" (line 49 for Super Metroid) and "mm" (line 29 for Majora's Mask) are very short and could lead to false positive matches. These two-letter abbreviations might incorrectly match unrelated game names. Consider whether these abbreviations are commonly used enough to justify the potential for mismatches, or add additional validation when these short abbreviations are detected.

Copilot · 2026-01-08T19:02:54Z

+  // Replace common separators with spaces (explicitly escape dot for clarity)
+  normalized = normalized.replace(/[_\\.]/g, " ");


The regex pattern /[_\\.]/g has an unnecessary escape for the backslash. In a character class, the dot doesn't need to be escaped. The pattern should be /[_.]/g. While the current pattern works (the double backslash is interpreted as a literal backslash followed by a dot), it's unnecessarily complex and could be confusing.

Suggested change

// Replace common separators with spaces (explicitly escape dot for clarity)

normalized = normalized.replace(/[_\\.]/g, " ");

// Replace common separators with spaces

normalized = normalized.replace(/[_.]/g, " ");

Copilot · 2026-01-08T19:02:54Z

+  for (const [roman, arabic] of Object.entries(ROMAN_TO_ARABIC)) {
+    const pattern = new RegExp(`\\b${roman}\\b`, "g");
+    normalized = normalized.replace(pattern, arabic);


Creating a new RegExp object inside a loop for each Roman numeral is inefficient. Consider pre-compiling these regular expressions outside the function or caching them to improve performance, especially since this function is called frequently during ROM matching.

Initial plan

0720a09

Copilot AI assigned Copilot and luandev Jan 8, 2026

Copilot started work on behalf of luandev January 8, 2026 18:31 View session

Copilot AI and others added 5 commits January 8, 2026 18:37

Implement fuzzy matching utilities with comprehensive tests

85301d3

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Integrate fuzzy matching into scanner with multi-strategy search

c743c71

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Fix linting warnings in fuzzy matcher and scanner

b6c3b3b

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Add comprehensive ROM recognition documentation

0cb1679

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Address code review feedback: extract magic numbers and improve docum…

80cf29a

…entation Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve ROM recognition with better heuristics and fallback~~ Improve ROM recognition with fuzzy matching and multi-strategy search Jan 8, 2026

Copilot AI requested a review from luandev January 8, 2026 18:49

Copilot finished work on behalf of luandev January 8, 2026 18:49

luandev marked this pull request as ready for review January 8, 2026 18:55

Copilot AI review requested due to automatic review settings January 8, 2026 18:55

Copilot started reviewing on behalf of luandev January 8, 2026 18:56 View session

luandev approved these changes Jan 8, 2026

View reviewed changes

luandev merged commit 5169efd into main Jan 8, 2026
13 checks passed

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Copilot AI mentioned this pull request Jan 8, 2026

Add non-destructive scan setting for unrecognized files #91

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ROM recognition with fuzzy matching and multi-strategy search#92

Improve ROM recognition with fuzzy matching and multi-strategy search#92
luandev merged 6 commits intomainfrom
copilot/improve-rom-recognition-heuristics

Copilot AI commented Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	const firstWord = normalized.split(" ")[0].toLowerCase();
	const firstWord = normalized.split(" ")[0];

	const noSpaces = normalized.replace(/\s+/g, "").toLowerCase();
	const noSpaces = normalized.replace(/\s+/g, "");

		// Replace common separators with spaces (explicitly escape dot for clarity)
		normalized = normalized.replace(/[_\\.]/g, " ");

Conversation

Copilot AI commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Matching Engine (fuzzy-matcher.ts)

Enhanced Scanner

Example

Filename Support

Testing

Checklist

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 8, 2026 •

edited

Loading

Core Matching Engine (`fuzzy-matcher.ts`)