Conversation
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
…entation Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive fuzzy matching system to significantly improve ROM recognition rates from approximately 40% to near 100% for common abbreviated filenames. The solution addresses the original issue where only 2 out of 5 well-known SNES ROMs were recognized despite clear filenames.
Key Changes
- New fuzzy matching engine with Levenshtein distance, token-based matching, and abbreviation expansion for 20+ common ROM abbreviations
- Multi-strategy search implementing 5-tier fallback system with confidence scoring (default threshold: 0.6)
- Comprehensive test coverage with 45 new tests (33 unit tests for fuzzy matcher + 12 integration tests) - all 127 tests passing
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
docs/rom-recognition.md |
Comprehensive documentation covering supported abbreviations, search strategies, configuration, troubleshooting, and technical details |
docs/README.md |
Added ROM recognition section with quick examples and feature overview linking to detailed guide |
apps/server/src/services/fuzzy-matcher.ts |
Core fuzzy matching engine implementing Levenshtein distance, normalization, abbreviation expansion, and scoring algorithms |
apps/server/src/services/scanner.ts |
Enhanced findCrocdbMatch with 5-tier strategy (original → normalized → abbreviation → threshold check → cross-platform); added helper function trySearchStrategy; renamed unused isRomHack to _isRomHack |
apps/server/src/services/__tests__/fuzzy-matching.test.ts |
33 unit tests covering string similarity, abbreviations, normalization, token matching, scoring, and real-world scenarios |
apps/server/src/services/__tests__/scanner-enhanced.test.ts |
12 integration tests validating abbreviated ROM names, filename format variations, fuzzy matching with typos, and backward compatibility |
apps/server/src/services/scanner.test.ts |
Removed unused type imports (UnorganizedItem, ReorganizeResult) |
| // Return best match if above threshold, otherwise null | ||
| if (bestMatch && bestMatch.score >= MIN_CONFIDENCE_SCORE) { | ||
| logger.info("Returning best match", { | ||
| folderName, | ||
| match: bestMatch.title, | ||
| score: bestMatch.score | ||
| }); | ||
| return bestMatch; | ||
| } | ||
|
|
There was a problem hiding this comment.
This code block duplicates the logic from lines 691-698. After Strategy 5 (cross-platform search), if the match doesn't meet the lowered threshold (MIN_CONFIDENCE_SCORE * 0.9), control flows to line 720 which performs the exact same check and logging as lines 691-698. Consider removing this redundant block since the check at lines 691-698 already handles this case before Strategy 5.
| // Return best match if above threshold, otherwise null | |
| if (bestMatch && bestMatch.score >= MIN_CONFIDENCE_SCORE) { | |
| logger.info("Returning best match", { | |
| folderName, | |
| match: bestMatch.title, | |
| score: bestMatch.score | |
| }); | |
| return bestMatch; | |
| } | |
| const lowerName = normalized.toLowerCase(); | ||
|
|
||
| // Check for exact abbreviation match | ||
| if (ABBREVIATIONS[lowerName]) { | ||
| return [...ABBREVIATIONS[lowerName], name]; |
There was a problem hiding this comment.
Line 185 applies toLowerCase() to an already lowercase string. The normalize() function on line 184 already converts the string to lowercase as its first operation (line 143), making this additional call redundant.
| const lowerName = normalized.toLowerCase(); | |
| // Check for exact abbreviation match | |
| if (ABBREVIATIONS[lowerName]) { | |
| return [...ABBREVIATIONS[lowerName], name]; | |
| // Check for exact abbreviation match | |
| if (ABBREVIATIONS[normalized]) { | |
| return [...ABBREVIATIONS[normalized], name]; |
| } | ||
|
|
||
| // Check for abbreviation as first word | ||
| const firstWord = normalized.split(" ")[0].toLowerCase(); |
There was a problem hiding this comment.
Line 193 also applies toLowerCase() to an already lowercase string. Since normalized comes from the normalize() function which already lowercases the input, this call is redundant.
| const firstWord = normalized.split(" ")[0].toLowerCase(); | |
| const firstWord = normalized.split(" ")[0]; |
| } | ||
|
|
||
| // Check if name without spaces matches an abbreviation (e.g., "ChronoTrigger") | ||
| const noSpaces = normalized.replace(/\s+/g, "").toLowerCase(); |
There was a problem hiding this comment.
Line 203 applies toLowerCase() to an already lowercase string. The normalized variable is already lowercase from the normalize() function, making this call redundant.
| const noSpaces = normalized.replace(/\s+/g, "").toLowerCase(); | |
| const noSpaces = normalized.replace(/\s+/g, ""); |
| * Roman numeral to Arabic number mapping | ||
| */ | ||
| const ROMAN_TO_ARABIC: Record<string, string> = { | ||
| "i": "1", |
There was a problem hiding this comment.
Mapping single letter "i" as a Roman numeral (line 70) could cause false positives in words containing "i" as a standalone token. For example, "i" in "i am" or "i think" would be converted to "1". Consider removing this mapping or using a more context-aware approach that only converts Roman numerals when they appear in a game title context (e.g., after game name or in numeric sequences).
| "i": "1", |
| "mm": ["Majora's Mask", "Majoras Mask"], | ||
| "loz": ["Legend of Zelda", "The Legend of Zelda"], | ||
|
|
||
| // Final Fantasy series | ||
| "ff": ["Final Fantasy"], | ||
| "ff6": ["Final Fantasy 6", "Final Fantasy VI"], | ||
| "ff7": ["Final Fantasy 7", "Final Fantasy VII"], | ||
| "ff4": ["Final Fantasy 4", "Final Fantasy IV"], | ||
| "ffvi": ["Final Fantasy VI", "Final Fantasy 6"], | ||
| "ffvii": ["Final Fantasy VII", "Final Fantasy 7"], | ||
| "ffiv": ["Final Fantasy IV", "Final Fantasy 4"], | ||
|
|
||
| // Street Fighter series | ||
| "sf": ["Street Fighter"], | ||
| "sf2": ["Street Fighter 2", "Street Fighter II"], | ||
| "sf2turbo": ["Street Fighter 2 Turbo", "Street Fighter II Turbo"], | ||
|
|
||
| // Other common abbreviations | ||
| "ct": ["Chrono Trigger"], | ||
| "chronotrigger": ["Chrono Trigger"], | ||
| "sm": ["Super Metroid"], |
There was a problem hiding this comment.
The abbreviations "sm" (line 49 for Super Metroid) and "mm" (line 29 for Majora's Mask) are very short and could lead to false positive matches. These two-letter abbreviations might incorrectly match unrelated game names. Consider whether these abbreviations are commonly used enough to justify the potential for mismatches, or add additional validation when these short abbreviations are detected.
| // Replace common separators with spaces (explicitly escape dot for clarity) | ||
| normalized = normalized.replace(/[_\\.]/g, " "); |
There was a problem hiding this comment.
The regex pattern /[_\\.]/g has an unnecessary escape for the backslash. In a character class, the dot doesn't need to be escaped. The pattern should be /[_.]/g. While the current pattern works (the double backslash is interpreted as a literal backslash followed by a dot), it's unnecessarily complex and could be confusing.
| // Replace common separators with spaces (explicitly escape dot for clarity) | |
| normalized = normalized.replace(/[_\\.]/g, " "); | |
| // Replace common separators with spaces | |
| normalized = normalized.replace(/[_.]/g, " "); |
| for (const [roman, arabic] of Object.entries(ROMAN_TO_ARABIC)) { | ||
| const pattern = new RegExp(`\\b${roman}\\b`, "g"); | ||
| normalized = normalized.replace(pattern, arabic); |
There was a problem hiding this comment.
Creating a new RegExp object inside a loop for each Roman numeral is inefficient. Consider pre-compiling these regular expressions outside the function or caching them to improve performance, especially since this function is called frequently during ROM matching.
Summary
SMW.sfc,ALTTP.sfc)Changes
Core Matching Engine (
fuzzy-matcher.ts)Enhanced Scanner
Replaces single-strategy Crocdb search with 5-tier fallback:
Returns matches only above configurable confidence threshold (default 0.6).
Example
Filename Support
SMW,ALTTP,CT,FF6,SF2, etc.game_name.sfc,Game.Name.sfc,GameName.sfcTesting
Checklist
/semver: patch,/semver: minor, or/semver: major. (See template for examples)Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.