fix: normalize Unicode paths to NFC and support directory renames#18
fix: normalize Unicode paths to NFC and support directory renames#18
Conversation
On macOS, git ls-tree outputs NFC-normalized paths (core.precomposeUnicode=true) while fs.readdir preserves the original form (NFD for files with dakuten/handakuten characters like が, ド, ベ). This caused Set/Map key mismatches: files appeared as both "idle" (NFC from git) and "untracked new" (NFD from readdir), and syncIgnoreState incorrectly removed legitimate tracked files. Also fixes directory renames being treated as single-file renames, which corrupted the git index by writing a blob entry at the directory path (causing "appears as both a file and as a directory" errors on subsequent operations). Changes: - Add pathNormalize.ts with NFC normalization utility - Normalize all readdir paths in stateManager directory walkers - Normalize all git path operations (snapshot, getBaseline, remove, rename) - Normalize uri.fsPath in all FileWatcher event handlers - Rewrite renameFile to handle directory renames (rename all entries under prefix) - Update stateManager.renameFile to migrate directory children in state map - Add unit tests for NFC/NFD path handling and directory renames Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses Unicode path normalization mismatches (notably macOS NFC/NFD differences between filesystem and git output) and fixes git index corruption caused by treating directory renames as single-file renames. It introduces a centralized normalizePath() helper and expands rename handling to correctly migrate all entries under a renamed directory.
Changes:
- Add NFC path normalization at key boundaries (filesystem reads, VSCode
fsPath, git output, and state key usage). - Update
HunkwiseGit.renameFile()to support directory renames by renaming all tracked entries under a prefix. - Add unit tests covering NFC/NFD interoperability and directory rename behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.test.json | Includes the new pathNormalize.ts in the test TypeScript build config. |
| src/pathNormalize.ts | Introduces normalizePath() to normalize paths to NFC. |
| src/hunkwiseGit.ts | Normalizes relpaths and rewrites renameFile() to handle directory renames via staged entry migration. |
| src/stateManager.ts | Normalizes directory walk paths and improves renameFile() to migrate in-memory state for directory renames. |
| src/fileWatcher.ts | Normalizes VSCode/FS event paths before state lookups and operations. |
| src/test/hunkwiseGit.test.ts | Adds tests for NFC/NFD path consistency and directory rename behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (this._suppressed) return; | ||
| if (!this.stateManager.enabled) return; | ||
| const filePath = uri.fsPath; | ||
| const filePath = normalizePath(uri.fsPath); | ||
| const basename = path.basename(filePath); | ||
| if (this.shouldIgnore(filePath)) return; |
There was a problem hiding this comment.
onDiskDelete() now normalizes filePath, but pendingUserDeletes is still populated/cleared using raw uri.fsPath from onWillDeleteFiles/onDidDeleteFiles. For NFD/NFC-mismatched paths this will cause user-initiated deletes to be misclassified as external deletes, leading to incorrect baseline/remove behavior. Normalize the paths when adding/removing from pendingUserDeletes (and any other sets used for path matching) so they use the same canonical form as filePath here.
| private async onDiskCreate(uri: vscode.Uri): Promise<void> { | ||
| const filePath = uri.fsPath; | ||
| const filePath = normalizePath(uri.fsPath); | ||
| const basename = path.basename(filePath); | ||
| if (this._suppressed) return; | ||
| if (!this.stateManager.enabled) return; |
There was a problem hiding this comment.
Normalizing uri.fsPath here is good for consistent Map/Set keys, but there are still direct === comparisons later in this function (and in onDiskChange) against un-normalized TextDocument.uri.fsPath. For NFD filenames on macOS this can break the "manual create/save" detection because the open document path won’t match the normalized filePath. Normalize the document fsPath (or normalize both sides) before comparing.
| renameFile(oldFilePath: string, newFilePath: string): void { | ||
| oldFilePath = normalizePath(oldFilePath); | ||
| newFilePath = normalizePath(newFilePath); |
There was a problem hiding this comment.
renameFile() normalizes its inputs, but other StateManager entry points (getFile, setFile, removeFile, exitReviewing, isReviewing, etc.) still accept raw paths and use them as Map keys. Since several other callers in the codebase pass uri.fsPath directly, this can reintroduce NFC/NFD mismatches and make state lookups unreliable. Consider normalizing filePath at the start of these public methods (or centralizing access through a single normalization helper) so the Map key format is consistent regardless of caller.
| */ | ||
| export function normalizePath(p: string): string { | ||
| return p.normalize('NFC'); |
There was a problem hiding this comment.
normalizePath() currently forces NFC on all platforms. On normalization-sensitive filesystems (common on Linux), NFC and NFD can be distinct filenames; normalizing unconditionally can conflate two different paths into the same Map/Set key and potentially apply git/index operations to the wrong entry. Consider making this a no-op except on macOS (or gating it behind detection of a git/FS normalization mismatch such as core.precomposeUnicode), and document the tradeoff explicitly.
| */ | |
| export function normalizePath(p: string): string { | |
| return p.normalize('NFC'); | |
| * | |
| * On normalization-sensitive filesystems (for example, many Linux setups), | |
| * NFC and NFD can be distinct filenames. To avoid conflating different paths | |
| * into the same Map/Set key or applying git/index operations to the wrong | |
| * entry, this function only forces NFC on macOS and is a no-op elsewhere. | |
| */ | |
| export function normalizePath(p: string): string { | |
| if (typeof process !== 'undefined' && process.platform === 'darwin') { | |
| return p.normalize('NFC'); | |
| } | |
| return p; |
| for (const [fp, fs] of [...this.state.entries()]) { | ||
| if (fp.startsWith(oldPrefix)) { | ||
| this.state.delete(fp); | ||
| const newFp = newFilePath + fp.slice(oldFilePath.length); | ||
| this.state.set(newFp, fs); |
There was a problem hiding this comment.
In the directory-children migration loop, the tuple variable is named fs, which shadows the imported fs module in this scope and makes the code harder to read/grep. Renaming it to something like childState (or similar) would avoid accidental confusion.
| for (const [fp, fs] of [...this.state.entries()]) { | |
| if (fp.startsWith(oldPrefix)) { | |
| this.state.delete(fp); | |
| const newFp = newFilePath + fp.slice(oldFilePath.length); | |
| this.state.set(newFp, fs); | |
| for (const [fp, childState] of [...this.state.entries()]) { | |
| if (fp.startsWith(oldPrefix)) { | |
| this.state.delete(fp); | |
| const newFp = newFilePath + fp.slice(oldFilePath.length); | |
| this.state.set(newFp, childState); |
- Add readability check in collectUntrackedFiles to skip files with permission errors (EACCES/EPERM) that would fail downstream (#17) - Normalize pendingUserDeletes paths to prevent NFC/NFD mismatches (#18) - Normalize document fsPath in onDiskCreate/onDiskChange comparisons (#18) - Normalize filePath in all StateManager public methods for consistent Map key access (#18) - Gate NFC normalization to macOS only — on Linux NFC/NFD can be distinct filenames (#18) - Rename shadowed `fs` variable to `childState` in renameFile (#18) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address Copilot review issues from PR #17 and #18 - Add readability check in collectUntrackedFiles to skip files with permission errors (EACCES/EPERM) that would fail downstream (#17) - Normalize pendingUserDeletes paths to prevent NFC/NFD mismatches (#18) - Normalize document fsPath in onDiskCreate/onDiskChange comparisons (#18) - Normalize filePath in all StateManager public methods for consistent Map key access (#18) - Gate NFC normalization to macOS only — on Linux NFC/NFD can be distinct filenames (#18) - Rename shadowed `fs` variable to `childState` in renameFile (#18) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update normalizePath docstring for platform-conditional behavior The docstring still described unconditional NFC normalization after the implementation was changed to macOS-only. Updated to document that the function is a no-op on non-macOS platforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.0.29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
git ls-treeoutputs NFC paths (core.precomposeUnicode=true) whilefs.readdirreturns NFD for files with dakuten/handakuten characters (e.g.が,ド,ベ). This caused files with Japanese names to repeatedly appear as "untracked new" after being accepted, andsyncIgnoreStateto incorrectly remove legitimate tracked files from the git index.onWillRenameFilestreated directory renames as single-file renames, writing a blob entry at the directory path. This corrupted the git index, causing "appears as both a file and as a directory" errors on all subsequent operations under that directory.normalizePath()(NFC normalization) at all path boundaries: git output, readdir results, VSCodeuri.fsPath, and state Map keys. RewroterenameFileto handle directory renames by renaming all entries under the old prefix.Test plan
🤖 Generated with Claude Code