Skip to content

fix: normalize Unicode paths to NFC and support directory renames#18

Merged
molon merged 1 commit intomainfrom
fix/unicode-nfc-nfd-and-dir-rename
Apr 1, 2026
Merged

fix: normalize Unicode paths to NFC and support directory renames#18
molon merged 1 commit intomainfrom
fix/unicode-nfc-nfd-and-dir-rename

Conversation

@molon
Copy link
Copy Markdown
Owner

@molon molon commented Apr 1, 2026

Summary

  • NFC/NFD Unicode mismatch on macOS: git ls-tree outputs NFC paths (core.precomposeUnicode=true) while fs.readdir returns NFD for files with dakuten/handakuten characters (e.g. , , ). This caused files with Japanese names to repeatedly appear as "untracked new" after being accepted, and syncIgnoreState to incorrectly remove legitimate tracked files from the git index.
  • Directory rename corruption: onWillRenameFiles treated directory renames as single-file renames, writing a blob entry at the directory path. This corrupted the git index, causing "appears as both a file and as a directory" errors on all subsequent operations under that directory.
  • Added normalizePath() (NFC normalization) at all path boundaries: git output, readdir results, VSCode uri.fsPath, and state Map keys. Rewrote renameFile to handle directory renames by renaming all entries under the old prefix.

Test plan

  • New unit test: NFD paths from snapshot are found by listTrackedFiles as NFC
  • New unit test: renameFile handles directory renames (moves all entries under prefix)
  • All 73 existing unit tests pass
  • Manual test: workspace with Japanese-named files (dakuten chars) — accept files, rebuild, verify no reappearance
  • Manual test: rename a directory containing tracked files — verify no "appears as both a file and as a directory" error

🤖 Generated with Claude Code

On macOS, git ls-tree outputs NFC-normalized paths (core.precomposeUnicode=true)
while fs.readdir preserves the original form (NFD for files with dakuten/handakuten
characters like が, ド, ベ). This caused Set/Map key mismatches: files appeared as
both "idle" (NFC from git) and "untracked new" (NFD from readdir), and
syncIgnoreState incorrectly removed legitimate tracked files.

Also fixes directory renames being treated as single-file renames, which corrupted
the git index by writing a blob entry at the directory path (causing "appears as
both a file and as a directory" errors on subsequent operations).

Changes:
- Add pathNormalize.ts with NFC normalization utility
- Normalize all readdir paths in stateManager directory walkers
- Normalize all git path operations (snapshot, getBaseline, remove, rename)
- Normalize uri.fsPath in all FileWatcher event handlers
- Rewrite renameFile to handle directory renames (rename all entries under prefix)
- Update stateManager.renameFile to migrate directory children in state map
- Add unit tests for NFC/NFD path handling and directory renames

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 1, 2026 17:25
@molon molon merged commit 9c3eeab into main Apr 1, 2026
2 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Unicode path normalization mismatches (notably macOS NFC/NFD differences between filesystem and git output) and fixes git index corruption caused by treating directory renames as single-file renames. It introduces a centralized normalizePath() helper and expands rename handling to correctly migrate all entries under a renamed directory.

Changes:

  • Add NFC path normalization at key boundaries (filesystem reads, VSCode fsPath, git output, and state key usage).
  • Update HunkwiseGit.renameFile() to support directory renames by renaming all tracked entries under a prefix.
  • Add unit tests covering NFC/NFD interoperability and directory rename behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tsconfig.test.json Includes the new pathNormalize.ts in the test TypeScript build config.
src/pathNormalize.ts Introduces normalizePath() to normalize paths to NFC.
src/hunkwiseGit.ts Normalizes relpaths and rewrites renameFile() to handle directory renames via staged entry migration.
src/stateManager.ts Normalizes directory walk paths and improves renameFile() to migrate in-memory state for directory renames.
src/fileWatcher.ts Normalizes VSCode/FS event paths before state lookups and operations.
src/test/hunkwiseGit.test.ts Adds tests for NFC/NFD path consistency and directory rename behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/fileWatcher.ts
Comment on lines 318 to 322
if (this._suppressed) return;
if (!this.stateManager.enabled) return;
const filePath = uri.fsPath;
const filePath = normalizePath(uri.fsPath);
const basename = path.basename(filePath);
if (this.shouldIgnore(filePath)) return;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onDiskDelete() now normalizes filePath, but pendingUserDeletes is still populated/cleared using raw uri.fsPath from onWillDeleteFiles/onDidDeleteFiles. For NFD/NFC-mismatched paths this will cause user-initiated deletes to be misclassified as external deletes, leading to incorrect baseline/remove behavior. Normalize the paths when adding/removing from pendingUserDeletes (and any other sets used for path matching) so they use the same canonical form as filePath here.

Copilot uses AI. Check for mistakes.
Comment thread src/fileWatcher.ts
Comment on lines 256 to 260
private async onDiskCreate(uri: vscode.Uri): Promise<void> {
const filePath = uri.fsPath;
const filePath = normalizePath(uri.fsPath);
const basename = path.basename(filePath);
if (this._suppressed) return;
if (!this.stateManager.enabled) return;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalizing uri.fsPath here is good for consistent Map/Set keys, but there are still direct === comparisons later in this function (and in onDiskChange) against un-normalized TextDocument.uri.fsPath. For NFD filenames on macOS this can break the "manual create/save" detection because the open document path won’t match the normalized filePath. Normalize the document fsPath (or normalize both sides) before comparing.

Copilot uses AI. Check for mistakes.
Comment thread src/stateManager.ts
Comment on lines 315 to +317
renameFile(oldFilePath: string, newFilePath: string): void {
oldFilePath = normalizePath(oldFilePath);
newFilePath = normalizePath(newFilePath);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renameFile() normalizes its inputs, but other StateManager entry points (getFile, setFile, removeFile, exitReviewing, isReviewing, etc.) still accept raw paths and use them as Map keys. Since several other callers in the codebase pass uri.fsPath directly, this can reintroduce NFC/NFD mismatches and make state lookups unreliable. Consider normalizing filePath at the start of these public methods (or centralizing access through a single normalization helper) so the Map key format is consistent regardless of caller.

Copilot uses AI. Check for mistakes.
Comment thread src/pathNormalize.ts
Comment on lines +15 to +17
*/
export function normalizePath(p: string): string {
return p.normalize('NFC');
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizePath() currently forces NFC on all platforms. On normalization-sensitive filesystems (common on Linux), NFC and NFD can be distinct filenames; normalizing unconditionally can conflate two different paths into the same Map/Set key and potentially apply git/index operations to the wrong entry. Consider making this a no-op except on macOS (or gating it behind detection of a git/FS normalization mismatch such as core.precomposeUnicode), and document the tradeoff explicitly.

Suggested change
*/
export function normalizePath(p: string): string {
return p.normalize('NFC');
*
* On normalization-sensitive filesystems (for example, many Linux setups),
* NFC and NFD can be distinct filenames. To avoid conflating different paths
* into the same Map/Set key or applying git/index operations to the wrong
* entry, this function only forces NFC on macOS and is a no-op elsewhere.
*/
export function normalizePath(p: string): string {
if (typeof process !== 'undefined' && process.platform === 'darwin') {
return p.normalize('NFC');
}
return p;

Copilot uses AI. Check for mistakes.
Comment thread src/stateManager.ts
Comment on lines +330 to +334
for (const [fp, fs] of [...this.state.entries()]) {
if (fp.startsWith(oldPrefix)) {
this.state.delete(fp);
const newFp = newFilePath + fp.slice(oldFilePath.length);
this.state.set(newFp, fs);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the directory-children migration loop, the tuple variable is named fs, which shadows the imported fs module in this scope and makes the code harder to read/grep. Renaming it to something like childState (or similar) would avoid accidental confusion.

Suggested change
for (const [fp, fs] of [...this.state.entries()]) {
if (fp.startsWith(oldPrefix)) {
this.state.delete(fp);
const newFp = newFilePath + fp.slice(oldFilePath.length);
this.state.set(newFp, fs);
for (const [fp, childState] of [...this.state.entries()]) {
if (fp.startsWith(oldPrefix)) {
this.state.delete(fp);
const newFp = newFilePath + fp.slice(oldFilePath.length);
this.state.set(newFp, childState);

Copilot uses AI. Check for mistakes.
molon added a commit that referenced this pull request Apr 1, 2026
- Add readability check in collectUntrackedFiles to skip files with
  permission errors (EACCES/EPERM) that would fail downstream (#17)
- Normalize pendingUserDeletes paths to prevent NFC/NFD mismatches (#18)
- Normalize document fsPath in onDiskCreate/onDiskChange comparisons (#18)
- Normalize filePath in all StateManager public methods for consistent
  Map key access (#18)
- Gate NFC normalization to macOS only — on Linux NFC/NFD can be
  distinct filenames (#18)
- Rename shadowed `fs` variable to `childState` in renameFile (#18)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
molon added a commit that referenced this pull request Apr 1, 2026
* fix: address Copilot review issues from PR #17 and #18

- Add readability check in collectUntrackedFiles to skip files with
  permission errors (EACCES/EPERM) that would fail downstream (#17)
- Normalize pendingUserDeletes paths to prevent NFC/NFD mismatches (#18)
- Normalize document fsPath in onDiskCreate/onDiskChange comparisons (#18)
- Normalize filePath in all StateManager public methods for consistent
  Map key access (#18)
- Gate NFC normalization to macOS only — on Linux NFC/NFD can be
  distinct filenames (#18)
- Rename shadowed `fs` variable to `childState` in renameFile (#18)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update normalizePath docstring for platform-conditional behavior

The docstring still described unconditional NFC normalization after the
implementation was changed to macOS-only. Updated to document that the
function is a no-op on non-macOS platforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.0.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@molon molon deleted the fix/unicode-nfc-nfd-and-dir-rename branch April 1, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants