Skip to content

Commit 6735967

Browse files
fix: handle concurrent file edits and symlink loops in watcher/builder
Add readFileSafe helper that retries on transient OS errors (EBUSY/EACCES/EPERM) from editors performing non-atomic saves. Replace bare readFileSync calls in builder.js (3 sites) and watcher.js (1 site). Add symlink loop detection to collectFiles via realpathSync tracking of visited directories, preventing infinite recursion from circular symlinks. Update architecture.md sections #7, #15, and summary table to reflect these fixes. Impact: 5 functions changed, 5 affected
1 parent ea201d1 commit 6735967

3 files changed

Lines changed: 52 additions & 8 deletions

File tree

generated/architecture.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ Consumers receive an engine object and call methods on it. They never branch on
228228
229229
**Current state:** The entire build pipeline is synchronous batch processing. Parse all files → insert all nodes → build all edges. The watcher does per-file updates but reimplements the pipeline in a simpler form.
230230
231-
**Problem:** For large repos (10K+ files), the user waits for the entire pipeline to complete before seeing anything. There's no progress reporting during parsing. There's no way to cancel a build mid-flight. The watcher's simplified pipeline diverges from the main build path (different code, different edge cases).
231+
**Problem:** For large repos (10K+ files), the user waits for the entire pipeline to complete before seeing anything. There's no progress reporting during parsing. There's no way to cancel a build mid-flight. The watcher's simplified pipeline diverges from the main build path (different code, different edge cases). *(Note: two concrete edge cases — concurrent file edits causing EBUSY/EACCES during read, and symlink loops causing infinite recursion in `collectFiles` — have been fixed. `readFileSafe` retries on transient OS errors and is shared between `builder.js` and `watcher.js`. `collectFiles` tracks visited real paths to break symlink cycles.)*
232232
233233
**Ideal architecture — event-driven pipeline:**
234234
@@ -473,6 +473,8 @@ This is a simple LRU or TTL cache that sits between the analysis layer and the r
473473
474474
**Problem:** Bug fixes to edge building in `builder.js` must be separately applied to `watcher.js`. The watcher's edge building is simpler (no barrel resolution, simpler confidence) which means watch-mode graphs are subtly different from full-build graphs.
475475
476+
**Partial progress:** `readFileSafe` (exported from `builder.js`, imported by `watcher.js`) is the first shared utility between the two modules. It retries on transient OS errors (EBUSY/EACCES/EPERM) that occur when editors perform non-atomic saves, replacing bare `readFileSync` calls in both code paths. This is a small step toward the shared-stages goal.
477+
476478
**Ideal fix:** The pipeline architecture from point #4 eliminates this entirely. Watch mode uses the same pipeline stages, just triggered per-file instead of per-project. The `insertNodes` and `buildEdges` stages are literally the same functions.
477479
478480
---
@@ -583,7 +585,7 @@ Consumers can only import from the documented entry points. Internal modules are
583585
| 9 | Transitive import-aware confidence | Low-Medium | Accuracy |
584586
| 14 | Query result caching | Low | Performance |
585587
| 8 | Config profiles for monorepos | Low | Feature |
586-
| 15 | Unify watcher/builder code paths | Low | Falls out of #4 |
588+
| 15 | Unify watcher/builder code paths | Low | Falls out of #4 (partial: `readFileSafe` shared) |
587589
588590
Items 1–4 and 6 are foundational — they restructure the core and everything else becomes easier after them. Items 13 and 7 are the most impactful feature-level changes. Items 14–15 are natural consequences of earlier changes.
589591

src/builder.js

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,28 @@ const BUILTIN_RECEIVERS = new Set([
4343
'require',
4444
]);
4545

46-
export function collectFiles(dir, files = [], config = {}, directories = null) {
46+
export function collectFiles(
47+
dir,
48+
files = [],
49+
config = {},
50+
directories = null,
51+
_visited = new Set(),
52+
) {
4753
const trackDirs = directories !== null;
54+
55+
// Resolve real path to detect symlink loops
56+
let realDir;
57+
try {
58+
realDir = fs.realpathSync(dir);
59+
} catch {
60+
return trackDirs ? { files, directories } : files;
61+
}
62+
if (_visited.has(realDir)) {
63+
warn(`Symlink loop detected, skipping: ${dir}`);
64+
return trackDirs ? { files, directories } : files;
65+
}
66+
_visited.add(realDir);
67+
4868
let entries;
4969
try {
5070
entries = fs.readdirSync(dir, { withFileTypes: true });
@@ -67,7 +87,7 @@ export function collectFiles(dir, files = [], config = {}, directories = null) {
6787

6888
const full = path.join(dir, entry.name);
6989
if (entry.isDirectory()) {
70-
collectFiles(full, files, config, directories);
90+
collectFiles(full, files, config, directories, _visited);
7191
} else if (EXTENSIONS.has(path.extname(entry.name))) {
7292
files.push(full);
7393
hasFiles = true;
@@ -125,6 +145,27 @@ function fileStat(filePath) {
125145
}
126146
}
127147

148+
/**
149+
* Read a file with retry on transient errors (EBUSY/EACCES/EPERM).
150+
* Editors performing non-atomic saves can cause these during mid-write.
151+
*/
152+
const TRANSIENT_CODES = new Set(['EBUSY', 'EACCES', 'EPERM']);
153+
const RETRY_DELAY_MS = 50;
154+
155+
export function readFileSafe(filePath, retries = 2) {
156+
for (let attempt = 0; ; attempt++) {
157+
try {
158+
return fs.readFileSync(filePath, 'utf-8');
159+
} catch (err) {
160+
if (attempt < retries && TRANSIENT_CODES.has(err.code)) {
161+
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, RETRY_DELAY_MS);
162+
continue;
163+
}
164+
throw err;
165+
}
166+
}
167+
}
168+
128169
/**
129170
* Determine which files have changed since last build.
130171
* Three-tier cascade:
@@ -193,7 +234,7 @@ function getChangedFiles(db, allFiles, rootDir) {
193234

194235
let content;
195236
try {
196-
content = fs.readFileSync(absPath, 'utf-8');
237+
content = readFileSafe(absPath);
197238
} catch {
198239
continue;
199240
}
@@ -256,7 +297,7 @@ function getChangedFiles(db, allFiles, rootDir) {
256297
for (const item of needsHash) {
257298
let content;
258299
try {
259-
content = fs.readFileSync(item.file, 'utf-8');
300+
content = readFileSafe(item.file);
260301
} catch {
261302
continue;
262303
}
@@ -459,7 +500,7 @@ export async function buildGraph(rootDir, opts = {}) {
459500
const absPath = path.join(rootDir, relPath);
460501
let code;
461502
try {
462-
code = fs.readFileSync(absPath, 'utf-8');
503+
code = readFileSafe(absPath);
463504
} catch {
464505
code = null;
465506
}

src/watcher.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import fs from 'node:fs';
22
import path from 'node:path';
3+
import { readFileSafe } from './builder.js';
34
import { EXTENSIONS, IGNORE_DIRS, normalizePath } from './constants.js';
45
import { initSchema, openDb } from './db.js';
56
import { appendJournalEntries } from './journal.js';
@@ -35,7 +36,7 @@ async function updateFile(_db, rootDir, filePath, stmts, engineOpts, cache) {
3536

3637
let code;
3738
try {
38-
code = fs.readFileSync(filePath, 'utf-8');
39+
code = readFileSafe(filePath);
3940
} catch (err) {
4041
warn(`Cannot read ${relPath}: ${err.message}`);
4142
return null;

0 commit comments

Comments
 (0)