perf: make updater faster by streaming instead of reading into memory#3630
Conversation
…g hidden/large dirs (node_modules, vendor, docs, etc.) to avoid OOMs and hangs; frees ASTs immediately. - ts-morph dependency bumped to ^27.0.2.
| let totalFiles = 0; | ||
| let userFileCount = 0; | ||
| for await ( | ||
| const entry of walk(dir, { | ||
| includeDirs: false, | ||
| includeFiles: true, | ||
| exts: ["js", "jsx", "ts", "tsx"], | ||
| skip: [SKIP_DIRS], | ||
| }) | ||
| ) { | ||
| totalFiles++; | ||
| if (!HIDE_FILES.test(entry.path)) userFileCount++; | ||
| } |
There was a problem hiding this comment.
Let's store the found paths, so that we don't need to walk the file system again a second time.
There was a problem hiding this comment.
Agreed. I just pushed a new commit.
…again a second time
| const HIDE_FILES = /[\\/]+(node_modules|vendor)[\\/]+/; | ||
| // Directories we should completely skip when walking the tree. This keeps us | ||
| // from loading vendored dependencies (often thousands of files) into ts-morph. | ||
| const SKIP_DIRS = |
There was a problem hiding this comment.
What's the difference between HIDE_FILES and SKIP_DIRS? Aren't both used for the same thing? Let's unify that.
bartlomieju
left a comment
There was a problem hiding this comment.
Good fix for a real OOM problem. The core idea — stream files one-by-one via walk + forget()/removeSourceFile() instead of loading everything via addSourceFilesAtPaths — is solid. A few things to address:
1. userFileCount vs totalFiles are always equal (logic bug)
Since walk is called with skip: [SKIP_DIRS], no file from a skipped directory will ever appear in filesToProcess. That means the if (!SKIP_DIRS.test(entry.path)) userFileCount++ check will always be true for every entry, so userFileCount === totalFiles always. The distinction between the two is dead code after the refactor.
Same applies to the modifiedFilesToShow filter — it's filtering against paths that can never match since walk already excluded them.
You can simplify by dropping userFileCount entirely and just using filesToProcess.length.
2. docs in SKIP_DIRS is too aggressive
```
node_modules|vendor|docs|.git|.next|.turbo|_fresh|dist|build|target|.cache
```
A user's Fresh 1.x project could easily have a docs/ directory containing actual .tsx routes or components that need migration. Unlike node_modules/vendor/dist/build which contain generated or third-party code, docs is a common name for user-authored content. I'd remove docs from this list.
3. The old code comment said "sequential" but was actually parallel
The original code had:
// process files sequentially to show proper progress
await Promise.all(sfs.map(async (sourceFile) => { ... }));That was parallel via Promise.all, not sequential. The new for loop actually is sequential — good fix, but worth noting this was also a latent bug (progress bar was racing).
4. Minor: build and target in skip list
build is a pretty common directory name that could appear in user projects (e.g., routes/build/index.tsx). Consider whether matching it only at the project root level would be safer, or just remove it. target (Rust convention) is unlikely to contain TS/TSX files so that's fine.
5. Test change looks correct
The updated expectations for node_modules/foo/bar.ts and vendor/foo/bar.ts to retain old imports (not be transformed) is the right behavior — these files should now be skipped entirely.
Overall this is a good improvement. The main things to fix are #1 (dead code from the refactor) and #2 (docs being too aggressive).
- Remove `docs` and `build` from SKIP_DIRS — these are common names for user-authored directories that may need migration - Remove redundant `userFileCount`/`totalFiles`/`modifiedFilesToShow` variables — since `walk` already skips excluded dirs, the post-walk filtering was dead code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix system crash (OOM) caused by updater spending time in folders like docs/
Closes #3483