fix(diff): skip huge file rendering#266
Conversation
Greptile SummaryThis PR guards against large untracked files stalling startup by short-circuiting the expensive
Confidence Score: 3/5The call-stack fix and placeholder rendering are safe to merge, but the line-counting path can still block the process for a long time on multi-hundred-MB untracked files. For files that exceed the 1 MB byte-size threshold, src/core/loaders.ts — specifically Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[buildUntrackedDiffFile] --> B{shouldSkipLargeUntrackedFile?}
B -->|no| C[runGitUntrackedFileDiffText and parse patch]
C --> D[buildDiffFile normal path]
B -->|size over 1MB| E[countLinesInFile reads ENTIRE file]
B -->|lines in 256KB over 20k| F[countNewlinesInFilePrefix reads 256KB]
F --> E
E --> G[createSkippedLargeUntrackedMetadata]
G --> H[buildDiffFile with isTooLarge=true]
D --> I[DiffFile]
H --> I
I --> J{diffMessage in renderRows}
J -->|isTooLarge| K[File too large to render message]
J -->|isBinary| L[Binary file skipped]
J -->|normal| M[render hunks]
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
src/core/loaders.ts:434
**`countLinesInFile` reads the entire file unconditionally for all skipped files**
`countLinesInFile` is called for every file that `shouldSkipLargeUntrackedFile` flags — including files that tripped the 1 MB byte-size threshold. A user with a 2 GB log file as an untracked worktree entry will hit `shouldSkipLargeUntrackedFile → true` in milliseconds (stat), then spend tens of seconds blocked in `countLinesInFile` reading all 2 GB synchronously in 64 KB chunks before the UI renders. The original goal of the fix — eliminating startup hangs caused by huge untracked files — is therefore still broken for the most extreme cases. Consider capping the read inside `countLinesInFile` to a reasonable byte limit (e.g. `LARGE_UNTRACKED_FILE_MAX_BYTES`) and returning a `null` / `undefined` stat when the full count can't be determined, or deriving an estimate from `stat.size`.
### Issue 2 of 3
src/core/loaders.ts:423-438
**Double file I/O for line-density-detected large files**
When a file is smaller than `LARGE_UNTRACKED_FILE_MAX_BYTES` (1 MB) but has high line density, the path is: `statSync` → `countNewlinesInFilePrefix` (reads up to 256 KB) → function returns `true` → `countLinesInFile` (re-opens and reads the whole file from byte 0). For the regression test case (200 KB, 100k lines) that means reading the file twice in full. Returning the partial newline count from the sniff pass (or threading an `fd` through) would avoid the redundant second open.
### Issue 3 of 3
src/ui/diff/renderRows.tsx:547
The `isTooLarge` message only describes the `--exclude-untracked` flag as a workaround, but `isTooLarge` applies to untracked files specifically — a user who has no idea their file is untracked may find the hint confusing or inapplicable. Aligning the message with the binary-skipped pattern (which omits workaround hints) would be cleaner, or the hint could mention that the file is untracked.
```suggestion
return "Untracked file too large to render. Use --exclude-untracked to hide large untracked files.";
```
Reviews (1): Last reviewed commit: "fix(diff): skip huge untracked file rend..." | Re-trigger Greptile |
| isTooLarge: true, | ||
| isUntracked: true, | ||
| stats: { | ||
| additions: countLinesInFile(join(repoRoot, filePath)), |
There was a problem hiding this comment.
countLinesInFile reads the entire file unconditionally for all skipped files
countLinesInFile is called for every file that shouldSkipLargeUntrackedFile flags — including files that tripped the 1 MB byte-size threshold. A user with a 2 GB log file as an untracked worktree entry will hit shouldSkipLargeUntrackedFile → true in milliseconds (stat), then spend tens of seconds blocked in countLinesInFile reading all 2 GB synchronously in 64 KB chunks before the UI renders. The original goal of the fix — eliminating startup hangs caused by huge untracked files — is therefore still broken for the most extreme cases. Consider capping the read inside countLinesInFile to a reasonable byte limit (e.g. LARGE_UNTRACKED_FILE_MAX_BYTES) and returning a null / undefined stat when the full count can't be determined, or deriving an estimate from stat.size.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/loaders.ts
Line: 434
Comment:
**`countLinesInFile` reads the entire file unconditionally for all skipped files**
`countLinesInFile` is called for every file that `shouldSkipLargeUntrackedFile` flags — including files that tripped the 1 MB byte-size threshold. A user with a 2 GB log file as an untracked worktree entry will hit `shouldSkipLargeUntrackedFile → true` in milliseconds (stat), then spend tens of seconds blocked in `countLinesInFile` reading all 2 GB synchronously in 64 KB chunks before the UI renders. The original goal of the fix — eliminating startup hangs caused by huge untracked files — is therefore still broken for the most extreme cases. Consider capping the read inside `countLinesInFile` to a reasonable byte limit (e.g. `LARGE_UNTRACKED_FILE_MAX_BYTES`) and returning a `null` / `undefined` stat when the full count can't be determined, or deriving an estimate from `stat.size`.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed by bounding skipped untracked-file line counting to the large-file byte limit. When the count is incomplete, Hunk now marks stats as truncated and renders the addition badge with a trailing + instead of synchronously reading the whole file. Added a regression test for the byte-threshold path.
This comment was generated by Pi using OpenAI GPT-5
| if (shouldSkipLargeUntrackedFile(repoRoot, filePath)) { | ||
| return buildDiffFile( | ||
| createSkippedLargeUntrackedMetadata(filePath), | ||
| "", | ||
| index, | ||
| sourcePrefix, | ||
| agentContext, | ||
| { | ||
| isTooLarge: true, | ||
| isUntracked: true, | ||
| stats: { | ||
| additions: countLinesInFile(join(repoRoot, filePath)), | ||
| deletions: 0, | ||
| }, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Double file I/O for line-density-detected large files
When a file is smaller than LARGE_UNTRACKED_FILE_MAX_BYTES (1 MB) but has high line density, the path is: statSync → countNewlinesInFilePrefix (reads up to 256 KB) → function returns true → countLinesInFile (re-opens and reads the whole file from byte 0). For the regression test case (200 KB, 100k lines) that means reading the file twice in full. Returning the partial newline count from the sniff pass (or threading an fd through) would avoid the redundant second open.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/core/loaders.ts
Line: 423-438
Comment:
**Double file I/O for line-density-detected large files**
When a file is smaller than `LARGE_UNTRACKED_FILE_MAX_BYTES` (1 MB) but has high line density, the path is: `statSync` → `countNewlinesInFilePrefix` (reads up to 256 KB) → function returns `true` → `countLinesInFile` (re-opens and reads the whole file from byte 0). For the regression test case (200 KB, 100k lines) that means reading the file twice in full. Returning the partial newline count from the sniff pass (or threading an `fd` through) would avoid the redundant second open.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed by reusing the bounded line-count pass for untracked large-file detection and placeholder stats. Line-density skips no longer reopen and reread the file from byte 0; incomplete counts are surfaced as truncated stats.
This comment was generated by Pi using OpenAI GPT-5
| } | ||
|
|
||
| if (file.isTooLarge) { | ||
| return "File too large to render. Use --exclude-untracked to hide large untracked files."; |
There was a problem hiding this comment.
The
isTooLarge message only describes the --exclude-untracked flag as a workaround, but isTooLarge applies to untracked files specifically — a user who has no idea their file is untracked may find the hint confusing or inapplicable. Aligning the message with the binary-skipped pattern (which omits workaround hints) would be cleaner, or the hint could mention that the file is untracked.
| return "File too large to render. Use --exclude-untracked to hide large untracked files."; | |
| return "Untracked file too large to render. Use --exclude-untracked to hide large untracked files."; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ui/diff/renderRows.tsx
Line: 547
Comment:
The `isTooLarge` message only describes the `--exclude-untracked` flag as a workaround, but `isTooLarge` applies to untracked files specifically — a user who has no idea their file is untracked may find the hint confusing or inapplicable. Aligning the message with the binary-skipped pattern (which omits workaround hints) would be cleaner, or the hint could mention that the file is untracked.
```suggestion
return "Untracked file too large to render. Use --exclude-untracked to hide large untracked files.";
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Updated the placeholder copy to be generic: File too large to render automatically. The placeholder now applies to both tracked and untracked large diffs, so the previous --exclude-untracked hint was too specific.
This comment was generated by Pi using OpenAI GPT-5
18b5e32 to
9810196
Compare
9810196 to
35c3f3e
Compare
Summary
git diffpatch so large changes do not slow startup.Fixes #218.
Testing
bun run format:check bun run typecheck bun run lint bun test src/core/loaders.test.ts src/ui/diff/codeColumns.test.ts src/ui/lib/ui-lib.test.ts bun run scripts/test-large-untracked-render.tsx 700000 bun run scripts/test-large-untracked-render.tsx 700000 trackedThis PR description was generated by Pi using OpenAI GPT-5