fix(backend): add filesystem-first scan to GC to handle orphaned disk resources#973
Conversation
This comment has been minimized.
This comment has been minimized.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a filesystem-first garbage collection step at scheduler startup: new utilities extract repo IDs from disk paths, RepoIndexManager scans repo and index cache directories for orphaned entries (no DB record) and deletes them; startup now awaits this cleanup before scheduling. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler
participant RepoIndexManager
participant FileSystem as File System
participant Database
Scheduler->>RepoIndexManager: startScheduler()
activate RepoIndexManager
RepoIndexManager->>RepoIndexManager: cleanupOrphanedDiskResources()
Note over RepoIndexManager: Scan REPOS_CACHE_DIR
RepoIndexManager->>FileSystem: readdir(REPOS_CACHE_DIR)
FileSystem-->>RepoIndexManager: [repo_dirs]
loop each repo_dir
RepoIndexManager->>RepoIndexManager: getRepoIdFromPath(dir)
RepoIndexManager->>Database: repoExists(repoId)?
alt not found
RepoIndexManager->>FileSystem: delete repo directory
end
end
Note over RepoIndexManager: Scan INDEX_CACHE_DIR
RepoIndexManager->>FileSystem: readdir(INDEX_CACHE_DIR)
FileSystem-->>RepoIndexManager: [shard_files]
loop each shard_file
RepoIndexManager->>RepoIndexManager: getRepoIdFromShardFileName(file)
RepoIndexManager->>Database: repoExists(repoId)?
alt not found
RepoIndexManager->>FileSystem: delete shard file
end
end
deactivate RepoIndexManager
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… resources Adds cleanupOrphanedDiskResources() which walks the repos/ and index/ directories on disk and removes any entries with no corresponding Repo record in the database. This handles desyncs caused by DB resets or cascade deletes that bypass the normal cleanup job flow. Also adds getRepoIdFromPath() to @sourcebot/shared and getRepoIdFromShardFileName() to the backend utils as inverse helpers to the existing getRepoPath() and getShardPrefix() functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0423e5c to
feef580
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/backend/src/repoIndexManager.ts`:
- Around line 657-675: The orphan shard cleanup currently does a per-entry DB
lookup (this.db.repo.findUnique) causing N+1 queries; instead, collect all
repoIds by mapping entries through getRepoIdFromShardFileName, deduplicate them,
fetch existing repos once via this.db.repo.findMany({ where: { id: { in: [...] }
} }), build a Set of existing ids, and then iterate entries removing files whose
repoId is missing from that Set; apply the same batching approach used for repo
directories under INDEX_CACHE_DIR to avoid per-entry DB calls.
In `@packages/shared/src/utils.ts`:
- Around line 90-93: The getRepoIdFromPath function uses parseInt which accepts
trailing characters; change it to validate the basename is strictly numeric
before parsing (e.g., test path.basename(repoPath) against a /^\d+$/ check) and
only then convert to a number (or use Number after validation) so names like
"123_tmp" return undefined; update getRepoIdFromPath to perform this strict
check and return undefined for non-pure-numeric basenames.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGELOG.mdpackages/backend/src/repoIndexManager.tspackages/backend/src/utils.tspackages/shared/src/index.server.tspackages/shared/src/utils.ts
Summary
cleanupOrphanedDiskResources()toRepoIndexManagerwhich walks therepos/andindex/directories on disk and removes any entries with no correspondingReporecord in the databasefindManyquery for both repo directories and index shards instead of per-entryfindUniquecalls (avoids N+1).sourcebot/persists on disk)getRepoIdFromPath()to@sourcebot/shared(inverse ofgetRepoPath()) andgetRepoIdFromShardFileName()to backend utils (inverse ofgetShardPrefix())Test plan
.sourcebot/repos/and.sourcebot/index/contain data, restart the backend, and verify orphaned directories and shards are removed on startup🤖 Generated with Claude Code
Summary by CodeRabbit