-
Notifications
You must be signed in to change notification settings - Fork 169
Description
Describe the bug
When syncing a generic-git-host connection with a large number of local repositories (~14,000+ in my case), sourcebot skips thousands of repositories with the warning "Skipping [path] - not a git repository" even though the directories are valid git repositories. This occurs during connection compilation in compileGenericGitHostConfig_file.
Root Cause:
The issue is caused by attempting to validate all repositories simultaneously using Promise.all in packages/backend/src/repoCompileUtils.ts at line 485:
await Promise.all(repoPaths.map(async (repoPath) => {
const isGitRepo = await isPathAValidGitRepoRoot({
path: repoPath,
});
// ...
}));When processing 14,000+ repositories, this creates 14,000+ concurrent git processes. The system runs out of resources (process/file descriptor limits), causing git operations to fail with:
GitError: Error: spawn git EAGAIN
The isPathAValidGitRepoRoot function then returns false, causing valid repositories to be skipped.
To reproduce
- Create a generic-git-host connection with a large number of local repositories:
{
"connections": {
"my-local-repos": {
"type": "git",
"url": "file:///repos/**/*"
}
}
}-
Ensure the glob pattern matches 10,000+ git repositories
-
Start sourcebot and wait for connection sync to complete
-
Check the logs for warnings:
docker logs <container> 2>&1 | grep "Skipping.*not a git repository" | wc -lThis will show thousands of skipped repositories.
- Verify that the skipped paths are actually valid git repositories:
docker exec <container> git -C /path/to/skipped/repo status
docker exec <container> git -C /path/to/skipped/repo rev-parse --show-cdupBoth commands work fine, confirming these are valid git repos.
- (Optional) To reproduce the EAGAIN error directly, run this test in the container:
const { readdir } = require('fs/promises');
const simpleGit = require('simple-git');
const path = require('path');
(async () => {
const basePath = '/repos/';
const shards = await readdir(basePath);
let allRepoPaths = [];
for (const shard of shards) {
const shardPath = path.join(basePath, shard);
const repos = await readdir(shardPath);
allRepoPaths = allRepoPaths.concat(
repos.map(repo => path.join(shardPath, repo))
);
}
console.log(`Testing ${allRepoPaths.length} repos...`);
// This will fail with EAGAIN:
await Promise.all(allRepoPaths.map(async (repoPath) => {
const git = simpleGit.simpleGit().cwd({ path: repoPath });
return await git.checkIsRepo(simpleGit.CheckRepoActions.IS_REPO_ROOT);
}));
})();Sourcebot deployment information
Sourcebot version (e.g. v3.0.1): v4.8.1
Additional information
Proposed Fix
Process repositories in batches instead of all at once. Replace the unbounded Promise.all with a batched approach:
const repos: RepoData[] = [];
const warnings: string[] = [];
// Process repos in batches to avoid resource exhaustion (EAGAIN errors)
// when checking thousands of repos simultaneously
const BATCH_SIZE = 100;
for (let i = 0; i < repoPaths.length; i += BATCH_SIZE) {
const batch = repoPaths.slice(i, i + BATCH_SIZE);
logger.debug(`Processing repo batch ${Math.floor(i / BATCH_SIZE) + 1}/${Math.ceil(repoPaths.length / BATCH_SIZE)} (${batch.length} repos)`);
await Promise.all(batch.map(async (repoPath) => {
const isGitRepo = await isPathAValidGitRepoRoot({
path: repoPath,
});
// ... rest of existing logic
}));
}This approach:
- Limits concurrent git processes to 100 at a time (configurable)
- Prevents system resource exhaustion
- Maintains all existing validation logic
Additional Notes
- The issue manifests as intermittent behavior because resource availability varies
- System limits (ulimit -n, process limits) don't fully prevent the issue
- The batch size of 100 is conservative; could potentially be increased to 200-500, even better, make it configurable
- This same pattern may exist in other parts of the codebase that process large numbers of items concurrently
I have tested the fix locally and it successfully processes all +14,000 repositories without any skipping.