Skip to content

[bug] Promise.all resource exhaustion causes thousands of repos to be skipped during generic-git-host connection sync #590

@derek-miller

Description

@derek-miller

Describe the bug

When syncing a generic-git-host connection with a large number of local repositories (~14,000+ in my case), sourcebot skips thousands of repositories with the warning "Skipping [path] - not a git repository" even though the directories are valid git repositories. This occurs during connection compilation in compileGenericGitHostConfig_file.

Root Cause:
The issue is caused by attempting to validate all repositories simultaneously using Promise.all in packages/backend/src/repoCompileUtils.ts at line 485:

await Promise.all(repoPaths.map(async (repoPath) => {
  const isGitRepo = await isPathAValidGitRepoRoot({
      path: repoPath,
  });
  // ...
}));

When processing 14,000+ repositories, this creates 14,000+ concurrent git processes. The system runs out of resources (process/file descriptor limits), causing git operations to fail with:

GitError: Error: spawn git EAGAIN

The isPathAValidGitRepoRoot function then returns false, causing valid repositories to be skipped.

To reproduce

  1. Create a generic-git-host connection with a large number of local repositories:
{
 "connections": {
   "my-local-repos": {
     "type": "git",
     "url": "file:///repos/**/*"
   }
 }
}
  1. Ensure the glob pattern matches 10,000+ git repositories

  2. Start sourcebot and wait for connection sync to complete

  3. Check the logs for warnings:

docker logs <container> 2>&1 | grep "Skipping.*not a git repository" | wc -l

This will show thousands of skipped repositories.

  1. Verify that the skipped paths are actually valid git repositories:
docker exec <container> git -C /path/to/skipped/repo status
docker exec <container> git -C /path/to/skipped/repo rev-parse --show-cdup

Both commands work fine, confirming these are valid git repos.

  1. (Optional) To reproduce the EAGAIN error directly, run this test in the container:
const { readdir } = require('fs/promises');
const simpleGit = require('simple-git');
const path = require('path');

(async () => {
 const basePath = '/repos/';
 const shards = await readdir(basePath);

 let allRepoPaths = [];
 for (const shard of shards) {
   const shardPath = path.join(basePath, shard);
   const repos = await readdir(shardPath);
   allRepoPaths = allRepoPaths.concat(
     repos.map(repo => path.join(shardPath, repo))
   );
 }

 console.log(`Testing ${allRepoPaths.length} repos...`);

 // This will fail with EAGAIN:
 await Promise.all(allRepoPaths.map(async (repoPath) => {
   const git = simpleGit.simpleGit().cwd({ path: repoPath });
   return await git.checkIsRepo(simpleGit.CheckRepoActions.IS_REPO_ROOT);
 }));
})();

Sourcebot deployment information

Sourcebot version (e.g. v3.0.1): v4.8.1

Additional information

Proposed Fix

Process repositories in batches instead of all at once. Replace the unbounded Promise.all with a batched approach:

const repos: RepoData[] = [];
const warnings: string[] = [];

// Process repos in batches to avoid resource exhaustion (EAGAIN errors)
// when checking thousands of repos simultaneously
const BATCH_SIZE = 100;
for (let i = 0; i < repoPaths.length; i += BATCH_SIZE) {
  const batch = repoPaths.slice(i, i + BATCH_SIZE);
  logger.debug(`Processing repo batch ${Math.floor(i / BATCH_SIZE) + 1}/${Math.ceil(repoPaths.length / BATCH_SIZE)} (${batch.length} repos)`);

  await Promise.all(batch.map(async (repoPath) => {
      const isGitRepo = await isPathAValidGitRepoRoot({
          path: repoPath,
      });
      // ... rest of existing logic
  }));
}

This approach:

  • Limits concurrent git processes to 100 at a time (configurable)
  • Prevents system resource exhaustion
  • Maintains all existing validation logic

Additional Notes

  • The issue manifests as intermittent behavior because resource availability varies
  • System limits (ulimit -n, process limits) don't fully prevent the issue
  • The batch size of 100 is conservative; could potentially be increased to 200-500, even better, make it configurable
  • This same pattern may exist in other parts of the codebase that process large numbers of items concurrently

I have tested the fix locally and it successfully processes all +14,000 repositories without any skipping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions