Skip to content

refactor(GitService): make getStatus faster#1265

Merged
arnestrickmann merged 2 commits intogeneralaction:mainfrom
anuragts:as/improve-git-diff-perf
Mar 4, 2026
Merged

refactor(GitService): make getStatus faster#1265
arnestrickmann merged 2 commits intogeneralaction:mainfrom
anuragts:as/improve-git-diff-perf

Conversation

@anuragts
Copy link
Contributor

@anuragts anuragts commented Mar 4, 2026

Problem

getStatus() spawns 2 git diff --numstat processes per file in a sequential loop. 100 files = 200 process
spawns, blocking the main thread.

Fix

#1264
Run 2 total git diff --numstat calls (staged + unstaged) for all files at once via Promise.all(), parse output
into a Map, then look up per file in O(1).

Benchmark

Changed files Before After Speedup
10 306ms 41ms 7.5x
25 660ms 41ms 16x
50 1,319ms 41ms 32x
100 2,771ms 41ms 67x

O(n) → O(1). All 391 tests pass.

Old -

old.mp4

New -

CleanShot.2026-03-04.at.07.04.45.mp4

…tracking and make it fast by extraction diff from one api call instead of looped call
@vercel
Copy link

vercel bot commented Mar 4, 2026

@anuragts is attempting to deploy a commit to the General Action Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR refactors getStatus in GitService.ts to replace the previous O(n) sequential per-file git diff --numstat calls (2 spawns per file) with two bulk Promise.all calls — one staged, one unstaged — followed by O(1) map lookups per file. The approach is architecturally sound and delivers the claimed speedups for large working-tree diffs.

Key changes:

  • Two total execFileAsync calls replace 2N per-file calls; untracked file line-counting is also parallelised.
  • A new parseNumstatMap helper parses git diff --numstat output into a Map<filePath, {add, del}>.
  • Results are assembled in a single entries.map(...) pass using the two maps.

Issue found:

  • git diff --numstat --cached (without a path filter) uses rename detection by default and emits renamed-file entries using the combined old => new or {old => new} format as the path field. parseNumstatMap stores that raw string as the map key, but the lookup key is always the new path extracted from git status --porcelain. For files that are renamed and modified, stagedMap.get(newPath) returns undefined and the additions/deletions are silently reported as 0 — a behavioural regression vs the old per-file calls, where specifying -- newPath forced git to emit just the new path.

Confidence Score: 3/5

  • Safe to merge for most workflows, but introduces a behavioural regression for staged renames-with-modifications due to a git numstat path-format mismatch.
  • The bulk-numstat approach is correct and well-implemented for the common cases (new, modified, deleted, and untracked files). The performance improvement is real and substantial. However, git diff --numstat --cached emits rename entries using old => new path notation when rename detection is active (the default), while the lookup key is always the bare new path. For any file that is both renamed and modified in a single stage operation, additions/deletions will be reported as 0 instead of their actual values. This is a silent regression with no error or warning. The fix is straightforward (normalise the rename notation in parseNumstatMap) but the issue should be addressed before merging.
  • src/main/services/GitService.ts — specifically the parseNumstatMap helper and its interaction with git's rename-detection output format.

Important Files Changed

Filename Overview
src/main/services/GitService.ts Refactors getStatus to run two bulk git diff --numstat calls instead of 2N per-file calls, with a map-based O(1) lookup per entry. Significant performance improvement, but the batch command's rename-detection output format (old => new) doesn't match the new-path key used for lookup, causing additions/deletions to be silently zeroed for renamed+modified files.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant GS as getStatus
    participant Git as git (child process)
    participant FS as Filesystem

    Note over C,FS: ── Before (O(n) spawns) ──
    C->>GS: getStatus(taskPath)
    GS->>Git: git status --porcelain
    Git-->>GS: N status lines
    loop for each of N files
        GS->>Git: git diff --numstat --cached -- fileN
        Git-->>GS: staged stats
        GS->>Git: git diff --numstat -- fileN
        Git-->>GS: unstaged stats
    end
    GS-->>C: GitChange[]

    Note over C,FS: ── After (O(1) spawns) ──
    C->>GS: getStatus(taskPath)
    GS->>Git: git status --porcelain
    Git-->>GS: N status lines
    par staged batch
        GS->>Git: git diff --numstat --cached
        Git-->>GS: all staged stats
    and unstaged batch
        GS->>Git: git diff --numstat
        Git-->>GS: all unstaged stats
    end
    GS->>GS: parseNumstatMap → stagedMap, unstagedMap
    opt untracked files (in parallel)
        GS->>FS: countFileNewlinesCapped (×M)
        FS-->>GS: line counts
    end
    GS->>GS: entries.map → lookup in maps O(1)
    GS-->>C: GitChange[]
Loading

Last reviewed commit: d5c2bc6

Comment on lines +122 to +142
const parseNumstatMap = (stdout: string): Map<string, { add: number; del: number }> => {
const map = new Map<string, { add: number; del: number }>();
if (!stdout || !stdout.trim()) return map;
for (const line of stdout.trim().split('\n')) {
if (!line.trim()) continue;
const parts = line.split('\t');
if (parts.length >= 3) {
const add = parts[0] === '-' ? 0 : parseInt(parts[0], 10) || 0;
const del = parts[1] === '-' ? 0 : parseInt(parts[1], 10) || 0;
const file = parts.slice(2).join('\t');
const existing = map.get(file);
if (existing) {
existing.add += add;
existing.del += del;
} else {
map.set(file, { add, del });
}
}
};
}
return map;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed files with modifications will silently report 0 additions/deletions

When git diff --numstat --cached is invoked without a path filter, git's rename detection is active by default. For a staged rename-with-modification, the output line looks like:

5	3	old_file.ts => new_file.ts

or in the in-place format:

5	3	src/components/{OldButton => NewButton}.tsx

parseNumstatMap stores the key as the full third tab-field ("old_file.ts => new_file.ts" or "src/components/{OldButton => NewButton}.tsx"). The corresponding entries entry, however, has filePath = "new_file.ts" (parsed from git status --porcelain). Because stagedMap.get("new_file.ts") finds nothing, additions and deletions fall back to 0 + 0 = 0.

In the old per-file approach, git diff --numstat --cached -- new_file.ts forced git to emit just new_file.ts as the key, so the lookup always matched.

A fix is to normalise rename notation when inserting into the map:

// After: const file = parts.slice(2).join('\t');
let resolvedFile = file;
if (file.includes(' => ')) {
  // "old => new"  or  "prefix/{old => new}/suffix"
  resolvedFile = file
    .replace(/\{[^}]+ => ([^}]+)\}/g, '$1')  // in-place: {old => new}
    .replace(/^.+ => (.+)$/, '$1');           // full rename: old => new
}
const existing = map.get(resolvedFile);
if (existing) { ... } else { map.set(resolvedFile, { add, del }); }

Note: pure renames (zero content changes) produce 0\t0\t..., so the zero-fallback happens to be correct for them — but any rename-with-modification (which refactoring tools frequently produce) will show incorrect stats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, fixed

@arnestrickmann
Copy link
Contributor

Thanks for the PR! @anuragts

@arnestrickmann arnestrickmann merged commit 6fe3307 into generalaction:main Mar 4, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants