Skip to content

Let callers opt out of the GitProvenance committer-history walk#7910

Merged
knutwannheden merged 1 commit into
mainfrom
gitprovenance-opt-out-of-committer-walk
Jun 5, 2026
Merged

Let callers opt out of the GitProvenance committer-history walk#7910
knutwannheden merged 1 commit into
mainfrom
gitprovenance-opt-out-of-committer-walk

Conversation

@knutwannheden
Copy link
Copy Markdown
Contributor

Motivation

GitProvenance.fromProjectDirectory(...) unconditionally calls getCommitters(repository), which does an unbounded git.log() walk over the entire commit history. There is no flag, env gate, or bound on it.

Many callers consume only getOrigin()/getRepositoryOrigin()/getRepositoryPath()/getBranch()/getChange() and never read committers — for them the walk is pure waste. CPU profiling of CommonStaticAnalysis over a ~40k-commit repository (spring-boot) attributed ~11% of the whole run to GitProvenance.getCommitters → JGit PackIndex binary search. This PR lets such callers opt out of the walk.

Examples

// Existing behavior — walks full history to populate committers (unchanged):
GitProvenance withCommitters =
    GitProvenance.fromProjectDirectory(projectDir, environment, gitRemoteParser);

// New overload — skip the committer walk; getCommitters() returns an empty list.
// origin/branch/change/autocrlf/eol/remote are identical to the call above.
GitProvenance fast =
    GitProvenance.fromProjectDirectory(projectDir, environment, gitRemoteParser, /* includeCommitters */ false);

Summary

  • Added a public overload fromProjectDirectory(Path, BuildEnvironment, GitRemote.Parser, boolean includeCommitters).
  • The existing three-arg overload delegates to it with includeCommitters = true, so all existing overloads keep current behavior.
  • Threaded includeCommitters through both private fromGitConfig methods (including the Jenkins branch). When false, the committer field is populated with emptyList() instead of getCommitters(repository), skipping the history walk.
  • Everything else (origin, branch, change, autocrlf, eol, remote) is unchanged, so GitProvenance.equals, RepositoryId derivation, and serialized markers stay byte-identical for committer-consuming callers.

Out of scope / follow-ups

  • Even the committer-including path is unbounded. The only first-party consumer (FindCommitters) bounds to 90 days; adding a since/limit parameter could be a follow-up.
  • A downstream consumer can now drop its hand-rolled reimplementation of the cheap git-config read path in favor of this overload.

Test plan

  • New GitProvenanceTest.excludeCommittersMatchesIncludingPath: on a normal branch checkout, asserts the opt-out overload produces a GitProvenance whose origin/branch/change match the committer-including path, with a non-empty committers list on the including path (proving the walk ran) and an empty list on the opt-out path (proving it was skipped).
  • Existing GitProvenanceTest stays green: 59 tests, 2 @Disabled skipped, 0 failures, 0 errors.

`GitProvenance.fromProjectDirectory(...)` unconditionally walked the entire
commit history to compute the committer list, which is pure waste for callers
that only consume origin/branch/change. Profiling CommonStaticAnalysis over a
~40k-commit repository attributed ~11% of the run to this walk.

Add a `fromProjectDirectory(Path, BuildEnvironment, GitRemote.Parser, boolean
includeCommitters)` overload that threads the flag down to `fromGitConfig`,
substituting an empty committers list when `false`. Existing overloads delegate
with `true`, so current behavior is preserved. Everything else (origin, branch,
change, autocrlf, eol, remote) is unchanged, leaving `equals`, `RepositoryId`
derivation, and serialized markers byte-identical for committer-consuming
callers.
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite Jun 5, 2026
@knutwannheden knutwannheden merged commit f82452d into main Jun 5, 2026
1 check passed
@knutwannheden knutwannheden deleted the gitprovenance-opt-out-of-committer-walk branch June 5, 2026 10:52
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite Jun 5, 2026
jkschneider added a commit that referenced this pull request Jun 6, 2026
…ory` options type (#7921)

PR #7910 added a `boolean includeCommitters` overload to
`GitProvenance.fromProjectDirectory(...)` to let callers skip the unbounded
`git.log()` walk that populates `getCommitters()`. A boolean only offers
all-or-nothing and masks the real question: how much history, at what detail?

Replace it (the boolean is unreleased) with `GitProvenance.CommitHistory`, a
factory-input value type with named presets on two axes:

- Scope (how far the walk goes): `none()`, `full()`, `since(date)` /
  `sinceDaysAgo(n)`, `lastCommits(n)`. `since` and `lastCommits` genuinely prune
  the JGit walk (CommitTimeRevFilter.after / setMaxCount both StopWalk), so they
  are real CPU savings, not output filters.
- Detail (how much per-committer data is retained, only when walking):
  `COMMITTERS` (identities) or `COMMITS_BY_DAY` (full per-day breakdown),
  overridable via `withDetail(...)`.

The off-switch lives on the scope axis, so no nonsense combination like
`since(date).withDetail(NONE)` is expressible. The cheap fields (origin, branch,
change, autocrlf, eol, remote) are always computed. `none()` now yields
`committers == null` ("not computed"), distinct from `emptyList()` ("walked,
found nobody"), keeping `FindCommitters`'s non-null latch correct. The 3-arg
overload keeps full-history behavior.

Also:
- Support linked git worktrees. The shaded JGit (5.13) has no `commondir`
  support and reports a worktree's private gitdir as bare with no refs/objects,
  so `fromProjectDirectory` previously returned null on any worktree checkout.
  Detect the worktree, open the shared common repository (objects/refs/config),
  and recover this worktree's branch/HEAD from its own HEAD file.
- `FindCommitters`/`DistinctCommitters` tolerate identities-only committers
  (empty per-day map -> null last-commit; not dropped by a date filter).
- Add `GitProvenanceBenchmark` (rewrite-benchmarks) measuring the cost curve
  against a deep-history repo. Against openrewrite/rewrite's own history:
  none() ~0.7ms, lastCommits(100) ~10ms, sinceDaysAgo(90) ~20ms, full() ~80ms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant