Let callers opt out of the GitProvenance committer-history walk#7910
Merged
Conversation
`GitProvenance.fromProjectDirectory(...)` unconditionally walked the entire commit history to compute the committer list, which is pure waste for callers that only consume origin/branch/change. Profiling CommonStaticAnalysis over a ~40k-commit repository attributed ~11% of the run to this walk. Add a `fromProjectDirectory(Path, BuildEnvironment, GitRemote.Parser, boolean includeCommitters)` overload that threads the flag down to `fromGitConfig`, substituting an empty committers list when `false`. Existing overloads delegate with `true`, so current behavior is preserved. Everything else (origin, branch, change, autocrlf, eol, remote) is unchanged, leaving `equals`, `RepositoryId` derivation, and serialized markers byte-identical for committer-consuming callers.
3 tasks
jkschneider
added a commit
that referenced
this pull request
Jun 6, 2026
…ory` options type (#7921) PR #7910 added a `boolean includeCommitters` overload to `GitProvenance.fromProjectDirectory(...)` to let callers skip the unbounded `git.log()` walk that populates `getCommitters()`. A boolean only offers all-or-nothing and masks the real question: how much history, at what detail? Replace it (the boolean is unreleased) with `GitProvenance.CommitHistory`, a factory-input value type with named presets on two axes: - Scope (how far the walk goes): `none()`, `full()`, `since(date)` / `sinceDaysAgo(n)`, `lastCommits(n)`. `since` and `lastCommits` genuinely prune the JGit walk (CommitTimeRevFilter.after / setMaxCount both StopWalk), so they are real CPU savings, not output filters. - Detail (how much per-committer data is retained, only when walking): `COMMITTERS` (identities) or `COMMITS_BY_DAY` (full per-day breakdown), overridable via `withDetail(...)`. The off-switch lives on the scope axis, so no nonsense combination like `since(date).withDetail(NONE)` is expressible. The cheap fields (origin, branch, change, autocrlf, eol, remote) are always computed. `none()` now yields `committers == null` ("not computed"), distinct from `emptyList()` ("walked, found nobody"), keeping `FindCommitters`'s non-null latch correct. The 3-arg overload keeps full-history behavior. Also: - Support linked git worktrees. The shaded JGit (5.13) has no `commondir` support and reports a worktree's private gitdir as bare with no refs/objects, so `fromProjectDirectory` previously returned null on any worktree checkout. Detect the worktree, open the shared common repository (objects/refs/config), and recover this worktree's branch/HEAD from its own HEAD file. - `FindCommitters`/`DistinctCommitters` tolerate identities-only committers (empty per-day map -> null last-commit; not dropped by a date filter). - Add `GitProvenanceBenchmark` (rewrite-benchmarks) measuring the cost curve against a deep-history repo. Against openrewrite/rewrite's own history: none() ~0.7ms, lastCommits(100) ~10ms, sinceDaysAgo(90) ~20ms, full() ~80ms.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
GitProvenance.fromProjectDirectory(...)unconditionally callsgetCommitters(repository), which does an unboundedgit.log()walk over the entire commit history. There is no flag, env gate, or bound on it.Many callers consume only
getOrigin()/getRepositoryOrigin()/getRepositoryPath()/getBranch()/getChange()and never read committers — for them the walk is pure waste. CPU profiling ofCommonStaticAnalysisover a ~40k-commit repository (spring-boot) attributed ~11% of the whole run toGitProvenance.getCommitters→ JGitPackIndexbinary search. This PR lets such callers opt out of the walk.Examples
Summary
fromProjectDirectory(Path, BuildEnvironment, GitRemote.Parser, boolean includeCommitters).includeCommitters = true, so all existing overloads keep current behavior.includeCommittersthrough both privatefromGitConfigmethods (including the Jenkins branch). Whenfalse, the committer field is populated withemptyList()instead ofgetCommitters(repository), skipping the history walk.GitProvenance.equals,RepositoryIdderivation, and serialized markers stay byte-identical for committer-consuming callers.Out of scope / follow-ups
FindCommitters) bounds to 90 days; adding asince/limit parameter could be a follow-up.Test plan
GitProvenanceTest.excludeCommittersMatchesIncludingPath: on a normal branch checkout, asserts the opt-out overload produces aGitProvenancewhoseorigin/branch/changematch the committer-including path, with a non-empty committers list on the including path (proving the walk ran) and an empty list on the opt-out path (proving it was skipped).GitProvenanceTeststays green: 59 tests, 2@Disabledskipped, 0 failures, 0 errors.