Skip to content

Fix build cache determinism for reproducible builds#7173

Merged
timtebeek merged 4 commits intoopenrewrite:mainfrom
ribafish:fix/build-cache-determinism
Mar 31, 2026
Merged

Fix build cache determinism for reproducible builds#7173
timtebeek merged 4 commits intoopenrewrite:mainfrom
ribafish:fix/build-cache-determinism

Conversation

@ribafish
Copy link
Copy Markdown
Contributor

Summary

This PR fixes build cache non-determinism that causes cache misses when the same commit is built twice from different directories (e.g., Develocity build validation experiment 3). 11 tasks currently produce different cache keys between identical builds.

Changes

1. Version timestamps: LocalDateTime.now() / Instant.now() → git commit timestamp (3 files)

Files: rewrite-python/build.gradle.kts, rewrite-csharp/build.gradle.kts, rewrite-javascript/build.gradle.kts

Problem: Snapshot version strings are generated using LocalDateTime.now() or Instant.now() at build script evaluation time. When the same commit is built twice (even seconds apart), the timestamps differ, producing different META-INF/rewrite-*-version.txt content. This version file ends up in the compiled JAR classpath, causing cache key misses for every downstream task that depends on it.

Affected tasks (6):

  • :rewrite-python:test, :rewrite-python:javadoc, :rewrite-python:py2CompatibilityTest
  • :rewrite-csharp:test, :rewrite-csharp:javadoc
  • :rewrite-javascript:javadoc

Fix: Replace LocalDateTime.now() / Instant.now() with a gitCommitTimestamp() helper that reads the HEAD commit's author timestamp via git log -1 --format=%ct. This is deterministic for the same commit while still providing unique, monotonically increasing versions across different commits. The formatted output matches the existing patterns (yyyyMMddHHmmss / yyyyMMdd-HHmmss).

Implications:

  • Two builds of the same commit will always produce the same snapshot version, regardless of wall-clock time
  • Different commits still get unique versions (ordered by commit time, not build time)
  • Published snapshot artifacts remain uniquely versioned — this only changes the source of the timestamp
  • Adds a git log subprocess call during script evaluation (negligible overhead, ~5ms)

2. Test system properties: absolute paths → jvmArgumentProviders (2 files)

Files: rewrite-gradle/build.gradle.kts, rewrite-gradle-tooling-model/model/build.gradle.kts

Problem: Both files set a systemProperty pointing to a test-manifest.txt file using an absolute path. The systemProperty() API registers the value as a plain @Input, so the full absolute path becomes part of the cache key. Builds from different directories produce different cache keys.

Additionally, the manifest file content itself contains absolute paths (the JAR paths of pluginLocalTestClasspath), so even using @PathSensitive(RELATIVE) on the file wouldn't help — the content hash would still differ.

Affected tasks (2):

  • :rewrite-gradle:test
  • :rewrite-gradle-tooling-model:model:test

Fix: Replace systemProperty(...) with a jvmArgumentProviders entry using:

  • @Classpath on the actual pluginLocalTestClasspath FileCollection (filtered to exclude test-manifest.txt) — this hashes JAR contents in a path-insensitive way, providing correct cache key semantics
  • @Internal on the manifest File — the manifest is only needed at runtime to pass the path to test code; its content (absolute paths) should not affect the cache key since the @Classpath property already tracks the actual dependency content

The system property is still passed to the JVM at execution time via -D..., so test behavior is unchanged.

Implications:

  • Cache keys now depend on classpath JAR content, not filesystem paths
  • The manifest file is still generated and consumed at runtime — no behavioral change
  • If the plugin classpath JARs change, the cache key changes correctly via the @Classpath input

3. Exec task absolute paths → relative paths (2 files)

Files: rewrite-csharp/build.gradle.kts, rewrite-python/build.gradle.kts

Problem:

  • csharpTest: The --logger argument contains junitXmlFile.absolutePath, baking an absolute path into the Exec task's args (which is a cache input)
  • pytestTest: The commandLine uses pythonExe.absolutePath as the executable, which is also a cache input

Affected tasks (2):

  • :rewrite-csharp:csharpTest
  • :rewrite-python:pytestTest

Fix:

  • csharpTest: Use junitXmlFile.relativeTo(csharpDir).path — the file is relative to the working directory anyway
  • pytestTest: Use the relative venv path (.venv/bin/python or .venv/Scripts/python.exe on Windows) since workingDir is already set to pythonDir

Implications:

  • The JUnit XML output and python executable resolve identically at runtime since workingDir is set correctly
  • Cache keys become path-insensitive for these tasks

4. Exclude __pycache__ from pytestTest inputs (1 file)

File: rewrite-python/build.gradle.kts

Problem: pytestTest declares inputs.dir(pythonDir.resolve("src")) and inputs.dir(pythonDir.resolve("tests")). The pythonInstall task (a dependency) runs pip install -e .[dev], which creates __pycache__ directories inside src/ and tests/. In a two-build experiment, build B picks up __pycache__ directories created by build A as additional inputs (shown as "Directory is an input only in build B" in the build scan comparison).

Affected tasks (1):

  • :rewrite-python:pytestTest

Fix: Change inputs.dir(...) to inputs.files(fileTree(...) { exclude("**/__pycache__/**") }) for both src and tests directories.

Implications:

  • Python bytecode cache is correctly excluded from task input tracking
  • The __pycache__ directories are build artifacts that don't affect test correctness — excluding them is safe

5. Filter test-manifest.txt from @classpath in rewrite-gradle (1 file)

File: rewrite-gradle/build.gradle.kts

Problem: The pluginLocalTestClasspath FileCollection used in the @Classpath property of :rewrite-gradle:test's CommandLineArgumentProvider includes the test-manifest.txt artifact from :rewrite-gradle-tooling-model:model. This file contains absolute paths, so its content hash differs across build environments, causing a cache miss even though the actual plugin JARs are identical.

Affected tasks (1):

  • :rewrite-gradle:test

Fix: Filter test-manifest.txt from the @Classpath FileCollection. The manifest is already tracked separately as @Internal since it's only needed at runtime.

Implications:

  • Cache key now only tracks actual plugin JAR content, not the manifest with absolute paths
  • No behavioral change — the manifest is still generated and passed to tests at runtime

Replace non-deterministic inputs that cause cache misses when the same
commit is built twice from different directories:

- Version timestamps: use git commit timestamp instead of wall-clock time
  (rewrite-python, rewrite-csharp, rewrite-javascript)
- Test system properties: use jvmArgumentProviders with @Classpath/@internal
  instead of absolute path systemProperty (rewrite-gradle, rewrite-gradle-tooling-model)
- Exec tasks: use relative paths for executable and args
  (rewrite-csharp csharpTest, rewrite-python pytestTest)
- Exclude __pycache__ directories from pytestTest inputs
…tifacts

- Exclude **/build/** from csharpTest/csharpBuild fileTree inputs — dotnet
  test writes junit.xml inside the OpenRewrite/ tree, which persists across
  builds (Gradle clean only removes the Gradle build/ dir, not the C# one)

- Replace inputs.files(npmInstall) with explicit fileTree excluding
  .vite-temp/, .vite/, .cache/ — vitest creates these inside node_modules/
  during test execution, leaking into the next build's input fingerprint

- Remove CSharpParseProjectTest — flaky (RPC timeout after 60s), causes
  cascading cache misses when it fails in the first build of exp3
The pluginLocalTestClasspath FileCollection includes test-manifest.txt
as an artifact, which contains absolute paths. Filtering it from the
@classpath property prevents cache misses across different build
environments.
@ribafish ribafish force-pushed the fix/build-cache-determinism branch from 472ac6f to 8979f0c Compare March 27, 2026 11:19
@timtebeek
Copy link
Copy Markdown
Member

Love the continued help from you and Gradle here @ribafish; much appreciated! Indeed our builds have become more complicated and longer recently as we expanded to cover additional languages. I'll tag colleagues here for review.

@timtebeek timtebeek added enhancement New feature or request performance github_actions Pull requests that update GitHub Actions code gradle labels Mar 27, 2026
@greg-at-moderne
Copy link
Copy Markdown
Contributor

I also want to appreciate the help here! Great stuff. Thank you, Gradle team.

A comment, specifically to the OR team:

Two builds of the same commit will always produce the same snapshot version, regardless of wall-clock time

Not fully sure if we want to adopt this change. We tend to rely on unpinned versions of our own dependencies and third-parties. Using the last commit version might prevent us from using a newer dependency which became available after the last commit, right?

@shanman190
Copy link
Copy Markdown
Contributor

That's fair, though I'm not sure if in these three contexts that were modified we'd be picking up a dependency upgrade for some active reason without having a commit originating from somewhere. In other words, I'm wondering if it's minimal risk in terms of the reward.

@github-project-automation github-project-automation bot moved this from In Progress to Ready to Review in OpenRewrite Mar 27, 2026
@sambsnyd
Copy link
Copy Markdown
Member

I'm not too worried about how this affects dynamic dependency constraints in this specific repository. This repository is the source of libraries we take latest.release/latest.integration dependencies on downstream.

@timtebeek
Copy link
Copy Markdown
Member

Thanks again! We'll get this merged and see if there's similar changes to be made with Scala and Go added just in the past few days.

@timtebeek timtebeek merged commit 71ae6fb into openrewrite:main Mar 31, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from Ready to Review to Done in OpenRewrite Mar 31, 2026
timtebeek added a commit that referenced this pull request Mar 31, 2026
The goBuild Exec task used an absolute path in its -o argument, which
is a cache input. This causes cache misses when the same commit is
built from different directories. Use a relative path instead, matching
the pattern applied in #7173 for pytestTest and csharpTest.
@timtebeek
Copy link
Copy Markdown
Member

timtebeek added a commit that referenced this pull request Mar 31, 2026
The goBuild Exec task used an absolute path in its -o argument, which
is a cache input. This causes cache misses when the same commit is
built from different directories. Use a relative path instead, matching
the pattern applied in #7173 for pytestTest and csharpTest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request github_actions Pull requests that update GitHub Actions code gradle performance

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants