CNTRLPLANE-3329: Use fuse-overlayfs for build cache instead of full copy#8568
Conversation
Point GOCACHE at the read-only EFS mount (/cache/go-build) instead of copying the entire cache into /tmp at job start. Go's build cache handles read-only directories gracefully by skipping writes. This eliminates the per-job cp -a overhead that was adding ~2 minutes to every CI job since the EFS cache was introduced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
@celebdor: This pull request references CNTRLPLANE-3329 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThe PR updates the warm-go-cache composite GitHub Action to always create /tmp/go-build-cache and export GOCACHE to it; when /cache/go-build exists and fuse-overlayfs with /dev/fuse is available, the action mounts an overlay (lower=/cache/go-build, upper/work in /tmp) into /tmp/go-build-cache, otherwise it attempts a 120s timed copy from /cache/go-build into /tmp/go-build-cache; failures in both paths result in proceeding without a warmed cache. The action description was updated, and the runner image now installs fuse-overlayfs. Sequence Diagram(s)sequenceDiagram
participant ComponentA
participant ComponentB
ComponentA->>ComponentB: observable interaction
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: celebdor The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/actions/warm-go-cache/action.yaml:
- Around line 9-16: The overlay mount branch uses an elif so a failed
fuse-overlayfs mount never triggers the copy fallback; change the logic in the
block that calls fuse-overlayfs so that if fuse-overlayfs exists and
/cache/go-build exists you attempt the mount, and on mount failure explicitly
fall back to the copy step (e.g., check fuse-overlayfs exit status or use a
compound || block to run the timeout cp -a /cache/go-build/.
/tmp/go-build-cache/ fallback and emit a warning). Update the code around the
fuse-overlayfs invocation and the timeout cp -a command so copy to
/tmp/go-build-cache is attempted whenever the mount fails, not only when
fuse-overlayfs is unavailable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 946e332a-7584-4610-9010-bae27e831707
📒 Files selected for processing (2)
.github/actions/warm-go-cache/action.yamlDockerfile.github-actions-runner
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8568 +/- ##
==========================================
+ Coverage 40.34% 40.40% +0.06%
==========================================
Files 755 755
Lines 93167 93235 +68
==========================================
+ Hits 37587 37675 +88
+ Misses 52877 52858 -19
+ Partials 2703 2702 -1 see 3 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Replace the per-job `cp -a` of the entire EFS cache with a fuse-overlayfs mount. This gives Go a writable view over the read-only EFS PVC with zero copy overhead — reads hit the EFS mount directly and writes go to a tmpfs upper layer. Falls back to `cp -a` if fuse-overlayfs or /dev/fuse is unavailable, and to an empty cache if both fail. Adds fuse-overlayfs to the runner image. On OpenShift 4.15+ /dev/fuse is available to unprivileged pods without cluster config changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
553f092 to
56fbf18
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/actions/warm-go-cache/action.yaml:
- Around line 10-16: Ensure the fuse-overlayfs mount uses a truly empty workdir
and that the mountpoint is created before mounting: before invoking the
fuse-overlayfs command referenced in the script (the line with "fuse-overlayfs
-o
lowerdir=/cache/go-build,upperdir=/tmp/go-cache-upper,workdir=/tmp/go-cache-work
/tmp/go-build-cache"), recreate/empty /tmp/go-cache-work (e.g., remove any
contents or rm -rf and then mkdir) so it is guaranteed empty per fuse-overlayfs
requirements, and explicitly mkdir -p /tmp/go-build-cache before the mount
attempt; keep the existing mount invocation and the mounted=true fallback logic
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 00b72777-978f-48ac-bec4-2c181b014575
📒 Files selected for processing (2)
.github/actions/warm-go-cache/action.yamlDockerfile.github-actions-runner
| if [ -d /cache/go-build ] && command -v fuse-overlayfs >/dev/null 2>&1 && [ -e /dev/fuse ]; then | ||
| mkdir -p /tmp/go-cache-upper /tmp/go-cache-work | ||
| if fuse-overlayfs -o lowerdir=/cache/go-build,upperdir=/tmp/go-cache-upper,workdir=/tmp/go-cache-work /tmp/go-build-cache; then | ||
| mounted=true | ||
| else | ||
| echo "::warning::fuse-overlayfs mount failed, falling back to copy" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the correct syntax for fuse-overlayfs mount command with lowerdir, upperdir, and workdir options?
💡 Result:
The correct syntax for the fuse-overlayfs mount command is [1][2]: fuse-overlayfs -o lowerdir=DIR1[:DIR2...],upperdir=UPPER,workdir=WORK MOUNTPOINT Options explanation: -o lowerdir=DIR1[:DIR2...]: A colon-separated list of directories to be merged as the read-only base layers [1][3][4]. -o upperdir=UPPER: The directory where all write changes to the filesystem will be stored [1][3][4]. -o workdir=WORK: An internal working directory required by the filesystem, which must reside on the same filesystem as the upperdir [1][3][4]. MOUNTPOINT: The target directory where the merged filesystem will be accessible [1][3]. Example: fuse-overlayfs -o lowerdir=lower1:lower2,upperdir=upper,workdir=work merged_dir [2] Note that workdir must be an empty directory on the same filesystem as the upperdir [1][3]. If you omit upperdir and workdir, the filesystem will be mounted in a read-only state [5].
Citations:
- 1: https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1
- 2: https://github.com/containers/fuse-overlayfs
- 3: https://manpages.debian.org/trixie/fuse-overlayfs/fuse-overlayfs.1.en.html
- 4: https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md
- 5: https://wiki.archlinux.org/title/Overlay_filesystem
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' .github/actions/warm-go-cache/action.yamlRepository: openshift/hypershift
Length of output: 1045
Confirm fuse-overlayfs mount syntax and tighten workdir requirements.
- The command
fuse-overlayfs -o lowerdir=/cache/go-build,upperdir=/tmp/go-cache-upper,workdir=/tmp/go-cache-work /tmp/go-build-cachematches fuse-overlayfs’ documented-o lowerdir=...,upperdir=...,workdir=... MOUNTPOINTsyntax, and/tmp/go-build-cacheis created before the mount call. - To maximize mount success, ensure
/tmp/go-cache-workis an empty directory before mounting (the currentmkdir -pdoesn’t guarantee emptiness per fuse-overlayfs’ requirement).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/actions/warm-go-cache/action.yaml around lines 10 - 16, Ensure the
fuse-overlayfs mount uses a truly empty workdir and that the mountpoint is
created before mounting: before invoking the fuse-overlayfs command referenced
in the script (the line with "fuse-overlayfs -o
lowerdir=/cache/go-build,upperdir=/tmp/go-cache-upper,workdir=/tmp/go-cache-work
/tmp/go-build-cache"), recreate/empty /tmp/go-cache-work (e.g., remove any
contents or rm -rf and then mkdir) so it is guaranteed empty per fuse-overlayfs
requirements, and explicitly mkdir -p /tmp/go-build-cache before the mount
attempt; keep the existing mount invocation and the mounted=true fallback logic
intact.
|
/override ci/prow/e2e-v2-aws |
|
@celebdor: Overrode contexts on behalf of celebdor: ci/prow/e2e-aks, ci/prow/e2e-aws, ci/prow/e2e-aws-upgrade-hypershift-operator, ci/prow/e2e-azure-self-managed, ci/prow/e2e-kubevirt-aws-ovn-reduced, ci/prow/e2e-v2-aws, ci/prow/e2e-v2-gke DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/images |
|
@celebdor: Overrode contexts on behalf of celebdor: ci/prow/images DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@celebdor: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Now I have all the evidence to produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Konflux Root CauseThe The
The preflight tool was killed by SIGKILL (signal 9) after 58 seconds on the arm64 platform, which is the characteristic signature of a container exceeding its memory limit (OOM kill by the kubelet/cgroup). The amd64 variant passed in 2 minutes — arm64 images can have different layer sizes and extraction memory profiles. The previous Dockerfile change (commit The PR's actual changes (adding Recommendations
Evidence
|
Summary
fuse-overlayfsto the GitHub Actions runner imagecp -aof the entire EFS cache with a fuse-overlayfs mount that gives Go a writable view over the read-only EFS PVC with zero copy overheadcp -aif fuse-overlayfs or/dev/fuseis unavailable, and to an empty cache if both fail/dev/fuseis available to unprivileged pods without cluster config changesContext
Benchmarking with
contrib/ci/gha-cache-timing.shshowed that all cached workflows (lint, verify, unit tests, envtest) got slower after the EFS cache was introduced (May 13-18) compared to before:The
timeout 120 cp -a /cache/go-build/. /tmp/go-build-cache/in the warm-go-cache action was copying the full cache on every job start, negating the compilation time savings. PointingGOCACHEdirectly at the read-only mount doesn't work either — Go fails hard if it can't write to its cache directory.Test plan
cp -asince the new runner image isn't built yet)contrib/ci/gha-cache-timing.sh🤖 Generated with Claude Code
Summary by CodeRabbit