Skip to content

fix: trivy cache isolation + 0.70.0 bump + multi-image report heading#223

Merged
Cre-eD merged 1 commit intomainfrom
fix/trivy-cache-lock
Apr 19, 2026
Merged

fix: trivy cache isolation + 0.70.0 bump + multi-image report heading#223
Cre-eD merged 1 commit intomainfrom
fix/trivy-cache-lock

Conversation

@Cre-eD
Copy link
Copy Markdown
Contributor

@Cre-eD Cre-eD commented Apr 19, 2026

Three small fixes surfaced from the first PAY-SPACE prod rollout:

1. Trivy cache lock timeout

Symptom (pay_space_wallet, crypto-tools runs):

ERROR Failed to acquire cache or database lock, see https://trivy.dev/docs/...
FATAL image scan error: unable to initialize fs cache: cache may be in use by another process: timeout

Cause: Trivy uses file locks on its cache directory to synchronize DB access. When the cache persists across runs (Blacksmith persistent cache) or a prior trivy crashed, stale locks cause the next scan to time out waiting for them.

Fix: Use a per-scan ephemeral cache directory via os.MkdirTemp("<userCache>/trivy", "scan-*"), with defer cleanup in Scan(). Eliminates lock contention entirely. Cost: re-downloads the ~100MB vulnerability DB per scan (a few seconds), acceptable for CI. Codex flagged the earlier "just delete the lock files" approach as unsafe under real concurrency — this version sidesteps the race entirely.

Updated TestEnsureTrivyCacheDir to assert the per-invocation scan-* suffix and that two calls return different directories.

2. Trivy version bump 0.69.3 → 0.70.0

Scan logs from the 0.69.3 runs surface the upstream notice:

📣 Notices:
  - Version 0.70.0 of Trivy is now available, current version is 0.69.3

3. Multi-image report heading disambiguation

Symptom (PAY-SPACE/pay_space_wallet): the step summary showed what looked like the same Security Pipeline Summary report twice.

Cause: The pay_space_wallet stack builds two images (web + worker), so the pipeline creates one security-report-<image> local.Command per image. Each one writes ## Security Pipeline Summary to $GITHUB_STEP_SUMMARY. Both reports have identical scan counts (same codebase, different entrypoints), so two distinct reports look like a duplicate.

Fix: Suffix the heading with the image name:

## Security Pipeline Summary — web
**Image:** `...--web@sha256:...`

## Security Pipeline Summary — worker
**Image:** `...--worker@sha256:...`

Verification

  • go build ./... clean
  • go test ./pkg/security/scan/... green (incl. updated cache-dir tests)
  • go test ./pkg/clouds/pulumi/docker/... green

…e report headings

Three small fixes addressing PAY-SPACE feedback from the first prod rollout:

1. trivy cache lock timeout (pay_space_wallet, crypto-tools runs):
   Trivy uses file locks on its cache directory to synchronize DB access.
   When the cache is shared across runs (Blacksmith persistent cache) or a
   prior trivy crashed, stale locks cause: 'unable to initialize fs cache:
   cache may be in use by another process: timeout'. Switch to a per-scan
   ephemeral cache directory via os.MkdirTemp — eliminates lock contention
   entirely at the cost of re-downloading the ~100MB DB per scan, which
   is acceptable for CI. Added Scan() defer cleanup so dirs don't leak.

2. trivy version bump 0.69.3 → 0.70.0:
   Latest upstream release, surfaces as a warning in 0.69 logs.

3. Multi-image report disambiguation:
   Stacks with multiple images (e.g. pay_space_wallet builds web+worker)
   produce one security-report local.Command per image. They wrote
   identical '## Security Pipeline Summary' headings to ,
   making two distinct reports look like a duplicate. Suffix the heading
   with imageName so web/worker/etc render as visibly separate sections.

Verified: go build ./... clean, go test ./pkg/security/scan/...  and
./pkg/clouds/pulumi/docker/... all green (including new multi-scan
thread-safety test for ensureTrivyCacheDir).
@Cre-eD Cre-eD merged commit b3d5fbc into main Apr 19, 2026
9 checks passed
Cre-eD added a commit that referenced this pull request Apr 19, 2026
PR #222 added the sc symlink to this Dockerfile on the staging branch,
but push.yaml (triggered on push to main) also builds and pushes
simplecontainer/github-actions:staging using MAIN's copy of this file.

When any main push triggers push.yaml, it overwrites the :staging image
with a build that doesn't include the symlink — reintroducing
'sc: not found' on downstream deploys (e.g. PAY-SPACE/crypto-tools at
2026-04-19T20:02, right after PR #223 merged to main and re-triggered
push.yaml).

Root cause: two workflows publish the same tag from two branches. Fix
keeps the Dockerfiles in sync by applying the same symlink+verify lines
to main's copy. Mirror of #222 exactly.
Cre-eD added a commit that referenced this pull request Apr 19, 2026
…image

Brings into staging:
- PR #223 (trivy cache isolation, 0.70.0 bump, multi-image report heading)
- AWS tags, cloudflare worker fix, Caddy preStop drain, etc.

Conflicts (4 files):
- trivy.go, trivy_test.go, security_report.go: take main (new features)
- simple_container.go: KEEP staging's serviceSpec() helper + serviceTypeStr
  (main regresses here — inline ServiceSpecArgs lacks ClusterIP guard that
  codex flagged as a P1 during PR #221 review)

Triggers build-staging.yml to publish a :staging image with staging's
fixed Dockerfile (sc symlink from PR #222) AND main's improvements.

Note: this is a TEMPORARY fix. PR #224 (still in review) fixes the root
cause — main's github-actions-staging.Dockerfile. Until that merges,
every push to main re-overwrites :staging with a broken image and this
merge must be repeated.
Cre-eD added a commit that referenced this pull request Apr 19, 2026
…staging with sc symlink (#225)

## Why now

`:staging` image is currently broken (`/bin/sh: sc: not found` on
PAY-SPACE/crypto-tools). Root cause in #224 (main's Dockerfile also
needs the symlink). Until that merges, we can unblock PAY-SPACE by
re-triggering `build-staging.yml` — which builds from **staging's**
Dockerfile (already fixed by #222).

This PR takes the opportunity to also pull main's improvements into
staging so the rebuilt image includes:
- PR #223 — trivy cache isolation, 0.70.0 bump, multi-image report
heading
- Misc main commits (AWS tags, Caddy preStop drain, cloudflare worker
fix, etc.)

## Conflict resolution (4 files)

| File | Taken | Why |
|---|---|---|
| `pkg/security/scan/trivy.go` | main | New cache-isolation + 0.70.0
version bump |
| `pkg/security/scan/trivy_test.go` | main | Updated tests for
per-invocation cache dir |
| `pkg/clouds/pulumi/docker/security_report.go` | main | Multi-image
heading fix |
| `pkg/clouds/pulumi/kubernetes/simple_container.go` | **staging** |
Keep `serviceSpec()` helper + `serviceTypeStr` — main inline regresses
on ClusterIP guard (was P1 in #221 codex review) |

## Verification

- `go build ./...` clean
- `go test ./pkg/security/scan/... ./pkg/clouds/pulumi/kubernetes/...
./pkg/clouds/pulumi/docker/...` all green

## Test plan

- [ ] Merge triggers `build-staging.yml`
- [ ] `docker pull simplecontainer/github-actions:staging && docker run
--rm --entrypoint sh ... -c 'which sc'` returns `/usr/local/bin/sc`
- [ ] Re-run PAY-SPACE/crypto-tools deploy — passes

## Follow-up

Still need **PR #224** (main's Dockerfile fix) to merge — otherwise the
next push to main will overwrite `:staging` with a broken image again.

---------

Co-authored-by: universe-ops <177390656+universe-ops@users.noreply.github.com>
Co-authored-by: Universe Ops <universe-ops@github.com>
Co-authored-by: simple-container-forge[bot] <257785999+simple-container-forge[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Ilya <smecsia@gmail.com>
Co-authored-by: Bao Tran <baotn166@users.noreply.github.com>
Cre-eD added a commit that referenced this pull request Apr 20, 2026
…le (root cause of repeated :staging regressions) (#224)

## Symptom

After #221 merged main→staging and #222 added the sc symlink to
staging's Dockerfile, PAY-SPACE deploys briefly worked. Then #223 merged
to main and immediately after, PAY-SPACE/crypto-tools started failing
again with:

```
/bin/sh: sc: not found
error: exit status 127: running "... 'sc' 'sbom' 'generate' ..."
```

## Root cause

**Two separate workflows build and push
`simplecontainer/github-actions:staging`:**

| Workflow | Trigger | Dockerfile (from which branch) |
|---|---|---|
| `build-staging.yml` | push to `staging` |
`github-actions-staging.Dockerfile` on `staging` branch |
| `push.yaml` | push to `main` | `github-actions-staging.Dockerfile` on
`main` branch |

PR #222 only fixed the Dockerfile on **staging**. The main branch copy
never got the `ln -s /root/github-actions /usr/local/bin/sc` line. So
every time anything merges to main, `push.yaml` runs, builds without the
symlink, and **overwrites** the good `:staging` image that was pushed by
`build-staging.yml`.

Verified by `docker pull simplecontainer/github-actions:staging &&
docker run --rm --entrypoint sh .../github-actions:staging -c 'ls -la
/usr/local/bin/sc'`:
```
ls: /usr/local/bin/sc: No such file or directory
```

Image created timestamp matches `push.yaml` run at 19:47 (after PR #223
merged at 19:47), not `build-staging.yml` run at 17:53 (after #222).

## Fix

Apply the identical symlink + verify lines to main's copy of
`github-actions-staging.Dockerfile`. Both workflows now produce an image
with `sc` in PATH. This keeps the two Dockerfiles in sync going forward.

## Long-term

Two workflows publishing the same tag from two branches is fragile —
whichever runs last wins. Consider consolidating: either
`build-staging.yml` is the sole publisher of `:staging`, or `push.yaml`
drops the staging tag. Out of scope for this PR.

## Test plan

- [ ] Merge triggers `push.yaml` → new `:staging` image pushed
- [ ] `docker pull simplecontainer/github-actions:staging && docker run
--rm --entrypoint sh ... -c 'sc --help | head -1'` succeeds
- [ ] Re-run PAY-SPACE/crypto-tools deploy — passes end-to-end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants