Skip to content

Commit 74f51ad

Browse files
adnaanclaude
andcommitted
fix(validate): unflake mermaid validator on Ubuntu CI (--disable-dev-shm-usage + bump per-file deadline 15s → 60s)
Surfaced while running the broadcast-redesign Phase-6 wave through livetemplate/docs#27's build workflow: the same docs content that validates cleanly in ~17s on a devbox failed in 5 successive CI runs with two error modes: ✗ index.md: chrome failed to start: Failed to connect to the bus ✗ recipes/architecture-flow.md: context deadline exceeded ✗ recipes/how-this-site-works.md: context deadline exceeded Each rerun flagged a *different* set of files (sometimes index.md alone; sometimes 3 unrelated files including ones the docs PR didn't touch), with the failures clustering on cold-runner runs. Local repro: 17s total for 55 files / 5 mermaid diagrams against the same content. The bug was in two layers: 1. **Missing `--disable-dev-shm-usage`.** Ubuntu CI runners default `/dev/shm` to 64MB (Docker/Actions cgroup default); Chrome's renderer process attempts to allocate shared memory there, OOMs, and the recovery path manifests as the `"Failed to connect to the bus"` D-Bus negotiation failure rather than a clean OOM error. Switching to `/tmp` via `--disable-dev-shm-usage` eliminates both the OOM and the misleading D-Bus message. Plus `--disable-extensions` + `--no-first-run` for faster cold-start. 2. **15s per-file deadline too tight on slow runners.** `validateMermaidDiagrams` creates a fresh `chromedp.NewExecAllocator` per file. Chrome cold-start on Ubuntu CI is routinely 5-10s; the diagram loop then adds Navigate + Sleep(2s) + Evaluate (~3s per diagram). On devbox: 15s holds. On CI with a slow cold start: only 5-7s budget remains — explaining the per-file randomness in which files fail. Bumped to 60s with a per-file rationale comment. Verified locally: same `time /tmp/tk-bin validate content/` against the docs site = 17s (no slowdown from the new flags), all 55 files pass. The CI side should follow once `docs#27` re-runs against a tinkerdown@main that includes this commit. This unblocks livetemplate/docs#27 (which has been red on `build` since its open) without changing any docs content — the docs content was always valid; the validator was over-tight for slow CI hardware. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dbab6f6 commit 74f51ad

1 file changed

Lines changed: 22 additions & 2 deletions

File tree

cmd/tinkerdown/commands/validate.go

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,11 +171,25 @@ func validateMermaidDiagrams(filePath string) ([]string, error) {
171171

172172
var errors []string
173173

174-
// Create chrome context
174+
// Create chrome context.
175+
//
176+
// --disable-dev-shm-usage is the critical flag on Ubuntu CI runners:
177+
// the default /dev/shm is 64MB on Docker/Actions runners, Chrome's
178+
// renderer OOMs trying to allocate shared memory there, and the
179+
// fallback path manifests as "chrome failed to start: Failed to
180+
// connect to the bus" (D-Bus negotiation failure after the shm OOM).
181+
// Switching to /tmp via --disable-dev-shm-usage avoids both.
182+
//
183+
// --disable-extensions and --no-first-run shave a second or two off
184+
// cold-start by skipping the default extension scan and welcome
185+
// flow Chromium does on a fresh profile.
175186
opts := append(chromedp.DefaultExecAllocatorOptions[:],
176187
chromedp.Flag("headless", true),
177188
chromedp.Flag("disable-gpu", true),
178189
chromedp.Flag("no-sandbox", true),
190+
chromedp.Flag("disable-dev-shm-usage", true),
191+
chromedp.Flag("disable-extensions", true),
192+
chromedp.Flag("no-first-run", true),
179193
)
180194

181195
allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
@@ -184,7 +198,13 @@ func validateMermaidDiagrams(filePath string) ([]string, error) {
184198
ctx, cancel := chromedp.NewContext(allocCtx)
185199
defer cancel()
186200

187-
ctx, cancel = context.WithTimeout(ctx, 15*time.Second)
201+
// Per-FILE deadline. Chrome cold-start can take 5-10s on a slow Ubuntu
202+
// CI runner, and each diagram below adds Navigate + Sleep(2s) +
203+
// Evaluate (~3s). 60s comfortably fits the worst case observed on CI
204+
// (cold Chrome + several diagrams + occasional D-Bus init jitter)
205+
// without masking real hangs. The previous 15s budget was tight on
206+
// devbox and routinely missed on CI.
207+
ctx, cancel = context.WithTimeout(ctx, 60*time.Second)
188208
defer cancel()
189209

190210
// Create a simple HTML page with Mermaid

0 commit comments

Comments
 (0)