feat(privacy): block merge-data when private wiki pages exist#3329
Conversation
Adds a defense-in-depth check before the merge-data PR is created: if knowledge/wiki/repos/ contains a file whose name matches a private repo entry's canonical slug or node_id, the workflow stops. Unit 6's dispatch gate already prevents new private wiki pages from being written; this catches anything that slips through. The workflow checks the data branch (not main) via a second checkout step scoped to data-branch-check/, so the gate inspects the tree being promoted rather than the already-merged destination. GraphQL slug-resolution failures now fail closed: if any private entry's canonical name cannot be resolved, the script exits 1 with the failing node_ids listed. Relying on node_id-only matching when the canonical name is unrecoverable would let a slug-named leak slip through undetected. fs.readdir errors other than ENOENT now propagate. ENOENT is graceful (fresh checkout, no wiki yet); permission errors and other FS failures are not and must not be silently swallowed. GraphQL response parsed into unknown and narrowed via type-guard; no unsafe casts at the JSON boundary. Pure detectPrivateWikiLeaks function with full unit coverage (16 tests); resolveCanonicalSlugs and loadWikiFilenames exported for direct testing. CLI is thin glue.
fro-bot
left a comment
There was a problem hiding this comment.
The dual-checkout pattern is the right move here — reading metadata/repos.yaml and knowledge/wiki/repos/ from the data tree being promoted (not main) is what makes this gate actually evaluate the merge content. Fail-closed posture is consistent: GraphQL failure throws, non-ENOENT FS errors propagate, schema validation propagates, and the only graceful path (no private entries → exit 0) is the one where there's nothing to leak. Tests cover the fail-closed paths individually, which is what matters for a privacy gate.
Verdict: PASS
Blocking issues
None.
Non-blocking concerns
-
scripts/check-wiki-private-presence.ts:114—node_idis interpolated directly into the GraphQL query string:query={ node(id: "${entry.node_id}") ... }. The value flows frommetadata/repos.yamlon thedatabranch, which a PR author could tamper with (already tracked as #3327). A craftednode_idcontaining"or)would produce malformed GraphQL, which the API rejects → script fails closed. So the worst case is a self-DoS of the gate, not exfiltration. Still worth tightening to a parameterized form (gh api graphql -F nodeId=...with$nodeId: ID!) when you tackle #3327 — eliminates the injection surface entirely and gets clearer error messages. -
scripts/check-wiki-private-presence.ts:124-127— when GraphQL returnsdata.node: null(deleted repo, revoked app access), the type guard correctly rejects and the entry lands infailures. The thrown error message then says "investigate token scope / repo access," which is helpful, but operators may not realize thenullbranch is distinct from the thrown-error branch. Consider distinguishing "resolution returned null" from "subprocess threw" in the failures list — one is a permission/lifecycle issue, the other is infra. -
scripts/check-wiki-private-presence.ts:67—stem.toLowerCase() === entry.canonicalSlug.toLowerCase()matches the test forMarcusRBrown--Poly.md, but the canonical wiki convention (per the wiki context) is the lowercasedowner--repoform. If the input slug fromnameWithOwnerever contains mixed case (it can, GitHub preserves case innameWithOwner), you'll want to verify behavior end-to-end. The case-insensitive comparison handles it, just flagging that the resolvedcanonicalSlugis not normalized to lowercase before storage inresolved. -
.github/workflows/merge-data.yaml:31-41— the second checkout fetchesref: datawithout specifying atoken, so it falls back toGITHUB_TOKEN(job-default,contents: read). That works becausedatais in the same repo and readable with default perms. Worth a comment in the workflow noting why both checkouts exist (scripts frommain, content fromdata) — future maintainers will wonder.
Missing tests
-
No test for the
node: nullGraphQL response shape (only thrown-error and well-formed responses are covered). The type guard rejects it, but a direct test would lock in the contract: anullnode is treated as a failure, not a silent skip. Add amockExecFileSync.mockReturnValue(JSON.stringify({data: {node: null}}))case assertingresolveCanonicalSlugsthrows. -
No test for the
nameWithOwner→owner--reposlug conversion (the.replace('/', '--')on line 122). A single-test fornameWithOwner: 'acme/secret'→canonicalSlug: 'acme--secret'covers it implicitly, but explicit coverage of an owner or repo containing a-or.would prevent regression if someone "improves" the replacement to a regex. -
The deferred slug-variant cases (
marcusrbrown--poly.draft.md, subdirectories) are tracked in #3328 — fine to defer, but mention them in a comment near the.replace(/\.md$/i, '')so the next reader knows the stem extraction is intentionally strict.
Risk assessment
LOW. This is additive defense-in-depth in front of an already-gated path. Action is SHA-pinned, permissions are minimal (file-level contents: read, app token only injected into the two steps that need it), no untrusted input lands in run: blocks, subprocess stderr is captured (no token echo). Fail-closed on every error path. The injection vector in the GraphQL query string only becomes meaningful in concert with the metadata-tampering bypass already tracked in #3327, and even then degrades to self-DoS rather than leak.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/.github |
| Run ID | 26142300188 |
| Cache | hit |
| Session | ses_1bc40437fffe70Gz0Aqpwi8QzV |
Adds a defense-in-depth check before
merge-data.yamlopens the weeklydata → mainpromotion PR: if anyknowledge/wiki/repos/file on thedatabranch matches a private repo entry's canonical slug ornode_id, the workflow stops.The earlier dispatch gate (Unit 6) already prevents new private wiki pages from being written via
survey-repo.yaml. This catches anything that slips through — a manual dispatch with a wrong input, an agent error, an out-of-band write.What changed
scripts/check-wiki-private-presence.ts(new, 141 LOC) — puredetectPrivateWikiLeaks(params)over(privateEntries, wikiRepoFilenames). CLI glue readsmetadata/repos.yaml, resolves canonical slugs viagh api graphql node(id:), listsknowledge/wiki/repos/, blocks on match. ExportsresolveCanonicalSlugsandloadWikiFilenamesseparately so each fail-mode is independently testable.scripts/check-wiki-private-presence.test.ts(new, 168 LOC) — 16 tests covering happy path, slug match,node_idmatch, both reasons firing on the same entry, case-insensitive match, no-canonical-slug fallback (resolution returned null), multi-entry detection,.md-only filtering, fail-closed-on-GraphQL-failure with single and multi-entry messages, ENOENT-graceful + EPERM-propagates filesystem behavior..github/workflows/merge-data.yaml— adds two steps before the existing🔀 Open weekly data merge PR:⤵ Fetch data branch for privacy check— secondactions/checkoutatref: data,path: data-branch-check. The default checkout (main) is still needed because that's where the scripts live.🔒 Block private wiki pages— runsnode ../scripts/check-wiki-private-presence.tswithworking-directory: data-branch-checksometadata/repos.yamlandknowledge/wiki/repos/are read from thedatabranch tree being promoted, not themaincheckout.Fail-closed posture
This is a privacy gate. Every failure mode favors blocking the merge over letting a leak slip:
node_idvalues, exits non-zero. Operator investigates token scope or repo access before re-running.fs.readdirfailure other than ENOENT → propagates and exits non-zero. ENOENT only is graceful (fresh checkout, never had a wiki).metadata/repos.yaml→ propagates and exits non-zero viaassertReposFile.Deferred (follow-up issues)
Three concerns surfaced during review are tracked separately because each needs a design decision rather than a code patch:
private: true → falsereads the tampered state. Mitigations include sourcing the denylist from a protected branch or doing live visibility probes per entry.marcusrbrown--poly.draft.md), subdirectory bypass, no per-GraphQL-call timeout, non-repo wiki areas (topics/,entities/,comparisons/), workflow step-order regression test.The current PR ships the v1 of the gate. The deferred items are the v1.1 hardening pass.
Verification
actionlint .github/workflows/merge-data.yamlclean.pnpm check-types,pnpm lint,pnpm test— green. 620 passed + 3 todo (+6 over baseline 614, all behavior-bearing).node -e "import('./scripts/check-wiki-private-presence.ts')"exits 0 under Node 24 strip-only.node scripts/check-wiki-private-presence.ts(no private entries onmain) prints "no private wiki leaks detected", exits 0.