Skip to content

feat(privacy): block merge-data when private wiki pages exist#3329

Merged
marcusrbrown merged 1 commit into
mainfrom
feat/private-merge-data-wiki-block
May 20, 2026
Merged

feat(privacy): block merge-data when private wiki pages exist#3329
marcusrbrown merged 1 commit into
mainfrom
feat/private-merge-data-wiki-block

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

Adds a defense-in-depth check before merge-data.yaml opens the weekly data → main promotion PR: if any knowledge/wiki/repos/ file on the data branch matches a private repo entry's canonical slug or node_id, the workflow stops.

The earlier dispatch gate (Unit 6) already prevents new private wiki pages from being written via survey-repo.yaml. This catches anything that slips through — a manual dispatch with a wrong input, an agent error, an out-of-band write.

What changed

scripts/check-wiki-private-presence.ts (new, 141 LOC) — pure detectPrivateWikiLeaks(params) over (privateEntries, wikiRepoFilenames). CLI glue reads metadata/repos.yaml, resolves canonical slugs via gh api graphql node(id:), lists knowledge/wiki/repos/, blocks on match. Exports resolveCanonicalSlugs and loadWikiFilenames separately so each fail-mode is independently testable.

scripts/check-wiki-private-presence.test.ts (new, 168 LOC) — 16 tests covering happy path, slug match, node_id match, both reasons firing on the same entry, case-insensitive match, no-canonical-slug fallback (resolution returned null), multi-entry detection, .md-only filtering, fail-closed-on-GraphQL-failure with single and multi-entry messages, ENOENT-graceful + EPERM-propagates filesystem behavior.

.github/workflows/merge-data.yaml — adds two steps before the existing 🔀 Open weekly data merge PR:

  1. ⤵ Fetch data branch for privacy check — second actions/checkout at ref: data, path: data-branch-check. The default checkout (main) is still needed because that's where the scripts live.
  2. 🔒 Block private wiki pages — runs node ../scripts/check-wiki-private-presence.ts with working-directory: data-branch-check so metadata/repos.yaml and knowledge/wiki/repos/ are read from the data branch tree being promoted, not the main checkout.

Fail-closed posture

This is a privacy gate. Every failure mode favors blocking the merge over letting a leak slip:

  • GraphQL resolution failure for any private entry → throws with all failing node_id values, exits non-zero. Operator investigates token scope or repo access before re-running.
  • fs.readdir failure other than ENOENT → propagates and exits non-zero. ENOENT only is graceful (fresh checkout, never had a wiki).
  • YAML/schema validation failure on metadata/repos.yaml → propagates and exits non-zero via assertReposFile.
  • No private entries → exits 0 immediately, no GraphQL or FS calls wasted.

Deferred (follow-up issues)

Three concerns surfaced during review are tracked separately because each needs a design decision rather than a code patch:

The current PR ships the v1 of the gate. The deferred items are the v1.1 hardening pass.

Verification

  • actionlint .github/workflows/merge-data.yaml clean.
  • pnpm check-types, pnpm lint, pnpm test — green. 620 passed + 3 todo (+6 over baseline 614, all behavior-bearing).
  • node -e "import('./scripts/check-wiki-private-presence.ts')" exits 0 under Node 24 strip-only.
  • node scripts/check-wiki-private-presence.ts (no private entries on main) prints "no private wiki leaks detected", exits 0.
  • Reviewed via persona dispatch (9 reviewers parallel, then 1 adversarial round-2 against the fixes). Round 2 found zero new issues against the dual-checkout + fail-closed implementation.

Adds a defense-in-depth check before the merge-data PR is created: if
knowledge/wiki/repos/ contains a file whose name matches a private repo
entry's canonical slug or node_id, the workflow stops. Unit 6's dispatch
gate already prevents new private wiki pages from being written; this
catches anything that slips through.

The workflow checks the data branch (not main) via a second checkout step
scoped to data-branch-check/, so the gate inspects the tree being promoted
rather than the already-merged destination.

GraphQL slug-resolution failures now fail closed: if any private entry's
canonical name cannot be resolved, the script exits 1 with the failing
node_ids listed. Relying on node_id-only matching when the canonical name
is unrecoverable would let a slug-named leak slip through undetected.

fs.readdir errors other than ENOENT now propagate. ENOENT is graceful
(fresh checkout, no wiki yet); permission errors and other FS failures
are not and must not be silently swallowed.

GraphQL response parsed into unknown and narrowed via type-guard; no
unsafe casts at the JSON boundary.

Pure detectPrivateWikiLeaks function with full unit coverage (16 tests);
resolveCanonicalSlugs and loadWikiFilenames exported for direct testing.
CLI is thin glue.
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dual-checkout pattern is the right move here — reading metadata/repos.yaml and knowledge/wiki/repos/ from the data tree being promoted (not main) is what makes this gate actually evaluate the merge content. Fail-closed posture is consistent: GraphQL failure throws, non-ENOENT FS errors propagate, schema validation propagates, and the only graceful path (no private entries → exit 0) is the one where there's nothing to leak. Tests cover the fail-closed paths individually, which is what matters for a privacy gate.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

  • scripts/check-wiki-private-presence.ts:114node_id is interpolated directly into the GraphQL query string: query={ node(id: "${entry.node_id}") ... }. The value flows from metadata/repos.yaml on the data branch, which a PR author could tamper with (already tracked as #3327). A crafted node_id containing " or ) would produce malformed GraphQL, which the API rejects → script fails closed. So the worst case is a self-DoS of the gate, not exfiltration. Still worth tightening to a parameterized form (gh api graphql -F nodeId=... with $nodeId: ID!) when you tackle #3327 — eliminates the injection surface entirely and gets clearer error messages.

  • scripts/check-wiki-private-presence.ts:124-127 — when GraphQL returns data.node: null (deleted repo, revoked app access), the type guard correctly rejects and the entry lands in failures. The thrown error message then says "investigate token scope / repo access," which is helpful, but operators may not realize the null branch is distinct from the thrown-error branch. Consider distinguishing "resolution returned null" from "subprocess threw" in the failures list — one is a permission/lifecycle issue, the other is infra.

  • scripts/check-wiki-private-presence.ts:67stem.toLowerCase() === entry.canonicalSlug.toLowerCase() matches the test for MarcusRBrown--Poly.md, but the canonical wiki convention (per the wiki context) is the lowercased owner--repo form. If the input slug from nameWithOwner ever contains mixed case (it can, GitHub preserves case in nameWithOwner), you'll want to verify behavior end-to-end. The case-insensitive comparison handles it, just flagging that the resolved canonicalSlug is not normalized to lowercase before storage in resolved.

  • .github/workflows/merge-data.yaml:31-41 — the second checkout fetches ref: data without specifying a token, so it falls back to GITHUB_TOKEN (job-default, contents: read). That works because data is in the same repo and readable with default perms. Worth a comment in the workflow noting why both checkouts exist (scripts from main, content from data) — future maintainers will wonder.

Missing tests

  • No test for the node: null GraphQL response shape (only thrown-error and well-formed responses are covered). The type guard rejects it, but a direct test would lock in the contract: a null node is treated as a failure, not a silent skip. Add a mockExecFileSync.mockReturnValue(JSON.stringify({data: {node: null}})) case asserting resolveCanonicalSlugs throws.

  • No test for the nameWithOwnerowner--repo slug conversion (the .replace('/', '--') on line 122). A single-test for nameWithOwner: 'acme/secret'canonicalSlug: 'acme--secret' covers it implicitly, but explicit coverage of an owner or repo containing a - or . would prevent regression if someone "improves" the replacement to a regex.

  • The deferred slug-variant cases (marcusrbrown--poly.draft.md, subdirectories) are tracked in #3328 — fine to defer, but mention them in a comment near the .replace(/\.md$/i, '') so the next reader knows the stem extraction is intentionally strict.

Risk assessment

LOW. This is additive defense-in-depth in front of an already-gated path. Action is SHA-pinned, permissions are minimal (file-level contents: read, app token only injected into the two steps that need it), no untrusted input lands in run: blocks, subprocess stderr is captured (no token echo). Fail-closed on every error path. The injection vector in the GraphQL query string only becomes meaningful in concert with the metadata-tampering bypass already tracked in #3327, and even then degrades to self-DoS rather than leak.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 26142300188
Cache hit
Session ses_1bc40437fffe70Gz0Aqpwi8QzV

@marcusrbrown marcusrbrown merged commit a53637e into main May 20, 2026
12 checks passed
@marcusrbrown marcusrbrown deleted the feat/private-merge-data-wiki-block branch May 20, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants