diff --git a/.well-known/README.adoc b/.well-known/README.adoc new file mode 100644 index 0000000..166b4ac --- /dev/null +++ b/.well-known/README.adoc @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += .well-known/ — RFC 8615 web metadata +:status: populated web metadata (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +RFC 8615 well-known URIs, published at the site root when this repo is served +via GitHub Pages. This is web-standard metadata, *not* a pipeline data sink — +it is listed here only because issue #5 grouped it with the data dirs. + +Files:: +* `ai.txt` — AI interaction policy (training/summarisation/generation + permissions; PMPL Emotional Lineage requirement). Points agents at + `0-AI-MANIFEST.a2ml`. +* `humans.txt` — humanstxt.org credits/colophon. +* `security.txt` — RFC 9116 security disclosure contact. + +Writer: maintainers (hand-edited). Reader: web clients, crawlers, AI +agents, security researchers. Edited in place; not deleted. diff --git a/dispatch/README.adoc b/dispatch/README.adoc new file mode 100644 index 0000000..4f93a19 --- /dev/null +++ b/dispatch/README.adoc @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += dispatch/ — scan→triage work queue +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +Append-only JSONL work queue emitted by the panic-attacker scan → triage +stage. One JSON object per line; one finding per object. + +Files:: +* `dispatch-YYYY-MM-DD.jsonl` — the day's dispatched findings (immutable once + written). +* `pending.jsonl` — findings accepted but not yet actioned by a bot. +* `held.jsonl` — findings deferred or quarantined (e.g. low confidence, + manual-review required). + +Record shape:: +`action`, `category`, `confidence`, `auto_fixable`, `description`, +`pattern_id`, `recipe_id`, `replacement`, `repo`, `program_path`, `severity`, +`strategy`, `tier`, `timestamp`. + +Writer: the triage stage of the ingest pipeline. Reader: the autofix / +review bots. Append-only — never rewritten in place. diff --git a/docs/decisions/ADR-0002-data-subdir-population.adoc b/docs/decisions/ADR-0002-data-subdir-population.adoc new file mode 100644 index 0000000..0ea8bf0 --- /dev/null +++ b/docs/decisions/ADR-0002-data-subdir-population.adoc @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later +// Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) += ADR-0002: data subdirectories are populated and self-documented, not removed +:revdate: 2026-05-17 +:status: Accepted + +== Status + +Accepted — 2026-05-17. + +Resolves: https://github.com/hyperpolymath/verisimdb-data/issues/5[V-L3-S1]. + +Builds on: xref:ADR-0001-repo-purpose.adoc[ADR-0001] (the two-purpose framing; +these directories are purpose 1, the flat-file data store). + +== Context + +Issue #5 observed that `dispatch/`, `patterns/`, `recipes/`, `outcomes/`, +`policy/`, `health/` and `.well-known/` looked like "mostly empty placeholder +directories" and asked for a binary decision: *populate* them (one issue per +subdir to make the contents concrete) or *delete* them and rebuild as content +lands. + +That premise is now stale. ADR-0001 already declared all of these subtrees as +the repository's flat-file data store, and the panic-attacker → triage → +autofix pipeline has since filled every one of them with production data: + +* `dispatch/` — dated work-queue JSONL plus `pending.jsonl` / `held.jsonl` +* `patterns/` — `registry.json` (the cross-repo pattern registry) +* `recipes/` — ~46 `recipe-*.json` remediation recipes +* `outcomes/` — monthly applied-fix ledgers plus a fleet import +* `policy/` — `policy.ncl` (the Nickel baseline policy contract) +* `health/` — `sitrep.txt` and `hypatia.json` operational telemetry +* `.well-known/` — RFC 8615 web metadata (`ai.txt`, `humans.txt`, + `security.txt`) + +There are no empty placeholder directories left in the tree. The only real +gap against the issue's acceptance criteria was the absence of a per-directory +README explaining what each one holds. + +== Decision + +**Populate, not remove.** The data subdirectories are a permanent, declared +part of the repository (per ADR-0001) and are now backed by real, actively +written content. They are *not* placeholders and must not be deleted. + +To make the contents concrete and self-documenting — the spirit of the issue's +"one issue per subdir" alternative, achieved in one PR — each data directory +carries a `README.adoc` that states: + +. what artefact(s) it holds and their on-disk format, +. who writes it (which pipeline stage / bot) and who reads it, +. whether it is append-only or rewritten in place, +. its retention / `.gitkeep` behaviour. + +Splitting this into seven tracking issues was rejected: the directories are +already populated, so seven issues would each open and immediately close as +"add a README" with no design content. One ADR plus seven READMEs records the +decision and discharges the acceptance criteria together. + +== Consequences + +. The `dispatch/ patterns/ recipes/ outcomes/ policy/ health/ .well-known/` + directories each gain a `README.adoc`; no data files are moved or removed. +. `outcomes/.gitkeep` is retained: the ledger is monthly, so the directory + can legitimately contain only `.gitkeep` at the start of a month, and that + is now documented rather than mistaken for an empty placeholder. +. Future data directories under purpose 1 must ship with a `README.adoc` + from the first commit (close the "is this a placeholder?" question at + creation time). +. ADR-0001's directory-layout list remains authoritative for *which* subtree + serves *which* purpose; this ADR documents the *contents* of the data + subtree. + +== Alternatives considered + +Delete the directories and rebuild as content lands:: +Rejected. The content has already landed and is being written continuously by +the ingest/triage/autofix pipeline; deleting live data sinks would break that +pipeline and lose history. This alternative only made sense under the (now +false) assumption that the directories were empty. + +One tracking issue per subdirectory:: +Rejected as busywork. The directories are populated; each issue would reduce +to "add a README." Folded into this single ADR + the seven READMEs instead. + +Move telemetry / web metadata out of the data store:: +Out of scope. `health/` and `.well-known/` are small and co-located by +design; relocating them is a separate decision if either grows. diff --git a/health/README.adoc b/health/README.adoc new file mode 100644 index 0000000..fe38e54 --- /dev/null +++ b/health/README.adoc @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += health/ — operational telemetry +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +Latest-state operational telemetry for the data store and its pipeline. +These are *snapshots* (rewritten in place each run), not append-only logs — +history lives in `outcomes/` and `dispatch/`. + +Files:: +* `sitrep.txt` — human-readable situation report: last-run timestamp, + counts (scans / patterns / actions / outcomes / recipes), the + auto/review/report split, process up/down, contingency status, run time. +* `hypatia.json` — machine-readable health snapshot consumed by the + Hypatia scanner / Scorecard surface. + +Writer: the pipeline's reporting stage. Reader: humans (sitrep) and the +Hypatia / health-monitoring tooling (json). diff --git a/outcomes/README.adoc b/outcomes/README.adoc new file mode 100644 index 0000000..6c7675c --- /dev/null +++ b/outcomes/README.adoc @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += outcomes/ — applied-fix outcome ledger +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +Append-only JSONL ledger of what happened when a recipe was applied to a +repo. Feeds the recipe success/fail counters and health reporting. + +Files:: +* `YYYY-MM.jsonl` — one month of outcomes (immutable once the month closes). +* `*-fleet-import.jsonl` — bulk historical imports. +* `.gitkeep` — retained on purpose: the ledger is monthly, so at the start + of a month this directory can legitimately hold only `.gitkeep`. That is + the expected steady state, *not* an empty placeholder (see ADR-0002). + +Record shape:: +`pattern`, `repo`, `bot`, `outcome` (`success` / failure), `fixed_at`. + +Writer: the autofix/review bots after each attempt. Reader: recipe +counter rollup and `health/`. Append-only. diff --git a/patterns/README.adoc b/patterns/README.adoc new file mode 100644 index 0000000..ba4896d --- /dev/null +++ b/patterns/README.adoc @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += patterns/ — canonical cross-repo pattern registry +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +`registry.json` — the single canonical pattern registry. It deduplicates +raw findings across every scanned repo into trackable, long-lived patterns. + +Shape:: +Top-level `description`, `last_updated`, and a `patterns` map keyed by +pattern id. Each entry: `id`, `category`, `description`, `pa_rule`, +`occurrences`, `first_seen`, `last_seen`, `recipe_id` (the remediation +recipe, or `null` if none yet), `repo_paths`. + +Writer: the pattern-aggregation stage of the ingest pipeline (rewritten in +place on each run — it is a registry, not a log). Reader: triage (to attach +`recipe_id`) and reporting/health. diff --git a/policy/README.adoc b/policy/README.adoc new file mode 100644 index 0000000..ba1f098 --- /dev/null +++ b/policy/README.adoc @@ -0,0 +1,18 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += policy/ — baseline policy contract +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +`policy.ncl` — the Nickel baseline policy contract for this data store. + +Contents:: +* `enforcement` — repo hygiene flags (`require_spdx_headers`, + `require_ci_security_checks`, `block_committed_secrets`, + `require_pinned_actions`). +* `triage` — confidence thresholds: `auto_fix_min_confidence`, + `review_min_confidence`. +* `version` — policy schema version. + +Writer: maintainers (hand-edited; version-bumped on change). Reader: +`contractiles/must/Mustfile` consumes it to gate enforcement, and triage +reads the confidence thresholds to choose auto-fix vs. review vs. report. +Single file, edited in place. diff --git a/recipes/README.adoc b/recipes/README.adoc new file mode 100644 index 0000000..f64f9ae --- /dev/null +++ b/recipes/README.adoc @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: PMPL-1.0-or-later += recipes/ — remediation recipe knowledge base +:status: populated data store (see docs/decisions/ADR-0002-data-subdir-population.adoc) + +One `recipe-*.json` file per remediation recipe — the knowledge base the +autofix bot consults to turn a matched pattern into a fix. + +Shape (per file):: +`id`, `action`, `description`, `confidence`, `auto_fixable`, `fix_script` +(or `null`), `languages`, `match`, `replacement`, `pattern_ids` (the +patterns this recipe remediates), `proven_module`, `triangle_tier` +(eliminate / substitute / …), and running counters +`total_attempts` / `successful_fixes` / `failed_fixes`. + +Naming:: +`recipe-.json`; the `recipe-scorecard-*` family maps to OpenSSF +Scorecard checks. + +Writer: curated, plus counter updates from outcome ingest. Reader: the +autofix bot and triage (recipe lookup by `pattern_id`).