Conversation
Phase-1 Track B of the AI-discoverability plan: implement the two
scripts that turn `profile/tools.json` from a hand-curated file into a
generated artifact, plus the catalog-level validator that enforces the
post-P1-A contract.
profile/build/validate-catalog.py
* Validates `profile/tools.json` against `tools.schema.json` and
`profile/task_index.json` against `task_index.schema.json`, both
via Draft202012Validator from jsonschema.
* Asserts no data-key collision between the two top-level shapes —
a future-proofing guard: after P1-A the two documents share no
top-level data key. The five meta-keys ($schema / schema_compat /
schema_version / kind / _comment) are expected on both and skipped.
* argparse: --tools (default profile/tools.json),
--task-index (default profile/task_index.json).
* Returns 0 on success; non-zero with structured stderr on failure.
profile/build/build-catalog.py
* TIER_1 + TIER_2 constants: the six onboarded repos' raw-GitHub
repo.meta.json URLs (m-cli, m-stdlib, m-standard, tree-sitter-m,
m-test-engine, m-modern-corpus).
* For each manifest: fetch → validate against repo.meta.schema.json
(reusing validate-repo-meta.py's logic so the schema-check path
stays in one place) → translate to a tools.<key> summary entry.
* Translation: id / repo / role / language / license /
agent_instructions / verified_on / status (default "active") /
repo_meta_url straight from the manifest; each exposes.<kind>
becomes <kind>_url with the URL resolved against the repo's
main-branch raw prefix; consumes / consumed_by passed through.
* Top-level narrative ($schema / schema_compat / schema_version /
kind / description / org / workflow / discovery_protocol) is
copied verbatim from the prior tools.json so we don't lose
hand-curated content. task_index is NOT emitted — it stays in
its own file post-P1-A.
* --write PATH (default stdout), --prior PATH (default the
committed tools.json), --no-network (dry-run framing only),
--urls (override TIER_1+TIER_2).
* Deterministic: sorted keys, 2-space indent, trailing newline,
ensure_ascii=False (em dashes pass through). Running twice
against the same input produces byte-identical output.
profile/build/test_validate_catalog.py (10 tests)
* Baseline pair validates clean (smoke).
* Unknown top-level keys fail under additionalProperties: false
(covers task_index re-inlined, generic unknown key, inlined-facts
block in a tool entry, surprise field in task_index).
* Malformed typed IDs fail under the typedID regex (in primary
and in see_also).
* Missing required field in a tool entry fails.
* Missing file path reports clean error.
* Main(argv) exits 0 on the committed baseline.
profile/build/test_build_catalog.py (12 tests)
* Three synthetic repo.meta.json payloads — minimal, rich,
extra-exposes-kind — exercise the manifest → tools entry
translation surface end-to-end.
* Build emits all required top-level keys.
* Build never emits task_index (post-P1-A contract).
* Build preserves hand-curated top-level from prior_tools.
* Minimal meta → summary entry; single-element language collapses
to string; agent_instructions resolves to a github.com/.../blob/
URL; manifest_url derives from exposes.manifest.
* Rich meta → multi-language array stays as array; multiple
exposes become multiple *_url pointers.
* Extra-exposes-kind passes through (no hardcoded allow-list).
* Tools key strips the `tool:` prefix from id.
* Generated output validates against tools.schema.json.
* dumps() is deterministic across input-order shuffles; emits
trailing newline and sorted keys.
* Two-run determinism (B5).
* Invalid manifest raises a clear error rather than emitting a
malformed entry.
Verification
* pytest profile/build/ → 22 passed (10 + 12).
* python3 profile/build/validate-catalog.py → exits 0 against the
committed baseline.
* python3 profile/build/build-catalog.py > generated.json runs
twice → byte-identical output (B5 determinism).
* Generated output validates clean against tools.schema.json.
* make phase0-smoke → PASS (manifests unchanged).
Deferred to P1-D
* Makefile `catalog` + `validate-catalog` targets — P1-D's job.
* CI workflow `make catalog && git diff --exit-code` drift gate
and `make validate-catalog` step — P1-D's job.
* The drift between this branch's generator output and the
committed `profile/tools.json` is intentional and surfaces
exactly what the P1-D drift gate will need to address — see
PR body for the catalogued differences.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
rafael5
added a commit
that referenced
this pull request
May 11, 2026
Coordinated companion to the three tier-3 onboarding PRs that landed in parallel today: - tree-sitter-m-vscode #3 (squash-merge 1251518) - m-stdlib-vscode #2 (squash-merge e92f660) - m-cli-extras #2 (squash-merge 6e6fccf) Each entry gains Phase-0 pointer URLs: - repo_meta_url → dist/repo.meta.json - tree-sitter-m-vscode: + extension_info_url, package_json_url, language_configuration_url - m-stdlib-vscode: + extension_info_url, package_json_url - m-cli-extras: + plugins_url Also drops the placeholder "not yet onboarded" notes lines and fixes two stale fields surfaced by the onboarding PRs: - tree-sitter-m-vscode license: AGPL-3.0 → MIT (matches package.json) - tree-sitter-m-vscode agent_instructions: CLAUDE.md → AGENTS.md - m-stdlib-vscode agent_instructions: README.md placeholder → AGENTS.md Phase 2 (tier-2) and tier-3 onboardings now both COMPLETE — every non-archived repo in the org-catalog carries the Phase-0 contract. Mechanical pickup will happen via P1-B's build-catalog.py (PR #12, open for review). make phase0-smoke still PASS; tools.json validates against tools.schema.json (P1-A's strict shape).
3 tasks
…s PR The three tier-3 repos (tree-sitter-m-vscode, m-stdlib-vscode, m-cli-extras) all shipped dist/repo.meta.json today (PRs #3, #2, #2 in their respective repos; org-side companion .github PR #13 merged). Adds TIER_3 = [...] alongside TIER_1 + TIER_2; defaults the URL list to TIER_1 + TIER_2 + TIER_3 so build-catalog covers all nine manifest-bearing org repos. Without this commit the regenerated catalog would silently drop the three tier-3 entries, looking like a drift-vs-committed bug. Tests unchanged (still 22 green); local diff against committed tools.json now shows the four real semantic gaps that P1-D will need to address (m-tools archived-entry handling, consumed_by inverse-edge computation, m-stdlib manifest_url/modules_url naming, additive licenses_url/pyproject_toml_url payload pointers).
rafael5
added a commit
that referenced
this pull request
May 11, 2026
Phase 3 launch state captured. Both upstream blockers from §0 closed: - Phase 1 (org routing layer) CLOSED 2026-05-10 — A/B/C/D all merged (PRs #10/#11/#12/#16); make catalog + make validate-catalog green in CI; make catalog byte-idempotent against origin/main. - Phase 2 (tier-2 + tier-3 manifests) CLOSED 2026-05-10 — all 3 tier-2 + all 3 tier-3 repos onboarded same day; tools.json carries 9 manifest-bearing entries; m-tools archived holdout rehosted under docs/history/ via PR #17. §0 status column refreshed; verification commands inlined so any future session can re-confirm the launch state without spelunking git history. Recipe 7's MCP-server soft-dep noted as Phase-4 follow-up but not gating. Pure documentation change; no plan-structure edits beyond §0. Tracks A → B+C+D → E and stage matrices unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase-1 Track B of the AI-discoverability plan: ship the two scripts that turn
profile/tools.jsonfrom hand-curated to generated, plus the catalog-level validator that enforces the post-P1-A contract.profile/build/validate-catalog.py— strict Draft2020-12 validation oftools.jsonagainsttools.schema.json+task_index.jsonagainsttask_index.schema.json, plus a key-collision guard between the two top-level shapes (post-P1-A invariant).profile/build/build-catalog.py— fetches each of the six onboarded repos'dist/repo.meta.json, validates it, translates it to a summarytools.<key>entry (one*_urlperexposes.<kind>), and emits a deterministictools.jsoncarrying the hand-curatedorg/workflow/discovery_protocolnarrative from the prior file.TDD coverage
test_validate_catalog.py(10 tests)tools.jsonfails underadditionalProperties: false(covers re-inlinedtask_index, generic unknown key, inlined-facts block in a tool entry).task_index.jsonfails.primaryfails the typedID regex.see_alsofails.main(argv)exits 0 against the committed baseline.test_build_catalog.py(12 tests)repo.meta.jsonfixtures — minimal, rich, extra-exposes-kind — exercise the translation surface.task_indexis NOT emitted; hand-curatedorg/workflow/discovery_protocol/descriptioncarried verbatim.languagearrays collapse to strings; multi-element stay as arrays.exposes.<kind>→<kind>_urlresolving against the manifest's repo-root raw URL.agent_instructionsresolves to agithub.com/.../blob/<branch>/...URL.tool:prefix; entryidkeeps it.tools.schema.json.dumps()is deterministic across input-order shuffles; sorted keys; trailing newline.Verification (all green)
Drift vs. committed
profile/tools.json(what P1-D's drift gate will need to address)Running
build-catalog.pyagainst the live TIER_1 + TIER_2 manifests produces atools.jsonthat diverges from the committed hand-curated baseline in four categorical ways. This is expected — the build script's job is to produce a regenerated version, and the drift is exactly what the P1-D drift gate (make catalog && git diff --exit-code) will need to reconcile. Documenting here so P1-D's reviewer can plan it.1. Four hand-curated entries dropped (no
dist/repo.meta.jsonon any of them):m-cli-extras— tier-3, no manifest.m-stdlib-vscode— tier-3, no manifest.tree-sitter-m-vscode— tier-3, no manifest.m-tools— archived seed repo.These need a decision in P1-D: either (a) keep them as hand-merged additions overlaid on the generator output, (b) onboard them to the Phase-0 contract first (T3-* tasks per
current-state-inventory-priority.md§3.2), or (c) drop them and add a top-levelunonboardedlist elsewhere.2.
consumed_bylost on all 6 onboarded tools.repo.meta.schema.jsonhasconsumesbut notconsumed_by— the inverse edge is currently hand-maintained intools.jsononly. P1-D options: (a) computeconsumed_byinbuild-catalog.pyfrom the inverseconsumesgraph, or (b) extendrepo.meta.schema.jsonto allowconsumed_by(less clean — consumers shouldn't know who consumes them).3.
roletext drifts on all 6 tools. The manifests (post-Phase-0) have shorter role strings; the committedtools.jsoncarries the original longer descriptions:m-cli: "Canonical CLI toolchain — m fmt / lint / test / coverage / watch / lsp / doc / new / ..." → "Canonical M CLI — fmt / lint / test / coverage / watch / lsp / doc / new"m-stdlib: "Pure-M (and selectively $ZF-bound) runtime standard library — STD* modules" → "Pure-M runtime standard library — STD* modules"m-standard: "Citable, machine-readable M language reference reconciling AnnoStd / YottaDB / IRIS / VA SAC" → "Machine-readable M language reference"m-modern-corpus,m-test-engine,tree-sitter-m— similar drift.The manifest is canonical going forward; P1-D should let the generator overwrite.
4.
exposeskey naming drift (one instance and one addition):m-stdlib: manifest exposesmodules(→modules_url); committed baseline calls the pointermanifest_url. Manifest is canonical.m-modern-corpus: manifest exposeslicenses(→licenses_url); committed baseline doesn't carry it. Generator additively picks it up.Deferred to P1-D (per the plan and the task brief)
catalogand a strictvalidate-catalogreplacing the currentpython -m json.toolparse-only target.make catalog && git diff --exit-code profile/tools.json profile/task_index.jsondrift gate, plus themake validate-catalogstep.tools.jsonfrom the generator output and reconcile, or extend the generator's surface, or both).Files added
profile/build/build-catalog.py(executable, 274 lines)profile/build/validate-catalog.py(executable, 128 lines)profile/build/test_build_catalog.py(236 lines, 12 tests)profile/build/test_validate_catalog.py(135 lines, 10 tests)No existing files modified — Track-B work is purely additive in
profile/build/.profile/tools.jsonandprofile/task_index.jsonare untouched (the drift question is P1-D's call, not this PR's).Test plan
pytest profile/build/test_validate_catalog.pyandpytest profile/build/test_build_catalog.pygreen.python3 profile/build/validate-catalog.pyexits 0 against the committedtools.json+task_index.json.python3 profile/build/build-catalog.py | python3 -m json.tool >/dev/nullparses.build-catalog.pytwice produces byte-identical output (B5 determinism).make phase0-smokestill green (manifests unchanged).Do not auto-merge — drift catalogued above needs P1-D-level reconciliation before this generator should overwrite
profile/tools.json.🤖 Generated with Claude Code