diff --git a/docs/AI-discoverability-guide.md b/docs/AI-discoverability-guide.md
index 9c87848..99084f5 100644
--- a/docs/AI-discoverability-guide.md
+++ b/docs/AI-discoverability-guide.md
@@ -309,7 +309,7 @@ tool:m-cli
 cmd:m-cli#test
 cmd:m-cli#ci.init
 module:m-stdlib#STDJSON
-doc:m-tools#gap-analysis
+doc:m-dev-tools#m-tool-gap-analysis
 data:m-standard#grammar-surface
 workflow:tdd_inner_loop
 task:workflow.scaffold_new_project
diff --git a/docs/AI-discoverability-plan.md b/docs/AI-discoverability-plan.md
index 144a66f..22ae365 100644
--- a/docs/AI-discoverability-plan.md
+++ b/docs/AI-discoverability-plan.md
@@ -117,7 +117,7 @@ For each repo, in order:
 | `tree-sitter-m-vscode` | exists | `package.json` already declares it; wrap | needs add | tier 3 |
 | `m-stdlib-vscode` | needs check | `package.json` + manifest discovery config | needs add | tier 3 |
 | `m-cli-extras` | unknown | dump plugin entry points to JSON | needs add | tier 3 |
-| `m-tools` | exists | **archived** — emit `status: archived` only | n/a | optional |
+| `m-tools` | archived (upstream) | **rehosted** under [`.github/docs/history/`](history/); dropped from `tools.json` | n/a | resolved |
 
 Until tier 1 emits machine-readable artifacts, the org catalog is fiction.
 
@@ -418,8 +418,11 @@ Goal: freshness, link-check, license-reconcile gates running weekly in CI.
    `CONTRIBUTING.md` is ~30 lines pointing at each repo's own contribution
    guide.
 4. **No history/archive routing in the catalog.** `m-tools` is archived;
-   either rehost its docs in `.github/docs/history/` or drop them from
-   `tools.json`. Agents care about the current shape.
+   its three design docs are rehosted under
+   [`.github/docs/history/`](history/) and routed via `task_index.json`'s
+   `history` category as `doc:m-dev-tools#<slug>` typed IDs. The
+   `tool:m-tools` entry is dropped from `tools.json` — agents care about
+   the current shape, not retired tools.
 5. **No human/AI documentation split per repo.** One `AGENTS.md` per repo;
    AI-specific sections marked inline. Two parallel docs always drift.
 
diff --git a/docs/history/README.md b/docs/history/README.md
new file mode 100644
index 0000000..3fe36da
--- /dev/null
+++ b/docs/history/README.md
@@ -0,0 +1,55 @@
+# Historical documents
+
+Frozen, in-org copies of design documents from now-archived repositories in
+the `m-dev-tools` org. The original repos remain read-only on GitHub; these
+copies exist so the *why* behind the current org shape stays discoverable
+inside `.github` itself, immune to upstream pruning or renames.
+
+These documents are **not maintained**. They reflect the state of the world
+at the moment they were imported. For the *current* shape of the org, start
+at [`profile/README.md`](../../profile/README.md) and
+[`profile/tools.json`](../../profile/tools.json).
+
+## Contents
+
+| Document | Source | Imported from commit | Why it's preserved |
+|---|---|---|---|
+| [`m-tool-gap-analysis.md`](m-tool-gap-analysis.md) | `m-dev-tools/m-tools/docs/m-tool-gap-analysis.md` | [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27) | The Go/Rust/Python toolchain comparison that produced the `m <subcommand>` design and `m-cli`'s CLI ergonomics. |
+| [`m-tooling-tier1.md`](m-tooling-tier1.md) | `m-dev-tools/m-tools/docs/m-tooling-tier1.md` | [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27) | The scoped Tier-1 strategy that defined what `m-cli` shipped first (fmt / lint / test / coverage / watch / LSP). |
+| [`gap-analysis-and-remediation-strategy.md`](gap-analysis-and-remediation-strategy.md) | `m-dev-tools/m-tools/docs/gap-analysis-and-remediation-strategy.md` | [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27) | The phased remediation roadmap that produced both `m-cli` and `m-stdlib`. |
+
+## Provenance policy
+
+- **Imported verbatim**, with a single `> Archived snapshot.` banner added
+  after each H1 to make the rehosting fact visible inline.
+- **No rewrites, no link-rot patching**, except where a *sibling-doc* link
+  pointed at a file we did not rehost — those links were retargeted at the
+  archived upstream repo (read-only) so they still resolve.
+- **Typed IDs** for these documents live under
+  [`profile/task_index.json`](../../profile/task_index.json) (category
+  `history`). The grammar is `doc:m-dev-tools#<filename-without-extension>`.
+
+## Adding a new historical doc
+
+Trigger: another repo in the org is archived and contains design rationale
+that future agents/contributors will benefit from reading.
+
+1. Copy the file(s) verbatim into this directory.
+2. Add the `> Archived snapshot.` banner immediately after the H1, citing
+   the source repo, source commit hash, and date.
+3. Append a row to the table above.
+4. Add a `doc:m-dev-tools#<slug>` typed ID to `task_index.json` under
+   the `history` category, with an `intent` line that names the
+   plain-English question the doc answers.
+5. Run `make validate-catalog` to confirm the typed IDs validate.
+6. Open a PR titled `chore(history): rehost <repo>/<path>`.
+
+## Not on this list
+
+- **`m-tools/docs/implementation.md`** — implementation log; superseded by
+  `m-cli/docs/evolution.md` and `m-cli/docs/plans/m-cli-history-and-evolution.md`.
+- **`m-tools/docs/ydb-dev-tools-gap-analysis.md`** — 10-line stub; no
+  preserved content worth rehosting.
+
+Both remain reachable in the archived `m-tools` repo on GitHub for anyone
+who wants the deeper context.
diff --git a/docs/history/gap-analysis-and-remediation-strategy.md b/docs/history/gap-analysis-and-remediation-strategy.md
new file mode 100644
index 0000000..85cdbc7
--- /dev/null
+++ b/docs/history/gap-analysis-and-remediation-strategy.md
@@ -0,0 +1,1313 @@
+# M Tools — Gap Analysis and Remediation Strategy
+
+> **Archived snapshot.** Imported verbatim from [`m-dev-tools/m-tools`](https://github.com/m-dev-tools/m-tools) — source commit [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27), before that repo was archived. Preserved as the original phased remediation roadmap that scoped what became `m-cli` and `m-stdlib`. **Not maintained.** For the *current* shape of the org, start at [`profile/README.md`](../../profile/README.md).
+
+**Document type:** Strategic planning
+**Scope:** Developer toolchain for the M (MUMPS) language
+**Audience:** Developers building productivity tools for the M ecosystem
+**Sibling document:** [`implementation.md`](https://github.com/m-dev-tools/m-tools/blob/main/docs/implementation.md) — what's actually shipped *(not rehosted; resolves to the archived m-tools repo, which remains read-only on GitHub)*
+
+---
+
+## Scope and portability
+
+This document analyses the developer experience for the M (MUMPS) programming language. M itself is a portable, ISO-standardised language — implementations include InterSystems IRIS, YottaDB, GT.M, and historically others. The *toolchain* this analysis recommends (linters, formatters, AST tools, package managers) operates over `.m` source code and is portable in principle to any conformant M runtime.
+
+In practice this project uses **YottaDB as the foundation runtime** for two reasons:
+
+1. **YottaDB is open source under AGPL-3.0**, which makes it the most portable foundation for non-commercial / community tooling — anyone can install it and run the full toolchain end-to-end without licence negotiation. A toolchain bound to a closed-source runtime would be unreproducible for most contributors and unusable in CI without per-developer licensing. Open source means the toolchain is genuinely portable in the practical sense, not just the theoretical sense.
+2. **YottaDB's command-line surface is mature and well-documented.** `mupip` (database management), `gde` (global directory editor), `lke` (lock examination), `dse` (database structure editor), and the `ydb` runtime — together with the `%XCMD` mechanism for one-shot M execution — give a concrete substrate to integrate with. These vendor tools are not wrapped or renamed by this project; they are used directly, with their own `--help` as canonical documentation.
+
+Where this matters in the analysis: tool *recommendations* (e.g., "auto-formatter using `tree-sitter-m`") are M-portable. Tool *implementations* that touch a runtime (e.g., the test runner, coverage instrumentation, the trace tail) are YottaDB-bound today and would need a runtime-adapter layer to run against IRIS or other implementations. The shell-level naming convention reflects this split: portable analysis commands use `m <subcommand>`; runtime-bound commands use `ydb <subcommand>`. See [implementation.md](implementation.md) for the canonical command map and as-built status.
+
+---
+
+## Table of Contents
+
+- [1. Introduction — The Problem](#1-introduction--the-problem)
+- [2. Comprehensive Gap Analysis](#2-comprehensive-gap-analysis)
+- [3. Strategic Recommendations](#3-strategic-recommendations)
+  - [3.1 Tier 1 — Immediate (high impact, low effort)](#31-tier-1--immediate-high-impact-low-effort)
+  - [3.2 Tier 2 — Short term (high impact, medium effort)](#32-tier-2--short-term-high-impact-medium-effort)
+  - [3.3 Tier 3 — Medium term (medium impact, medium/high effort)](#33-tier-3--medium-term-medium-impact-mediumhigh-effort)
+  - [3.4 Tier 4 — Long term / aspirational](#34-tier-4--long-term--aspirational)
+- **[Addendum A: Technology-Optimal Remediation Strategy](#addendum-a-technology-optimal-remediation-strategy)**
+  - [A.1 — The Foundation Problem: MUMPS Needs a Parser](#a1--the-foundation-problem-mumps-needs-a-parser)
+  - [A.2 — Technology Selection Matrix](#a2--technology-selection-matrix)
+  - [A.3 — The Database Layer: ZWR Format as Universal Interface](#a3--the-database-layer-zwr-format-as-universal-interface)
+  - [A.4 — The Instrumentation Layer: Observability Without a Profiler](#a4--the-instrumentation-layer-observability-without-a-profiler)
+  - [A.5 — Per-Gap Remediation (Major Gaps 🔴)](#a5--per-gap-remediation-major-gaps-)
+  - [A.6 — Per-Gap Remediation (Moderate Gaps 🟡)](#a6--per-gap-remediation-moderate-gaps-)
+- **[Addendum B: Prioritized Sequence of Remediation (Post-Parser)](#addendum-b-prioritized-sequence-of-remediation-post-parser)**
+  - [B.1 — Sequencing Principles](#b1--sequencing-principles)
+  - [B.2 — Phase 1: Canonicalise the Codebase](#b2--phase-1-canonicalise-the-codebase)
+  - [B.3 — Phase 2: Catch Bugs Before Runtime](#b3--phase-2-catch-bugs-before-runtime)
+  - [B.4 — Phase 3: Replace Approximations with Truth](#b4--phase-3-replace-approximations-with-truth)
+  - [B.5 — Phase 4: Interactive Surfaces (No Parser Dep)](#b5--phase-4-interactive-surfaces-no-parser-dep)
+  - [B.6 — Phase 5: Ecosystem Layer](#b6--phase-5-ecosystem-layer)
+  - [B.7 — Cross-Cutting: Umbrella Dispatcher Rename](#b7--cross-cutting-umbrella-dispatcher-rename)
+  - [B.8 — Sequence Summary](#b8--sequence-summary)
+- **[Appendix B: Gold Standard — Top 5 Language Toolchains](#appendix-b-gold-standard--top-5-language-toolchains)**
+  - [B.1 Python](#b1-python)
+  - [B.2 JavaScript / TypeScript](#b2-javascript--typescript)
+  - [B.3 Go](#b3-go)
+  - [B.4 Rust](#b4-rust)
+  - [B.5 Java](#b5-java)
+- **[Appendix C: What Ships with YottaDB (Foundation Runtime)](#appendix-c-what-ships-with-yottadb-foundation-runtime)**
+  - [C.1 Runtime and Interactive Tools](#c1-runtime-and-interactive-tools)
+  - [C.2 MUPIP — Database Management Utility](#c2-mupip--database-management-utility)
+  - [C.3 Auxiliary Utilities](#c3-auxiliary-utilities)
+  - [C.4 MUMPS Intrinsic Debugging Commands](#c4-mumps-intrinsic-debugging-commands)
+  - [C.5 Percent-Sign Utility Routines](#c5-percent-sign-utility-routines)
+
+---
+
+## 1. Introduction — The Problem
+
+### Background
+
+MUMPS (Massachusetts General Hospital Utility Multi-Programming System), now standardized as M, is a programming language and integrated hierarchical database that has been in continuous production use since 1966. It powers the majority of the world's large-scale healthcare IT infrastructure — Epic Systems, MEDITECH, the U.S. Department of Veterans Affairs' VistA system, and many others collectively manage hundreds of millions of patient records in MUMPS databases. M is implemented by several runtimes: InterSystems IRIS (commercial), YottaDB (open source, the foundation used here), GT.M (the open-source ancestor of YottaDB), and historically by several other vendors.
+
+Despite this operational scale and longevity, the developer experience around M has received comparatively little investment in tooling. The language itself predates virtually every modern software development practice: unit testing, continuous integration, static analysis, code coverage, package management, and automated formatting all emerged decades after M was in widespread use. As a result, the ecosystem of developer productivity tools that mainstream language communities take for granted simply does not exist in the M world.
+
+### The Core Problem
+
+A developer arriving at an M codebase from Python, Go, JavaScript, Rust, or Java faces a jarring regression in developer experience. The gap is not merely cosmetic — it affects every stage of the development lifecycle:
+
+**Edit:** No formatter exists. Code style is enforced only by convention and discipline. There is no equivalent of `black`, `gofmt`, or `prettier` to keep a codebase consistent without manual effort.
+
+**Lint:** The only available static check is syntax validation (`zcompile`). There is no analysis of logic errors, unused variables, unreachable code, missing QUIT statements, undefined labels, or style violations. Python's `ruff`/`pylint`, Go's `golangci-lint`, and Rust's `clippy` all catch categories of bugs before runtime; M has nothing comparable.
+
+**Test:** A test framework (`TESTRUN.m`) exists in this project, but the tooling around it is primitive. There is no way to run a single test case without running the entire suite, no coverage measurement, no test history, and no parallel execution. The `make watch` command reruns *all* tests on every file save — a workflow that degrades as the test suite grows.
+
+**Debug:** M has built-in debugging commands (`ZBREAK`, `ZSTEP`, `ZSHOW`) but they are interactive and require entering the runtime manually. There is no scriptable debugger, no conditional breakpoint wrapper, and no integration with any IDE debugger protocol.
+
+**Observe:** The integrated database is both a strength and an observability challenge. Globals are persistent and shared across processes, which makes it easy to accidentally carry test state between runs. There is no tool to snapshot the database state before a test, compare it after, or reliably reset it to a known fixture. The trace log (`^trace`) exists but cannot be tailed live.
+
+**Integrate:** There are no pre-commit hooks, no CI pipeline script, no coverage gate, and no automated quality check that runs before code is committed or deployed.
+
+### Why This Matters
+
+The consequence of this tooling gap is not merely inconvenience. It means that:
+
+1. **Bugs that would be caught automatically in other ecosystems reach manual testing** — or production. A Go developer's `go vet` or a Python developer's `mypy` catches entire categories of errors before a single test runs. In M, these categories are only caught when the faulty code path is manually exercised.
+
+2. **The feedback loop is slower and more manual.** A Rust developer running `cargo watch -x test` gets sub-second feedback on every save. An M developer runs `make test`, waits for all 11 suites, and manually reads output. As the codebase grows, this degrades.
+
+3. **Onboarding new developers is harder.** Modern languages have toolchains that enforce consistency and provide guardrails. M has neither, so new developers must learn conventions that are undocumented and unenforced.
+
+4. **The barrier to contribution is higher.** Open-source projects with good tooling (formatters, lint gates, coverage requirements) attract more contributors because the bar for a "correct" contribution is clear and automatically checkable.
+
+### The Strategic Opportunity
+
+YottaDB's runtime is mature, performant, and POSIX-compliant. The runtime provides powerful hooks — `%XCMD` for one-shot execution, `$ZHOROLOG` for microsecond timing, `ZSHOW` for full process introspection, `mupip extract` for database export, and a straightforward routine compilation model. These are the building blocks of a complete developer toolchain. What is missing is the shell toolchain layer that assembles these primitives into a coherent, ergonomic developer experience comparable to what Python, Go, and Rust developers have. Because YottaDB is open source, every layer of this toolchain is reproducible without licence negotiation — a property no closed-source M runtime can offer.
+
+This document surveys what currently exists, maps the complete gap against the toolchains of the five most popular programming languages (see [Appendix B](#appendix-b-gold-standard--top-5-language-toolchains) for the per-language reference tables), and proposes a prioritized roadmap of shell tools that can be built now using existing YottaDB capabilities.
+
+---
+
+## 2. Comprehensive Gap Analysis
+
+This chapter maps every significant developer toolchain category against four reference points: what the gold standard provides (synthesized from the toolchains of Python, JavaScript/TypeScript, Go, Rust, and Java — see [Appendix B](#appendix-b-gold-standard--top-5-language-toolchains) for the per-language tables), what YottaDB ships with natively (see [Appendix C](#appendix-c-what-ships-with-yottadb-foundation-runtime)), what this project has built (see [implementation.md](implementation.md) for the live status), and the remaining gap with severity.
+
+**Severity key:** 🔴 Major gap (daily pain) · 🟡 Moderate gap (occasional friction) · 🟢 Minor gap or N/A
+
+**Status legend:** ✅ shipped (Tier 1–3) · 🟢 unblocked (parser foundation now exists in [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m); Tier 4 tool not yet built) · ⏸ deferred (no parser dep; awaiting demand) · 🟢/🟡/🔴 = original severity
+
+| Category | Gold Standard | YDB Native | This Project | Original Sev | Status |
+|----------|--------------|------------|--------------|--------------|--------|
+| **Syntax check** | Per-file, fast, exit-code | `zcompile` via `%XCMD` | `ycheck` | 🟢 | ✅ shipped (with known exit-code bug — see TODO.md) |
+| **Interactive REPL** | History, completion, multiline | `ydb` direct mode (bare) | `yeval` (single expression) | 🟡 | ⏸ Tier 4 (`yrepl` — needs prompt_toolkit) |
+| **Lint — style** | Configurable style rules | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m AST visitor) |
+| **Lint — logic** | Unused vars, unreachable code, missing returns | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m + CFG analysis) |
+| **Lint — deep** | Data flow, type errors, null safety | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m + whole-program call graph) |
+| **Auto-formatter** | Zero-config, deterministic | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m CST pretty-printer) |
+| **Type checking** | Full static type analysis | N/A (untyped language) | N/A | 🟢 | N/A by language design |
+| **Run all tests** | `make test` / `cargo test` | Nothing | `make test` | 🟢 | ✅ pre-existing |
+| **Run one suite** | Select by name/path | Nothing | `ytest <suite>` | 🔴 | ✅ Tier 1 |
+| **Run one test** | Select individual test case | Nothing | `ytest <suite> <label>` | 🔴 | ✅ Tier 1 |
+| **Test watcher** | Smart — reruns only affected | Nothing | `ytest-watch-smart` | 🟡 | ✅ Tier 2 |
+| **Test output** | Structured (TAP, JUnit XML) | Plain text | `ytap` (TAP-13) | 🟡 | ✅ Tier 3 |
+| **Test history** | Pass/fail trends over time | Nothing | Nothing | 🟡 | ⏸ Tier 4 / future |
+| **Coverage — line** | Which lines executed | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m identifies executable lines for source instrumentation) |
+| **Coverage — branch** | Which branches taken | Nothing | Nothing | 🔴 | 🟢 unblocked (tree-sitter-m branch-aware injection) |
+| **Coverage report** | HTML / lcov / badge | Nothing | `ycover` (label-entry, JSON or table) | 🔴 | ✅ Tier 3 (approximate) |
+| **Benchmarking** | Repeatable, statistical | Nothing | `yperf` ($ZHOROLOG, μs precision) | 🟡 | ✅ Tier 3 |
+| **Profiling** | Call graph, flame graph | `$ZHOROLOG` (manual) | Nothing | 🟡 | 🟢 unblocked (tree-sitter-m + source instrumentation share the coverage pipeline) |
+| **Debugger — interactive** | Breakpoints, step, inspect | `ZBREAK`/`ZSTEP`/`ZSHOW` (manual) | Nothing | 🟡 | ⏸ Tier 4 (`ydebug`) |
+| **Debugger — scriptable** | Conditional BPs, watchpoints | Nothing | Nothing | 🔴 | ⏸ Tier 4 |
+| **Symbol introspection** | List functions/exports | `%RD` (compiled routines only) | `ywhat` | 🟡 | ✅ Tier 1 |
+| **Documentation gen** | Extract comments → HTML/MD | Nothing | `ydoc` (Markdown) | 🟡 | ✅ Tier 3 |
+| **Dependency mgmt** | Lockfile, versioned packages | Nothing | Nothing | 🔴 | ⏸ Tier 4 (manifest format to be designed in `m-standard`) |
+| **DB export** | Dump state to portable format | `mupip extract`, `%GO` | `yexport` (json/zwr/raw) | 🟡 | ✅ Tier 2 |
+| **DB import / fixture load** | Load known state for tests | `mupip load`, `%GI` | `yseed` (auto-detect format) | 🔴 | ✅ Tier 2 |
+| **DB diff** | What changed between runs | Nothing | `ydiff` (+/-/~ markers) | 🔴 | ✅ Tier 2 |
+| **DB state snapshot** | Before/after comparison | `mupip extract` (manual) | `ydiff before/after`, `ysnapshot` | 🔴 | ✅ Tier 2/3 |
+| **DB global sizing** | Node counts, storage usage | `mupip size` | `yglobsize` (nodes + blocks) | 🟡 | ✅ Tier 2 |
+| **DB reset / clean** | Wipe test globals reliably | `kill` in test teardown | `yclean` (named groups) | 🔴 | ✅ Tier 1 |
+| **DB integrity check** | Verify database not corrupt | `mupip integ` | Nothing wired | 🟡 | ⏸ future |
+| **Live log tail** | Stream output in real time | Nothing | `ylog` (poll + filter) | 🟡 | ✅ Tier 1 |
+| **Pre-commit hooks** | Block bad commits | Nothing | `yhook install/run/uninstall` | 🟡 | ✅ Tier 1 |
+| **CI pipeline** | One-command full check | Nothing | `yci`, `yci --report` | 🟡 | ✅ Tier 1 |
+| **Environment check** | Verify full toolchain | Nothing | `make check-env` (minimal, but `yci` wraps it) | 🟡 | ✅ Tier 1 (via yci) |
+| **Scaffolding** | New module/test template | Nothing | `ynew` (module + test + Makefile injection) | 🟡 | ✅ Tier 3 |
+| **Security scan** | Dependency CVE check | Nothing | Nothing | 🟢 | N/A (no dependencies) |
+| **Complexity metrics** | Cyclomatic complexity | Nothing | Nothing | 🟡 | 🟢 unblocked (tree-sitter-m AST visitor) |
+| **Dead code detection** | Unused labels/variables | Nothing | Nothing | 🟡 | 🟢 unblocked (tree-sitter-m + reachability over call graph) |
+| **Snapshot testing** | Compare output to baseline | Nothing | `ysnapshot create/check/update` | 🟡 | ✅ Tier 3 |
+| **Parallel tests** | Run suites concurrently | Nothing | Nothing | 🟡 | ⏸ Tier 4 |
+| **Test fixtures** | Composable, scoped test state | Nothing | `yseed` + `yclean` cover the foundation | 🔴 | ✅ Tier 1+2 |
+| **Crash / lockup cleanup** | Recover from bad process exit | `mupip rundown`, `lke` | `yrundown` | 🟡 | ✅ Tier 2 |
+
+**Original severity counts (still meaningful as a baseline):** 🔴 Major: 16 · 🟡 Moderate: 20 · 🟢 Minor/N/A: 4
+**Closed by Tier 1–3:** 11 of 16 majors · 12 of 20 moderates · all 4 minors-or-N/A handled.
+**Unblocked by `tree-sitter-m` v1.0** (parser foundation now ships — see [implementation.md → Parser-foundation status](implementation.md#41-parser-foundation-status-the-unlock-for-tier-4)): 4 of the 5 remaining majors (lint-style, lint-logic, lint-deep, auto-formatter) plus 4 moderates (profiling, complexity-metrics, dead-code, line/branch-coverage). Tools themselves are not yet built — they are downstream consumers of the parser.
+**Still genuinely open (no parser dep, awaiting demand):** scriptable-debugger (DAP server), interactive-REPL (`yrepl` Phase 1 = `rlwrap ydb`), test-history (SQLite trend store), interactive-debugger, dependency-mgmt (manifest design in `m-standard`), DB-integrity (wrap `mupip integ`), parallel-tests (test-isolation refactor).
+
+---
+
+## 3. Strategic Recommendations
+
+### Prioritization Criteria
+
+Tools are ranked by the product of:
+- **Daily friction:** How often does this gap cause pain in a normal edit-test-commit cycle?
+- **Build effort:** How hard is this to build with existing YDB primitives?
+- **Ecosystem unlock:** Does this tool enable other tools (e.g., fixture management enables reliable testing)?
+
+---
+
+### 3.1 Tier 1 — Immediate (high impact, low effort) — ✅ DONE 2026-04-25
+
+All six shipped in the same session as the analysis itself. Each tool closes the original "Why Now" friction via shell-only implementation; no MUMPS-side changes were required for Tier 1.
+
+| Tool | Closes Gap | Status |
+|------|-----------|--------|
+| **`ytest`** | Single suite / single test execution | ✅ |
+| **`yclean`** | DB reset / test isolation | ✅ — 7 named groups (`tasks`, `trace`, `txn`, `idx`, `fixtures`, `demo`, `safe`) |
+| **`ylog`** | Live trace tail | ✅ — polls `$$count^trace()` at 0.5s; supports `--n`, `--clear`, `--filter` |
+| **`ywhat`** | Symbol introspection | ✅ — pure awk over column-1 lines |
+| **`yhook`** | Pre-commit hooks | ✅ — refuses to overwrite a hand-written hook (marker line check) |
+| **`yci`** | CI pipeline | ✅ — `--fast` and `--report` modes |
+
+---
+
+### 3.2 Tier 2 — Short term (high impact, medium effort) — ✅ DONE 2026-04-25
+
+| Tool | Closes Gap | Implementation Note |
+|------|-----------|---------------------|
+| **`ydiff`** | DB diff / state change tracking | ✅ — chose flat `^ref=value` dumps + `diff -u` + awk to pair `-`/`+` lines into `~` change lines (simpler than parsing ZWR) |
+| **`yexport`** | DB export | ✅ — three formats: `json` (via `exportJson^yutil`), `zwr` (`mupip extract`), `raw` (flat dump) |
+| **`yseed`** | DB fixture loading | ✅ — auto-detects format; JSON path uses python3 to emit `set` commands and pipes them to `$YDB -direct` |
+| **`ytest-watch-smart`** | Targeted test watcher | ✅ — pure-bash `stat -c %Y` polling (no `entr`/`inotifywait` dep); `<NAME>TST` convention mapping |
+| **`yglobsize`** | Global size reporting | ✅ — exact node count via `count^yutil`; storage blocks via `mupip size` (with stderr→stdout redirect) |
+| **`yrundown`** | Crash cleanup | ✅ — refuses to run if other YDB processes are alive; `--dry`, `--locks`, `--db` flags |
+
+**New helper module:** [`routines/yutil.m`](../routines/yutil.m) — small MUMPS-side helpers (`count`, `dump`, `exportJson`, `bench`, `listGlobals`) since argless `FOR` loops fail through `%XCMD`'s wrapper. Shell tools call labels directly via `$YDB -run <label>^yutil <arg>`.
+
+---
+
+### 3.3 Tier 3 — Medium term (medium impact, medium/high effort) — ✅ DONE 2026-04-25
+
+| Tool | Closes Gap | Implementation Note |
+|------|-----------|---------------------|
+| **`ydoc`** | Documentation generation | ✅ — pure awk; emits H2 per routine, H3 per label; skips `tXxx` test labels and `;@TEST` decorations |
+| **`yperf`** | Benchmarking | ✅ — `bench^yutil` (3 warmups + N measurements with `$ZHOROLOG`); awk computes mean/median/p95/min/max/stddev/outliers |
+| **`ynew`** | Scaffolding | ✅ — generates module + test + Makefile injection (python3 helper for the Makefile edit) |
+| **`ycover`** | Coverage approximation | ✅ — ZBREAK at every label, run all suites in one YDB process, diff `^ycov` against discovered label set; reports per-routine % |
+| **`ytap`** | TAP output | ✅ — awk transformer over `ytest` output; `1..N` plan emitted at end |
+| **`ysnapshot`** | Snapshot testing | ✅ — `create`/`check`/`update`/`list`/`show`/`rm`; baselines in `fixtures/snapshots/<name>.txt` |
+
+**Real test gaps surfaced by `ycover`** (current state, 69.1% coverage):
+- `server.m` 0% (9 labels) — no test suite exists
+- `taskscli.m` 0% (6 labels) — CLI exercised only via shell, not unit tests
+- `trace.m` 0% (6 labels) — used as a side effect, never asserted on
+- `ystate.m` 0% (3 labels) — has the known parse bug from TODO.md
+- `yutil.m` 0% (5 labels) — new helper, no dedicated suite
+
+---
+
+### 3.4 Tier 4 — Long term / aspirational — 🟢 PARSER FOUNDATION SHIPPED 2026-04-26
+
+The remaining tools all share one root prerequisite: **a real MUMPS parser**. Hand-rolled regex/awk approaches hit ceiling fast (postconditionals, dot blocks, naked references, indirection). The original strategic plan called for splitting the parser work into separate repos so the parser could mature on its own lifecycle — that work is now done.
+
+| Project | Purpose | Status |
+|---------|---------|--------|
+| **[`m-standard`](https://github.com/rafael5/m-standard)** | Authoritative reference for the MUMPS language: integrated, citable, machine-readable spec layer reconciling AnnoStd (ISO 11756), YottaDB docs, IRIS docs, and VA SAC/XINDEX into a unified grammar-surface JSON. Also home to the dependency manifest format and any cross-cutting standards documents. | ✅ **v1.0 tagged** for AnnoStd + YottaDB scope; end-to-end pipeline green; all 9 validation gates passing in CI. v0.2 in progress for IRIS + SAC additions. |
+| **[`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m)** | The implementation layer. Production tree-sitter grammar generated from `m-standard`'s grammar-surface (949 keyword forms, schema-pinned). Bindings scaffolded for Node / Rust / Python / Go. **Note:** the original plan called for a "Lark phase 1, Tree-sitter phase 2" split under a single `m-grammar` repo. In practice, `tree-sitter-m` was built directly against `m-standard`'s grammar-surface and the Lark phase was skipped — the schema-pinned grammar-surface JSON gave enough structure that the iteration speed argument for Lark went away. | ✅ **v1.0 grammar work complete.** 99.06% clean on the full 39,330-routine VistA corpus; 100% of clinical packages. 10k-line synthesised routine parses in 78.6 ms. 110 corpus tests + 19 lib tests + 347/347 keyword-coverage triples all green. Remaining: publish bindings to npm/crates.io/PyPI/Go, AD-03 stamping integration, perf budget in CI. |
+| **[`tree-sitter-m-vscode`](https://github.com/rafael5/tree-sitter-m-vscode)** | VS Code extension exercising the grammar end-to-end. Two-layer highlighting (TextMate cold-load + tree-sitter-m WASM semantic-tokens) demonstrates the editor-integration success criterion. | ✅ **v0.1 working.** `vsce package` produces a 1.27 MB `.vsix` bundling the parser WASM + web-tree-sitter runtime. Marketplace `vsce publish` gated only on a Personal Access Token from dev.azure.com. |
+
+With the parser foundation in place, the Tier 4 tools become straightforward downstream consumers. The remaining work is on the *tools themselves*, not the prerequisite:
+
+| Tool | Depends on | Status / Notes |
+|------|-----------|----------------|
+| **`yfmt`** (→ `m fmt`) | tree-sitter-m AST + pretty-printer | 🟢 Ready to build. Use lossless byte-range mode to preserve comments. |
+| **`ylint-deep`** (→ `m lint --deep`) | tree-sitter-m AST + call graph | 🟢 Ready to build. Visitor pattern with rule predicates; configurable warning-set. |
+| **`ylint-style`** / **`ylint-logic`** (→ `m lint --style` / `--logic`) | tree-sitter-m AST visitor | 🟢 Ready to build. Style + control-flow rules over the AST. |
+| **`ycov-line` / `ycov-branch`** (→ `ydb cover --line` / `--branch`) | tree-sitter-m for instrumentation-point identification + `^ycov` global | 🟢 Ready to build. Replaces today's label-entry-only `ycover`. |
+| **`ydebug`** (→ `ydb debug`) | YDB `ZBREAK`/`ZSTEP` + DAP server (no parser dep) | ⏸ No parser dep; deferred on demand. Wraps existing YDB primitives in a Debug Adapter Protocol server. |
+| **`yrepl`** (→ `ydb repl`) | `prompt_toolkit` + tree-sitter-m (for completion) | ⏸ Phase 1 = `rlwrap ydb` (no parser dep, ships now); Phase 2 uses tree-sitter-m for tab-completion. |
+| **`yparallel`** (→ `ydb test --parallel`) | Global-isolation conventions in test suites | ⏸ No parser dep; blocked on test-suite isolation discipline. |
+| **`ydb-pkg`** (→ `m pkg`) | TOML manifest spec in `m-standard` + installer script | ⏸ Manifest format design pending in `m-standard`. |
+
+**Decision (revised 2026-04-26):** the parser foundation is now shipped, so the strategic question shifts from *should we build a parser?* to *which downstream tools are worth building, in what order?* The natural sequencing follows daily friction: `yfmt` (zero current solution; canonicalises the codebase), then `ylint-style` + `ylint-logic` (catches bug categories before runtime), then `ycov-line` (replaces today's approximate `ycover`). `ylint-deep`, `ydebug`, `yrepl` Phase 2, and `ydb-pkg` are larger investments and can wait.
+
+For the technology selection, parser hard problems, and the rationale for splitting into `m-standard` (spec) vs `tree-sitter-m` (impl), see [Addendum A](#addendum-a-technology-optimal-remediation-strategy).
+
+---
+
+## Addendum A: Technology-Optimal Remediation Strategy
+
+This addendum provides a technology-first remediation plan for every Major (🔴) and Moderate (🟡) gap identified in the gap analysis. It is structured as an engineering specification: each section names specific libraries, parser technologies, and integration patterns. The goal is not a wish list but a buildable roadmap grounded in how comparable ecosystems have solved identical problems.
+
+---
+
+### A.1 — The Foundation Problem: MUMPS Needs a Parser
+
+Almost every high-value gap in this analysis — linting, formatting, dead code detection, documentation generation, symbol introspection, complexity metrics, snapshot testing — shares a single prerequisite: the ability to transform MUMPS source text into a structured representation that a program can reason about. Regex-based approaches have been tried in MUMPS tooling for decades and consistently fail at the same boundaries: postconditional expressions embedded in commands, the distinction between a DO block's dot-notation and an argument list, naked references, and the interaction between `IF`/`ELSE` and the `$TEST` special variable. A proper parser is not a luxury; it is the foundation.
+
+**The MUMPS Grammar's Hard Problems**
+
+Any grammar for MUMPS must handle the following without ambiguity:
+
+- **Column-1 labels.** A MUMPS source file is not free-form. A label must begin in column 1; everything indented is a command. This is a lexer-level concern — the tokenizer must be line-position-aware.
+- **Postconditionals.** `DO:condition label` and `SET:condition var=val` attach conditions directly to commands and arguments, not as separate control structures. The grammar must represent these as optional decorated nodes on every command.
+- **FOR variants.** `FOR i=1:1:10`, `FOR i=1,3,5`, and `FOR` (infinite loop with no argument) are three distinct syntactic forms with the same keyword.
+- **Dot-block indentation.** MUMPS has no braces. `DO` blocks are delimited by leading dots: one dot for one level of nesting, two dots for two levels. This is whitespace-significant at the token level, not the grammar level.
+- **String literal escaping.** The only escape sequence in MUMPS string literals is `""` (doubled quote) to represent a literal quote. Parsers that assume backslash escaping will silently misparse.
+- **Extended indirection.** `@var` evaluates `var` as a name, and `@var@(subscript)` evaluates it as an array reference. The `@` operator is legal in argument positions across nearly every command.
+- **DO/ELSE/IF interaction.** MUMPS `ELSE` does not attach to an `IF` syntactically; it tests `$TEST`, which is a global side effect modified by `IF`, `DO`, and certain other commands. A formatter that reformats `IF`/`ELSE` pairs without understanding this will silently break code.
+- **Naked references.** `^(subscript)` reuses the last-used global name. This makes static data-flow analysis context-sensitive in a way that most languages do not have.
+
+**Parser Technology Survey**
+
+| Technology | Strengths | Weaknesses | Verdict |
+|---|---|---|---|
+| **ANTLR4** | Mature, generates parsers in Python/Java/Go/Rust/C#, large community, good error recovery, IDE grammar tooling | Java toolchain dependency, generated code is verbose, LL(*) has trouble with left-recursive grammars | Strong candidate |
+| **Tree-sitter** | Incremental parsing, excellent IDE integration (Neovim/Helix/Emacs native), generates C with bindings to any language, handles error recovery gracefully | Grammar language is Rust-influenced DSL with a learning curve, less documentation than ANTLR | Best long-term choice |
+| **Lark (Python PEG/Earley)** | Pure Python, EBNF grammar files, no code generation step, `lark.Token` trees are Pythonic, Earley handles ambiguous grammars | Slower than compiled parsers, Earley mode is O(n³) worst case, not suitable for IDE incremental use | Best for rapid prototyping |
+| **pest.rs (Rust PEG)** | Extremely fast, safe memory model, excellent for CLI tools | Rust-only bindings, grammar in `.pest` DSL is less standard, high barrier for contributors | Good if Rust is already in stack |
+| **flex/bison (lex/yacc)** | Decades of precedent, C output, small runtime | LALR(1) grammars are difficult to write and debug, C output requires C toolchain for every consumer language | Poor ergonomics for modern tooling |
+| **Hand-written recursive descent** | Full control, can handle context-sensitive constructs like column-1 labels naturally | High maintenance cost, difficult for contributors, error messages require explicit effort | Acceptable only if grammar is small |
+
+**Recommendation: Tree-sitter (Lark phase skipped in practice)**
+
+The original recommendation was a two-phase path: Lark EBNF for the bootstrap, Tree-sitter for the long-term incremental/IDE-grade grammar. In practice the Lark phase was skipped. The grammar source-of-truth was extracted into `m-standard`'s schema-pinned `grammar-surface.json` (949 forms across the seven concept families), which gave enough structure that iterating directly in Tree-sitter's grammar DSL was viable from the start. `tools/build-grammar.js` in `tree-sitter-m` reads the grammar-surface and emits keyword tables, so grammar changes are driven by the spec-side data, not by hand-editing parser internals.
+
+Tree-sitter's incremental parsing model is the prerequisite for IDE integration (the most valuable long-term unlock), and its C-with-bindings architecture means a single grammar can serve Python tooling, Neovim plugins, and GitHub's Linguist. [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m) is the published grammar; bindings for Node / Rust / Python / Go are scaffolded and locally green; publishing to package registries is the remaining release work. The VS Code demonstration ([`tree-sitter-m-vscode`](https://github.com/rafael5/tree-sitter-m-vscode)) exercises the WASM build path end-to-end.
+
+**The single investment that unlocks everything.** With a working parse tree:
+- A formatter is a pretty-printer over the AST
+- A linter is a visitor over the AST with rule predicates
+- Documentation generation reads doc comments adjacent to label nodes
+- Dead code detection becomes a reachability problem on the call graph extracted from the AST
+- Symbol introspection is a label-index over all parsed files
+- IDE support (via Language Server Protocol) becomes a tree query problem
+
+No other single investment has this leverage ratio.
+
+**Project split (decided 2026-04-25, executed 2026-04-26):** This work lives outside `m-tools`. Three repos now exist:
+- **[`m-standard`](https://github.com/rafael5/m-standard)** — the spec layer. Reconciled grammar-surface JSON + per-concept TSVs derived from AnnoStd (ISO 11756), YottaDB docs, IRIS docs, and VA SAC/XINDEX. Schema-pinned (`schema_version="1"`). v1.0 tagged for the AnnoStd + YottaDB scope; v0.2 in progress for IRIS + SAC additions. Also home to the dependency-manifest format for `ydb-pkg` (TBD).
+- **[`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m)** — the implementation layer. Production tree-sitter grammar generated from `m-standard`'s grammar-surface; 99.06% clean on the full 39,330-routine VistA corpus; Node / Rust / Python / Go bindings scaffolded. The original plan called for a `m-grammar` repo containing both Lark (phase 1) and Tree-sitter (phase 2); in practice the schema-pinned grammar-surface let us go straight to tree-sitter, so `m-grammar` collapsed into `tree-sitter-m` as a single repo.
+- **[`tree-sitter-m-vscode`](https://github.com/rafael5/tree-sitter-m-vscode)** — sibling editor extension. Two-layer highlighting: TextMate grammar for cold-load + `DocumentSemanticTokensProvider` powered by `tree-sitter-m` compiled to WASM via `tree-sitter build --wasm --docker`. Demonstrates the editor-integration success criterion.
+
+The Tier 4 tools (`yfmt`, `ylint-deep`, `ydoc-html`, `ycov-line`, `ydeadcode`, etc.) become downstream consumers of `tree-sitter-m`'s bindings. `m-tools` remains a pure shell-tools + MUMPS-library workspace; the parser project does not get folded back in.
+
+---
+
+### A.2 — Technology Selection Matrix
+
+> **Updated 2026-04-26.** Where the original matrix said *Python + Lark/Tree-sitter*, the realised choice is *Python + `tree-sitter-m` Python binding*. The Lark phase from the original two-phase plan was skipped — see [A.1](#a1--the-foundation-problem-mumps-needs-a-parser).
+
+| Gap | Severity | Recommended Technology |
+|---|---|---|
+| Lint — style | 🔴 | Python + tree-sitter-m AST visitor |
+| Lint — logic | 🔴 | Python + tree-sitter-m + cfg-style analysis |
+| Lint — deep (data flow) | 🔴 | Python + networkx call graph over tree-sitter-m AST |
+| Auto-formatter | 🔴 | Python + tree-sitter-m CST printer (lossless mode) |
+| Run one suite | 🔴 | Bash + pytest-style test discovery in yeval |
+| Run one test | 🔴 | Bash + yeval argument parsing |
+| Coverage — line | 🔴 | Python source instrumentation (tree-sitter-m identifies executable lines) + `^ycov` global |
+| Coverage — branch | 🔴 | Python source instrumentation (tree-sitter-m branch-aware injection) |
+| Coverage report | 🔴 | Python + rich / LCOV-format output |
+| Debugger — scriptable | 🔴 | YDB ZBREAK hooks + expect/pexpect driver |
+| Dependency management | 🔴 | Python + TOML manifest + ydb-pkg installer script |
+| DB import / fixture load | 🔴 | Python ZWR processor + mupip load wrapper |
+| DB diff | 🔴 | Python ZWR parser + unified diff |
+| DB state snapshot | 🔴 | Python ZWR export wrapper |
+| DB reset / clean | 🔴 | Python ZWR fixture restore |
+| Test fixtures | 🔴 | Python ZWR fixture library + pytest-style fixture injection |
+| Interactive REPL | 🟡 | Python + prompt_toolkit wrapping mumps process |
+| Test watcher | 🟡 | Python + watchfiles + targeted yeval re-run |
+| Test output | 🟡 | Python TAP/JUnit XML formatter over yeval output |
+| Test history | 🟡 | Python SQLite store + trend reporter |
+| Benchmarking | 🟡 | Python source instrumentation + time.perf_counter_ns wrapper |
+| Profiling | 🟡 | Python source instrumentation + call-count aggregator |
+| Symbol introspection | 🟡 | Python + tree-sitter-m label index |
+| Documentation generation | 🟡 | Python + tree-sitter-m AST + Jinja2 HTML/Markdown output |
+| DB export | 🟡 | Python ZWR wrapper + mupip extract |
+| DB global sizing | 🟡 | Python ZWR parser + size aggregator |
+| Live log tail | 🟡 | Python + rich.live + tail -F wrapper |
+| Pre-commit hooks | 🟡 | pre-commit framework + ycheck as hook |
+| CI pipeline | 🟡 | GitHub Actions / Gitea Actions YAML |
+| Environment check | 🟡 | Python platform inspector script |
+| Scaffolding | 🟡 | Python + Jinja2 template engine |
+| Complexity metrics | 🟡 | Python + tree-sitter-m AST cyclomatic counter |
+| Dead code detection | 🟡 | Python + networkx reachability over call graph |
+| Snapshot testing | 🟡 | Python ZWR snapshot + diff assertion |
+| Parallel tests | 🟡 | Python + concurrent.futures + isolated DB regions |
+| Crash / lockup cleanup | 🟡 | Python + psutil + mupip rundown wrapper |
+
+---
+
+### A.3 — The Database Layer: ZWR Format as Universal Interface
+
+Before addressing individual database-layer gaps, it is worth recognizing that YottaDB has already solved the hardest part of database tooling: it provides a textual export format that is both complete and trivially parseable. That format is ZWR (Z-WRite format), produced by `mupip extract` and consumed by `mupip load`.
+
+**ZWR Format Description**
+
+A ZWR file is a sequence of newline-terminated records. Each record is one of:
+
+```
+^global(sub1,sub2)="value"
+^global(sub1,sub2,sub3)=$$VALUE$$hexencoded$$
+%local="value"
+```
+
+Header lines begin with `;` and are comments. The format is:
+- One node per line, always
+- Global names begin with `^`, local names with `%` or alphanumeric
+- Subscripts are comma-separated inside parentheses, string subscripts are double-quoted
+- Values are either quoted strings (with `""` escaping) or `$$VALUE$$` hex-encoded binary blocks
+- Subscript ordering matches M canonical ordering (numeric-before-string, lexicographic within strings)
+
+This is, structurally, a sorted key-value dump with explicit hierarchy visible in the subscripts. Every line is self-contained.
+
+**Why This Is a Gift for Tooling**
+
+Most databases require either a binary dump format (requiring vendor tools to inspect) or a complex multi-table SQL dump (requiring schema knowledge to interpret). ZWR is neither. A Python script can process a multi-gigabyte ZWR file with a single-pass line iterator — no binary parsing, no schema introspection, no vendor library. Each line can be parsed with a small state machine that splits on the first `=` not inside quotes, then parses the subscript list.
+
+This single property enables a Python ZWR processing library that becomes the foundation for all database-layer tooling:
+
+- **DB diff**: Extract two snapshots, sort both, feed to `difflib.unified_diff`
+- **DB export**: Wrap `mupip extract`, optionally filter by global prefix
+- **DB import / fixture load**: Validate ZWR then call `mupip load`
+- **DB state snapshot**: Timestamped `mupip extract` to a versioned directory
+- **DB reset / clean**: `mupip load` from a known-good ZWR fixture
+- **Test fixtures**: Curated ZWR files, one per test scenario, loaded before each test
+- **Snapshot testing**: Capture ZWR after a test run, compare with committed baseline
+- **Global sizing**: Parse ZWR, accumulate byte counts per top-level global name
+
+**Recommended Python ZWR Library**
+
+The library should be a single module, `yzwr.py`, with the following API surface:
+
+- `parse_line(line: str) -> ZWRNode` — parses one ZWR record into a typed object
+- `load_file(path: Path) -> Iterator[ZWRNode]` — streaming parser, handles arbitrarily large files
+- `dump_nodes(nodes: Iterable[ZWRNode], path: Path)` — writes ZWR file
+- `diff(a: Path, b: Path) -> str` — unified diff of two ZWR files
+- `filter_prefix(nodes: Iterable[ZWRNode], prefix: str) -> Iterator[ZWRNode]` — subset by global name
+
+The `ZWRNode` dataclass holds: `name: str`, `subscripts: list[str | int | float]`, `value: str`, `is_global: bool`, `raw: str`.
+
+This library is ~200 lines of Python and unlocks six major gaps and three moderate gaps simultaneously.
+
+---
+
+### A.4 — The Instrumentation Layer: Observability Without a Profiler
+
+Several gaps — coverage (line and branch), profiling, and benchmarking — require the ability to observe what code ran and how often. YottaDB provides no built-in profiler and no coverage instrumentation. However, MUMPS is a text-based language that can be pre-processed before execution, which makes source-level instrumentation practical and portable.
+
+**Source Instrumentation vs Runtime Instrumentation**
+
+Runtime instrumentation in YottaDB means inserting `ZBREAK` commands, which attach actions to specific labels or offsets. `ZBREAK label+offset^routine:"action"` runs `action` (an M expression) when execution reaches that point. This is powerful for interactive debugging but has serious limitations for automated tooling:
+
+- ZBREAK actions are set programmatically in a running YDB session; there is no way to inject them from outside
+- ZBREAK does not survive process restarts
+- ZBREAK on every line of every routine has unmeasured overhead and adds fragility
+- ZBREAK cannot easily instrument branch-level decisions
+
+Source instrumentation is the alternative: a Python preprocessor reads each `.m` file, injects counter-increment statements into the source, writes modified `.m` files to a temporary directory, and runs the test suite against the modified source. After the suite completes, the counters (stored in a YDB global) are read back and converted into a coverage or profile report.
+
+**The Instrumentation Pattern**
+
+The preprocessor identifies instrumentation points by walking the parsed AST (using `tree-sitter-m` from A.1) or, as a simpler bootstrap, by line-level heuristics: every line that begins with a label, and every continuation line after a command that branches (`IF`, `FOR`, `DO`). At each point it injects:
+
+```mumps
+ $INCREMENT(^ycov(routineName,labelName,lineOffset))
+```
+
+`^ycov` is the instrumentation global. After the test run, reading `^ycov` with `$ORDER` loops gives the full execution frequency table. A Python script reads this table via `ydb_get` or by `mupip extract`-ing `^ycov` and parsing the ZWR output.
+
+**Why Source Instrumentation is Practical**
+
+- The modified `.m` source is valid MUMPS and runs in any standard YDB environment without configuration changes
+- The instrumentation global survives across process boundaries, accumulating across all test processes in a parallel run
+- The overhead is one integer increment per instrumented line — negligible for correctness testing, acceptable for coverage, and well-characterized for profiling
+- Resetting instrumentation is a single `KILL ^ycov` before the test run
+- The same mechanism serves three different consumers: coverage reporter, profiler, and benchmarking harness, differing only in which metrics they extract from `^ycov`
+
+**Coverage vs Profiling Distinction**
+
+Coverage asks: which lines were executed at all (binary)? Profiling asks: how many times was each line executed and in what proportion (quantitative)? The `^ycov` counter serves both: coverage is `$DATA(^ycov(r,l,o)) > 0`, profiling is the raw counter value. Branch coverage requires instrumenting both sides of conditional branches: inject a counter before the conditional and a counter at the first line of each branch body, then check that both were reached.
+
+---
+
+### A.5 — Per-Gap Remediation (Major Gaps 🔴)
+
+---
+
+#### Gap 1 — Lint: Style 🔴
+
+**Domain Analysis**
+
+Style linting is the enforcement of formatting and naming conventions at the syntactic level: indentation consistency, label naming conventions, command capitalization (MUMPS commands are case-insensitive; a codebase that mixes `SET` and `set` and `Set` is harder to read), line length, spacing around operators. Solving this requires recognizing syntactic structures — a regex over raw text cannot reliably distinguish a command keyword from a string value that happens to contain the same characters.
+
+**Language/Technology Candidates**
+
+| Option | Strengths | Weaknesses | Maturity |
+|---|---|---|---|
+| Python + `tree-sitter-m` AST visitor | Pythonic, easy rule addition, integrates with existing ycheck; **parser already exists** (99.06% on VistA corpus) | Requires the published Python binding (or a local install of `tree-sitter-m`) | High (tree-sitter Python bindings are mature) |
+| Python + regex heuristics | No parser prerequisite, deployable now | Fragile at edge cases, false positives on strings containing keywords | Medium — superseded now that the parser exists |
+| Rust + `tree-sitter-m` Rust binding | Fast, single binary; same grammar source-of-truth | Higher contributor barrier than Python | Medium |
+| Go + `tree-sitter-m` Go binding | Single binary distribution | Less ergonomic AST walking than Python | Medium |
+| JavaScript/Node + `tree-sitter-m` Node binding | Same grammar; trivial integration in editor extensions | Unexpected runtime dependency for a CLI MUMPS tool | Medium |
+
+**Recommended Approach**
+
+Python 3.11+ with the [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m) Python binding as the parse layer (the parser is already complete — 99.06% clean on the full 39,330-routine VistA corpus, 78.6 ms for a 10k-line synthesised routine). The rule engine is a visitor over the tree-sitter `Node` API: walk the tree with `tree.walk()` (or `node.children` recursion), match on `node.type`, and accumulate violations. Style rules should be configurable via a `ylint.toml` file (using Python's `tomllib` standard library). Output should use `rich` for terminal color and support `--format=json` for CI integration.
+
+For per-keyword rules (e.g. command-keyword capitalization), use the metadata join exposed by `tree-sitter-m`'s `lib/stamp.js` (`canonical_name`, `matched_form`, `standard_status`) to drive checks against the canonical form, not by hand-listing keywords.
+
+The rule set for style linting: command capitalization enforcement, consistent label casing (all-uppercase or CamelCase, not mixed), maximum line length (configurable, default 120), trailing whitespace, spacing after command keywords, dot-block indentation consistency.
+
+**Precedent / Inspiration**
+
+This mirrors `pylint`'s convention rules (C-prefixed messages) or ESLint's stylistic rules. The closest analogue is `rustfmt --check` in Rust: it does not modify files but exits non-zero if the file would be reformatted, enabling CI enforcement without auto-modification.
+
+**Implementation Sketch**
+
+1. Parse each `.m` file into a tree-sitter parse tree via the `tree-sitter-m` Python binding
+2. Walk the tree with a `StyleVisitor` that checks each rule against relevant node types
+3. Accumulate violations as `(file, line, col, rule_id, message)` tuples
+4. Format and output; exit non-zero if any violations found
+5. Suppression comments: `; ylint:disable=<rule_id>` on a line suppresses that rule for that line
+
+**Integration Path**
+
+`ycheck --style` calls the style linter on all `.m` files in the project. `make check` includes style checking. Pre-commit hook runs `ycheck --style --format=json` and fails the commit on violations.
+
+---
+
+#### Gap 2 — Lint: Logic 🔴
+
+**Domain Analysis**
+
+Logic linting detects semantic errors that are syntactically valid: variables SET but never READ, labels defined but never called, `QUIT` missing from a function that should return a value, unreachable code after unconditional `QUIT` or `GOTO`. These checks require understanding control flow, not just syntax. At minimum, a control-flow graph (CFG) per routine is needed. Full data-flow analysis (which variables are live at which points) requires a more sophisticated analysis — a def-use chain over the CFG.
+
+**Language/Technology Candidates**
+
+| Option | Strengths | Weaknesses | Maturity |
+|---|---|---|---|
+| Python + networkx CFG | networkx is mature, well-documented, pure Python | networkx graphs can be slow for very large routines | High |
+| Python + custom CFG | Lightweight, no external dependency | More code to maintain | Medium |
+| Clang-Tidy style (C++ AST) | Very mature model | Wrong language entirely for this project | N/A |
+| Semgrep patterns | Easy rule authorship in YAML | Semgrep's MUMPS support is nonexistent | Low |
+| Datalog (Soufflé) | Used by real static analyzers (CodeQL) | Extreme complexity overhead for a hobbyist tool | Low |
+
+**Recommended Approach**
+
+Python 3.11+ with `tree-sitter-m`'s parse tree converted to a CFG using `networkx.DiGraph`. Each routine becomes a directed graph where nodes are basic blocks (sequences of statements with no branches) and edges represent control transfers (`IF`, `FOR`, `DO`, `GOTO`, `QUIT`). Logic rules are then graph queries:
+
+- **Unused variables**: SET nodes whose variable never appears as a read in any downstream node (def-use chain)
+- **Unreachable code**: nodes with no incoming edges after the entry node
+- **Missing QUIT**: routines that have a call-return usage pattern but lack a `QUIT expr` on all exit paths
+- **Naked reference usage**: flag any `^(` as a warning unless it is inside a routine that explicitly sets the last-used global
+
+**Precedent / Inspiration**
+
+This is the equivalent of `go vet` in the Go ecosystem: a lightweight correctness checker that ships with the language toolchain and catches common mistakes without requiring full type inference. Go's `go vet` includes `unreachable`, `unusedresult`, and `lostcancel` checks, all implemented as AST/SSA passes over Go's own compiler IR.
+
+**Implementation Sketch**
+
+1. Parse `.m` files into AST (shared with style linter)
+2. For each routine, build a CFG: label nodes as basic block headers, add edges for each branch target
+3. Run a reachability analysis from the entry label to mark live blocks
+4. Run a def-use pass: collect all `SET var` as definitions, all `$var` reads as uses; report variables defined but never used (excluding intentional dummy variables named `%` or `_`)
+5. Check `QUIT` coverage: for routines invoked with `$$` (extrinsic function syntax), verify all paths reach `QUIT expr`
+
+**Integration Path**
+
+`ycheck --logic` runs logic linting. Violations include severity levels: ERROR for unreachable code and missing returns, WARN for unused variables. `make check` gates on zero ERRORs.
+
+---
+
+#### Gap 3 — Lint: Deep (Data Flow) 🔴
+
+**Domain Analysis**
+
+Deep linting goes beyond control flow to data flow: tracking where data values originate, how they transform, and where they are consumed. In MUMPS, this is particularly valuable for global variable data flow — tracking which routines write to which globals, which routines read from them, and whether there are routines that read a global path that is never written anywhere in the codebase (a potential runtime error source). This is a whole-program analysis, not per-routine.
+
+**Language/Technology Candidates**
+
+| Option | Strengths | Weaknesses | Maturity |
+|---|---|---|---|
+| Python + networkx (whole-program call/data graph) | Reuses CFG infrastructure from Gap 2 | Interprocedural analysis is significantly more complex | High (networkx), Low (MUMPS impl) |
+| Soufflé Datalog | Purpose-built for program analysis, declarative rules | High operational complexity, separate toolchain | Medium |
+| Python + custom taint analysis | Taint tracking is a well-understood pattern | Requires full inter-procedural call graph | Medium |
+| Joern (code property graphs) | Used in security research | Java-based, no MUMPS support, heavyweight | Low |
+| LLVM-based IR | Maximum power | Requires MUMPS→LLVM IR frontend, enormous effort | Research-grade |
+
+**Recommended Approach**
+
+Python 3.11+ with `networkx` for both the call graph and data flow graph. The analysis is structured as three passes:
+
+1. **Call graph extraction**: Build a directed graph of `routine A calls routine B` by identifying `DO label^routine` and `$$label^routine()` patterns in the AST
+2. **Global access extraction**: For each routine, extract the set of `{global_name: access_type}` where `access_type` is READ, WRITE, KILL, or LOCK
+3. **Flow query layer**: The `ycheck --dataflow` command answers specific queries: "which routines write to `^patients`?", "which globals are read but never written?", "is `^tempwork` killed after every use path?"
+
+The global access extraction is the novel part. It requires resolving extended indirection when possible (`@var` where `var` is a locally-set constant) and flagging as UNKNOWN when not.
+
+**Precedent / Inspiration**
+
+This mirrors TypeScript's whole-program type inference or Rust's borrow checker in intent (whole-program reasoning), but in implementation it is closer to a call graph analysis tool like `pycallgraph` or the inter-procedural analysis in `pyright`. The closest direct analogue is the `cargo-geiger` tool in Rust, which performs whole-crate analysis of unsafe usage patterns.
+
+**Implementation Sketch**
+
+1. Build call graph for entire project using AST from all `.m` files
+2. Annotate each node with its global access set
+3. Propagate annotations up the call graph (a routine that calls a routine that writes `^X` transitively writes `^X`)
+4. Report: globals read but never written anywhere; globals written but never read anywhere; globals that cross routine boundaries without any documented interface
+
+**Integration Path**
+
+`ycheck --dataflow` for interactive use; `ycheck --dataflow --format=dot | dot -Tsvg > callgraph.svg` for visualization. Not included in `make check` by default (too slow for routine CI) but available as `make analyze`.
+
+---
+
+#### Gap 4 — Auto-Formatter 🔴
+
+**Domain Analysis**
+
+An auto-formatter transforms source code into a canonical style without changing its semantics. For MUMPS, the stakes are higher than for most languages: because `IF`/`ELSE` interact through `$TEST`, and because dot-block nesting is semantically significant whitespace, a formatter that makes incorrect assumptions will silently break code. This is a correctness-critical tool that must be grounded in a complete semantic model of MUMPS control flow, not just syntactic pretty-printing.
+
+**Language/Technology Candidates**
+
+| Option | Strengths | Weaknesses | Maturity |
+|---|---|---|---|
+| Python + `tree-sitter-m` (Python binding) | `node.text` over the byte-range tree gives lossless source reconstruction; **parser already exists** | Comment preservation requires explicit handling of trivia between named nodes | High |
+| Python + line-level transformer | Simpler, no full parser required | Cannot reformat multi-line constructs | Low — superseded |
+| Rust + `tree-sitter-m` (Rust binding) | Tree-sitter's `node.text()` enables lossless formatting; single binary | Higher contributor barrier than Python | Medium |
+| Go + `tree-sitter-m` (Go binding) | Single-binary distribution | Less ergonomic AST walking | Medium |
+| Haskell + Prettify/Pandoc-style | Algebraic pretty-printing is elegant | Extreme contributor barrier | Research |
+
+**Recommended Approach**
+
+Python 3.11+ with the [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m) Python binding. Tree-sitter's parse tree is naturally a CST: every byte of the input is covered by some node, and `node.text` (or `source[node.start_byte:node.end_byte]`) lets the formatter reconstruct trivia (comments, whitespace) that lives between named children. The key insight from tools like `rustfmt` and `prettier` is that a formatter must preserve comments and whitespace that carry meaning — tree-sitter's lossless byte-range model is exactly that property.
+
+The formatter outputs canonical MUMPS: uppercase command keywords, single space after command keyword, postconditionals attached with no space (`SET:cond`), dot-blocks indented with exactly one dot plus one space per level, blank lines between top-level routines, and label names normalized to the project-configured convention.
+
+**Precedent / Inspiration**
+
+`rustfmt` in the Rust ecosystem works by parsing source to a full syntax tree (including trivia: comments and whitespace), reformatting using a width-aware layout algorithm (based on Philip Wadler's "A Prettier Printer"), and then emitting the formatted tree. The key lesson from `rustfmt` is that formatting must be idempotent (formatting a formatted file is a no-op) and that the formatter must be integrated with the parser — standalone regex-based formatters are always broken in edge cases.
+
+**Implementation Sketch**
+
+1. Parse `.m` file into a tree-sitter parse tree via the `tree-sitter-m` Python binding; keep the original `source: bytes` alongside the tree so trivia spans can be reconstructed via byte ranges
+2. Walk the tree with a `Formatter` class that maintains indent level and line width
+3. For each node type, emit the canonical form: commands as uppercase, spaces canonically placed
+4. Preserve comment nodes in their original relative positions (before or after the statement they annotate)
+5. Detect `IF`/`ELSE` pairs and emit them with a warning if the formatter cannot statically verify semantic equivalence (to avoid the `$TEST` trap)
+6. Write output to stdout; use `--in-place` flag to overwrite; use `--check` for CI enforcement
+
+**Integration Path**
+
+`yfmt file.m` for single file. `yfmt --check` in `make check`. Pre-commit hook runs `yfmt --check` and fails if formatting differs. `make fmt` runs `yfmt --in-place` on all `.m` files.
+
+---
+
+> **Note on coverage:** This Addendum keeps per-gap entries only for the **unbuilt** tools — the strategic forward-looking content. Per-gap entries for Gaps 5, 6, 12–16 (Major) and 18, 19, 21, 23–31, 34, 36 (Moderate) covered tools that have since shipped; their as-built specifications live in [implementation.md §3](implementation.md#3-as-built-tool-specifications) and the per-tool deltas in [implementation.md §5.1](implementation.md#51-per-tool-delta-vs-original-spec). The remaining unbuilt-tool entries continue below: Gaps 7, 8, 10, 11 (Major) and Gaps 17, 20, 22, 32, 33, 35 (Moderate).
+
+---
+
+#### Gap 7 — Coverage: Line 🔴
+
+**Domain Analysis**
+
+Line coverage measures which source lines were executed during the test suite. For MUMPS, there is no built-in coverage facility. The solution is source instrumentation as described in A.4: inject counter increments before each executable line, run the suite, read the counters.
+
+**Recommended Approach**
+
+Python 3.11+ source instrumentor (`ycov.py`) that uses the [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m) Python binding to identify every executable line (every line with a command, not blank lines or comment-only lines), injects `$INCREMENT(^ycov("line",routine,linenum))` as the first statement on that line, writes instrumented `.m` files to a temporary directory (preserving the directory structure), runs `yeval` against the instrumented source, reads back `^ycov` via `mupip extract` into a ZWR file, and computes line coverage as `executed_lines / total_lines * 100`.
+
+The parser is used to distinguish executable lines from label-only lines and comment-only lines, which must not be counted in the denominator. `tree-sitter-m`'s `line` / `label` / `comment` node types make this a direct query.
+
+**Precedent / Inspiration**
+
+`coverage.py` in Python uses the same source instrumentation approach. `gcov` in C/C++ instruments at the compiler IR level. The simpler `istanbul` (JavaScript) instruments source text, not AST, which produces false results for multi-statement lines — a lesson to take from.
+
+**Implementation Sketch**
+
+1. For each `.m` file, parse with `tree-sitter-m` to get line-type annotations (command line, label, comment, blank)
+2. Inject `$INCREMENT(^ycov("L",routineName,lineNum))` as first statement on each command line
+3. Write instrumented files to `/tmp/ycov_src/`
+4. Set `ydb_routines` to point to `/tmp/ycov_src/` prepended to the normal path
+5. Run `yeval`
+6. `mupip extract /tmp/ycov.zwr` to get `^ycov` data
+7. Parse ZWR, compute coverage per routine and project-wide
+8. Output: per-routine table with line counts and coverage %; highlight uncovered lines
+
+**Integration Path**
+
+Replaces today's `ycover` (label-entry approximation). `make coverage` runs `ycov --report=term`. `make coverage-html` produces HTML report. CI posts coverage percentage as a check status.
+
+---
+
+#### Gap 8 — Coverage: Branch 🔴
+
+**Domain Analysis**
+
+Branch coverage measures whether both sides of every conditional were executed. In MUMPS, the relevant branching constructs are `IF condition` (true branch executed / false branch — fall through to `ELSE` or next command), postconditionals `CMD:condition` (command executed / skipped), and `FOR` loop entry/exit. Branch coverage requires more precise instrumentation than line coverage.
+
+**Recommended Approach**
+
+Extend `ycov.py` with branch instrumentation mode. For each `IF condition`, inject two counters: `^ycov("BT",routine,linenum)` (branch true — increment before the if-body) and `^ycov("BF",routine,linenum)` (branch false — inject before the `ELSE` body or, if no `ELSE`, inject via a `$INCREMENT(^ycov("BF",...))` into a synthetic `ELSE` block appended after the `IF`).
+
+For postconditionals `CMD:cond`, the instrumentation is: inject `^ycov("PT",...)` before the command (executed only if condition is true) and `^ycov("PF",...)` as `DO:'cond $INCREMENT(^ycov("PF",...))` before the command.
+
+**Implementation Sketch**
+
+1. Extend the `tree-sitter-m` AST visitor to identify branch points (IF, ELSE, postconditionals, FOR bodies)
+2. For each branch point, inject a pair of counters (true-side and false-side)
+3. After test run, compute branch coverage: for each branch point, check if both sides have non-zero counts
+4. Report as: `branch coverage: 45/60 branches covered (75%)`
+5. Combine with line coverage report in the same output
+
+**Integration Path**
+
+`ycov --branch` adds branch coverage to the standard coverage report. `make coverage` runs both by default. The LCOV output can be fed to `genhtml` for HTML reporting.
+
+---
+
+#### Gap 10 — Debugger: Scriptable 🔴
+
+**Domain Analysis**
+
+YottaDB has an interactive debugger built into the environment: ZBREAK sets breakpoints, ZSHOW shows state, and the direct-mode prompt allows inspection. The gap is that none of this is scriptable — there is no programmatic API for setting breakpoints, stepping, and inspecting state from an external process. A scriptable debugger enables automated failure analysis: when a test fails, automatically attach a debugging session to replay the failure and capture the state at the point of error.
+
+**Recommended Approach**
+
+Python 3.11+ with `pexpect` as the immediate solution, targeting the DAP (Debug Adapter Protocol) as a long-term goal. The `pexpect` driver wraps a `yottadb` direct-mode session: it sends ZBREAK commands to set breakpoints, sends `DO label^routine` to start execution, waits for the breakpoint prompt, and then sends ZSHOW commands to inspect state. The output is captured and parsed by Python.
+
+A `ydebug.py` script provides a scriptable interface: `ydebug.set_break("label", "routine")`, `ydebug.run("label", "routine")`, `ydebug.get_local("varname")`, `ydebug.get_global("^globalname", subscripts)`.
+
+**Precedent / Inspiration**
+
+`gdb` with Python scripting is the gold standard for scriptable debugging. `lldb` has an equivalent Python API. The long-term DAP implementation mirrors how `debugpy` (Python's VS Code debugger) works: a standalone process implementing the Debug Adapter Protocol.
+
+**Implementation Sketch**
+
+1. `pexpect.spawn('yottadb -direct')` starts a YDB direct-mode session
+2. Send `ZBREAK label^routine` for each requested breakpoint
+3. Send `DO entry^routine` to start execution
+4. Wait for `%YDB-I-BPNTSET` and breakpoint hit prompts
+5. At each breakpoint, send `ZSHOW "V"` to capture local variable state; parse the output
+6. Provide `step()`, `continue_()`, `inspect_local()`, `inspect_global()` methods
+
+**Integration Path**
+
+`ydebug --script=replay_failure.py` runs a debugging script against a failing test. Future: `ytest --debug YTAUTH:TESTLOGIN` automatically attaches debugger on test failure and dumps variable state.
+
+---
+
+#### Gap 11 — Dependency Management 🔴
+
+**Domain Analysis**
+
+MUMPS has no package manager. Sharing code between projects requires manually copying `.m` files, with no version pinning, conflict resolution, or transitive dependency tracking. A dependency management system for MUMPS needs: a manifest format for declaring dependencies, a registry or source location for packages, a resolver that satisfies version constraints, and an installer that places `.m` files in the correct location in `ydb_routines`.
+
+**Recommended Approach**
+
+Python 3.11+ with a `ydb-pkg` (future `m pkg`) tool using TOML manifests. The manifest format (`ydb.toml`) specifies dependencies as git repository URLs with version tags or commit hashes — exactly the Go modules model, which deliberately avoids central registry lock-in. The `ydb-pkg install` command reads `ydb.toml`, clones or fetches each dependency, checks out the specified version, and copies the `.m` files to a local `vendor/` directory. The `ydb_routines` environment variable is updated to include `vendor/`.
+
+A lockfile (`ydb.lock`) records the exact commit hash of each dependency at install time, enabling reproducible installs. The manifest format itself should be specified in `m-standard` (it is M-language metadata, not YDB-specific), and the installer can grow YDB-specific and IRIS-specific backends later.
+
+**Precedent / Inspiration**
+
+Go's module system (`go.mod` + `go.sum`) is the closest analogue: dependencies declared as URL + version, with a lockfile for reproducibility, no central registry required. `cargo` in Rust adds a central registry on top of the same model. For M, starting with the Go model is appropriate because the ecosystem is too small to justify a registry.
+
+**Implementation Sketch**
+
+1. `ydb.toml` format: `[dependencies]`, key = package name, value = `"git+https://github.com/user/repo@v1.2.3"`
+2. `ydb-pkg install` reads manifest, resolves versions, clones to `vendor/<name>/`, copies `.m` files
+3. `ydb-pkg update <name>` fetches latest tag matching the version constraint
+4. `ydb-pkg lock` generates `ydb.lock` with exact commit hashes
+5. Environment setup: `eval $(ydb-pkg env)` adds `vendor/` to `ydb_routines`
+
+**Integration Path**
+
+`make install` runs `ydb-pkg install`. `ydb_routines` in `.envrc` (direnv) includes the vendor path. CI runs `ydb-pkg install --frozen` (respects lockfile, fails if lockfile is stale).
+
+---
+
+### A.6 — Per-Gap Remediation (Moderate Gaps 🟡)
+
+Per-gap entries for the still-unbuilt Moderate gaps. (See note at the start of A.5: shipped-tool entries are in [implementation.md](implementation.md).)
+
+---
+
+#### Gap 17 — Interactive REPL 🟡
+
+**Domain Analysis**
+
+The YottaDB direct-mode prompt is an interactive REPL, but it lacks the features that make REPLs productive: command history with search, tab completion for global names and labels, multi-line expression editing with proper continuation, and syntax highlighting. The gap is not the absence of a REPL but the absence of a good REPL.
+
+**Recommended Approach**
+
+Two phases. **Immediate**: `rlwrap yottadb -direct` already adds history and basic readline editing with zero new code. Document this and add `alias ymrepl='rlwrap yottadb -direct'` to environment setup. **Proper**: Python 3.11+ with `prompt_toolkit` wrapping a `pexpect`-driven YDB process. `prompt_toolkit` provides: history with `~/.ydb_history`, tab completion (complete global names from `$ORDER`, label names from the symbol index built by `tree-sitter-m`), multi-line input (detect incomplete expressions by counting unclosed parentheses and DO blocks), and syntax highlighting using a Pygments lexer for MUMPS.
+
+**Precedent / Inspiration**
+
+`pgcli` (PostgreSQL CLI replacement) and `mycli` (MySQL) both use `prompt_toolkit` to provide schema-aware completion, syntax highlighting, and history for database CLIs. `bpython` uses the same approach for Python. `ipython` is the reference implementation.
+
+**Integration Path**
+
+`yrepl` command launches the enhanced REPL. Falls back to `rlwrap yottadb -direct` if Python dependencies are not installed.
+
+---
+
+#### Gap 20 — Test History 🟡
+
+**Domain Analysis**
+
+Test history — tracking pass/fail/skip counts and timing trends across runs — enables identifying flaky tests, tracking coverage regression, and demonstrating progress over time. This requires persistent storage of test results and a reporting interface.
+
+**Recommended Approach**
+
+Python 3.11+ with SQLite (via the standard library `sqlite3` module). The `yhistory.py` tool maintains a `data/ydb/test_history.db` SQLite database. Each test run inserts one row per test: `(run_id, timestamp, routine, label, result, duration_ms, output)`. The `yhistory report` command shows trends: flaky tests (pass/fail in the last 10 runs), slowest tests, recent regressions.
+
+**Implementation Sketch**
+
+1. `ytest` appends results to `yhistory.db` after each run (TAP parse → SQLite insert)
+2. Schema: `runs(id, timestamp, suite, duration)`, `results(run_id, test_id, result, duration_ms)`
+3. `yhistory trend --last=30` shows a sparkline (using `rich.sparkline`) of pass rate over time
+4. `yhistory flaky` lists tests with >10% failure rate in the last 50 runs
+5. `yhistory compare run1 run2` shows which tests changed result between runs
+
+**Integration Path**
+
+History is written automatically by `ytest`. `make history` shows trend report. CI publishes history stats as job summaries.
+
+---
+
+#### Gap 22 — Profiling 🟡
+
+**Domain Analysis**
+
+Profiling identifies which routines and labels consume the most execution time or are called most frequently. Unlike benchmarking (which measures a specific operation), profiling measures the whole system under realistic load to find bottlenecks. As described in A.4, source instrumentation is the practical approach for MUMPS profiling.
+
+**Recommended Approach**
+
+Extend `ycov.py` (Gap 7) with a profiling mode that captures both call counts (via the `^ycov` instrumentation global) and wall-clock time (via `$ZHOROLOG` at label entry/exit). The profiler report shows: top 20 labels by call count, top 20 labels by total time, top 20 labels by time-per-call. This is a flat profile (not a call graph profile), which is appropriate for MUMPS given the complexity of building a full call graph profile.
+
+For a more sophisticated sampling profiler (without source modification), the Python debugger driver from Gap 10 (`pexpect`-based) can periodically send ZSHOW commands to a running YDB process to capture the current execution label — this is the statistical sampling approach, equivalent to how `perf` and `py-spy` work.
+
+**Implementation Sketch**
+
+1. Instrumentation mode: inject `$INCREMENT(^yprof(label,routine))` at each label entry; also capture `$ZHOROLOG` value
+2. Run routine under test N times
+3. Read `^yprof` via ZWR export
+4. Report: sorted by call count, then by total time
+5. Sampling mode (alternative): `pexpect` driver sends `ZSHOW "S"` (stack trace) every 10ms; aggregate stack frames
+
+**Integration Path**
+
+`yprof --routine YTPERF^TESTQUERY` runs instrumented profiling. `yprof --sample --attach <pid>` runs sampling profiler against a running YDB process. `make profile` runs instrumented profiling on the benchmark suite.
+
+---
+
+#### Gap 32 — Complexity Metrics 🟡
+
+**Domain Analysis**
+
+Complexity metrics quantify the structural complexity of code: cyclomatic complexity (number of linearly independent paths through a routine), cognitive complexity (how difficult a routine is to understand), nesting depth (maximum dot-block nesting). High-complexity routines are candidates for refactoring. Computing these metrics requires parsing the control flow structure.
+
+**Recommended Approach**
+
+Python 3.11+ complexity visitor over the `tree-sitter-m` AST. Cyclomatic complexity (CC) for a MUMPS routine is: 1 + (number of `IF` statements) + (number of postconditionals) + (number of `FOR` loops) + (number of `DO:condition` calls). This is the standard McCabe formula applied to MUMPS control flow constructs. Cognitive complexity adds weighting for nesting depth.
+
+Output: per-routine table sorted by complexity, with configurable thresholds (warn at CC > 10, error at CC > 20). Export as JSON for trend tracking.
+
+**Precedent / Inspiration**
+
+`radon` in Python computes McCabe complexity for Python code. `lizard` is a polyglot complexity analyzer. The key insight from `radon`: complexity should be tracked over time (regression detection) and integrated into code review.
+
+**Implementation Sketch**
+
+1. `tree-sitter-m` AST visitor `ComplexityVisitor` counts: `IF` nodes (+1 each), postconditional attributes (+1 each), `FOR` nodes (+1 each), `DO:cond` (+1 each), logical operators in conditions (`&`, `!` in M = `AND`/`OR`) (+1 each)
+2. Track maximum nesting depth (dot-block depth)
+3. `ycomplex` command reports per-routine CC, max nesting, total LOC
+4. `--max-cc=15` fails if any routine exceeds threshold
+5. History stored in SQLite for trend analysis
+
+**Integration Path**
+
+`ycomplex` command. `make complexity` runs and reports. `--max-cc` flag integrated into `ycheck`. Complexity trends shown alongside test history.
+
+---
+
+#### Gap 33 — Dead Code Detection 🟡
+
+**Domain Analysis**
+
+Dead code detection identifies code that can never be executed: labels that are defined but never called from anywhere in the codebase (unreachable labels), globals that are written but never read, and code after unconditional `QUIT` statements. This is a whole-program reachability analysis.
+
+**Recommended Approach**
+
+Python 3.11+ with `networkx` call graph analysis. The call graph is built by the data flow tool (Gap 3); dead code detection adds a reachability query: starting from known entry points (labels that are called from test entry points, from external interfaces, or marked with a `;;@export` doc annotation), traverse the call graph and collect all reachable labels. Any label not in the reachable set is potentially dead.
+
+The "potentially" qualifier is important: MUMPS uses dynamic dispatch extensively (string-valued routine and label names, `DO @labelvar^@routinevar`), so static reachability analysis will have false positives. The tool should flag dynamic dispatch sites and annotate potentially-dead-but-dynamically-called labels separately.
+
+**Precedent / Inspiration**
+
+`cargo unused-features` and `clippy::dead_code` in Rust; `pylint`'s unused-import warnings in Python; `knip` in TypeScript. The key design lesson: dead code detection must have a clear notion of "entry points" (exported symbols), and everything reachable from entry points is live.
+
+**Implementation Sketch**
+
+1. Parse all `.m` files into AST; build call graph (from Gap 3)
+2. Identify entry points: labels called from `yeval` test entry points, labels marked `;;@export`, labels in known framework hook positions
+3. BFS/DFS from all entry points; mark all reachable labels as LIVE
+4. Report all labels not marked LIVE as potentially DEAD
+5. Flag any call sites that use string-valued routine/label names as DYNAMIC (cannot trace statically)
+
+**Integration Path**
+
+`ydead` command. `make dead-code` for project-wide analysis. Not in default `make check` (too many false positives from dynamic dispatch); run as `make analyze`.
+
+---
+
+#### Gap 35 — Parallel Tests 🟡
+
+**Domain Analysis**
+
+Running tests in parallel reduces total test suite time. For MUMPS/YottaDB, parallelism is constrained by database isolation: two parallel tests that write to the same global will interfere. The solution is either global namespace partitioning (each parallel worker uses a unique global prefix) or separate database files (each worker gets its own `ydb_gbldir` pointing to an isolated database).
+
+**Recommended Approach**
+
+Python 3.11+ with `concurrent.futures.ProcessPoolExecutor`, using the **separate database files** approach for isolation: each worker process gets a unique `ydb_gbldir` environment variable pointing to a copy of the baseline database in a temporary directory. Workers are pre-provisioned (one database copy per worker), and tests are distributed to workers in a queue. Results are collected and merged by the main process.
+
+The worker count defaults to `min(cpu_count(), suite_count)`. For test suites that must share state (integration tests), they are pinned to a single worker using a `;;@serial` annotation.
+
+**Precedent / Inspiration**
+
+`pytest-xdist` in Python is the model: each worker gets an isolated environment, tests are distributed via a work queue, results are streamed back. The database-per-worker approach mirrors how Rails' `parallel_tests` gem creates a separate test database per worker.
+
+**Implementation Sketch**
+
+1. `ytest --parallel=4` provisions 4 worker environments: `cp -a $ydb_gbldir /tmp/ywrk_{0..3}/`
+2. Distributes test list to a shared `multiprocessing.Queue`
+3. Each worker pops tests from the queue, runs with its own `ydb_gbldir`, pushes results to results queue
+4. Main process aggregates results, reports progress with `rich.progress`
+5. On test failure, worker saves its database state for post-mortem debugging
+6. Cleanup: removes worker database copies
+
+**Integration Path**
+
+`make test-parallel` runs with `--parallel=$(nproc)`. CI uses `--parallel=4`. Sequential mode (`make test`) is used for debugging.
+
+---
+
+*End of Addendum A*
+
+---
+
+## Addendum B: Prioritized Sequence of Remediation (Post-Parser)
+
+This addendum was added 2026-04-27, after `tree-sitter-m` v1.0 shipped (99.06% clean on the 39,330-routine VistA corpus) and `m-standard` v1.0 was tagged. The strategic question is no longer *should we build a parser?* — it is *which downstream tools are worth building, in what order, now that the parser dependency is satisfied?*
+
+The Tier 1–3 tools listed in Chapter 3 are already shipped. The remaining work is the Tier 4 backlog plus the umbrella-dispatcher rename. This addendum sequences that work into five phases ordered by daily friction, ecosystem unlock, and dependency depth.
+
+---
+
+### B.1 — Sequencing Principles
+
+Three criteria drive the order:
+
+1. **Daily friction first.** A tool that hurts every commit (no formatter) outranks a tool that hurts once a quarter (a debugger).
+2. **Ecosystem unlock.** A tool that unblocks others (a formatter that lets a linter assume canonical layout) outranks a self-contained tool of equal friction.
+3. **Risk-adjusted effort.** Tools with a clear analogue in another ecosystem (`yfmt` ≈ `gofmt`, `ylint` ≈ `clippy`) have lower implementation risk than novel work (a MUMPS-native package manager).
+
+Two anti-principles also apply:
+
+- **Do not bundle.** Each phase ships independently and is usable on its own. No phase blocks the next; a delay in Phase 3 must not stall Phase 4.
+- **Do not perfect.** Each tool ships a usable v0.1 first. Coverage of edge cases (`ylint-deep`'s call-graph analysis, `ydebug`'s breakpoint expressions) matures over subsequent releases.
+
+---
+
+### B.2 — Phase 1: Canonicalise the Codebase
+
+**Goal:** eliminate style debate and lock in a deterministic file layout that downstream tools can assume.
+
+| Tool | Future name | Effort | Why now |
+|------|-------------|--------|---------|
+| **`yfmt`** | `m fmt` | Medium (2–3 weeks) | No current solution. Canonical formatting is the precondition for every later visitor — a linter that fights inconsistent indentation is a much harder linter. |
+
+**Implementation:**
+- Lossless byte-range pretty-printer over the `tree-sitter-m` AST (preserves comments, blank-line groupings, trailing-comment column alignment).
+- Configuration: `m.toml` with style rules (label case, dot-block depth limits, max line length). Defaults match the "Lowercase Pythonic MUMPS" style in this project's CLAUDE.md.
+- Idempotent: `m fmt | m fmt` produces no further change. Round-trip CI test runs the formatter twice and asserts byte-identical output.
+- `--check` mode exits non-zero on any drift; wired into `yhook` and `yci`.
+
+**Exit criteria for the phase:** `yfmt --check` passes on this repo and on a representative VistA package; the output is bytewise stable across two consecutive runs.
+
+---
+
+### B.3 — Phase 2: Catch Bugs Before Runtime
+
+**Goal:** move bug categories from "found in test" to "found at edit time."
+
+| Tool | Future name | Effort | Why now |
+|------|-------------|--------|---------|
+| **`ylint-style`** | `m lint --style` | Small (1–2 weeks) | AST visitor with rule predicates; rules are mostly mechanical. Builds the lint framework itself. |
+| **`ylint-logic`** | `m lint --logic` | Medium (2–3 weeks) | Control-flow rules over the same framework: missing `QUIT`, unreachable code, undefined labels, unused locals. |
+
+**Implementation:**
+- Single binary, pluggable rule set. `m lint --style --logic` runs both groups; granular `--enable=R001,R012` for CI tuning.
+- Lint configuration in the same `m.toml` as `yfmt`. Rule severity (`error` / `warning` / `off`) is per-project.
+- Output formats: human (default), `--format=json` for editor integration, `--format=tap` for the existing `ytap` pipeline.
+- `--fix` mode for mechanically rewritable rules (e.g., trailing whitespace, missing `QUIT` at the end of a routine).
+
+**Why before `ylint-deep`:** the style + logic rules are local — they reason within a single function or routine. The deep variant needs a call graph and a symbol table that span the whole project, which is materially more work for a smaller marginal payoff. Ship the cheap, broad-coverage layer first.
+
+**Exit criteria:** `m lint --style --logic` runs cleanly on this repo's existing routines (after the formatter pass), and surfaces ≥3 real bugs when run against a noisy VistA package as a smoke test.
+
+---
+
+### B.4 — Phase 3: Replace Approximations with Truth
+
+**Goal:** retire the placeholder coverage tool with a parser-grounded replacement, and add the deeper analyses that need a call graph.
+
+| Tool | Future name | Effort | Why now |
+|------|-------------|--------|---------|
+| **`ycov-line` / `ycov-branch`** | `ydb cover --line` / `--branch` | Medium (2–3 weeks) | Today's `ycover` reports label-entry coverage only — i.e., "did we enter this label?" not "did we execute every line in it?" The parser identifies real instrumentation points (statements + branches), and the existing `^ycov` global infrastructure is reused. |
+| **`ylint-deep`** | `m lint --deep` | Large (4–6 weeks) | Builds a project-wide call graph and symbol table on top of the AST. Detects unused exports, dead labels, missing-routine references, and circular dependencies. |
+
+**Implementation notes:**
+- `ydb cover` keeps the existing `ZBREAK`-based runtime hook; the change is in instrumentation-point selection (now AST-driven) and in reporting (line + branch percentages, lcov export for IDE integration).
+- `ylint-deep` shares the rule framework from Phase 2. New rule categories: `dead-code`, `unused-export`, `unresolved-call`, `cyclic-import`. The call graph itself becomes a reusable artifact (`m graph` could ship as a thin CLI over it later).
+
+**Exit criteria:** `ydb cover --line` agrees with hand-instrumented spot checks on a small routine; `ylint-deep` correctly identifies the known dead labels in this repo's own `routines/`.
+
+---
+
+### B.5 — Phase 4: Interactive Surfaces (No Parser Dep)
+
+These tools do not require the parser and were previously deferred only on demand. They can be picked up in parallel with Phase 2 or Phase 3 by a separate contributor.
+
+| Tool | Future name | Effort | Notes |
+|------|-------------|--------|-------|
+| **`yrepl` Phase 1** | `ydb repl` | Small (≤1 week) | Wrap `ydb` direct mode with `rlwrap` or `prompt_toolkit` for history + multi-line editing. No parser dependency. |
+| **`yparallel`** | `ydb test --parallel` | Medium (2 weeks) | Worker pool over isolated `ydb_gbldir` copies (see [A.6 → Parallel Test Execution](#a6--per-gap-remediation-moderate-gaps-)). Blocked only on per-suite isolation discipline in our existing tests. |
+| **`ydebug`** | `ydb debug` | Large (3–4 weeks) | DAP server over YDB's `ZBREAK` / `ZSTEP` / `ZSHOW` primitives. Highest ceiling (full IDE step-debugging) but the lowest daily friction — a battery of `ZSHOW`s in direct mode covers most cases today. |
+| **`yrepl` Phase 2** | `ydb repl` | Small follow-on | Adds tab-completion driven by `tree-sitter-m` (after Phase 1 ships). |
+
+**Sequencing within the phase:** ship `yrepl` Phase 1 first — it is the smallest measurable win and unblocks REPL-driven exploration immediately. `yparallel` next, since it directly speeds up the existing test loop. `ydebug` last; it is the largest investment and the smallest delta over today's manual `ZBREAK` workflow.
+
+---
+
+### B.6 — Phase 5: Ecosystem Layer
+
+| Tool | Future name | Effort | Status |
+|------|-------------|--------|--------|
+| **`ydb-pkg`** | `m pkg` | Large (6+ weeks) | Blocked on the manifest-format design in `m-standard`. Once the format is specified, the installer is a relatively small shell + Python tool over a declarative TOML registry. |
+| **Bindings publishing** | (`tree-sitter-m`) | Small (≤1 week each) | Publish to npm / crates.io / PyPI / Go module proxy. Unblocks third-party tool authors. Tracked in `tree-sitter-m`'s STATUS, not in this document. |
+| **AD-03 stamping** | (`tree-sitter-m`) | Small | Per `tree-sitter-m` STATUS — integrate the keyword-coverage stamping into the grammar release process. |
+
+The package manager is intentionally last. The current toolchain assumes a single-repo / single-routine-set model, which has been adequate. Cross-project sharing of `.m` libraries is the genuine new capability that `ydb-pkg` unlocks, and the design ROI grows once a second project (e.g., a CLI on top of the parser) actually wants to depend on `m-tools`'s helpers.
+
+---
+
+### B.7 — Cross-Cutting: Umbrella Dispatcher Rename
+
+The `m <subcommand>` / `ydb <subcommand>` rename (see [implementation.md → §1](implementation.md#1-canonical-command-map-m-help)) is independent of the per-tool work and can be done in any phase. Recommended timing: **after Phase 2 ships**, when the lint framework forces a config file to exist (`m.toml`) and the umbrella dispatcher gives the config a natural home.
+
+The migration is mechanical: existing `y*` shell scripts become thin shims that dispatch to the umbrella. Old names remain functional indefinitely (no breakage); new documentation references the umbrella form.
+
+---
+
+### B.8 — Sequence Summary
+
+| Phase | Tools | Approximate effort | Unblocks |
+|-------|-------|--------------------|----------|
+| **1** | `yfmt` | 2–3 weeks | Style debate ends; later visitors assume canonical layout |
+| **2** | `ylint-style`, `ylint-logic` | 3–5 weeks | Bugs caught at edit time; lint framework reusable |
+| **3** | `ycov-line`/`ycov-branch`, `ylint-deep` | 6–9 weeks | Real (not approximate) coverage; project-wide analyses |
+| **4** | `yrepl` (P1+P2), `yparallel`, `ydebug` | 6–8 weeks | Interactive ergonomics; faster test loop |
+| **5** | `ydb-pkg`, bindings publishing | 6+ weeks | Cross-project library sharing; third-party tool authors |
+| **X-cut** | Umbrella dispatcher rename | 1 week | Coherent CLI surface; canonical config home |
+
+**Critical-path summary:** the parser foundation is shipped, so no phase has a hard blocker on another. Phases 1 → 2 → 3 form the natural single-developer critical path (each builds on the previous AST + framework). Phase 4 is parallelisable. Phase 5 waits on `m-standard`'s manifest design and accepts a third-party-driven cadence.
+
+---
+
+*End of Addendum B*
+
+---
+
+## Appendix B: Gold Standard — Top 5 Language Toolchains
+
+This appendix documents the toolchain available to developers in each of the five most widely used mainstream programming languages. These represent the lived experience of developers who would need to transition to or work alongside M code, and form the basis for the gold-standard column in [Chapter 2 — Comprehensive Gap Analysis](#2-comprehensive-gap-analysis).
+
+---
+
+### B.1 Python
+
+Python's toolchain has matured significantly with the `ruff` era. The ecosystem prioritizes speed of feedback and comprehensive static analysis.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `python`, `ipython`, `ptpython` | `python`, `ipython` | Full REPL with history, completion, multiline, magic commands |
+| Syntax check | `py_compile`, `ruff` | `python -m py_compile f.py` | Instant; part of every linter |
+| Linting (style) | `ruff`, `flake8`, `pycodestyle` | `ruff check .` | Rule-based; hundreds of configurable checks |
+| Linting (logic) | `pylint`, `ruff` | `pylint src/` | Detects unused vars, unreachable code, missing returns |
+| Type checking | `mypy`, `pyright` | `mypy src/` | Full static type analysis; catches type errors before runtime |
+| Formatting | `ruff format`, `black`, `autopep8` | `ruff format .` | Zero-config; deterministic output |
+| Test runner | `pytest`, `unittest` | `pytest` | Autodiscovers tests; rich output; plugins |
+| Single test | `pytest` | `pytest tests/test_foo.py::test_bar` | Path + name selector |
+| Test watcher | `pytest-watch`, `watchdog` | `ptw` | Reruns only affected tests on save |
+| Coverage | `coverage.py`, `pytest-cov` | `pytest --cov=src` | Line + branch coverage; HTML report |
+| Benchmarking | `pytest-benchmark`, `timeit` | `pytest --benchmark-only` | Repeatable, statistical results |
+| Profiling | `cProfile`, `py-spy`, `line_profiler` | `py-spy record -o out.svg -- python f.py` | Flame graphs, line-level timing |
+| Debugging | `pdb`, `ipdb`, `debugpy` | `python -m pdb script.py` | Breakpoints, step, inspect; IDE integration via DAP |
+| Documentation | `pdoc`, `sphinx`, `mkdocs` | `pdoc src/mymodule` | Extracts docstrings; generates HTML |
+| Dependency mgmt | `uv`, `pip`, `poetry`, `pipenv` | `uv add requests` | Lockfiles, virtual envs, reproducible installs |
+| Build / tasks | `make`, `tox`, `nox`, `invoke` | `tox` | Multi-env test matrix; task automation |
+| Import analysis | `isort`, `ruff` | `ruff check --select I` | Detect unused imports, sort order |
+| Security scan | `bandit`, `safety` | `bandit -r src/` | Detects common security anti-patterns |
+| Complexity | `radon`, `ruff` | `radon cc src/` | Cyclomatic complexity per function |
+| Dead code | `vulture` | `vulture src/` | Unused functions, variables, imports |
+| Fixture mgmt | `pytest fixtures`, `factory_boy` | `@pytest.fixture` decorator | Scoped, composable test state |
+| Snapshot testing | `syrupy` | `assert result == snapshot` | Auto-update expected output |
+| Pre-commit hooks | `pre-commit` | `pre-commit install` | Runs lint+format+type-check before every commit |
+| CI script | `tox`, `nox`, GitHub Actions | `tox -e lint,type,test` | Full pipeline; matrix testing |
+| Environment check | `tox`, `pyenv` | `python --version` | Version managers + lockfiles ensure reproducibility |
+| Package publishing | `twine`, `flit`, `uv publish` | `uv publish` | Upload to PyPI |
+
+---
+
+### B.2 JavaScript / TypeScript
+
+The JS/TS ecosystem has the broadest toolchain of any language, driven by the npm ecosystem's culture of small, composable packages.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `node`, `ts-node`, `deno` | `node` | Readline REPL; `ts-node` for TypeScript |
+| Syntax check | `tsc` | `tsc --noEmit` | TypeScript compiler; also catches type errors |
+| Linting | `eslint`, `biome` | `eslint src/` | Pluggable; hundreds of rules; fixable violations |
+| Type checking | `tsc`, `pyright` | `tsc --strict` | Full inference + structural typing |
+| Formatting | `prettier`, `biome` | `prettier --write .` | Zero-config; opinionated; universal |
+| Test runner | `jest`, `vitest`, `mocha` | `jest` | Autodiscovery; parallel; snapshots built-in |
+| Single test | `jest`, `vitest` | `jest --testNamePattern "my test"` | Regex name or path filter |
+| Test watcher | `jest`, `vitest` | `jest --watch` or `vitest --watch` | Interactive; runs only changed files |
+| Coverage | `istanbul/nyc`, `c8`, `v8` | `jest --coverage` | Built into jest; HTML + lcov output |
+| Benchmarking | `tinybench`, `benchmark.js` | (library-based) | Statistical microbenchmarks |
+| Profiling | Node `--prof`, Chrome DevTools | `node --prof script.js` | V8 CPU profiler; flame graphs |
+| Debugging | `node --inspect`, VS Code | `node --inspect-brk` | DAP protocol; full IDE integration |
+| Documentation | `jsdoc`, `typedoc` | `typedoc src/` | Extracts JSDoc/TSDoc comments; HTML output |
+| Dependency mgmt | `npm`, `yarn`, `pnpm` | `npm install` | `package-lock.json`; semantic versioning |
+| Build | `webpack`, `vite`, `esbuild`, `rollup` | `vite build` | Bundling, tree-shaking, minification |
+| Snapshot testing | `jest snapshots` | `expect(x).toMatchSnapshot()` | Auto-create + update expected output files |
+| Fixture mgmt | `jest beforeEach/afterEach` | `beforeEach(() => setup())` | Scoped setup/teardown per test/suite |
+| Mock/stub | `jest.mock()`, `sinon` | `jest.mock('./module')` | Module-level mocking; spy functions |
+| Pre-commit hooks | `husky`, `lint-staged` | `npx husky install` | Run lint+format on staged files only |
+| CI script | GitHub Actions, `npm run ci` | `npm run lint && npm test` | Standard `ci` script in `package.json` |
+| Security scan | `npm audit`, `snyk` | `npm audit` | Dependency vulnerability scanning |
+| Environment check | `nvm`, `volta`, `.nvmrc` | `node --version` | Version pinning per project |
+
+---
+
+### B.3 Go
+
+Go's toolchain is the gold standard for batteries-included developer experience. Nearly everything ships with the language itself; third-party tools fill only the gaps.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `gore`, `yaegi` | `gore` | No official REPL; `go run` for quick scripts |
+| Syntax check | `go build` | `go build ./...` | Compile errors are syntax + type errors |
+| Linting (vet) | `go vet` | `go vet ./...` | Ships with Go; catches common mistakes |
+| Linting (full) | `golangci-lint`, `staticcheck` | `golangci-lint run` | Aggregates 50+ linters; industry standard |
+| Type checking | built-in | `go build` | Types are checked at compile time — always |
+| Formatting | `gofmt`, `goimports` | `gofmt -w .` | **Ships with Go**; canonical; non-negotiable in PRs |
+| Test runner | `go test` | `go test ./...` | **Ships with Go**; parallel by default |
+| Single test | `go test` | `go test -run TestName ./pkg/` | Regex name filter + package path |
+| Test watcher | `gotestsum`, `air` | `gotestsum --watch` | `gotestsum` formats output; `--watch` reruns on change |
+| Coverage | `go test -cover` | `go test -coverprofile=c.out ./...` | **Ships with Go**; HTML report via `go tool cover` |
+| Benchmarking | `go test -bench` | `go test -bench=. -benchmem` | **Ships with Go**; ns/op + allocs/op |
+| Profiling | `go tool pprof` | `go test -cpuprofile=cpu.out` | **Ships with Go**; flame graphs, heap profiles |
+| Debugging | `dlv` (Delve) | `dlv test ./pkg/` | Full DAP debugger; breakpoints, stack, goroutines |
+| Documentation | `godoc`, `pkgsite` | `godoc -http :6060` | Extracts doc comments; standard format |
+| Dependency mgmt | `go mod` | `go mod tidy` | **Ships with Go**; lockfile (`go.sum`); reproducible |
+| Build | `go build` | `go build -o bin/app` | **Ships with Go** |
+| Race detector | `go test -race` | `go test -race ./...` | **Ships with Go**; detects data races at runtime |
+| Fuzzing | `go test -fuzz` | `go test -fuzz=FuzzFn` | **Ships with Go** since 1.18 |
+| Fixture mgmt | `testing.T`, `testcontainers` | `t.Cleanup(func(){...})` | `t.TempDir()`, `t.Cleanup()` built into stdlib |
+| Pre-commit hooks | `golangci-lint` + `pre-commit` | `pre-commit run --all-files` | Standard practice; enforces `gofmt` + vet |
+| CI script | `Makefile`, GitHub Actions | `make lint test` | `go vet + golangci-lint + go test` |
+| Security scan | `gosec`, `govulncheck` | `govulncheck ./...` | **`govulncheck` ships with Go toolchain** |
+| Complexity | `gocyclo`, `golangci-lint` | (via golangci-lint) | Cyclomatic complexity reporting |
+
+> **Note:** Go is the benchmark for language-bundled tooling. `go test`, `go fmt`, `go vet`, `go doc`, `go mod`, `-race`, `-bench`, `-cover`, and `-fuzz` all ship with the standard `go` binary. Third-party tools are needed only for aggregated linting and the debugger.
+
+---
+
+### B.4 Rust
+
+Rust's toolchain, delivered via `cargo`, is the closest to Go in terms of batteries-included quality and the most ergonomic for a compiled language.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `evcxr` | `evcxr` | Third-party; reasonable quality |
+| Syntax / compile | `cargo check` | `cargo check` | Type-checks without linking; very fast |
+| Linting | `cargo clippy` | `cargo clippy -- -D warnings` | Ships in rustup; 700+ lints; highly actionable |
+| Type checking | built-in | `cargo check` | Always; Rust's type system is the primary safety tool |
+| Formatting | `rustfmt` | `cargo fmt` | Ships with rustup; canonical; enforced in most projects |
+| Test runner | `cargo test` | `cargo test` | Ships with cargo; captures output; parallel |
+| Single test | `cargo test` | `cargo test test_name` | String filter on test names |
+| Test watcher | `cargo watch` | `cargo watch -x test` | Watches source; reruns on change |
+| Coverage | `cargo tarpaulin`, `cargo llvm-cov` | `cargo tarpaulin` | LLVM-based; line + branch; lcov output |
+| Benchmarking | `cargo bench`, `criterion` | `cargo bench` | `criterion` gives statistical analysis |
+| Profiling | `cargo flamegraph`, `samply` | `cargo flamegraph` | Generates SVG flame graphs |
+| Debugging | `rust-gdb`, `rust-lldb`, `CodeLLDB` | `rust-gdb target/debug/app` | IDE-integrated via DAP |
+| Documentation | `cargo doc` | `cargo doc --open` | Extracts `///` doc comments; runs doctests |
+| Dependency mgmt | `cargo` | `cargo add serde` | `Cargo.lock`; deterministic; audit-able |
+| Build | `cargo build` | `cargo build --release` | Incremental; cross-compilation |
+| Fixture mgmt | `rstest`, `proptest` | `#[rstest]` attribute | Parameterized tests; property-based testing |
+| Fuzzing | `cargo fuzz` | `cargo fuzz run` | LibFuzzer integration |
+| Security scan | `cargo audit` | `cargo audit` | Checks advisory database for vulnerable deps |
+| Pre-commit hooks | `cargo fmt --check` + `cargo clippy` | (via `.pre-commit-config.yaml`) | Standard practice |
+| CI script | GitHub Actions + `cargo` | `cargo fmt --check && cargo clippy && cargo test` | Standard three-step pipeline |
+
+---
+
+### B.5 Java
+
+Java has the most mature and enterprise-focused toolchain, with build systems that can feel heavyweight but provide comprehensive lifecycle management.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `jshell` | `jshell` | Ships with JDK since Java 9; reasonable REPL |
+| Syntax / compile | `javac`, `maven`, `gradle` | `mvn compile` | Compilation is syntax + type checking |
+| Linting (style) | `Checkstyle`, `PMD` | `mvn checkstyle:check` | Rule-based style enforcement; Google/Sun rules |
+| Linting (logic) | `SpotBugs`, `PMD`, `Error Prone` | `mvn spotbugs:check` | Detects null dereferences, resource leaks, etc. |
+| Type checking | built-in | `javac` | Strong static typing; checked at compile time |
+| Formatting | `google-java-format`, `Spotless` | `mvn spotless:apply` | Plugin-driven; enforces Google Java style |
+| Test runner | `JUnit 5`, `TestNG` | `mvn test` | Industry standard; rich annotations |
+| Single test | `Maven Surefire` | `mvn test -Dtest=MyTest#myMethod` | Class + method filter |
+| Test watcher | `JUnit Platform`, `fizzed-watcher` | (limited native support) | Less ergonomic than other ecosystems |
+| Coverage | `JaCoCo` | `mvn jacoco:report` | Line + branch + complexity; HTML + XML |
+| Benchmarking | `JMH` | (annotation-based) | JVM Microbenchmark Harness; industry standard |
+| Profiling | `JProfiler`, `async-profiler`, `JFR` | `jfr print recording.jfr` | JFR ships with JDK; async-profiler is excellent |
+| Debugging | `jdb`, IDE debuggers | `jdb` | JDWP protocol; universal IDE support |
+| Documentation | `Javadoc` | `mvn javadoc:javadoc` | Ships with JDK; standard `/** */` format |
+| Dependency mgmt | `Maven`, `Gradle` | `mvn dependency:tree` | `pom.xml` / `build.gradle`; central repository |
+| Build | `Maven`, `Gradle`, `Bazel` | `mvn package` | Full lifecycle management |
+| Fixture mgmt | `JUnit @BeforeEach`, `DBUnit` | `@BeforeEach void setup()` | Scoped; `@Nested` for grouping |
+| Mock/stub | `Mockito`, `EasyMock` | `@Mock MyService svc` | Industry-standard mocking framework |
+| Static analysis | `SonarQube`, `Error Prone` | `mvn sonar:sonar` | Enterprise-grade; technical debt tracking |
+| Security scan | `OWASP Dependency-Check` | `mvn dependency-check:check` | CVE database scanning |
+| Pre-commit hooks | `Maven enforcer`, `Checkstyle` | (via Maven lifecycle) | Bound to `validate` phase |
+| CI script | `mvn verify` | `mvn clean verify` | Runs compile + test + check + package |
+
+---
+
+## Appendix C: What Ships with YottaDB (Foundation Runtime)
+
+Before assessing gaps, it is important to inventory what YottaDB — the open-source M runtime used as this project's foundation — already provides. Many developers are unaware of the full scope of YottaDB's built-in utilities. These vendor tools are used directly throughout the toolchain; they are not wrapped or renamed.
+
+### C.1 Runtime and Interactive Tools
+
+| Tool | Invocation | Description |
+|------|-----------|-------------|
+| `ydb` / `mumps` | `$YDB_DIST/ydb` | Main runtime. Enters interactive direct-mode when invoked without `-run`. Accepts MUMPS commands interactively. |
+| `%XCMD` | `ydb -run %XCMD "code"` | Execute a MUMPS code string and exit. The foundation of all shell wrappers. |
+| Direct mode | `ydb` (interactive) | REPL-like mode: type MUMPS commands, see results. No history, no completion. |
+| `ZCOMPILE` | `ydb -run %XCMD "zcompile \"file.m\""` | Compile a routine to object code (`.m` → `.o`). Reports syntax errors. Used by `ycheck`. |
+| `ZLINK` | in-process | Dynamically load a compiled routine. Happens automatically on first call. |
+
+### C.2 MUPIP — Database Management Utility
+
+`mupip` is YottaDB's most powerful and underused utility. It operates on the database files directly and is the closest thing M has to a database administration toolkit.
+
+| Subcommand | Description | Dev Relevance |
+|-----------|-------------|---------------|
+| `mupip extract` | Export globals to a portable text file (ZWR or GO format) | **High** — enables fixture export, backup before tests, diff between runs |
+| `mupip load` | Import globals from ZWR/GO format | **High** — enables fixture loading, restoring known test state |
+| `mupip integ` | Verify database file structural integrity | Medium — useful after crashes or unexpected exits |
+| `mupip backup` | Backup live database to a file | Medium — snapshot before risky operations |
+| `mupip restore` | Restore from a backup | Medium — reset to snapshot |
+| `mupip size` | Report node counts and storage statistics for all globals | **High** — global size reporting; equivalent of `du` for the database |
+| `mupip reorg` | Compact and reorganize database blocks | Low (maintenance) |
+| `mupip rundown` | Clean up after crashed processes (remove stale locks, shared memory) | Medium — critical after crashes |
+| `mupip journal` | Manage journal/WAL files | Low (operations) |
+| `mupip set` | Modify database parameters (block size, extension size, etc.) | Low (setup) |
+| `mupip trigger` | Manage YottaDB triggers (code that fires on global updates) | Medium — triggers are an advanced feature |
+| `mupip freeze` | Freeze/unfreeze database updates | Low |
+| `mupip replicate` | Configure primary/secondary replication | Low (operations) |
+
+> **Key insight:** `mupip extract` and `mupip load` are the foundation of any fixture management system. They are already present but unused in most development workflows.
+
+### C.3 Auxiliary Utilities
+
+| Tool | Description |
+|------|-------------|
+| `gde` (Global Directory Editor) | Interactive tool to configure which globals live in which database files. Used at setup, rarely during development. |
+| `lke` (Lock Examination) | Inspect and forcibly clear `LOCK` entries held by any process. Critical when a crashed process leaves locks held. |
+| `dse` (Database Structure Editor) | Low-level block-by-level database editor. Dangerous — only for recovery scenarios. |
+
+### C.4 MUMPS Intrinsic Debugging Commands
+
+These commands are available within any MUMPS routine or interactive session:
+
+| Command / Function | Description |
+|-------------------|-------------|
+| `ZSHOW "V"` | Print all local variables and their values |
+| `ZSHOW "G"` | Print all global variable references |
+| `ZSHOW "L"` | Print all currently held locks |
+| `ZSHOW "D"` | Print all open devices |
+| `ZSHOW "B"` | Print all ZBREAK breakpoints |
+| `ZSHOW "S"` | Print the current call stack |
+| `ZSHOW "A"` | Print everything above |
+| `ZWRITE var` | Print a variable in MUMPS `SET` syntax (full subtree) |
+| `ZPRINT label^routine` | Print source code of a label or entire routine |
+| `ZBREAK label` | Set a breakpoint at a label |
+| `ZBREAK label:"condition"` | Conditional breakpoint |
+| `ZCONTINUE` | Resume execution after a ZBREAK halt |
+| `ZSTEP INTO` | Step into the next line |
+| `ZSTEP OVER` | Step over a DO call |
+| `ZSTEP OUTOF` | Step out of current routine |
+| `ZGOTO level:label` | Unwind stack to level and jump to label |
+| `$STACK(n,"MCODE")` | Source code of call at stack level n |
+| `$STACK(n,"PLACE")` | Routine+label+offset of call at stack level n |
+| `$ZPOSITION` | Current routine and label+offset |
+| `$ZTRAP` | YDB-specific error trap (alternative to `$ETRAP`) |
+
+### C.5 Percent-Sign Utility Routines
+
+These ship with YottaDB and live in `$YDB_DIST`:
+
+| Routine | Description |
+|---------|-------------|
+| `%GO` | Export one or more globals to a file in GO (sequential) format |
+| `%GI` | Import globals from a GO-format file |
+| `%GSEL` | Interactive global name selection utility |
+| `%RD` | Routine directory — list all compiled routines |
+| `%RSEL` | Routine selector — interactive search through routines |
+| `%ZDATE` | Date/time formatting utility |
+| `%ZCRC` | CRC checksum computation |
+| `%ZMVALID` | Validate that a string is a legal MUMPS variable name |
+| `%XCMD` | Execute a command string (used by shell wrappers) |
+| `%ZTRIGGER` | Trigger management interface |
+
+---
+
+*End of gap-analysis-and-remediation-strategy document.*
diff --git a/docs/history/m-tool-gap-analysis.md b/docs/history/m-tool-gap-analysis.md
new file mode 100644
index 0000000..c0063ac
--- /dev/null
+++ b/docs/history/m-tool-gap-analysis.md
@@ -0,0 +1,1146 @@
+# M Tools — Gap Analysis (Vendor-Neutral)
+
+> **Archived snapshot.** Imported verbatim from [`m-dev-tools/m-tools`](https://github.com/m-dev-tools/m-tools) — source commit [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27), before that repo was archived. Preserved for the design rationale behind the m-dev-tools ecosystem (Go/Rust/Python toolchain analogy that drove `m-cli`'s CLI ergonomics). **Not maintained.** For the *current* shape of the org, start at [`profile/README.md`](../../profile/README.md).
+
+**Document type:** Reference / strategic planning
+**Scope:** Developer toolchain for the M (MUMPS) programming language across all current implementations
+**Audience:** Anyone evaluating, building for, or contributing to the M ecosystem
+**Companion document:** [gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md) — the YottaDB-bound remediation roadmap that consumes this analysis
+
+---
+
+## Table of Contents
+
+- [1. Introduction](#1-introduction)
+  - [1.1 What is M, and why does its toolchain matter?](#11-what-is-m-and-why-does-its-toolchain-matter)
+  - [1.2 The two main current implementations](#12-the-two-main-current-implementations)
+  - [1.3 What "M tools" means in this document](#13-what-m-tools-means-in-this-document)
+- [2. The Gold Standard — Top 5 Language Toolchains](#2-the-gold-standard--top-5-language-toolchains)
+  - [2.1 Python](#21-python)
+  - [2.2 JavaScript / TypeScript](#22-javascript--typescript)
+  - [2.3 Go](#23-go)
+  - [2.4 Rust](#24-rust)
+  - [2.5 Java](#25-java)
+- [3. The M Language Surface Across Implementations](#3-the-m-language-surface-across-implementations)
+  - [3.1 Concept-by-concept reconciliation](#31-concept-by-concept-reconciliation)
+  - [3.2 What's portable vs what isn't](#32-whats-portable-vs-what-isnt)
+  - [3.3 Multi-vendor extensions (non-ANSI but in both engines)](#33-multi-vendor-extensions-non-ansi-but-in-both-engines)
+- [4. The M Development Toolchain Across Implementations](#4-the-m-development-toolchain-across-implementations)
+  - [4.1 InterSystems IRIS](#41-intersystems-iris)
+    - [4.1.1 IRIS ObjectScript (IOS): what it is, and why it isn't ANSI standard MUMPS](#411-iris-objectscript-ios-what-it-is-and-why-it-isnt-ansi-standard-mumps)
+    - [4.1.2 File extensions in IRIS source code](#412-file-extensions-in-iris-source-code)
+    - [4.1.3 IRIS tooling, by file scope and language](#413-iris-tooling-by-file-scope-and-language)
+  - [4.2 YottaDB](#42-yottadb)
+  - [4.3 Common across both engines](#43-common-across-both-engines)
+  - [4.4 Foreign-language integration: "embedded language" vs "embedded database"](#44-foreign-language-integration-embedded-language-vs-embedded-database)
+  - [4.5 Polyglot routines vs C-API separation: a quality / maintainability analysis](#45-polyglot-routines-vs-c-api-separation-a-quality--maintainability-analysis)
+- [5. Summary Table: MUMPS-vs-MUMPS — Gold Standard, IRIS, YottaDB, VA/Community](#5-summary-table-mumps-vs-mumps--gold-standard-iris-yottadb-vacommunity)
+  - [5.1 Where both engines fall short of the gold standard](#51-where-both-engines-fall-short-of-the-gold-standard)
+  - [5.2 Where the engines diverge most sharply](#52-where-the-engines-diverge-most-sharply)
+  - [5.3 What the MUMPS-only matrix reveals](#53-what-the-mumps-only-matrix-reveals)
+- [6. The Real Question: Developer Experience for a Legacy MUMPS Codebase](#6-the-real-question-developer-experience-for-a-legacy-mumps-codebase)
+  - [6.1 The IRIS-based VistA scenario](#61-the-iris-based-vista-scenario)
+  - [6.2 The YottaDB-based VistA scenario](#62-the-yottadb-based-vista-scenario)
+  - [6.3 Side-by-side summary](#63-side-by-side-summary)
+  - [6.4 The bottom line](#64-the-bottom-line)
+- [7. Consolidated Gap Analysis](#7-consolidated-gap-analysis)
+- [8. Rank-Ordered Developer Impact: Where to Invest First](#8-rank-ordered-developer-impact-where-to-invest-first)
+
+---
+
+## 1. Introduction
+
+### 1.1 What is M, and why does its toolchain matter?
+
+M (originally MUMPS — Massachusetts General Hospital Utility Multi-Programming System, ANSI X11.1, ISO 11756) is a high-level programming language with an integrated hierarchical key-value database. It has been in continuous production use since 1966 and underpins a disproportionate share of the world's healthcare IT — Epic Systems, MEDITECH, the U.S. Department of Veterans Affairs' VistA system, and others collectively store hundreds of millions of patient records in M databases.
+
+Despite that operational footprint, the developer experience around M has received comparatively little tooling investment. Most modern software-development practices — unit testing, continuous integration, static analysis, code coverage, package management, automated formatting — emerged after M was already in widespread production use. As a result, the productivity tools that mainstream language communities take for granted are largely absent in the M world.
+
+This document inventories the gap. It is deliberately **vendor-neutral**: it begins from the language standard and the cross-vendor reality, not from any single implementation's strengths or limitations. The companion document, [gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md), is the YottaDB-bound remediation strategy that builds on this analysis.
+
+### 1.2 The two main current implementations
+
+M is a standardised language with multiple historical and current implementations. Two implementations are in active production today and are the focus of this analysis:
+
+| Implementation | Vendor | Licence | Notes |
+|----------------|--------|---------|-------|
+| **InterSystems IRIS** | InterSystems Corporation | Commercial / proprietary | The current product, branded as **IRIS** since 2018. *(See the **Naming history** note below the table.)* The runtime is MUMPS at its core. The primary developer-facing language is **IRIS ObjectScript (IOS)** — a proprietary superset of MUMPS that adds object-oriented classes, methods, embedded SQL, and embedded Python (see [§4.1.1](#411-iris-objectscript-ios-what-it-is-and-why-it-isnt-ansi-standard-mumps)). IOS is **not** ANSI standard MUMPS, and most IRIS tooling targets IOS classes (`.cls`) rather than `.m` MUMPS routines. |
+| **YottaDB** | YottaDB LLC | AGPL-3.0 (open source) | A 2017 fork of FIS GT.M (the open-source M implementation that traces back to the same Massachusetts General codebase). M-only at its core; richer extensibility comes via a stable C API (`libyottadb`) that other languages bind to. |
+
+Other implementations — FIS GT.M (the YottaDB ancestor, now in maintenance), MiniM, M21, Reference Standard M (RSM), MUMPS V1 — exist but are either retired, niche, or maintained on a different cadence; they are not covered in detail here.
+
+#### Naming history: InterSystems MUMPS → Caché ObjectScript → IRIS ObjectScript (IOS)
+
+InterSystems' technology has been continuously evolved for several decades; its **branding** has been changed twice in that time. Distinguishing the technology from the marketing layers is important for an accurate gap analysis:
+
+1. **InterSystems MUMPS (ISM)** — late 1970s through the 1990s. A pure ANSI MUMPS implementation, more or less.
+2. **Caché ObjectScript (COS)** — late 1990s onward. InterSystems built an object-oriented layer on top of ISM's MUMPS runtime — adding classes, methods, properties, embedded SQL, and a class-compilation phase — and named the resulting language **Caché ObjectScript**, abbreviated **COS**. The product itself was branded **Caché**. The MUMPS runtime remained underneath; COS code compiles down to MUMPS-shaped intermediate routines that the routine compiler then turns into object code.
+3. **IRIS / "ObjectScript"** — 2018 onward. InterSystems rebranded the product from **Caché** to **IRIS**. *This was a marketing rename, not a technology change* — the engine, the class-compilation pipeline, and the language itself were all carried over. InterSystems scrubbed most mentions of "Caché" from its website and product surfaces, and now refers to the language simply as **"ObjectScript"** (without the "Caché" prefix). Technically, however, today's "ObjectScript" is **the same Caché ObjectScript** with the brand name removed.
+
+For clarity in this document — and to disambiguate from Apple's iOS (which is an unrelated mobile operating system) — we use the term **IRIS ObjectScript (IOS)** when referring to InterSystems' current language. **IOS = Caché ObjectScript with the "Caché" prefix scrubbed.** Where context calls for it, we also use **COS** to refer to the historically-cumulative language (the pre- and post-rename forms are functionally equivalent), or just **ObjectScript** when the modifier doesn't add information.
+
+**The MUMPS runtime under IOS is still MUMPS.** This is what the §5 matrix scores on: the IRIS column tracks what's available to a developer writing pure `.m` or pure-MUMPS `.mac` routines on IRIS — i.e., the developer who is not opting into the IOS class layer. IOS-specific tooling (the class compiler, `%UnitTest`, Documatic, IPM, etc.) is out of scope for the MUMPS-vs-MUMPS comparison; see [§4.1.1](#411-iris-objectscript-ios-what-it-is-and-why-it-isnt-ansi-standard-mumps).
+
+### 1.3 What "M tools" means in this document
+
+Two related things are inventoried below:
+
+1. **The language surface** — commands, intrinsic functions, intrinsic special variables (ISVs), operators, and pattern codes that an M program can use. This is the input to any source-level tool (parser, linter, formatter, AST-based analyser) and is necessarily implementation-aware: a tool that promises portability has to know which features each engine implements.
+2. **The development toolchain** — the editors, debuggers, test runners, linters, formatters, profilers, package managers, CI integrations, and other utilities that surround the language. This is where the gap against mainstream languages is most acute.
+
+The data in [§3](#3-the-m-language-surface-across-implementations) is grounded in [`m-standard`](https://github.com/rafael5/m-standard), an integrated machine-readable reference that reconciles four primary sources (the Annotated M Standard / ISO 11756, the YottaDB documentation, the IRIS documentation, and the VA SAC / XINDEX rule set) into a unified data layer. All cross-engine counts cited below trace back to that reconciliation.
+
+---
+
+## 2. The Gold Standard — Top 5 Language Toolchains
+
+The following tables document the toolchain available to developers in each of the five most widely used mainstream programming languages today. They establish the gold-standard reference against which the M ecosystem is measured in [§5](#5-summary-table-gold-standard-vs-iris-vs-yottadb).
+
+These are the lived experience of developers who would need to transition to or work alongside M code. The tables are deliberately uniform in shape so they can be compared directly.
+
+---
+
+### 2.1 Python
+
+Python's toolchain has matured significantly with the `ruff` era. The ecosystem prioritises speed of feedback and comprehensive static analysis.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `python`, `ipython`, `ptpython` | `python`, `ipython` | Full REPL with history, completion, multiline, magic commands |
+| Syntax check | `py_compile`, `ruff` | `python -m py_compile f.py` | Instant; part of every linter |
+| Linting (style) | `ruff`, `flake8`, `pycodestyle` | `ruff check .` | Rule-based; hundreds of configurable checks |
+| Linting (logic) | `pylint`, `ruff` | `pylint src/` | Detects unused vars, unreachable code, missing returns |
+| Type checking | `mypy`, `pyright` | `mypy src/` | Full static type analysis; catches type errors before runtime |
+| Formatting | `ruff format`, `black`, `autopep8` | `ruff format .` | Zero-config; deterministic output |
+| Test runner | `pytest`, `unittest` | `pytest` | Autodiscovers tests; rich output; plugins |
+| Single test | `pytest` | `pytest tests/test_foo.py::test_bar` | Path + name selector |
+| Test watcher | `pytest-watch`, `watchdog` | `ptw` | Reruns only affected tests on save |
+| Coverage | `coverage.py`, `pytest-cov` | `pytest --cov=src` | Line + branch coverage; HTML report |
+| Benchmarking | `pytest-benchmark`, `timeit` | `pytest --benchmark-only` | Repeatable, statistical results |
+| Profiling | `cProfile`, `py-spy`, `line_profiler` | `py-spy record -o out.svg -- python f.py` | Flame graphs, line-level timing |
+| Debugging | `pdb`, `ipdb`, `debugpy` | `python -m pdb script.py` | Breakpoints, step, inspect; IDE integration via DAP |
+| Documentation | `pdoc`, `sphinx`, `mkdocs` | `pdoc src/mymodule` | Extracts docstrings; generates HTML |
+| Dependency mgmt | `uv`, `pip`, `poetry`, `pipenv` | `uv add requests` | Lockfiles, virtual envs, reproducible installs |
+| Build / tasks | `make`, `tox`, `nox`, `invoke` | `tox` | Multi-env test matrix; task automation |
+| Import analysis | `isort`, `ruff` | `ruff check --select I` | Detect unused imports, sort order |
+| Security scan | `bandit`, `safety` | `bandit -r src/` | Detects common security anti-patterns |
+| Complexity | `radon`, `ruff` | `radon cc src/` | Cyclomatic complexity per function |
+| Dead code | `vulture` | `vulture src/` | Unused functions, variables, imports |
+| Fixture mgmt | `pytest fixtures`, `factory_boy` | `@pytest.fixture` decorator | Scoped, composable test state |
+| Snapshot testing | `syrupy` | `assert result == snapshot` | Auto-update expected output |
+| Pre-commit hooks | `pre-commit` | `pre-commit install` | Runs lint+format+type-check before every commit |
+| CI script | `tox`, `nox`, GitHub Actions | `tox -e lint,type,test` | Full pipeline; matrix testing |
+| Environment check | `tox`, `pyenv` | `python --version` | Version managers + lockfiles ensure reproducibility |
+| Package publishing | `twine`, `flit`, `uv publish` | `uv publish` | Upload to PyPI |
+
+---
+
+### 2.2 JavaScript / TypeScript
+
+The JS/TS ecosystem has the broadest toolchain of any language, driven by the npm ecosystem's culture of small, composable packages.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `node`, `ts-node`, `deno` | `node` | Readline REPL; `ts-node` for TypeScript |
+| Syntax check | `tsc` | `tsc --noEmit` | TypeScript compiler; also catches type errors |
+| Linting | `eslint`, `biome` | `eslint src/` | Pluggable; hundreds of rules; fixable violations |
+| Type checking | `tsc`, `pyright` | `tsc --strict` | Full inference + structural typing |
+| Formatting | `prettier`, `biome` | `prettier --write .` | Zero-config; opinionated; universal |
+| Test runner | `jest`, `vitest`, `mocha` | `jest` | Autodiscovery; parallel; snapshots built-in |
+| Single test | `jest`, `vitest` | `jest --testNamePattern "my test"` | Regex name or path filter |
+| Test watcher | `jest`, `vitest` | `jest --watch` or `vitest --watch` | Interactive; runs only changed files |
+| Coverage | `istanbul/nyc`, `c8`, `v8` | `jest --coverage` | Built into jest; HTML + lcov output |
+| Benchmarking | `tinybench`, `benchmark.js` | (library-based) | Statistical microbenchmarks |
+| Profiling | Node `--prof`, Chrome DevTools | `node --prof script.js` | V8 CPU profiler; flame graphs |
+| Debugging | `node --inspect`, VS Code | `node --inspect-brk` | DAP protocol; full IDE integration |
+| Documentation | `jsdoc`, `typedoc` | `typedoc src/` | Extracts JSDoc/TSDoc comments; HTML output |
+| Dependency mgmt | `npm`, `yarn`, `pnpm` | `npm install` | `package-lock.json`; semantic versioning |
+| Build | `webpack`, `vite`, `esbuild`, `rollup` | `vite build` | Bundling, tree-shaking, minification |
+| Snapshot testing | `jest snapshots` | `expect(x).toMatchSnapshot()` | Auto-create + update expected output files |
+| Fixture mgmt | `jest beforeEach/afterEach` | `beforeEach(() => setup())` | Scoped setup/teardown per test/suite |
+| Mock/stub | `jest.mock()`, `sinon` | `jest.mock('./module')` | Module-level mocking; spy functions |
+| Pre-commit hooks | `husky`, `lint-staged` | `npx husky install` | Run lint+format on staged files only |
+| CI script | GitHub Actions, `npm run ci` | `npm run lint && npm test` | Standard `ci` script in `package.json` |
+| Security scan | `npm audit`, `snyk` | `npm audit` | Dependency vulnerability scanning |
+| Environment check | `nvm`, `volta`, `.nvmrc` | `node --version` | Version pinning per project |
+
+---
+
+### 2.3 Go
+
+Go's toolchain is the gold standard for batteries-included developer experience. Nearly everything ships with the language itself; third-party tools fill only the gaps.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `gore`, `yaegi` | `gore` | No official REPL; `go run` for quick scripts |
+| Syntax check | `go build` | `go build ./...` | Compile errors are syntax + type errors |
+| Linting (vet) | `go vet` | `go vet ./...` | Ships with Go; catches common mistakes |
+| Linting (full) | `golangci-lint`, `staticcheck` | `golangci-lint run` | Aggregates 50+ linters; industry standard |
+| Type checking | built-in | `go build` | Types are checked at compile time — always |
+| Formatting | `gofmt`, `goimports` | `gofmt -w .` | **Ships with Go**; canonical; non-negotiable in PRs |
+| Test runner | `go test` | `go test ./...` | **Ships with Go**; parallel by default |
+| Single test | `go test` | `go test -run TestName ./pkg/` | Regex name filter + package path |
+| Test watcher | `gotestsum`, `air` | `gotestsum --watch` | `gotestsum` formats output; `--watch` reruns on change |
+| Coverage | `go test -cover` | `go test -coverprofile=c.out ./...` | **Ships with Go**; HTML report via `go tool cover` |
+| Benchmarking | `go test -bench` | `go test -bench=. -benchmem` | **Ships with Go**; ns/op + allocs/op |
+| Profiling | `go tool pprof` | `go test -cpuprofile=cpu.out` | **Ships with Go**; flame graphs, heap profiles |
+| Debugging | `dlv` (Delve) | `dlv test ./pkg/` | Full DAP debugger; breakpoints, stack, goroutines |
+| Documentation | `godoc`, `pkgsite` | `godoc -http :6060` | Extracts doc comments; standard format |
+| Dependency mgmt | `go mod` | `go mod tidy` | **Ships with Go**; lockfile (`go.sum`); reproducible |
+| Build | `go build` | `go build -o bin/app` | **Ships with Go** |
+| Race detector | `go test -race` | `go test -race ./...` | **Ships with Go**; detects data races at runtime |
+| Fuzzing | `go test -fuzz` | `go test -fuzz=FuzzFn` | **Ships with Go** since 1.18 |
+| Fixture mgmt | `testing.T`, `testcontainers` | `t.Cleanup(func(){...})` | `t.TempDir()`, `t.Cleanup()` built into stdlib |
+| Pre-commit hooks | `golangci-lint` + `pre-commit` | `pre-commit run --all-files` | Standard practice; enforces `gofmt` + vet |
+| CI script | `Makefile`, GitHub Actions | `make lint test` | `go vet + golangci-lint + go test` |
+| Security scan | `gosec`, `govulncheck` | `govulncheck ./...` | **`govulncheck` ships with Go toolchain** |
+| Complexity | `gocyclo`, `golangci-lint` | (via golangci-lint) | Cyclomatic complexity reporting |
+
+> **Note:** Go is the benchmark for language-bundled tooling. `go test`, `go fmt`, `go vet`, `go doc`, `go mod`, `-race`, `-bench`, `-cover`, and `-fuzz` all ship with the standard `go` binary.
+
+---
+
+### 2.4 Rust
+
+Rust's toolchain, delivered via `cargo`, is the closest to Go in terms of batteries-included quality and the most ergonomic for a compiled language.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `evcxr` | `evcxr` | Third-party; reasonable quality |
+| Syntax / compile | `cargo check` | `cargo check` | Type-checks without linking; very fast |
+| Linting | `cargo clippy` | `cargo clippy -- -D warnings` | Ships in rustup; 700+ lints; highly actionable |
+| Type checking | built-in | `cargo check` | Always; Rust's type system is the primary safety tool |
+| Formatting | `rustfmt` | `cargo fmt` | Ships with rustup; canonical; enforced in most projects |
+| Test runner | `cargo test` | `cargo test` | Ships with cargo; captures output; parallel |
+| Single test | `cargo test` | `cargo test test_name` | String filter on test names |
+| Test watcher | `cargo watch` | `cargo watch -x test` | Watches source; reruns on change |
+| Coverage | `cargo tarpaulin`, `cargo llvm-cov` | `cargo tarpaulin` | LLVM-based; line + branch; lcov output |
+| Benchmarking | `cargo bench`, `criterion` | `cargo bench` | `criterion` gives statistical analysis |
+| Profiling | `cargo flamegraph`, `samply` | `cargo flamegraph` | Generates SVG flame graphs |
+| Debugging | `rust-gdb`, `rust-lldb`, `CodeLLDB` | `rust-gdb target/debug/app` | IDE-integrated via DAP |
+| Documentation | `cargo doc` | `cargo doc --open` | Extracts `///` doc comments; runs doctests |
+| Dependency mgmt | `cargo` | `cargo add serde` | `Cargo.lock`; deterministic; audit-able |
+| Build | `cargo build` | `cargo build --release` | Incremental; cross-compilation |
+| Fixture mgmt | `rstest`, `proptest` | `#[rstest]` attribute | Parameterized tests; property-based testing |
+| Fuzzing | `cargo fuzz` | `cargo fuzz run` | LibFuzzer integration |
+| Security scan | `cargo audit` | `cargo audit` | Checks advisory database for vulnerable deps |
+| Pre-commit hooks | `cargo fmt --check` + `cargo clippy` | (via `.pre-commit-config.yaml`) | Standard practice |
+| CI script | GitHub Actions + `cargo` | `cargo fmt --check && cargo clippy && cargo test` | Standard three-step pipeline |
+
+---
+
+### 2.5 Java
+
+Java has the most mature and enterprise-focused toolchain, with build systems that can feel heavyweight but provide comprehensive lifecycle management.
+
+| Category | Tool(s) | Command | Notes |
+|----------|---------|---------|-------|
+| Runtime / REPL | `jshell` | `jshell` | Ships with JDK since Java 9; reasonable REPL |
+| Syntax / compile | `javac`, `maven`, `gradle` | `mvn compile` | Compilation is syntax + type checking |
+| Linting (style) | `Checkstyle`, `PMD` | `mvn checkstyle:check` | Rule-based style enforcement; Google/Sun rules |
+| Linting (logic) | `SpotBugs`, `PMD`, `Error Prone` | `mvn spotbugs:check` | Detects null dereferences, resource leaks, etc. |
+| Type checking | built-in | `javac` | Strong static typing; checked at compile time |
+| Formatting | `google-java-format`, `Spotless` | `mvn spotless:apply` | Plugin-driven; enforces Google Java style |
+| Test runner | `JUnit 5`, `TestNG` | `mvn test` | Industry standard; rich annotations |
+| Single test | `Maven Surefire` | `mvn test -Dtest=MyTest#myMethod` | Class + method filter |
+| Test watcher | `JUnit Platform`, `fizzed-watcher` | (limited native support) | Less ergonomic than other ecosystems |
+| Coverage | `JaCoCo` | `mvn jacoco:report` | Line + branch + complexity; HTML + XML |
+| Benchmarking | `JMH` | (annotation-based) | JVM Microbenchmark Harness; industry standard |
+| Profiling | `JProfiler`, `async-profiler`, `JFR` | `jfr print recording.jfr` | JFR ships with JDK; async-profiler is excellent |
+| Debugging | `jdb`, IDE debuggers | `jdb` | JDWP protocol; universal IDE support |
+| Documentation | `Javadoc` | `mvn javadoc:javadoc` | Ships with JDK; standard `/** */` format |
+| Dependency mgmt | `Maven`, `Gradle` | `mvn dependency:tree` | `pom.xml` / `build.gradle`; central repository |
+| Build | `Maven`, `Gradle`, `Bazel` | `mvn package` | Full lifecycle management |
+| Fixture mgmt | `JUnit @BeforeEach`, `DBUnit` | `@BeforeEach void setup()` | Scoped; `@Nested` for grouping |
+| Mock/stub | `Mockito`, `EasyMock` | `@Mock MyService svc` | Industry-standard mocking framework |
+| Static analysis | `SonarQube`, `Error Prone` | `mvn sonar:sonar` | Enterprise-grade; technical debt tracking |
+| Security scan | `OWASP Dependency-Check` | `mvn dependency-check:check` | CVE database scanning |
+| Pre-commit hooks | `Maven enforcer`, `Checkstyle` | (via Maven lifecycle) | Bound to `validate` phase |
+| CI script | `mvn verify` | `mvn clean verify` | Runs compile + test + check + package |
+
+---
+
+## 3. The M Language Surface Across Implementations
+
+Before discussing toolchains, it is necessary to understand what the language itself looks like across implementations. A formatter, linter, or AST analyser has to know which features each engine implements; otherwise it cannot make portability claims.
+
+The numbers in this section are drawn directly from [`m-standard`](https://github.com/rafael5/m-standard), which reconciles the Annotated M Standard (ISO 11756 / ANSI X11.1-1995), the YottaDB documentation tree, and the InterSystems IRIS documentation site into a unified per-concept inventory.
+
+### 3.1 Concept-by-concept reconciliation
+
+| Concept | Total catalogued | In ANSI standard | YottaDB implements | IRIS implements | Implemented by **both** |
+|---------|------------------|------------------|--------------------|-----------------|-------------------------|
+| **Commands** | 82 | 40 | 50 | 47 | 29 |
+| **Intrinsic functions** | 159 | 28 | 60 | 119 | 26 |
+| **Intrinsic special variables (ISVs)** | 82 | 17 | 65 | 42 | 26 |
+| **Operators** | 17 | 16 | 17 | 16 | 16 |
+| **Pattern codes** | 7 | 7 | 7 | 7 | 7 |
+
+Two observations follow from the table:
+
+1. **The ANSI core is small, and full coverage is partial in both engines.** Of the 40 ANSI commands, neither YottaDB nor IRIS implements all of them — 14 ANSI commands (mostly the `ASTART`/`ASTOP`/`AUNBLOCK`/`ASSIGN` async-event family) are absent in both. Conformance is "ANSI minus a few legacy bits, plus a large layer of extensions."
+2. **IRIS extends the function library far more aggressively than YottaDB.** IRIS ships 93 intrinsic functions that YottaDB does not — `$BIT`, `$LISTBUILD`, `$LISTGET`, `$ZF`, `$ZHEX`, the `$WCHAR` family, and many more. YottaDB extends primarily through ISVs (39 YDB-only ISVs vs. 16 IRIS-only) and through its C-API.
+
+### 3.2 What's portable vs what isn't
+
+`m-standard` defines three layered standards beyond raw counts:
+
+| Standard | Definition | Count |
+|----------|------------|-------|
+| **Pragmatic** | Token implemented by both YottaDB and IRIS | 81 |
+| **VA SAC-clean** | Token permitted by VA Standards & Conventions / XINDEX | 65 rules / 171 per-name flags |
+| **Operational** | Pragmatic ∩ SAC-clean — i.e., what runs unmodified on both engines AND passes the VA's static-analysis rules | 58 |
+
+For a developer whose code must run on both engines, the language surface is roughly the **81 pragmatic** tokens. For a VistA developer it shrinks to **58 operational** tokens. The remaining ANSI commands — the parts of the standard that no engine implements — are dead surface that no portable program can use.
+
+### 3.3 Multi-vendor extensions (non-ANSI but in both engines)
+
+A small set of `Z*` tokens originated outside ANSI but were picked up by both major engines. These are de-facto cross-vendor extensions:
+
+| Concept | Tokens implemented by YDB and IRIS but not in ANSI |
+|---------|----------------------------------------------------|
+| Commands | `ZBREAK`, `ZKILL`, `ZPRINT`, `ZWRITE` |
+| Intrinsic functions | `$INCREMENT`, `$ZCONVERT`, `$ZDATE`, `$ZSEARCH`, `$ZWIDTH` |
+| Intrinsic special variables | `$X`, `$Y`, `$ZA`, `$ZB`, `$ZEOF`, `$ZERROR`, `$ZHOROLOG`, `$ZIO`, `$ZJOB`, `$ZMODE`, `$ZTRAP`, `$ZVERSION` |
+
+These are useful for a portability-minded toolchain because they expand the practical pragmatic surface from 81 to ~102 tokens — still well short of ANSI + every extension, but enough to cover most non-trivial diagnostic and I/O code.
+
+---
+
+## 4. The M Development Toolchain Across Implementations
+
+This chapter inventories what each implementation provides for **developing** M code (as distinct from running it). The structure mirrors §2's gold-standard categories so the comparison in §5 can be a direct row-by-row match.
+
+### 4.1 InterSystems IRIS
+
+IRIS sits at the commercial end of the spectrum. Tooling is comprehensive but largely proprietary, web/IDE-centric, and gated by licensing. Crucially for an M-portability analysis, **most IRIS tooling targets IRIS ObjectScript / IOS classes (`.cls`) — not pure MUMPS routines (`.m`).** Before listing the tooling, the next two subsections establish what IOS actually is and how it relates to ANSI MUMPS (note: IOS = IRIS ObjectScript = the language formerly branded as Caché ObjectScript / COS; see the [naming history](#naming-history-intersystems-mumps--caché-objectscript--iris-objectscript-ios) above), then enumerate the file types you will encounter in an IRIS source tree.
+
+#### 4.1.1 IRIS ObjectScript (IOS): what it is, and why it isn't ANSI standard MUMPS
+
+**IRIS ObjectScript (IOS)** — historically and technically still **Caché ObjectScript (COS)**, see the [naming history](#naming-history-intersystems-mumps--caché-objectscript--iris-objectscript-ios) above — is InterSystems' primary programming language. It is **not** ANSI standard MUMPS. It is a proprietary superset built on top of MUMPS that adds object orientation, embedded SQL, embedded Python, and a class-compilation phase. Pure ANSI MUMPS code runs under IOS (every ANSI command is also legal IOS) — but IOS code does **not** run on a pure ANSI MUMPS engine such as YottaDB.
+
+The terms **IOS**, **Caché ObjectScript**, **COS**, and (when InterSystems is the speaker) plain **ObjectScript** all refer to the same language. We use **IOS** in this document to disambiguate from Apple's iOS and to make the IRIS-attachment explicit.
+
+**What ObjectScript adds beyond ANSI MUMPS:**
+
+| Feature | ObjectScript example | Why it isn't ANSI MUMPS |
+|---------|----------------------|-------------------------|
+| **Classes** | `Class Pkg.Foo Extends %Persistent { Property X As %Integer; Method Bar() {...} }` | ANSI MUMPS has no class concept. Defined in `.cls` files, compiled by the class compiler. |
+| **Method dispatch syntax** | `set obj=##class(Pkg.Foo).%New()`, `do obj.Bar()`, `set y=obj.X`, `..Property` | The `obj.Method(args)` form lexically overlaps MUMPS dot-blocks (where a leading-dot line introduces a nested DO scope). The grammar disambiguates by context — but the disambiguation rules themselves are non-ANSI. |
+| **Embedded SQL** | `&sql(SELECT ID INTO :id FROM Pkg.Foo WHERE X=:val)` | The `&sql(...)` form is an ObjectScript-only construct that compiles to a prepared SQL plan. ANSI MUMPS has no SQL layer at all. |
+| **Embedded Python** | `Method M [Language=python] { ... Python code ... }` (since IRIS 2021.1) | Method-level language switching is an ObjectScript extension. ANSI MUMPS executes only MUMPS. |
+| **Macros** | `#include %occInclude`, `#define $$$Foo expr`, `$$$Foo` | Pre-processed before the routine compiler sees the code. ANSI MUMPS has no macro pre-processor. |
+| **Class-scoped operators** | `$this`, `..Property`, `##super(...)`, `##class(...)`, `%this` | Tokens unique to the class-compilation model. None exist in ANSI. |
+| **Property / parameter typing** | `Property X As %Integer (MAXVAL=100)` | ANSI MUMPS is untyped — every value is a string until coerced. ObjectScript's class properties carry compile-time type metadata. |
+
+**How ObjectScript compiles internally:**
+
+```
+.cls source (ObjectScript class)
+        │
+        │   class compiler
+        ▼
+.int file (intermediate routine — generated, MUMPS-shaped code)
+        │
+        │   routine compiler  (also runs on hand-written .mac and .m)
+        ▼
+object code  (executed by the IRIS runtime)
+```
+
+A `.cls` file is **compiled into one or more `.int` (intermediate) routines** by the class compiler. Those `.int` routines look superficially like MUMPS but use ObjectScript-specific tokens (`$this`, `&sql`, `..Property`, etc.) that a pure-ANSI parser cannot accept. The `.int` routines are then run through the routine compiler to produce object code. A hand-written `.mac` (macro routine) or `.m` (ANSI routine) skips the class compiler — it goes straight through the macro pre-processor (for `.mac`) and into the routine compiler.
+
+This means ObjectScript is, in effect, **a higher-level language that transpiles down to a MUMPS-flavoured intermediate form**, and then to object code. It is not "MUMPS with extensions" in the same sense that GNU C is "ISO C with extensions" — it is a separate language with its own grammar, semantics, and compilation pipeline that happens to share a runtime with MUMPS.
+
+**Why ObjectScript is not part of the ANSI standard:**
+
+1. **The standard pre-dates the OO additions.** ANSI X11.1-1995 / ISO 11756 defines MUMPS as a procedural, untyped language with hierarchical key-value globals. The standard has not been revised to incorporate classes, methods, embedded SQL, or macros. ObjectScript is InterSystems-proprietary; the ANSI committee did not adopt it.
+2. **The grammars are incompatible.** A pure-ANSI MUMPS parser cannot parse a typical `.cls` file. The `Class`, `Property`, `Method`, `Parameter`, `Index`, and `Storage` keywords — and the `&sql(...)`, `&js<>...`, `##class(...)`, `..Property`, `$this` tokens — have no ANSI definition.
+3. **The compilation model is different.** ANSI MUMPS is routine-based: the `.m` file is the unit of compilation. ObjectScript adds a class-compilation phase that generates `.int` routines from `.cls` definitions. The class layer has no ANSI counterpart.
+4. **Embedded SQL and embedded Python are out of scope for ANSI MUMPS.** They are first-class in ObjectScript but have no place in the ANSI grammar or runtime model.
+
+**Bottom line:** ObjectScript is a different language built on top of MUMPS. For the rest of this document, "MUMPS code" means ANSI-flavoured M (`.m` source, or `.mac` source restricted to ANSI features) that runs on any conformant engine. "ObjectScript code" means IRIS-extended code (`.cls` classes, or `.mac` routines using ObjectScript tokens) that runs only on IRIS. **A tool that handles ObjectScript does not necessarily handle MUMPS, and vice versa.**
+
+#### 4.1.2 File extensions in IRIS source code
+
+| Extension | Contents | Language | Notes |
+|-----------|----------|----------|-------|
+| `.cls` | ObjectScript class definition | ObjectScript | Compiled by the class compiler. Generates `.int` routines. **Not parseable as ANSI MUMPS.** |
+| `.mac` | Macro routine — most common form of routine code | ObjectScript or MUMPS | Can use plain MUMPS or ObjectScript routine syntax. Macros (`$$$Foo`) are expanded before compilation. The bulk of "routine-layer" IRIS code lives here. |
+| `.int` | Intermediate routine — post-macro-expansion | ObjectScript or MUMPS | Auto-generated from `.cls` (always) and `.mac` (after macro expansion). Editable but not the canonical source-of-truth. |
+| `.inc` | Macro / include definitions | ObjectScript | `#define $$$Foo expr`, included by `.mac` and `.cls` via `#include`. |
+| `.m` | ANSI MUMPS routine | MUMPS (ANSI) | Recognised on import; **stored internally as a MAC routine.** Rarely the source-of-truth in IRIS-native projects. |
+| `.bas`, `.mvb`, `.mvi` | Caché Basic / MultiValue Basic | Basic | Legacy. Out of scope here. |
+| `.csp` | Caché Server Pages | Mixed (HTML + ObjectScript) | Server-side templating; legacy. |
+| `.dfi`, `.lut`, `.pivot` | DeepSee / Analytics artefacts | XML metadata | Out of scope for source-code analysis. |
+| `.xml`, `.gof` | Export bundle formats | Wrapper | Used for source-control export and database import/export, not as direct edit targets. |
+
+In a typical IRIS-native project, source code is overwhelmingly `.cls` (ObjectScript classes) plus some `.mac` (routines, usually with ObjectScript tokens). **Pure `.m` ANSI MUMPS files are uncommon** — they appear mainly in projects that maintain cross-engine portability with YottaDB or in legacy VistA imports.
+
+#### 4.1.3 IRIS tooling, by file scope and language
+
+The table annotates each IRIS tool with **three** orthogonal axes — the file scope (which IRIS source format it touches), and whether the tool genuinely supports MUMPS code, IOS code, or both. **File scope and language are not the same thing:** a `.mac` routine is a *container*, not a *language*. The same `.mac` slot can hold either pure ANSI MUMPS or hand-written IOS, and importing a `.m` file into IRIS produces a MAC routine without changing the source language one byte. A tool that "operates on `.mac` files" may or may not actually understand MUMPS — it depends on whether the tool requires IOS constructs (`$this`, `..Property`, `&sql(...)`, `##class()`, `///` doc comments, `$$$Foo` macros) to do useful work.
+
+**File scope** — what kind of source format does the tool ingest?
+- **`.cls`** — IOS class file
+- **`.mac/.int`** — routine-layer file (any language)
+- **`.m`** — ANSI MUMPS native source
+- **engine** — operates on compiled bytecode, the database, or the engine itself
+
+**Language columns** — does the tool genuinely support each language?
+- <span style="color:#22863a;font-weight:bold">✔</span> — yes, the tool supports this language. (Where support is degraded for one language vs the other, the Notes column elaborates.)
+- <span style="color:#cb2431;font-weight:bold">✘</span> — no, the tool does not meaningfully support this language (or no tool exists in this row).
+
+**IOS = IRIS ObjectScript** (formerly Caché ObjectScript / COS); see [§4.1.1](#411-iris-objectscript-ios-what-it-is-and-why-it-isnt-ansi-standard-mumps) for the language definition. The IOS column comes first, since IOS is the engine's primary developer-facing language; MUMPS is second to make the IOS-vs-MUMPS asymmetry visible at a glance.
+
+| Category | What ships with IRIS | File scope | IOS | MUMPS | Notes |
+|----------|----------------------|------------|:---:|:-----:|-------|
+| Runtime / REPL | `iris session`, `iris terminal` | `.mac/.int`, engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Interactive prompt accepts either IOS or MUMPS commands; limited history. |
+| IDE | **VS Code ObjectScript extension** | `.cls`, `.mac/.int`, `.inc` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | **By definition an IOS extension** — the IntelliSense, completion, navigation, refactoring, and class-aware features all target IOS. `.m` files can be opened, but the extension provides no MUMPS-aware features for them; the experience is a bare text editor. (InterSystems Studio, the now-deprecated Windows-only IDE, had the same posture.) |
+| Class compiler | `$SYSTEM.OBJ.Compile`, `##class(%SYSTEM.OBJ).Compile` | `.cls` only | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Compiles classes into `.int` routines. **Cannot run on `.m` files** — there is no class to compile. |
+| Routine compiler | Implicit on first reference | `.mac/.int`, `.m` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Compiles whatever routine code is loaded — IOS or MUMPS — to object code. Reports MUMPS-level syntax errors. Language-neutral within the MUMPS family. |
+| Linting | VS Code extension diagnostics | `.cls`, `.mac` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Surfaces class-compile errors and IOS-only checks; no MUMPS-aware linting. (`^XINDEX` is **not** IRIS-shipped — it is a VA-provided community package; see [§4.3](#43-common-across-both-engines).) |
+| Formatting | **None official** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | No `gofmt`/`prettier` analogue at any layer or for any language. |
+| Test runner | **`%UnitTest`** framework | `.cls` only | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Requires extending `%UnitTest.TestCase` (an IOS class). **Cannot test a `.m` routine directly** — only via a class wrapper that calls into the routine. |
+| Coverage | `%UnitTest.Coverage` (line-level) | `.mac/.int` (instrumented) | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Instruments compiled routines regardless of source language. The instrumentation sees compiled bytecode, not source — it does not care whether the routine was originally IOS or MUMPS. **Driver, however, must be a `%UnitTest` class** (IOS). |
+| Benchmarking | **None standardised** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Ad-hoc `$ZHOROLOG`. |
+| Profiling | **`^%SYS.MONLBL`** (line-by-line); `^pButtons` / `^SystemPerformance` | `.mac/.int`, engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Profiles compiled routines. Language-neutral: runs equally well over a hand-written MUMPS routine, an imported `.m`, or an `.int` generated from `.cls`. |
+| Debugging — terminal | `ZBREAK`, `ZSTEP`, `ZSHOW` | engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Pure runtime primitives. Work on any compiled routine regardless of source language. |
+| Debugging — IDE | Studio debugger, VS Code debugger via DAP | `.cls`, `.mac/.int`, `.m` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Underlying step/breakpoint mechanics are engine-level, but UI affordances (variable inspection, expression evaluation, source mapping) are tuned for IOS classes. Pure MUMPS works in degraded form. |
+| Documentation | **Documatic**, `class.View()` | `.cls` only | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Requires `///` doc comments **and** class-level metadata. **No documentation generator exists for MUMPS routine code at all**, regardless of file scope. |
+| Dependency mgmt | **IPM** (InterSystems Package Manager, formerly **ZPM**) | `.cls`-centric | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | `module.xml` manifest treats classes as the unit of distribution. A MUMPS-only project has no natural manifest unit. |
+| Build / tasks | `$SYSTEM.OBJ.LoadDir`, `Installer.cls`, Makefile around `iris session` | `.cls`, `.mac/.int` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | The IRIS build / install pattern is **`Installer.cls`** — by definition a class-based, IOS-only construct. There is no MUMPS-routine-shaped equivalent. Raw `iris session` invocations can be wrapped in a Makefile to load `.m` files manually, but that is a hand-rolled bypass of the IRIS build model, not first-class MUMPS support. |
+| Source control | `%Studio.SourceControl.*`; community git integrations | `.cls`, `.mac/.int` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | **IRIS stores all routines and classes inside its proprietary database (in a global), *not* on the filesystem.** This is the single most consequential workflow detail in the IRIS column. Source control therefore requires an explicit **export → filesystem → git → import** round-trip on every iteration: the developer edits inside IRIS, exports to `.cls` / `.mac` / `.m` text, commits to git, and on the consuming side imports the text back into the IRIS database before the code can run. There is **no filesystem-resident development model** comparable to Python, Go, Rust, or YottaDB (where `.m` source files on disk *are* the routines). The export/import dance is a serious productivity drag for both languages, but it is **worse for MUMPS**: the `%Studio.SourceControl.*` hooks and the VS Code extension's server-side editing are designed around IOS classes, leaving MUMPS-routine round-trips largely manual. Day-to-day MUMPS development on IRIS effectively reduces to: dump routines from the global → version-control on git → re-load to the global to test. |
+| Pre-commit hooks | **None standardised** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Ad-hoc. |
+| CI script | Docker image + `iris session` | `.cls`, `.mac/.int` | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Official Docker images make CI feasible regardless of source language. The harness inside (class compile, `%UnitTest`) is IOS-shaped. |
+| Snapshot testing | **None standardised** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Ad-hoc within `%UnitTest`. |
+| Foreign-language API | **Native API** (.NET, Java, Python, Node.js) + embedded Python | `.cls`, engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | Native API calls dispatch through IOS methods; embedded Python lives inside IOS methods. **Not callable from a pure `.m` routine.** |
+| System administration | **System Management Portal (SMP)** | engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Web admin UI: namespaces, users, journal, replication. Independent of source language. |
+| Database export / import | `$SYSTEM.OBJ.Export/Import`, journal replication | engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Container/engine level. Bundles whichever routines and classes exist in the namespace. |
+| Embedded SQL | `&sql(...)` blocks | `.cls` only | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#cb2431;font-weight:bold">✘</span> | IOS-only construct. Cannot appear in an ANSI MUMPS routine. |
+| Containerised deployment | Official Docker images, Kubernetes kits | engine | <span style="color:#22863a;font-weight:bold">✔</span> | <span style="color:#22863a;font-weight:bold">✔</span> | Engine-level packaging. |
+
+**Key observation about IRIS tooling and language:**
+
+- **Language-aware tooling in IRIS is overwhelmingly ObjectScript-targeted.** The class compiler, `%UnitTest`, Documatic, IPM, Studio's smart features, the VS Code extension's IntelliSense, the Native API, embedded Python, and embedded SQL all require ObjectScript constructs to be useful.
+- **`^XINDEX` does not ship with IRIS at all.** It is a VistA Toolkit routine (pure M source, from the VA's Kernel package) that happens to be present in any IRIS-based VistA installation because VistA itself brings it. The same routine runs identically on YottaDB. It is the closest thing to a MUMPS-aware linter in the M ecosystem today, but it is a VistA artefact, not a vendor tool of either engine.
+- **The remainder are engine-level**: the routine compiler, `^%SYS.MONLBL`, `%UnitTest.Coverage`'s instrumentation, `ZBREAK`/`ZSHOW`/`ZSTEP`, journal export/import, and SMP. These are language-neutral because they operate below the language layer (compiled bytecode, the database, or the running process). They work on a 40,000-routine VistA codebase as readily as on an ObjectScript application — but they tell you nothing about the source language and provide no source-language-aware guidance.
+
+The MAC-routine container does not magically transform MUMPS code into ObjectScript. Importing a `.m` routine into IRIS produces a MAC routine slot that holds *MUMPS code*, and tools that genuinely understand MUMPS (essentially `^XINDEX` only, and only when VistA — or a standalone Toolkit install — is present) treat it as MUMPS. Tools that require ObjectScript constructs simply have nothing to do with that routine.
+
+**The IRIS toolchain in one sentence:** comprehensive, IDE-centric, gated by commercial licensing, and **ObjectScript-targeted at the language layer** — strong for class-based development, language-neutral at the engine layer (profiler, admin UI, journal), and offering **no first-party MUMPS-aware tooling at all** (the `^XINDEX` static analyser, often cited in this context, is a VistA Toolkit routine — not an InterSystems tool).
+
+### 4.2 YottaDB
+
+YottaDB sits at the open-source end. The runtime is feature-complete and POSIX-compliant; the **C API (`libyottadb.so`)** is the principal extensibility surface, and most non-M language support comes through bindings on top of it. There is no first-party IDE.
+
+The table below mirrors the format of [§4.1.3 (IRIS tooling)](#413-iris-tooling-by-file-scope-and-language) so the two engines can be compared row-by-row. Because YottaDB is **MUMPS-only** — there is no IOS-equivalent layer to score separately — the language column is just **MUMPS**.
+
+**File scope** — what kind of source format or runtime layer does the tool touch?
+- **`.m`** — ANSI MUMPS routine source
+- **engine** — operates on compiled bytecode, the database, or the engine itself
+- **bindings** — operates via the C API with host-language bindings (`libyottadb.so`)
+- **—** — no tool exists in this category
+
+**MUMPS column** — does the tool genuinely support MUMPS code?
+- <span style="color:#22863a;font-weight:bold">✔</span> — yes, the tool supports MUMPS code (or is engine-level / language-neutral and applies to MUMPS).
+- <span style="color:#cb2431;font-weight:bold">✘</span> — no first-party tool exists. (Community-supplied tools — `^XINDEX`, `%ut`, KIDS — are inventoried in [§4.3](#43-common-across-both-engines), not here.)
+
+| Category | What ships with YottaDB | File scope | MUMPS | Notes |
+|----------|-------------------------|------------|:-----:|-------|
+| Runtime / REPL | `ydb` direct mode; `ydb -run %XCMD "code"` | engine | <span style="color:#22863a;font-weight:bold">✔</span> | REPL is bare: no history, no completion, no multi-line editing. `%XCMD` is the foundation of all shell wrappers. |
+| Syntax check / routine compiler | `ZCOMPILE` via `%XCMD` | `.m` | <span style="color:#22863a;font-weight:bold">✔</span> | Compile-only; reports syntax errors. No type system. |
+| Linting | **None** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No analogue to `ruff` / `clippy`. (`^XINDEX` is community-supplied — see [§4.3](#43-common-across-both-engines).) |
+| Formatting | **None** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No `gofmt` analogue. |
+| Test runner | **None first-party** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | OSEHRA `%ut` (M-Unit) is the de-facto community framework — see [§4.3](#43-common-across-both-engines). |
+| Coverage | **None first-party** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | `ZBREAK`-based community implementations exist; nothing canonical. |
+| Benchmarking | `$ZHOROLOG` primitive only | engine | <span style="color:#cb2431;font-weight:bold">✘</span> | A microsecond timer is available; no `criterion`-style harness. |
+| Profiling | **None integrated** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No analogue to IRIS's `^%SYS.MONLBL`. Some signals derivable from journals / triggers. |
+| Debugging — terminal | `ZBREAK`, `ZSTEP INTO/OVER/OUTOF`, `ZSHOW "V/G/L/S/A"`, `ZWRITE`, `ZPRINT`, `ZCONTINUE`, `ZGOTO`, `$STACK`, `$ZPOSITION` | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Powerful but interactive and manual. |
+| Debugging — IDE / DAP | **None** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No first-party IDE or DAP server. |
+| Documentation | **None integrated** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | Comments in source; no `godoc` analogue ships with YDB. |
+| Dependency mgmt | **None** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No package manager. Source is shipped as `.m` files manually. (KIDS is community-supplied — see [§4.3](#43-common-across-both-engines).) |
+| Build / tasks | **None integrated** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | No build system. Teams roll their own with `make` + `ydb -run`. |
+| Source control | Plain git on the routine directory | `.m` | <span style="color:#22863a;font-weight:bold">✔</span> | **Filesystem-resident:** `.m` source files on disk *are* the routines. No export / import dance — git just works on the routine directory. **Direct contrast with IRIS**, where routines live inside the database and source control requires an export → git → import round-trip on every iteration (see [§4.1.3 source-control row](#413-iris-tooling-by-file-scope-and-language)). |
+| Pre-commit hooks | **None integrated** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | Standard git hooks are usable; nothing M-aware. |
+| CI script | **None integrated** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | Standard CI runners over `make` + `ydb -run`. |
+| Snapshot testing | **None** | — | <span style="color:#cb2431;font-weight:bold">✘</span> | Ad-hoc. |
+| Foreign-language API | **`libyottadb.so` C API** + bindings: Go (official), Python (official), Node.js (community), Rust (community), Lua (community), Perl (community) | engine + bindings | <span style="color:#22863a;font-weight:bold">✔</span> | Stable C API is the major extensibility win. Calls are bidirectional: M ↔ host language. See [§4.4](#44-foreign-language-integration-embedded-language-vs-embedded-database). |
+| Database management | **`mupip`** (extract / load / size / integ / backup / restore / rundown / journal / set / trigger / freeze / replicate) | engine | <span style="color:#22863a;font-weight:bold">✔</span> | The single most powerful and underused YDB utility. `mupip extract` / `load` are the foundation of any fixture management system. |
+| Global directory | **`gde`** | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Configures which globals live in which database files. |
+| Lock examination | **`lke`** | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Inspect and forcibly clear `LOCK` entries from crashed processes. |
+| Database structure / recovery | **`dse`** | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Block-level recovery editor. Dangerous; recovery scenarios only. |
+| Utility routines | **`%GO`**, **`%GI`**, **`%GSEL`**, **`%RD`**, **`%RSEL`**, **`%ZDATE`**, **`%ZCRC`**, **`%ZMVALID`**, **`%XCMD`**, **`%ZTRIGGER`** | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Ship inside `$YDB_DIST`. Cover global I/O, routine listing, date/time, CRC, identifier validation, trigger management. |
+| SQL | **Octo** (separate package) | separate runtime | <span style="color:#cb2431;font-weight:bold">✘</span> | A SQL-on-YottaDB layer — *not part of the core distribution*. Roughly comparable in surface to IRIS's embedded SQL but sits as a separate runtime. |
+| Containerised deployment | Community Docker images; YottaDB AWS / GCP marketplace listings | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Adequate; less polished than IRIS's official kits. |
+| System administration — web UI | **YDBGUI** ([gitlab.com/YottaDB/UI/YDBGUI](https://gitlab.com/YottaDB/UI/YDBGUI)) — Vue.js front-end on an **M backend**, served by the **YDB Web Server** plugin. Companion projects: **YDBGDEGUI** (Global Directory Editor GUI) and **YDBAdminOpsGUI** (admin / ops dashboard). | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Shipped with YottaDB r1.36 (2022). Dashboard, real-time process / global statistics, database admin and monitor surfaces. Younger and narrower in scope than IRIS's SMP, but actively developed. The fact that the backend is itself written in M is a deliberate "M-first" architectural choice. |
+| System administration — shell | `mupip`, `gde`, `lke`, `dse`, environment variables, `.envrc`-style setup | engine | <span style="color:#22863a;font-weight:bold">✔</span> | Composable POSIX surface; fully scriptable. Unfamiliar to non-Unix admins, which is what motivated YDBGUI. |
+
+**The YottaDB toolchain in one sentence:** mature, open-source, runtime-first, and POSIX-composable — strong on the engine / database / admin layer (mupip, gde, lke, dse, YDBGUI, the C API), with **filesystem-resident routines** that play naturally with standard git and POSIX tooling — but most developer-experience layers (linter, formatter, IDE, test runner, coverage, docs, package manager) are simply absent and have to be built or sourced from the community.
+
+### 4.3 Common across both engines
+
+A handful of capabilities are consistently available either because they come from the M language itself or because they ship as third-party / community / VistA-provided M source that runs on any conformant engine:
+
+**From the M language itself**
+
+- **Interactive direct mode** with manual breakpoints (`ZBREAK`) and inspection (`ZSHOW`, `ZWRITE`).
+- **Microsecond timing** via `$ZHOROLOG` — sufficient for hand-rolled benchmarks.
+- **Trigger primitives** (`MUPIP TRIGGER` in YDB, `%CSP.UI`-managed triggers in IRIS) for instrumentation patterns that don't require a separate profiler.
+- **Plain-text routine source** that any text editor or CI runner can handle, even without M-aware tooling.
+- **A stable, decades-old language standard** — code written against the ANSI / pragmatic core ports between engines without rewrites.
+
+**From the VA / VistA ecosystem (not vendor-shipped)**
+
+- **`^XINDEX`** — the VistA Toolkit static analyser (the 17 `XINDX*` routines from the VA Kernel package, in `WorldVistA/VistA-M`). Pure M source; runs identically on YottaDB and IRIS. The closest thing to a MUMPS-aware linter that exists today: it understands ANSI MUMPS, enforces the VA Standards & Conventions (SAC) rule set, and rejects ObjectScript-only constructs as non-conformant. Bundled with any VistA deployment; can also be installed standalone.
+- **OSEHRA `%ut`** (M-Unit) — the de-facto community testing framework for pure MUMPS routines. Pure M source; runs on either engine.
+
+**From the OSS community (cross-engine)**
+
+- **OSEHRA / WorldVistA tooling** — additional M-source utilities for code archaeology, indentation, and routine inspection. Quality varies; not vendor-supported.
+
+---
+
+### 4.4 Foreign-language integration: "embedded language" vs "embedded database"
+
+The §4.1 and §4.2 tables list foreign-language integration as a single row, but the two engines take **architecturally inverse approaches** that deserve a closer look. InterSystems markets IRIS's runtime hosting of Python (since 2021.1) under the label **"Embedded Python"**. YottaDB ships a stable C API (`libyottadb.so`) with first-party bindings for Python and Go, plus community bindings for Rust, Node.js, Lua, and Perl. Both put the foreign language and the M engine in the same OS process. Neither pays IPC or serialisation overhead. **They are not the same thing.**
+
+#### "Embedded" is a technical term of art, not marketing
+
+Embedding a language interpreter into a host program is a well-established architectural pattern with decades of precedent: Tcl was designed for embedding in the late 1980s; Lua's defining design property is that it is embedded into host applications; CPython's C API explicitly distinguishes [embedding](https://docs.python.org/3/extending/embedding.html) from extending; V8 is embedded in Chrome, Node.js, and Deno; SQLite is the canonical embedded database. The term is not marketing.
+
+**Definition (technical):** *an embedded runtime is one that is loaded as a library into another program's address space, runs under that program's process and threading control, and exposes an API the host can use to evaluate code, call functions, and exchange data.* The host owns the lifecycle; the embedded runtime is the guest.
+
+#### The architectural inversion
+
+The difference between the two integration models is **which side is the host**:
+
+| Aspect | IRIS Embedded Python | YottaDB C-API integration |
+|--------|---------------------|---------------------------|
+| Host process | IRIS server | The user's Python (or Rust, Go, Lua, …) program |
+| Embedded runtime | CPython, loaded into the IRIS process | YottaDB, loaded as `libyottadb.so` into the user's program |
+| Who owns lifecycle, threading, scheduling | IRIS | The host language's program |
+| Inline source-file integration | Python in `.cls` methods (`[Language=python]`); SQL via `&sql(...)` | None — host language has its own source files; M routines invoked via `ydb_ci()` (call-in) |
+| Cross-language call mechanism | ObjectScript ↔ Python proxy layer | Direct C FFI: `ydb_set_s`, `ydb_get_s`, `ydb_subscript_next_s`, etc. |
+| Foreign-language package ecosystem | Operates inside an IRIS-managed environment | Normal PyPI / crates.io / npm / Go module proxy |
+| Tooling for the foreign language | pdb, pip, IDE debug all IRIS-mediated; unusual for a Python developer | Host's normal tooling works unchanged |
+| Threading model | IRIS owns threads; Python GIL applies | Host owns threads; YDB calls are thread-safe |
+| Licence posture | Commercial IRIS licence required | AGPL-3.0 on YDB; host language unconstrained |
+| Foreign-language coverage | Python only (+ SQL, JavaScript via CSP) | Any C-FFI-capable language — currently 10+ ecosystems |
+| Foreign object persistence | Python objects can BE IRIS objects (auto-mapped to globals via class storage) | Host objects are not persisted — globals are a separate, deliberate API surface |
+
+Both architectures have decades of precedent in other systems:
+
+- **IRIS's model** — a database server that hosts a foreign-language interpreter — is the same shape as **PostgreSQL's PL/Python**, **Oracle's Java stored procedures**, and **SQL Server's CLR integration**. The database drives.
+- **YottaDB's model** — the database as a library linked into the host program — is the same shape as **SQLite**, **RocksDB**, **LevelDB**, **LMDB**, and **DuckDB**. These are routinely described as *embedded databases*. The host language drives.
+
+So **both models embed something in something else**. IRIS embeds a language inside the database; YottaDB embeds the database inside a language. The labels "Embedded Python" (IRIS) and "C-API bindings" (YDB) describe these inverse architectures.
+
+#### A terminology test
+
+The conventional **`X-on-Y`** idiom for software stacks names the data foundation last: *Python-on-Postgres*, *Rails-on-Postgres*, *Django-on-MySQL*. The application is `X`; the database is `Y`.
+
+- **YottaDB's model fits this idiom cleanly.** A Python application using YottaDB as its data layer is **Python-on-YDB**. Same for **Rust-on-YDB**, **Go-on-YDB**, **Node-on-YDB**, **Lua-on-YDB**. The host language is the application; YDB is the data foundation. Stack idiom works.
+- **IRIS's model breaks the idiom.** IRIS is not a database that Python applications run on top of — IRIS is a host runtime that *encapsulates* Python as an embedded guest interpreter. Neither *Python-on-IRIS* nor *IRIS-on-Python* captures the relationship correctly. The natural phrasing is **"IRIS encapsulating Python"** (or *"Python embedded in IRIS"*, *"IRIS hosting Python"*).
+
+The fact that `X-on-Y` works for one model and not the other is itself diagnostic of the architectural inversion. Stack idioms describe relationships of dependency-on-foundation; encapsulation idioms describe relationships of host-and-guest. The two models are different *shapes*, not just different orientations of the same shape, and the language we naturally reach for reflects that.
+
+#### Performance
+
+Both models are in-process and avoid IPC / serialisation. For pure compute, neither has an architectural advantage. The dominant cost is **cross-language call overhead** (typically sub-microsecond per call on modern hardware), and that cost is comparable in both models. Workloads that respect the "do bulk work on one side of the boundary, then cross" rule perform well in either model; chatty workloads suffer in both.
+
+What dominates real workloads differs:
+
+- **Python-on-YDB:** a typical hot loop traverses an M global from Python (`for sub in ydb.subscripts("^Patient", []): …`). Each iteration crosses the FFI boundary once. Performance is dominated by *number of round-trips*, which binding authors can amortise by exposing bulk / iterator APIs.
+- **IRIS Embedded Python:** a typical hot loop iterates over an ObjectScript class's properties (`for prop in obj.Properties: …`). Each iteration crosses the proxy layer once. Performance is dominated by the proxy-cache hit rate.
+
+Neither architecture is faster in the abstract; the question is whether your workload's natural API shape lines up with the model's strengths.
+
+#### Implications for portability
+
+When evaluating whether code is "portable" between IRIS and YottaDB:
+
+- **A `.cls` file using Embedded Python or `&sql(...)`** is not portable — both the class layer and the embedded constructs are ObjectScript-specific.
+- **A `.m` file plus a Python program that calls into it via `libyottadb`** is portable *in spirit* — the M code runs on either engine, and the Python program could in principle call IRIS via the Native API instead. The two paths use different APIs (and different licences), but the M source itself is unchanged.
+- **A `.m` file containing only ANSI / pragmatic MUMPS** runs on either engine; the question of which language hosts it is orthogonal.
+
+This asymmetry is what the §5 matrix's "Embedded other language" row captures: IRIS's `.cls` ⬤ entry reflects inline foreign code in source files; YottaDB's ◯ entry reflects that you cannot write Python directly inside a `.m` file. But the equivalent capability — *calling Python from M, or calling M from Python* — exists on both engines via different mechanisms, and that broader capability is what the "Foreign-language API" row tracks.
+
+---
+
+### 4.5 Polyglot routines vs C-API separation: a quality / maintainability analysis
+
+§4.4 establishes that the two integration architectures put the foreign language and the M engine in the same OS process and have comparable runtime performance. The substantive question is therefore not *which is faster?* but **what happens to the foreign-language code over the project's lifetime**: how it is reviewed, tested, refactored, deployed, debugged, and handed off to new contributors. That is what determines whether a system is good to work on five years from now.
+
+This subsection compares the two models on the dimensions that matter to developers and maintainers — code quality, efficiency, maintainability, and integration with modern CI/CD lifecycles.
+
+#### The two models, concretely
+
+**Polyglot routine (IRIS):** ObjectScript class with embedded Python and embedded SQL in the same `.cls` file:
+
+```objectscript
+Class MyApp.Patients Extends %Persistent {
+  Property ID As %Integer;
+
+  ClassMethod Process(id As %Integer) As %Status [Language = python] {
+    import pandas as pd
+    df = pd.read_sql("SELECT * FROM Patients WHERE ID = ?", iris.connect(), params=[id])
+    # ... pandas operations ...
+  }
+
+  Method Save() As %Status {
+    &sql(INSERT INTO Patients VALUES (:..ID))
+    quit $$$OK
+  }
+}
+```
+
+Three grammars — ObjectScript, Python, embedded SQL — in one file.
+
+**C-API separation (YottaDB):** pure M on one side, pure Python project on the other, joined by a stable C ABI:
+
+```mumps
+; routines/patients.m
+patients ;
+process(id)
+  set rec=$get(^Patient(id))
+  quit rec
+```
+
+```python
+# patient_service/main.py — separate project, normal layout
+import yottadb
+import pandas as pd
+
+def process(patient_id: int) -> dict:
+    record = yottadb.get(("^Patient", str(patient_id)))
+    df = pd.DataFrame([record])
+    return df.to_dict()
+```
+
+Two languages, two source trees, joined by the YDB C-API.
+
+#### Comparison matrix
+
+**Symbol convention** (cell-by-cell, scoring each side independently): <span style="color:#22863a;font-weight:bold">✔</span> = capability present and works as expected for that side · <span style="color:#cb2431;font-weight:bold">✘</span> = capability absent or materially degraded for that side. A row with <span style="color:#22863a;font-weight:bold">✔</span>/<span style="color:#22863a;font-weight:bold">✔</span> means both architectures handle the dimension well; a row with <span style="color:#22863a;font-weight:bold">✔</span>/<span style="color:#cb2431;font-weight:bold">✘</span> or <span style="color:#cb2431;font-weight:bold">✘</span>/<span style="color:#22863a;font-weight:bold">✔</span> shows a substantive split.
+
+| Dimension | C-API separation (YDB) | Polyglot (IRIS) |
+|-----------|------------------------|-----------------|
+| **Linter / formatter for foreign code** | <span style="color:#22863a;font-weight:bold">✔</span> Full — Python lives in `.py`; standard tools work without modification | <span style="color:#cb2431;font-weight:bold">✘</span> None — `ruff`, `black`, `mypy`, `pylint` cannot read `.cls` files; embedded Python is invisible to its own ecosystem |
+| **Type checker for foreign code** | <span style="color:#22863a;font-weight:bold">✔</span> Full — `mypy --strict` works; binding types (e.g., `yottadb-rs`'s typed `Result<T, E>`) carry contracts across the boundary | <span style="color:#cb2431;font-weight:bold">✘</span> Effectively absent — `mypy` / `pyright` don't see embedded Python; OS↔Python seam errors surface only at runtime |
+| **Test framework for foreign code** | <span style="color:#22863a;font-weight:bold">✔</span> Native — `pytest` / `cargo test` / `go test`, with mocked or real YDB calls | <span style="color:#cb2431;font-weight:bold">✘</span> `%UnitTest` only — testing embedded Python in isolation requires either (a) wrapping in an OS test class or (b) extracting to `.py` (which defeats the polyglot integration) |
+| **Code review (PRs)** | <span style="color:#22863a;font-weight:bold">✔</span> Single-language PRs; each can be reviewed by domain experts | <span style="color:#cb2431;font-weight:bold">✘</span> Multi-grammar diffs; reviewer context-switches between OS, Python, and SQL |
+| **Refactoring tools** | <span style="color:#22863a;font-weight:bold">✔</span> Full — `gopls rename`, `rust-analyzer rename`, PyCharm refactor all work normally | <span style="color:#cb2431;font-weight:bold">✘</span> None — PyCharm / VS Code Python extensions can't refactor inside `.cls`; ObjectScript-specific tools don't understand Python |
+| **IDE / completion / navigation** | <span style="color:#22863a;font-weight:bold">✔</span> Full — each language gets its native IDE support | <span style="color:#cb2431;font-weight:bold">✘</span> Degraded — Studio and the VS Code ObjectScript extension treat embedded Python as opaque text inside a class |
+| **Dependency management** | <span style="color:#22863a;font-weight:bold">✔</span> Standard — `pip` / `uv` / `poetry` / `cargo` / `go mod` with lockfiles | <span style="color:#cb2431;font-weight:bold">✘</span> IRIS-managed Python environment; not the standard PyPI flow; conflicts with system Python / virtualenvs |
+| **CI / CD** | <span style="color:#22863a;font-weight:bold">✔</span> Standard — `pytest` in a 50 MB Python container, `cargo test` in `rust:slim`; M-side runs separately in YDB | <span style="color:#cb2431;font-weight:bold">✘</span> Requires an IRIS container (commercial; ~GB image) to compile and test the class; pipelines are IRIS-specific |
+| **Documentation** | <span style="color:#22863a;font-weight:bold">✔</span> Full — Sphinx / mkdocs / `pdoc` for Python, `godoc` for Go, `rustdoc` for Rust | <span style="color:#cb2431;font-weight:bold">✘</span> Documatic extracts OS class headers and `///` comments; embedded Python is invisible to it |
+| **Static analysis (security, complexity, dead code)** | <span style="color:#22863a;font-weight:bold">✔</span> Full — `bandit`, `gosec`, `cargo-audit` operate on each side as designed | <span style="color:#cb2431;font-weight:bold">✘</span> None across the OS / Python boundary; foreign-language tools don't see embedded code |
+| **Debugger** | <span style="color:#22863a;font-weight:bold">✔</span> Native — host language's debugger (`pdb`, `dlv`, `rust-lldb`) works on the host code; M-side via `ZBREAK` is separate but well-defined | <span style="color:#cb2431;font-weight:bold">✘</span> Studio / VS Code debug ObjectScript; stepping into embedded Python is awkward; Python frames render as IRIS-managed proxies |
+| **Versioning** | <span style="color:#22863a;font-weight:bold">✔</span> M and host language evolve independently | <span style="color:#cb2431;font-weight:bold">✘</span> A `.cls` file with embedded Python is a single unit; updating the Python revs the class; backward-compat is awkward |
+| **Hiring pool** | <span style="color:#22863a;font-weight:bold">✔</span> Large — anyone fluent in the host language can contribute to that side; M expertise remains needed for the M side, but is bounded | <span style="color:#cb2431;font-weight:bold">✘</span> Small — needs OS class developers + IRIS-specific Python proxy expertise (and shrinking) |
+| **Onboarding** | <span style="color:#22863a;font-weight:bold">✔</span> A Python developer can learn "this is how I call the M database" and ship code without learning ObjectScript | <span style="color:#cb2431;font-weight:bold">✘</span> New developer must learn OS class system, embedded Python conventions, IRIS-specific Python proxy semantics, embedded SQL |
+| **Vendor lock-in** | <span style="color:#22863a;font-weight:bold">✔</span> Low — the C ABI is a stable, portable contract | <span style="color:#cb2431;font-weight:bold">✘</span> High — code commits to the IRIS class layer and IRIS-specific Python integration |
+| **Initial prototype velocity** | <span style="color:#cb2431;font-weight:bold">✘</span> Slower bootstrap — two trees, two test runners, two CI jobs | <span style="color:#22863a;font-weight:bold">✔</span> Faster — everything in one file; class-compile feedback is fast |
+| **Steady-state development velocity** | <span style="color:#22863a;font-weight:bold">✔</span> Faster — each language uses its native (and, in Python / Rust / Go, very mature) tooling | <span style="color:#cb2431;font-weight:bold">✘</span> Slower — degraded tooling on the foreign language compounds over time |
+| **Runtime performance** | <span style="color:#22863a;font-weight:bold">✔</span> In-process when host loads `libyottadb.so`; ~µs per FFI call. Comparable. | <span style="color:#22863a;font-weight:bold">✔</span> In-process; ~µs per OS↔Python proxy call |
+
+**Tally:** of 18 dimensions, the C-API model wins 16, the polyglot model wins 1 (initial prototype velocity), and 1 is a tie (runtime performance). The single polyglot win is a transient advantage that disappears once the project crosses a low size threshold; the 16 C-API wins compound over the project lifetime.
+
+#### The CI / CD lifecycle dimension
+
+This is where the gap between the two models is sharpest. Modern host-language ecosystems assume:
+
+- Linters and formatters on every commit (`ruff check`, `cargo fmt --check`)
+- Type checkers in CI (`mypy --strict`, `tsc --noEmit`)
+- Test suites in fast, ephemeral containers (`pytest` in 50 MB images, `cargo test` in `rust:slim`)
+- Lockfiles guaranteeing reproducible builds (`uv.lock`, `Cargo.lock`, `go.sum`)
+- Pre-commit hooks blocking bad commits before review
+- Coverage tracked over time (Codecov, Coveralls)
+- Static analysis (`bandit`, `gosec`, `cargo-audit`) on every PR
+- Documentation auto-generated and deployed (`mkdocs gh-deploy`, `cargo doc`)
+
+**In the polyglot model, none of this applies to the embedded foreign-language code.** The Python inside a `.cls` method is opaque to `ruff`, invisible to `mypy`, untestable by `pytest`, undocumented by Sphinx, uncovered by `pytest-cov`, and unreachable by every other tool the Python ecosystem has built over the past decade. The IRIS class compiler and `%UnitTest` are the only validation — and they were not designed to be a Python toolchain.
+
+**In the C-API model, every modern host-language tool applies as-is.** The Python side is a normal Python project; the Rust side is a normal Cargo crate. The M side has weak tooling (the gap motivating this entire document), but the weak M tooling **does not drag down the host language**. Each language gets the best available for its ecosystem.
+
+#### The Inner Platform Effect
+
+The polyglot model exhibits what software-engineering literature calls the [**Inner Platform Effect**](https://en.wikipedia.org/wiki/Inner-platform_effect): IRIS effectively reinvents a multi-language tooling stack inside its own walls.
+
+| Mainstream tool / capability | IRIS-internal reinvention | Why it falls short |
+|------------------------------|---------------------------|--------------------|
+| `ruff` / `mypy` / `pylint` | IRIS class compiler + runtime errors for embedded Python | Class compiler validates ObjectScript; embedded Python is just text until it executes |
+| `pytest` | `%UnitTest` (ObjectScript class-based) | Cannot test embedded Python in isolation; can only test through OS wrappers |
+| Sphinx / `pdoc` | Documatic | OS-classes only; embedded Python is invisible to documentation extraction |
+| `pip` / `uv` | IRIS-managed Python environment | Not standard PyPI; lockfiles, virtualenvs, and reproducible installs are second-class |
+| `pdb` / IDE Python debug | IRIS Studio / VS Code debugger | Stepping into Python from OS is awkward; Python frames render as IRIS-managed proxies |
+| GitHub Actions matrices | IRIS-container-based CI | Commercial container required; per-PR cost is materially higher |
+
+Each reinvention is necessarily weaker than the original it shadows, because the original has decades of community investment that no single vendor can match per-language. The C-API model avoids this entirely by **not** trying to host a Python lifecycle inside the database — it just exposes a C ABI and lets `pip` / `uv` / `pytest` / `mypy` do what they already do well.
+
+#### When polyglot is legitimately the right call
+
+The polyglot model has narrow but real use cases:
+
+- **Genuinely small snippets.** A single `&sql(SELECT ID FROM Foo WHERE X=:val)` inside a method is more readable than dispatching to a separate SQL file. The line where this stops being true is roughly when the foreign code grows past one screen.
+- **Stored-procedure-style logic.** Where the foreign code is logically a stored procedure that runs close to the data and never needs to evolve independently, polyglot has lower ceremony.
+- **Incremental modernisation.** A team with decades of ObjectScript that wants to introduce Python without uprooting structure may reasonably start polyglot before separating out.
+- **Single-developer or very small projects.** Where the velocity cost of two source trees outweighs the long-term maintenance benefit, polyglot can win on net.
+
+These cases are bounded. Once a project has multiple developers, a non-trivial Python / Rust / Go component, or any expectation of long-term maintenance, the polyglot model's costs compound while the C-API model's costs amortise.
+
+#### Recommendation
+
+For systems of any non-trivial scale, **the C-API separation model is materially better on every dimension that matters for long-term maintenance**: code quality, testing, refactoring, CI/CD, hiring, documentation, and lock-in. Runtime performance is roughly a wash. The polyglot model wins only on initial-prototype velocity and on a small class of snippet-sized foreign code — both temporary or bounded advantages.
+
+The deeper observation: **good software engineering separates concerns by domain, and language is one of the most important domain boundaries.** Each language has its own conventions, tooling, experts, and ecosystem. Mixing languages in one file fights all of these; separating them lets each part be excellent at what it does. The C-API approach respects this boundary; the polyglot approach fights it.
+
+For an M codebase being modernised today, the question "polyglot or C-API?" reduces to: *do we want to reinvent the Python (or Rust, or Go) tooling stack inside our walls, or use the tooling that already exists outside them?* Phrased that way, the answer is rarely in doubt.
+
+---
+
+## 5. Summary Table: MUMPS-vs-MUMPS — Gold Standard, IRIS, YottaDB, VA/Community
+
+**This is a MUMPS-vs-MUMPS comparison.** It scores each engine on what's available to a developer writing **pure ANSI / pragmatic MUMPS code** — `.m` routines, or pure-MUMPS `.mac` routines on IRIS that contain no ObjectScript constructs. ObjectScript classes (`.cls`) are deliberately **out of scope**: ObjectScript is a separate language built on top of the runtime ([§4.1.1](#411-objectscript-what-it-is-and-why-it-isnt-ansi-standard-mumps)), and a `%UnitTest` class or Documatic comment doesn't help a developer writing MUMPS. Tools that target ObjectScript belong in a different comparison.
+
+The columns are:
+
+1. **Gold Standard** — the consensus capability mainstream-language developers expect (synthesised across Python, JS/TS, Go, Rust, Java; see [§2](#2-the-gold-standard--top-5-language-toolchains)).
+2. **IRIS (MUMPS routine)** — what's available to a developer writing `.m` or pure-MUMPS `.mac` files on IRIS, with no class wrapping.
+3. **YottaDB (`.m`)** — what's available to a developer writing `.m` files on YottaDB.
+4. **VA / Community packages** — pure-M source artefacts that supplement *either* engine equally (`^XINDEX`, KIDS, `%ut` / M-Unit, OSEHRA / WorldVistA tools; see [§4.3](#43-common-across-both-engines)). These are not vendor-shipped — they are M-language packages distributed by the VA, OSEHRA, WorldVistA, and similar communities. **This column is informational, not scored.** It lists what community / VA add-ons exist where they exist, and carries a descriptive note where they don't. **There are no scoring labels in this column** (no Full / Basic / Minimal / None) — absence of a community package is not an implementation gap, so it is not scored. See the scoping caveat below.
+
+> **Scoping caveat.** *None of the tooling in the table below — neither the IRIS or YottaDB engine entries, nor the VA / Community packages — has been formally scoped against the gold-standard tools in §2 for actual functionality, depth, or feature parity.* This document identifies the **presence or absence** of an analogous capability; it does not measure how close that capability comes to (for example) `ruff`'s rule set, `pytest`'s discovery, or `cargo`'s dependency resolution. The Full / Basic tag on the IRIS and YottaDB columns reflects relative engine-shipping maturity (e.g., IRIS's profiler is much more developed than YDB's REPL); it is **not** a comparison against the gold standard. The VA / Community column drops Full / Basic entirely because community packages have not been benchmarked against peer tools at all. *Quantifying these gaps and measuring the remaining distance to gold-standard parity is a follow-on project; the purpose of **this** analysis is to identify the gaps in the first place.*
+
+#### A note on "OS-class wrappers" (and why they don't help the underlying MUMPS code)
+
+Several IRIS-column cells in the matrix below note that a capability is available *only via an OS-class wrapper*. This is shorthand for a specific pattern: writing an ObjectScript class file (`.cls`) that extends an IRIS framework class (e.g., `%UnitTest.TestCase`) and whose methods do nothing but call into pure MUMPS routines via the `$$label^routine` syntax. Concretely:
+
+```objectscript
+/// Test class — pure scaffolding around a MUMPS routine
+Class MyApp.Tests.PatientServiceTest Extends %UnitTest.TestCase {
+  Method TestProcess() {
+    set result = $$process^patientService(123)
+    do $$$AssertEquals(result.status, "OK")
+  }
+}
+```
+
+The class is **scaffolding, not logic** — it exists solely so IRIS's class-based tooling (`%UnitTest`, `%UnitTest.Coverage`, Documatic, IPM) has something to dispatch on. The underlying MUMPS routine (`patientService.m`) is unchanged.
+
+**What this gives you:** access to IRIS's test discovery, test reporting, line-coverage instrumentation, and class-based packaging — all hanging off the wrapper class.
+
+**What this does *not* give you:**
+
+- **No MUMPS-language awareness.** The test framework sees pass/fail returned from a class method; it has no understanding of MUMPS syntax, control flow, or idioms. A test that calls `$$process^patientService` is opaque to the framework as a unit; it cannot tell you anything about the routine's quality.
+- **No improvement to the underlying MUMPS code.** Linting, formatting, complexity analysis, dead-code detection, documentation extraction, and refactoring of the MUMPS routines themselves are entirely unaffected. The wrapper class is a façade, not an analyser. The MUMPS code remains as opaque after wrapping as before.
+- **No MUMPS-side refactoring support.** Refactoring tools (rename, extract method, find references) operate on the class, not the routine. Renaming a label inside `patientService.m` will silently break every wrapper that calls `$$oldlabel^patientService`, and no IRIS-side tool will catch it.
+
+**What this *forces*:**
+
+- Test code, fixture code, documentation, and dependency manifests must all be expressed in **ObjectScript class syntax** — pulling MUMPS development into the OS class hierarchy and the OS toolchain.
+- The wrapper layer is itself a maintenance burden: every MUMPS entry point that needs testing or coverage requires a parallel class method, and the two must be kept in sync by hand.
+- The team's tooling investment goes into the wrapper layer (not the MUMPS routines), which compounds IRIS lock-in: the wrappers are non-portable to YottaDB, even though the MUMPS routines they call are portable.
+
+**Implication for a legacy MUMPS codebase.** A 40,000-routine VistA codebase has **effectively zero benefit** from this tooling pattern. To get *partial* coverage of those routines under IRIS's class-based test framework, a team would need to author and maintain tens of thousands of wrapper class files — a multi-year, non-MUMPS-improving effort whose only product is permission to use IRIS's tools on a fraction of the surface. The underlying 40,000 routines remain unlinted, unformatted, undocumented at the language layer, and unaffected by IRIS's developer-experience investment, **regardless of how thorough the wrapper layer becomes**.
+
+**Wrapping doesn't manage MUMPS code; it manages ObjectScript code that happens to call MUMPS.** OS-wrapping is a **severe and serious blocker of MUMPS-side code management**, not a partial fill of the MUMPS-tooling gap. Test code, fixture code, documentation, and dependency manifests are all forced into ObjectScript class syntax. Refactoring tools, code review, and lifecycle automation operate on the wrapper classes, not the underlying routines. The team's tooling investment goes into the wrappers, and the underlying MUMPS routines remain opaque to every OS-tier tool that touches them.
+
+The structural consequence: the IRIS column in the matrix below is scored **None** everywhere the supposed capability is gated by an OS-class wrapper. By the table's MUMPS-only scope (see preface), **"capability available only via an OS-class wrapper" is equivalent to "capability not available to MUMPS code"** — wrapping is scored as **None**, never as Basic or Minimal.
+
+**Legend.** The whole table is MUMPS-only by scope (see preface). The IRIS and YottaDB columns are scored on whether the **implementation ships this functionality**, using bold-text labels (no symbols). The VA / Community column is *informational* — it carries descriptive notes only, not scoring labels. Licensing posture is captured separately in [§1.2](#12-the-two-main-current-implementations) and [§5.2](#52-where-the-engines-diverge-most-sharply).
+
+**For the IRIS and YottaDB columns** (four-level scoring):
+
+- **Full** — the implementation ships a mature, comprehensive equivalent.
+- **Basic** — the implementation ships something usable but minimal; works, but well below the gold standard.
+- **Minimal** — the implementation provides only a primitive (e.g., `$ZHOROLOG` for benchmarking, bare `ydb` direct-mode for REPL); below "Basic" but not entirely absent.
+- **None** — the implementation does NOT ship this functionality, *or* the underlying capability exists but only via an **OS-class wrapper** (see the OS-class-wrapper note above). Per the table's MUMPS-only scope, OS-wrapped capability is not in scope and is scored **None** — never Basic or Minimal. Wrapping is a severe blocker for MUMPS code management, not a bridge to it.
+
+**For the VA / Community column** (informational; no scoring labels — see the scoping caveat above): the cell carries a descriptive note (package name, scope qualifier, or "VistA-shaped" where applicable) when a community / VA package exists, or a short note where nothing widely-known exists. Presence is only *presence* — not parity with gold-standard tools.
+
+**N/A** — concept does not apply at this scope (e.g., type checking on an untyped language).
+
+| Category | Gold Standard | IRIS (MUMPS routine) | YottaDB (`.m`) | VA / Community packages |
+|----------|---------------|----------------------|----------------|--------------------------|
+| **Runtime / REPL** | History, completion, multiline | **Basic** — `iris terminal` | **Minimal** — `ydb` direct mode (bare; no history / completion) | none |
+| **Syntax check** | Per-file, fast, exit-code | **Basic** — routine compile (on first use) | **Basic** — `zcompile` | (`^XINDEX` does deeper static analysis — see Linting rows) |
+| **Linting (style)** | Configurable, hundreds of rules | **None** | **None** | `^XINDEX` (VA Toolkit) |
+| **Linting (logic)** | Unused vars, unreachable code, missing returns | **None** | **None** | `^XINDEX` (control-flow + reachability) |
+| **Type checking** | Full static analysis | **N/A**<br>untyped | **N/A**<br>untyped | **N/A**<br>language-level |
+| **Formatting** | Canonical, deterministic, idempotent | **None** | **None** | no canonical formatter |
+| **Test runner** | Auto-discovery, parallel, rich output | **None** — `%UnitTest` requires OS-class wrapper — **blocker** for MUMPS code improvement (forces test code into ObjectScript classes; leaves the MUMPS routines themselves untested in any MUMPS-aware sense) | **None** | `%ut` / M-Unit (OSEHRA) |
+| **Single-test selection** | Path + name | **None** — only via OS-class wrapper — **blocker** | **None** — only via `%ut` | via `%ut` |
+| **Test watcher** | Reruns on save | **None** | **None** | none |
+| **Coverage (line)** | HTML + lcov | **None** — `%UnitTest.Coverage` instrumentation works on MAC routines, but the driver requires an OS-class wrapper — **blocker** for MUMPS code improvement | **None** | no widely-adopted community line-coverage tool |
+| **Coverage (branch)** | Branch + condition | **None** | **None** | none |
+| **Benchmarking** | Statistical, repeatable | **Minimal** — `$ZHOROLOG` primitive only | **Minimal** — `$ZHOROLOG` primitive only | none |
+| **Profiling** | Flame graphs, line timing | **Full** — `^%SYS.MONLBL`, `^SystemPerformance` (engine-level; works on any compiled routine) | **None** | none widely-adopted |
+| **Debugging (interactive)** | Breakpoints, step, inspect | **Basic** — `ZBREAK` in terminal (Studio / VS Code support is OS-first) | **Basic** — in-runtime `ZBREAK` only | none |
+| **Debugging (DAP / IDE)** | DAP server, IDE-agnostic | **Basic** — VS Code extension (routine-level, second-class) | **None** | none |
+| **Documentation gen** | Extract comments → HTML / MD | **None** — Documatic is `.cls`-only via `///` convention — to document MUMPS routines requires wrapping them in classes and rewriting comments as `///`, **same OS-class-wrapper blocker pattern** as the test framework (see note above). Not a partial fill — a redirect into ObjectScript that leaves the underlying MUMPS routines undocumented. | **None** | no canonical M-source doc generator |
+| **Dependency mgmt** | Lockfile, registry | **None** — IPM is class-centric — no MUMPS-routine manifest unit | **None** | **KIDS** (Kernel Installation & Distribution System; VA Kernel; VistA-shaped) |
+| **Build / tasks** | Standard task runner | **Basic** — Makefile around `iris session` | **Basic** — Makefile-only | KIDS install / build workflow (VistA-shaped) |
+| **Pre-commit hooks** | Block bad commits before push | **None** | **None** | none |
+| **CI pipeline** | One-command full check | **Basic** — Docker + `iris session` (no MUMPS-specific harness) | **Basic** — Makefile-only | none |
+| **Snapshot testing** | Compare to baseline; auto-update | **None** | **None** | none |
+| **Fixture management** | Composable, scoped test state | **None** — only via OS test class — **blocker** for MUMPS code improvement | **None** | `%ut` setup / teardown |
+| **Mock / stub** | Standard library | **None** | **None** | none |
+| **Database export** | Portable text format | **Full** — `$SYSTEM.OBJ.Export` (engine-level) | **Full** — `mupip extract` (ZWR / GO) | FileMan-derived utilities (VistA-shaped) |
+| **Database import / fixture load** | Load known state | **Full** — `$SYSTEM.OBJ.Import` (engine-level) | **Full** — `mupip load`, `%GI` | FileMan-derived utilities |
+| **Database diff** | What changed between runs | **None** | **None** | none |
+| **Database state snapshot** | Before/after comparison | **None** — ad-hoc | **None** — ad-hoc | ad-hoc |
+| **Crash / lockup cleanup** | Recover from bad process exit | **Full** — SMP / journal recovery (engine-level) | **Full** — `mupip rundown`, `lke` | none |
+| **System administration UI** | Web admin | **Full** — System Management Portal | **Basic** — YDBGUI (Vue.js + M backend, since 2022; narrower scope than SMP) | none |
+| **Foreign-language API**<br>(see [§4.4](#44-foreign-language-integration-embedded-language-vs-embedded-database)) | First-class FFI | **None** — Native API targets classes; embedded Python is OS-only | **Full** — stable C API; foreign language hosts YDB | no community FFI |
+| **Containerised deployment** | Official images | **Full** — InterSystems Docker images | **Basic** — community / marketplace | none |
+| **Source-control integration** | Editor + CI hooks | **Basic** — same hooks; `.m` exports as plain text | **Basic** — plain git over `.m` files | VA-internal Forum is not git-like |
+| **Symbol introspection** | List functions / exports | **Basic** — `%RD` (routine directory) | **Basic** — `%RD`, manual | `^XINDEX` cross-references; KIDS routine catalog |
+| **Security scan** | CVE / advisory check | **None** | **None** | none |
+| **Complexity metrics** | Cyclomatic complexity | **None** | **None** | `^XINDEX` complexity output |
+| **Dead code detection** | Unused functions / labels | **None** | **None** | `^XINDEX` flags unreferenced labels |
+| **Package publishing** | Public registry | **None** — IPM is class-centric | **None** | **KIDS** distributions; **OSEHRA** / **WorldVistA** repositories |
+
+### 5.1 Where both engines fall short of the gold standard
+
+Even taking the union of IRIS, YottaDB, **and** the VA / Community ecosystem, a number of mainstream-language toolchain categories have **no credible answer anywhere in the M world**:
+
+1. **Formatter.** No canonical layout tool exists for M — neither vendor-shipped nor in the community. Style is enforced by convention, code review, and discipline.
+2. **Linter (style + logic).** Neither engine surfaces unused variables, unreachable code, missing `QUIT`, undefined labels, or style violations as first-class diagnostics. The `^XINDEX` static analyser **partially** plugs this gap, but it is a VA Toolkit routine — not an InterSystems or YottaDB tool — and is only present where VistA (or a standalone Toolkit install) is available. There is no `ruff`/`clippy`-class linter for general M code.
+3. **Test watcher.** No equivalent of `cargo watch` / `pytest-watch` in any tier.
+4. **Branch / condition coverage.** IRIS provides line coverage via `%UnitTest.Coverage` (driven from an OS class); neither engine provides branch coverage; nothing in the community fills this.
+5. **Benchmarking harness.** Only `$ZHOROLOG`-based primitives; no `criterion` or `pytest-benchmark` analogue.
+6. **Snapshot testing.** No equivalent of `jest`'s snapshots or `syrupy`.
+7. **Mocking / stubbing.** No framework anywhere.
+8. **Database diff / state snapshot.** Critical for testing globals-bound code, absent in both engines and unaddressed by the community.
+9. **Complexity metrics.** `^XINDEX` reports some complexity statistics, but not in a form comparable to `radon` / `gocyclo` / `cargo-cyclo`. ◐ partial via XINDEX.
+10. **Dead-code detection.** `^XINDEX` flags unreferenced labels — useful, but again partial. No `vulture`-equivalent for M.
+11. **Security scanner (M-specific).** No CVE / advisory pipeline targeting M code anywhere in the ecosystem.
+12. **Cross-engine, MUMPS-native package manager.** IPM is OS-centric (and so excluded from the MUMPS scope); KIDS is VistA-shaped; YottaDB has nothing first-party. The community has not produced a `npm` / `cargo` / `uv` equivalent for cross-engine MUMPS routines.
+
+The **VA / Community column** in §5's matrix captures where the M ecosystem has real partial fills — most of them anchored on `^XINDEX`, KIDS, and `%ut`. None of these reach the maturity bar of mainstream-language tooling, and they are concentrated in the VistA developer's lifecycle (testing, distribution, static analysis) rather than spread across the full toolchain.
+
+### 5.2 Where the engines diverge most sharply
+
+| Capability | IRIS | YottaDB |
+|------------|------|---------|
+| **Licensing posture** | Commercial; tooling gated | Open source (AGPL); reproducible without licence negotiation |
+| **Primary language surface** | ObjectScript classes (`.cls`) — a proprietary superset | ANSI MUMPS only (`.m`) |
+| **IDE story** | Studio (legacy) + VS Code extension, both ObjectScript-first | None; editor-agnostic plain-text workflow |
+| **Type system** | Class-typed (ObjectScript classes only) | Untyped (M is untyped by definition) |
+| **Profiler** | First-class (`^%SYS.MONLBL`, `^SystemPerformance`) | None integrated |
+| **Package manager** | IPM/ZPM (ObjectScript-centric) | None |
+| **Documentation generator** | Documatic (ObjectScript-only via `///`) | None |
+| **System admin UI** | System Management Portal (decades mature; comprehensive) | YDBGUI + YDBGDEGUI + YDBAdminOpsGUI (Vue.js, M backend, since 2022; younger and narrower in scope) |
+| **Embedded other language** | Embedded Python, embedded SQL (inside ObjectScript only) | None |
+| **Foreign-language extensibility** | Native API (.NET, Java, Python, Node.js) | C API + bindings (Go, Python, Node.js, Rust, Lua, Perl) |
+| **Test framework posture** | First-party `%UnitTest` (ObjectScript classes only) | Community `%ut` (M-Unit) |
+| **Documentation availability** | Vendor-controlled, login-gated for some | Public Git repo |
+
+### 5.3 What the MUMPS-only matrix reveals
+
+The matrix above is **deliberately MUMPS-vs-MUMPS only**: ObjectScript is excluded because it isn't MUMPS ([§4.1.1](#411-objectscript-what-it-is-and-why-it-isnt-ansi-standard-mumps)) and tools that target ObjectScript don't help a developer writing pure MUMPS. With that scope enforced and the VA / Community column made explicit, three observations emerge:
+
+**1. The IRIS-vs-YottaDB gap, for MUMPS code, is small.** Once OS-specific tooling is excluded, IRIS's advantages narrow to a handful of engine-level capabilities — the profiler (`^%SYS.MONLBL`), more mature web admin (SMP, decades of accumulated scope), and official Docker images. YDB has matched IRIS on the runtime / admin / database-export tier (YDBGUI, `mupip extract`, etc.); IRIS holds an edge on profiling and container polish. **Neither approaches the gold standard.**
+
+**2. Most of the genuinely MUMPS-aware tooling lives in the VA / Community column, not in either vendor.** `^XINDEX` (static analysis), KIDS (package management and distribution), and `%ut` / M-Unit (testing) are the canonical answers for those concerns in MUMPS. They are pure M source, run on either engine, and predate both vendors' modern tooling efforts. **The MUMPS-aware ecosystem is mostly community / VA-driven, not vendor-driven** — and the vendors have, for different reasons, invested elsewhere (IRIS in ObjectScript; YottaDB in the runtime and the C API).
+
+**3. Even with the VA / Community column, large gaps remain.** Formatter, deeper linter, branch coverage, benchmarking, snapshot testing, mocking, doc generator, true cyclomatic complexity, security scanner, generic / non-VistA package manager — categories where neither vendor *nor* community has a credible answer. These are the gaps that motivate building vendor-neutral, source-level M tooling on a shared parser foundation.
+
+The companion document, [gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md), describes one such effort grounded in YottaDB but designed to be portable to any conformant M engine via a shared parser foundation ([`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m)) and a vendor-neutral grammar surface ([`m-standard`](https://github.com/rafael5/m-standard)). Because that parser targets the ANSI / pragmatic MUMPS surface (not ObjectScript), tools built on it would fill gaps in **all four** columns of §5's matrix — they would be MUMPS-aware first-party tooling that complements both engines and the VA / Community ecosystem alike.
+
+---
+
+## 6. The Real Question: Developer Experience for a Legacy MUMPS Codebase
+
+The preceding chapters analysed each engine's tooling on its own merits. But the question that motivates most M tooling work is concrete and specific: **what is the developer experience for someone maintaining a large, legacy MUMPS codebase — for example, the U.S. Department of Veterans Affairs' VistA system, with roughly 40,000 routines of pure ANSI MUMPS?**
+
+This question deserves a direct answer, because the engine-level tooling story (§4.1, §4.2) and the language-surface analysis (§3) both miss it. The codebase in question:
+
+- Is overwhelmingly `.m` files — hand-written ANSI MUMPS, decades of accumulated procedural code.
+- Has **no `.cls` files**, no `&sql(...)`, no embedded Python, no `##class()`, no `///` doc comments, no `$$$Foo` macros.
+- Uses dot-blocks, naked references, `$DATA` / `$ORDER` / `$PIECE` traversals over hierarchical globals — the classical M idiom.
+- Conforms primarily to the VA SAC / XINDEX rule set (a stricter subset of ANSI), not to InterSystems-extended ObjectScript.
+
+For this codebase, the question is not "what does ObjectScript give me?" — there is no ObjectScript involved. The question is: **what tooling actually treats my code as MUMPS, and what does my daily edit / test / debug / ship loop look like?**
+
+### 6.1 The IRIS-based VistA scenario
+
+A team adopts IRIS to host a legacy MUMPS codebase. The engine runs the code (IRIS supports the ANSI / pragmatic surface that VistA uses). Engine-level operations work cleanly: the System Management Portal admin UI, journal-based replication, official Docker images, `^%SYS.MONLBL` profiling, `$SYSTEM.OBJ.Export/Import`, and `ZBREAK`-based debugging all function regardless of the source language.
+
+But the **developer-experience layer is mostly inaccessible**, because virtually all of IRIS's language-aware tooling targets ObjectScript. Specifically:
+
+- **`%UnitTest` cannot test the existing routines** without re-casting them, or wrapping them, in ObjectScript classes. A 40,000-routine wrap-and-port effort is not a credible undertaking.
+- **Documatic produces nothing** because there are no classes and no `///` doc comments.
+- **IPM/ZPM has no manifest unit** — the package manager assumes ObjectScript classes as the unit of distribution.
+- **Studio and VS Code provide a bare editor** with syntax highlighting but no MUMPS-specific completions, refactorings, or lints. The IntelliSense is tuned for ObjectScript.
+- **The class compiler is irrelevant** — there are no classes.
+- **Embedded SQL and embedded Python are unreachable** — both are OS-only language features.
+
+What remains genuinely useful:
+- **`^XINDEX`** — VistA's own static analyser (M source from the VA Kernel package). Actually MUMPS-aware and SAC-aware. **Not an IRIS tool** — it is part of the VistA distribution itself, so it is present regardless of which engine hosts VistA.
+- **Routine compiler** — catches MUMPS syntax errors at first reference.
+- **`^%SYS.MONLBL`** — profiles routines regardless of source language.
+- **`ZBREAK` / `ZSHOW` / `ZSTEP`** — manual interactive debugging in the terminal.
+- **SMP, journal, Docker, export/import** — engine-level features that don't care about the language.
+
+That is approximately the YottaDB experience plus a profiler, a more mature web admin UI, and official container kits — at the cost of a commercial licence. (YottaDB now ships its own web admin via **YDBGUI**, since r1.36 / 2022, but its scope is narrower than IRIS's SMP.)
+
+### 6.2 The YottaDB-based VistA scenario
+
+YottaDB is pure ANSI MUMPS — it runs a legacy MUMPS codebase without translation. The runtime is mature and POSIX-composable. There is no ObjectScript layer to navigate around, because there is no ObjectScript layer. But there is also **no first-party developer-experience layer**: no formatter, no linter beyond `zcompile`, no test framework, no coverage tool, no profiler, no docs generator, no package manager, no IDE. (The `^XINDEX` static analyser is bundled with VistA itself, so any YottaDB-based VistA deployment has it — it is the same routine that runs on IRIS-based VistA.)
+
+The community fills some gaps — M-Unit (`%ut`) for testing, ad-hoc patterns for everything else — but nothing reaches the polish of mainstream-language tooling.
+
+Compared to IRIS-based VistA, YottaDB-based VistA loses:
+- The integrated profiler (`^%SYS.MONLBL`).
+- A more mature web admin UI — YottaDB ships **YDBGUI** (Vue.js + M backend, since 2022), but SMP has decades of accumulated scope.
+- Some polish in container tooling.
+
+It gains:
+- Open-source licensing (AGPL-3.0) — no commercial licence negotiation, fully reproducible CI, no per-developer seat costs.
+- `mupip` and YDB-specific recovery utilities (`lke`, `gde`, `dse`).
+- A stable, public C API (`libyottadb.so`) with first-party language bindings (Go, Python) and several community bindings (Node.js, Rust, Lua, Perl).
+- A public Git documentation tree (no login wall).
+
+### 6.3 Side-by-side summary
+
+> **About the IRIS-based VistA column.** Many of the IRIS rows below collapse to the same underlying fact: **the capability exists in IRIS, but only as an IOS-only feature** — i.e., available to a developer writing IOS classes (`.cls`), not to a developer maintaining `.m` MUMPS routines. Rather than spelling this out five different ways ("requires OS wrapper", "requires OS class", "no class manifest", "OS-first", "OS-tuned"), those rows are simply marked **◯** with a single annotation: **"IOS-only feature."** The structural cause is the same in every case (see [§4.1.3](#413-iris-tooling-by-file-scope-and-language) and the [OS-class-wrapper note in §5](#5-summary-table-mumps-vs-mumps--gold-standard-iris-yottadb-vacommunity)); listing it once is more useful than restating it per row.
+>
+> **About the VistA tools column.** This column lists VA-supplied M packages — principally **`^XINDEX`** (the VA Kernel Toolkit static analyser) and **KIDS** (Kernel Installation & Distribution System) — that *neither IRIS nor YottaDB ship*. They are bundled with VistA itself and run identically on either engine. Listing them here keeps the IRIS / YottaDB columns honest about what the *engines* ship for MUMPS code, while still acknowledging that a VistA codebase brings its own toolset.
+
+| Capability the team actually needs | Gold standard | IRIS-based VistA | YottaDB-based VistA | VistA tools |
+|------------------------------------|---------------|------------------|---------------------|-------------|
+| Engine runs the code unmodified | ⬤ | ⬤ | ⬤ | — |
+| Static analysis of MUMPS code | ⬤<br>ruff / clippy / staticcheck | ◯<br>none vendor-shipped | ◯<br>none vendor-shipped | `^XINDEX` |
+| Test runner over pure MUMPS routines | ⬤ | ◯<br>IOS-only feature | <span style="font-size:1.5em;line-height:1">◐</span><br>community `%ut` (M-Unit) | — |
+| Coverage over MUMPS routines | ⬤ | ◯<br>IOS-only feature | ◯<br>community efforts only | — |
+| Profiler | ⬤ | ⬤<br>`^%SYS.MONLBL` | ◯<br>none integrated | — |
+| Documentation generator | ⬤ | ◯<br>IOS-only feature | ◯<br>none | — |
+| Package / dependency mgmt | ⬤ | ◯<br>IOS-only feature | ◯<br>none | KIDS |
+| Formatter | ⬤ | ◯<br>none | ◯<br>none | — |
+| Linter (style / logic) | ⬤ | ◯<br>none vendor-shipped | ◯<br>none vendor-shipped | `^XINDEX` |
+| IDE support for MUMPS source | ⬤ | ◯<br>IOS-only feature | ◯<br>none | — |
+| Interactive debugger | ⬤ | <span style="font-size:1.5em;line-height:1">◐</span><br>`ZBREAK` only (IDE step-debugger is an IOS-only feature) | <span style="font-size:1.5em;line-height:1">◐</span><br>`ZBREAK` only | — |
+| Web admin UI | **N/A**<br>for runtime languages | ⬤<br>SMP (mature) | ⬤<br>YDBGUI (since 2022; narrower scope) | — |
+| Foreign-language API (see [§4.4](#44-foreign-language-integration-embedded-language-vs-embedded-database)) | ⬤ | ◯<br>IOS-only feature | ⬤<br>stable C API; foreign language hosts YDB | — |
+| Licensing posture for OSS / community work | ⬤<br>free | ◯<br>commercial; per-seat / instance | ⬤<br>AGPL-3.0; no negotiation | — |
+
+### 6.4 The bottom line
+
+**For a pure-MUMPS legacy codebase, the gap between IRIS and YottaDB is much narrower than the gap between either engine and the gold-standard developer experience of Python, Go, or Rust.** Most of IRIS's tooling investment is consumed by ObjectScript developers, and most of YottaDB's investment goes into the runtime itself. **Neither engine offers a developer experience that approaches what mainstream-language developers consider table stakes.**
+
+This is the structural problem that motivates building **vendor-neutral, source-level M tooling** on top of a shared parser:
+
+- A parser that targets the ANSI / pragmatic MUMPS surface (not ObjectScript) gives every downstream tool — formatter, linter, doc generator, complexity analyser, dead-code detector — the same input on either engine.
+- A grammar surface that treats MUMPS as a first-class language (rather than as a mode of ObjectScript or as a secondary file format inside IRIS) is the only way to fill the source-language gaps for legacy MUMPS codebases.
+- Vendor neutrality matters because the codebases that need this tooling most — VistA and similarly-shaped systems — must be able to run on either engine without lock-in.
+
+The companion remediation strategy ([gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md)) describes one such effort, grounded in YottaDB for pragmatic reasons (open-source reproducibility) but designed around a portable parser foundation (`tree-sitter-m`) and a vendor-neutral grammar surface (`m-standard`) so the resulting tools serve the IRIS-MUMPS-routine column equally well as the YottaDB column.
+
+---
+
+## 7. Consolidated Gap Analysis
+
+The §5 and §6.3 matrices score each engine separately. This section flips the perspective and asks the consolidated question: **which gold-standard developer-toolchain categories are missing from *both* engines for pure MUMPS code?**
+
+The table below mirrors the category list from [§2.1 (Python)](#21-python) — the most comprehensive of the five gold-standard toolchain tables — **in the same order**. For each gold-standard category, IRIS-MUMPS and YottaDB statuses are summarised in the four-level scoring convention from [§5](#5-summary-table-mumps-vs-mumps--gold-standard-iris-yottadb-vacommunity) (**Full** / **Basic** / **Minimal** / **None**), and a final column classifies each row by gap severity.
+
+**Gap classification:**
+
+- **MAJOR — common gap** — *both* engines ship **None** for MUMPS code. These are the most severe gaps and the highest-leverage targets for vendor-neutral, source-level M tooling: a single tool built on a shared parser foundation can fill the gap on both engines simultaneously.
+- **PARTIAL — common gap** — both engines ship something usable but well below the gold standard (Basic, Minimal, or some combination). Real, but less acute than a Major gap.
+- **ENGINE-SPECIFIC** — one engine has a meaningful tool, the other does not. Not a common gap; the absence is single-engine.
+- **N/A** — concept does not apply to M (e.g., type checking on an untyped language; import analysis where there is no import system).
+
+**Scoping caveat carries forward:** as established in [§5's preface](#5-summary-table-mumps-vs-mumps--gold-standard-iris-yottadb-vacommunity), none of the IRIS / YottaDB tooling has been formally scoped against the gold-standard exemplars for actual feature parity. The Full / Basic / Minimal labels reflect *engine-shipping maturity*, not *parity with the exemplar*. Quantifying the remaining distance to gold-standard parity is a follow-on project.
+
+| # | Gold-standard category | Exemplar (Python ref) | IRIS (MUMPS) | YottaDB (MUMPS) | Gap classification |
+|---|------------------------|-----------------------|:------------:|:---------------:|--------------------|
+|  1 | Runtime / REPL          | `ipython`, `ptpython`              | **Basic** — `iris terminal` | **Minimal** — `ydb` direct mode (bare) | **PARTIAL — common gap** |
+|  2 | Syntax check            | `ruff`, `py_compile`               | **Basic** — routine compile | **Basic** — `zcompile`                 | **PARTIAL — common gap** |
+|  3 | Linting (style)         | `ruff`, `flake8`, `pycodestyle`    | **None**                    | **None**                               | **MAJOR — common gap**   |
+|  4 | Linting (logic)         | `pylint`, `ruff`                   | **None**                    | **None**                               | **MAJOR — common gap**   |
+|  5 | Type checking           | `mypy`, `pyright`                  | **N/A** — untyped language  | **N/A** — untyped language             | **N/A**                  |
+|  6 | Formatting              | `ruff format`, `black`             | **None**                    | **None**                               | **MAJOR — common gap**   |
+|  7 | Test runner             | `pytest`, `unittest`               | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+|  8 | Single-test selection   | `pytest tests/x.py::test_y`        | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+|  9 | Test watcher            | `pytest-watch`, `ptw`              | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 10 | Coverage                | `coverage.py`, `pytest-cov`        | **None** (driver IOS-only)  | **None**                               | **MAJOR — common gap**   |
+| 11 | Benchmarking            | `pytest-benchmark`, `timeit`       | **Minimal** — `$ZHOROLOG`   | **Minimal** — `$ZHOROLOG`              | **PARTIAL — common gap** |
+| 12 | Profiling               | `cProfile`, `py-spy`               | **Full** — `^%SYS.MONLBL`   | **None**                               | **ENGINE-SPECIFIC** (IRIS-only) |
+| 13 | Debugging               | `pdb`, `ipdb`, `debugpy`           | **Basic** — `ZBREAK` etc.   | **Basic** — `ZBREAK` etc.              | **PARTIAL — common gap** |
+| 14 | Documentation           | `pdoc`, `sphinx`, `mkdocs`         | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+| 15 | Dependency mgmt         | `uv`, `pip`, `poetry`              | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+| 16 | Build / tasks           | `make`, `tox`, `nox`               | **Basic** — Makefile + `iris session` | **Basic** — Makefile + `ydb -run` | **PARTIAL — common gap** |
+| 17 | Import analysis         | `isort`, `ruff --select I`         | **N/A** — no import system  | **N/A** — no import system             | **N/A**                  |
+| 18 | Security scan           | `bandit`, `safety`                 | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 19 | Complexity              | `radon`, `ruff`                    | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 20 | Dead code               | `vulture`                          | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 21 | Fixture management      | `pytest fixtures`, `factory_boy`   | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+| 22 | Snapshot testing        | `syrupy`                           | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 23 | Pre-commit hooks        | `pre-commit`                       | **None**                    | **None**                               | **MAJOR — common gap**   |
+| 24 | CI script               | `tox`, `nox`, GitHub Actions       | **Basic** — Docker + `iris session` | **Basic** — Makefile + `ydb -run` | **PARTIAL — common gap** |
+| 25 | Environment check       | `pyenv`, `tox`                     | **N/A** — no equivalent     | **N/A** — no equivalent                | **N/A**                  |
+| 26 | Package publishing      | `twine`, `flit`, `uv publish`      | **None** (IOS-only)         | **None**                               | **MAJOR — common gap**   |
+
+### Tally
+
+Of the **26 gold-standard categories** from §2.1:
+
+- **16 are MAJOR common gaps** — both engines ship **None** for MUMPS code. These are the highest-leverage remediation targets: linting (style and logic), formatting, test runner, single-test selection, test watcher, coverage, documentation, dependency management, security scan, complexity, dead-code, fixture management, snapshot testing, pre-commit hooks, package publishing.
+- **6 are PARTIAL common gaps** — both engines ship something below gold standard: runtime/REPL, syntax check, benchmarking, debugging, build/tasks, CI script.
+- **1 is ENGINE-SPECIFIC** — IRIS-only: profiling (`^%SYS.MONLBL`).
+- **0 are YottaDB-only** — there is no gold-standard category where YottaDB ships a meaningful first-party tool that IRIS lacks (within the MUMPS scope; YottaDB's foreign-language API and YDBGUI are not in §2.1's category list).
+- **3 are N/A** — type checking, import analysis, environment check (these don't apply to M).
+
+**22 of the 23 applicable categories are common gaps** (16 major + 6 partial). Only profiling is single-engine. **No category is fully solved on both engines for MUMPS code.**
+
+### Why this consolidation matters
+
+The 16 major common gaps are the strategic high-water mark for M tooling investment. **A single vendor-neutral, source-level tool — built on a shared MUMPS parser foundation (e.g., [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m)) — can fill each of these gaps for both engines simultaneously.** That is the economy of leverage that justifies treating M as a portable language with portable tooling, rather than as a feature of a vendor's runtime that each vendor solves separately (and neither does for MUMPS code).
+
+The 6 partial common gaps are second-tier targets: tools where both engines ship something usable but below gold standard, so the remediation work is to *augment* rather than to *originate*.
+
+The single engine-specific item (profiling, IRIS-only) is the only category where a remediation effort would be *YottaDB-side only*, with no parallel benefit to IRIS.
+
+This consolidated view is precisely what the companion document [gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md) builds its sequencing on — the major common gaps are the natural Tier-1 targets for any M-language toolchain effort.
+
+---
+
+## 8. Rank-Ordered Developer Impact: Where to Invest First
+
+The §7 consolidated table inventories *what is missing*. This closing section ranks the same gold-standard categories by **developer impact** — which tools, in absolute terms, do the most to improve **productivity, efficiency, code quality, and rapid code evolution**. The ranking is **independent of which engine implements them**: it asks the universal question, *"if a single tool from this list could be added to a developer's day, which would matter most?"*
+
+The categories cluster into four tiers. Within each tier, ordering is by approximate daily-use frequency × magnitude of impact.
+
+### Tier 1 — The development loop (transformative impact)
+
+These are the tools whose **absence is felt every single edit**. They form the inner loop of modern software development: write code → check it → run a test → see the result. Without them, every other quality activity is harder.
+
+| Rank | Category | Why it sits here |
+|------|----------|------------------|
+| 1 | **Test runner** | The single most foundational tool. Without a test framework, no quality activity is possible — refactoring is unsafe, CI has nothing to gate on, coverage cannot be measured. Mainstream developers run tests dozens of times per hour; M developers running tests once per session is a defining gap. |
+| 2 | **Linter (logic)** | Catches whole categories of bugs (unused vars, unreachable code, missing returns, undefined labels) **at edit time**, before they reach a test or production. Every keystroke is implicitly checked by IDE-integrated linters in mainstream languages; the absence in M means bugs surface only at runtime. |
+| 3 | **Formatter** | Eliminates style debate, makes diffs review-friendly, enforces canonical layout that downstream tools (linters, AST analysers) can rely on. Runs invisibly on every save in mainstream languages. |
+| 4 | **Single-test selection** | Without it, the test loop devolves to "run all tests, wait, scroll for the relevant failure." With it, the loop is sub-second. The difference compounds over a workday. |
+| 5 | **Test watcher** | Auto-rerun on save; sub-second feedback. Once a developer has experienced this loop (Rust's `cargo watch`, Python's `pytest-watch`), going back is painful. |
+
+**Tier 1 summary:** these five tools, used together, are the single biggest developer-experience gap between modern languages and M. They are also tightly coupled — adding any one without the others delivers a fraction of the value.
+
+### Tier 2 — Quality gates and team scaling (high impact)
+
+These tools move quality work from "individual discipline" to "automated guarantee." They are run periodically rather than on every edit, but they are how teams scale quality across many contributors.
+
+| Rank | Category | Why it sits here |
+|------|----------|------------------|
+| 6 | **CI script** | Every commit gets the full quality battery (lint, format-check, test, type-check). The bedrock of multi-developer collaboration. Currently both engines have *Basic* (Makefile + container); the gap is in CI-shaped harnesses tuned for M. |
+| 7 | **Coverage** | Measures test thoroughness; identifies untested code paths. Quality investment compounds when coverage is visible per-PR. |
+| 8 | **Linter (style)** | Secondary to logic linting, but pairs with the formatter to enforce a consistent codebase. |
+| 9 | **Pre-commit hooks** | Catches lint / format / basic-type errors *before* a bad commit reaches the remote, saving CI cycles and faster feedback. Cheap to implement on top of a linter and formatter. |
+| 10 | **Debugger** | When a bug resists static analysis, an interactive debugger (step / breakpoint / inspect) is the canonical recovery tool. Both engines provide `ZBREAK` at the engine level (basic) but lack mainstream IDE-integrated step-debugging for MUMPS code. |
+
+### Tier 3 — Maintenance and ecosystem (medium impact)
+
+These tools become important *after* a project has scale: shared knowledge, shared dependencies, code health over time.
+
+| Rank | Category | Why it sits here |
+|------|----------|------------------|
+| 11 | **Documentation generator** | Critical for onboarding new contributors and for long-term maintainability. Less daily-use than testing/linting, but every codebase eventually needs it. |
+| 12 | **Dependency management** | Becomes critical when projects need to share or consume libraries. For a single-team codebase, less acute; for an ecosystem, indispensable. |
+| 13 | **Dead code detection** | Periodic cleanup; identifies labels, routines, and exports no longer referenced. Quality-of-life. |
+| 14 | **Complexity metrics** | Code-health monitoring; flags routines that have grown unwieldy. Useful in CI as a "no new complexity above threshold" gate. |
+| 15 | **Fixture management** | Test-infrastructure scaffolding that becomes valuable once test runner + single-test selection exist. Without those, fixture management is moot. |
+
+### Tier 4 — Specialised or quality-of-life (lower impact)
+
+These tools matter, but either operate in narrow contexts (performance work, deployment, sharing) or are quality-of-life polish on top of capabilities already minimally present.
+
+| Rank | Category | Why it sits here |
+|------|----------|------------------|
+| 16 | **Snapshot testing** | Useful for specific patterns (CLI output, generated text); not a daily-use tool for most code. |
+| 17 | **Build / tasks** | Both engines already have *Basic* coverage via Makefile. The gap is convenience, not capability. |
+| 18 | **Runtime / REPL** | Quality-of-life for exploration; both engines have *something* (Basic / Minimal). Improvements are incremental, not transformative. |
+| 19 | **Syntax check** | Already exists at compile time on both engines (*Basic*). Gap is in editor-integrated speed and granular reporting. |
+| 20 | **Profiling** | Critical when performance work is on the agenda; idle the rest of the time. IRIS already has *Full* (`^%SYS.MONLBL`); YDB lacks it. Not daily-use. |
+| 21 | **Benchmarking** | Only used in performance-critical work. `$ZHOROLOG` covers the primitive case. |
+| 22 | **Security scan** | Important pre-deployment, less important pre-commit. Not daily-use. |
+| 23 | **Package publishing** | Only matters when sharing artefacts publicly. Until M has a vibrant package ecosystem, this is mostly aspirational. |
+
+### Closing observation
+
+The ranking is steeply skewed toward **Tier 1**. The five tools at the top — test runner, logic linter, formatter, single-test selection, test watcher — are not five separable items but a single integrated **inner loop** that every modern language ecosystem provides and the M ecosystem does not. **Filling those five gaps would close the most consequential portion of the M developer-experience deficit, regardless of which engine the code runs on.** Every tool below them depends on or is amplified by them.
+
+The ranking also confirms a striking economy: **the highest-impact gaps are also the most universal** — they are MUMPS-language gaps, not engine-specific gaps, and they are best filled by source-level tools built on a shared parser foundation. The companion remediation strategy ([gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md)) prioritises the inner-loop tools first, on exactly this reasoning.
+
+### 8.5 Validation: empirical grounding for the ranking
+
+The ranking above is informed by primary research where empirical data exists, and by engineering judgment where it does not. This subsection documents both — and is honest about the limits of the evidence base, since most research targets mainstream languages, not M.
+
+#### Tier 1 — strongest empirical support
+
+**Test automation as a foundational capability.** The clearest evidence comes from the DORA / Accelerate research programme:
+
+- Forsgren, Humble & Kim (2018), *Accelerate: The Science of Lean Software and DevOps* (IT Revolution). Based on four years of DORA research, **23,000+ respondents from 2,000+ organisations**, identifies **test automation** as one of the technical capabilities *most strongly correlated with high software-delivery performance* — alongside version control, continuous integration, continuous delivery, and loosely-coupled architecture.
+- [DORA / Test Automation capability](https://dora.dev/capabilities/test-automation/) summarises the core finding: fast, reliable automated test suites drive *higher software stability, reduced team burnout, and lower deployment pain*.
+
+**Fast feedback loop (test runner + watcher + single-test selection).** Foundational TDD research:
+
+- Erdogmus, Morisio & Torchiano (2005), [*"On the Effectiveness of the Test-First Approach to Programming"*](https://www.researchgate.net/publication/3189711) (IEEE TSE 31(3)). Establishes that the test-first feedback cycle reduces error-detection latency from hours to minutes and bounds the scope of introduced bugs — supporting Tier 1's emphasis on fast iteration.
+- Tosi, Lavazza et al. (2017), [*"An industry experiment on the effects of test-driven development on external quality and productivity"*](https://link.springer.com/article/10.1007/s10664-016-9490-0) (Empirical Software Engineering). Industry experiment with 24 professionals; mixed productivity findings but consistent quality improvements.
+
+**Static analysis (linter) impact.** Strong industrial evidence:
+
+- Sadowski, Aftandilian, Eagle, Miller-Cushon & Jaspan (2018), [*"Lessons from Building Static Analysis Tools at Google"*](https://cacm.acm.org/research/lessons-from-building-static-analysis-tools-at-google/) (CACM 61(4)). Documents Google's Tricorder system, which prevents *hundreds of bugs per day* from entering the Google codebase. Confirms static analysis as a high-leverage tool when integrated into the developer workflow — and characterises why ad-hoc bug-filing approaches fail (84% of bugs not fixed) but compiler-integrated checks succeed.
+
+**Linters and formatters in practice.** Industry survey data:
+
+- [Stack Overflow Annual Developer Survey 2024](https://survey.stackoverflow.co/2024/) — among build / dev tools, **Ruff** (Python linter + formatter) scores **84% admired** (highest in its category), and **Cargo** (Rust dep manager + test runner + build tool) scores **83% admired**. Tools that bundle the Tier 1 capabilities consistently rank at the top of developer-satisfaction surveys, supporting their primacy in the ranking.
+
+#### Tier 2 — solid empirical support for CI; coverage is more nuanced
+
+**Continuous integration:**
+
+- Vasilescu, Yu, Wang, Devanbu & Filkov (2015), [*"Quality and productivity outcomes relating to continuous integration in GitHub"*](https://web.cs.ucdavis.edu/~filkov/papers/pr_soc_lan.pdf) (FSE 2015). Large-scale GitHub study: CI-using projects merge PRs significantly faster, and core developers using CI discover more bugs.
+- Hilton, Tunnell, Huang, Marinov & Dig (2016), [*"Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects"*](https://dl.acm.org/doi/10.5555/3155562.3155575) (ASE 2016). Documents adoption patterns and quantifies productivity benefits.
+
+**Coverage — a useful signal, not a guarantee:**
+
+- Inozemtseva & Holmes (2014), *"Coverage is Not Strongly Correlated with Test Suite Effectiveness"* (ICSE 2014). Important counter-point: coverage is a useful indicator but does **not** guarantee test quality. This is why Coverage sits at #7 (Tier 2, high impact) rather than Tier 1 — its value is conditional on having good tests already.
+
+**Code review (related to pre-commit hooks):**
+
+- Bacchelli & Bird (2013), *"Expectations, Outcomes, and Challenges of Modern Code Review"* (ICSE 2013). Establishes modern code review as a high-impact quality activity; pre-commit hooks shift some of that work to an earlier, faster checkpoint.
+
+#### Tier 3 / 4 — weaker empirical grounding, more reliance on cross-language consensus
+
+For documentation generators, dependency managers, complexity metrics, and the specialised tools in Tier 4, direct empirical comparison studies are sparse. The ranking here relies on:
+
+- **Cross-language consensus** — every mainstream language (Python, JS/TS, Go, Rust, Java) ships these tools at this approximate level of priority, as documented in [§2's gold-standard tables](#2-the-gold-standard--top-5-language-toolchains).
+- **Frequency of daily use** as a proxy — tools that operate periodically (security scan, package publishing) are placed below tools that operate continuously.
+- **Stack Overflow Developer Survey** popularity rankings, which consistently place dependency managers (Cargo, uv, npm) among the most-admired tools, validating their Tier 3 placement.
+
+#### The SPACE framework as a sanity check
+
+- Forsgren, Storey, Maddila, Zimmermann, Houck & Butler (2021), [*"The SPACE of Developer Productivity"*](https://queue.acm.org/detail.cfm?id=3454124) (CACM 64(6)). The five SPACE dimensions — **S**atisfaction, **P**erformance, **A**ctivity, **C**ommunication, **E**fficiency — provide a useful sanity check. Tier 1 tools touch all five (they affect satisfaction *and* performance *and* efficiency); lower-tier tools tend to touch one or two. The tier ordering is broadly consistent with SPACE coverage.
+
+#### Limitations and caveats
+
+1. **Most research targets mainstream languages.** M-specific empirical productivity data is scarce. The ranking transfers cross-language conclusions on the assumption that the M development cycle is structurally similar — a defensible but unverified premise.
+2. **No empirical study directly compares all 23 categories head-to-head.** The ranking synthesises research where it exists and engineering judgment where it doesn't.
+3. **"Productivity" is multi-dimensional.** SPACE makes this explicit: no single metric captures it. The ranking reflects an *unweighted* aggregate across productivity, efficiency, quality, and rapid code evolution. A team weighting one dimension heavily (e.g., a research lab prioritising exploration velocity) would justifiably re-rank some categories.
+4. **Ordering *within* a tier is judgment-based.** Cross-tier ordering (Tier 1 above Tier 2 etc.) is research-supported; intra-tier ordering (e.g., test runner #1 vs logic linter #2) is informed by tool-dependency graphs rather than direct comparative studies.
+
+#### Suggested primary sources for follow-up
+
+- Forsgren, Humble & Kim (2018), *Accelerate: The Science of Lean Software and DevOps* (IT Revolution Press)
+- DORA's annual *State of DevOps* reports at [dora.dev/research](https://dora.dev/research/)
+- Forsgren, Storey et al. (2021), [*"The SPACE of Developer Productivity"*](https://queue.acm.org/detail.cfm?id=3454124) (CACM 64(6))
+- Sadowski et al. (2018), [*"Lessons from Building Static Analysis Tools at Google"*](https://cacm.acm.org/research/lessons-from-building-static-analysis-tools-at-google/) (CACM 61(4))
+- Vasilescu et al. (2015), [*"Quality and productivity outcomes relating to continuous integration in GitHub"*](https://web.cs.ucdavis.edu/~filkov/papers/pr_soc_lan.pdf) (FSE 2015)
+- [Stack Overflow Annual Developer Survey](https://survey.stackoverflow.co/) (annual; useful for tool popularity / satisfaction)
+- [JetBrains State of Developer Ecosystem](https://www.jetbrains.com/lp/devecosystem-2024/) (annual; complementary tool-popularity data)
+
+---
+
+*End of m-tool-gap-analysis document.*
diff --git a/docs/history/m-tooling-tier1.md b/docs/history/m-tooling-tier1.md
new file mode 100644
index 0000000..17d99b0
--- /dev/null
+++ b/docs/history/m-tooling-tier1.md
@@ -0,0 +1,260 @@
+# M Tooling — Tier 1 Strategy: Closing the Inner-Loop Gaps
+
+> **Archived snapshot.** Imported verbatim from [`m-dev-tools/m-tools`](https://github.com/m-dev-tools/m-tools) — source commit [`16fe3f7`](https://github.com/m-dev-tools/m-tools/commit/16fe3f7dc6982070809cd1d8290d01fedc5905ac) (2026-04-27), before that repo was archived. Preserved as the original Tier 1 strategy that scoped what became `m-cli`'s first deliverables. **Not maintained.** For the *current* shape of the org, start at [`profile/README.md`](../../profile/README.md).
+
+**Document type:** Strategic plan, scoped
+**Scope:** The five Tier 1 developer-toolchain gaps in the M (MUMPS) ecosystem
+**Audience:** Anyone planning, coordinating, or contributing to M-language tooling work
+**Companion documents:**
+- [m-tool-gap-analysis.md](m-tool-gap-analysis.md) — the broader cross-engine gap analysis (this doc focuses on its [§8 Tier 1](m-tool-gap-analysis.md#8-rank-ordered-developer-impact-where-to-invest-first))
+- [gap-analysis-and-remediation-strategy.md](gap-analysis-and-remediation-strategy.md) — the wider phased remediation plan (this doc is the focused Tier 1 extract)
+
+---
+
+## Table of Contents
+
+- [1. The Tier 1 gaps](#1-the-tier-1-gaps)
+- [2. Foundation already in place](#2-foundation-already-in-place)
+  - [2.1 `m-standard` — the language reference](#21-m-standard--the-language-reference)
+  - [2.2 `tree-sitter-m` — the parser](#22-tree-sitter-m--the-parser)
+  - [2.3 VistA — the corpus](#23-vista--the-corpus)
+- [3. Strategy: incremental remediation](#3-strategy-incremental-remediation)
+  - [3.1 Principles](#31-principles)
+  - [3.2 The sequence](#32-the-sequence)
+  - [3.3 What ships at each step](#33-what-ships-at-each-step)
+  - [3.4 Portability across M implementations](#34-portability-across-m-implementations)
+  - [3.5 Validation gates](#35-validation-gates)
+  - [3.6 Out of scope (intentional)](#36-out-of-scope-intentional)
+- [4. Why Tier 1 first](#4-why-tier-1-first)
+- [5. Design decisions](#5-design-decisions)
+  - [5.1 IRIS adapter ownership](#51-iris-adapter-ownership)
+  - [5.2 `^XINDEX` integration](#52-xindex-integration)
+  - [5.3 Performance baselining](#53-performance-baselining)
+  - [5.4 Editor integration cadence](#54-editor-integration-cadence)
+  - [5.5 Versioning across `m-standard` updates](#55-versioning-across-m-standard-updates)
+
+---
+
+## 1. The Tier 1 gaps
+
+The [§8 ranking in m-tool-gap-analysis.md](m-tool-gap-analysis.md#8-rank-ordered-developer-impact-where-to-invest-first) identifies five Tier 1 capabilities — the development inner loop — that are **MAJOR common gaps** across both major M engines (IRIS, YottaDB) for pure MUMPS code. They are the **transformative** tools, validated in [§8.5](m-tool-gap-analysis.md#85-validation-empirical-grounding-for-the-ranking) against DORA / *Accelerate* research and the broader literature on developer productivity.
+
+| # | Capability | Why it's Tier 1 |
+|---|------------|-----------------|
+| 1 | **Test runner** | Foundation. Without it, no quality activity is possible — refactoring is unsafe, CI has nothing to gate on, coverage cannot be measured. |
+| 2 | **Linter (logic)** | Catches whole categories of bugs — unused vars, unreachable code, missing `QUIT`s, undefined labels — *at edit time*, before they reach a test or production. |
+| 3 | **Formatter** | Eliminates style debate; enforces canonical layout that downstream tools (linters, AST analysers) can rely on. Runs invisibly on every save in mainstream languages. |
+| 4 | **Single-test selection** | Without it, the test loop is "run all tests, scroll for the relevant failure." With it, the loop is sub-second. The difference compounds over a workday. |
+| 5 | **Test watcher** | Auto-rerun on save; sub-second feedback. Once a developer has experienced this loop (Rust's `cargo watch`, Python's `pytest-watch`), going back is painful. |
+
+These five are not five separable tools but a single integrated **inner loop**: edit → save → format → lint → run-affected-test → see result. Filling all five gaps closes the highest-leverage portion of the M developer-experience deficit; any one in isolation delivers a fraction of the value.
+
+---
+
+## 2. Foundation already in place
+
+Tier 1 is feasible *now* because three pre-requisites are already shipped. These are not aspirations — they are tagged, tested, machine-readable artefacts that downstream tools can consume directly.
+
+### 2.1 [`m-standard`](https://github.com/rafael5/m-standard) — the language reference
+
+A machine-readable, vendor-neutral inventory of the M language surface, reconciled across the Annotated M Standard (ISO 11756), YottaDB documentation, IRIS documentation, and the VA SAC / XINDEX rule set:
+
+- **949 keyword forms** (commands, intrinsic functions, intrinsic special variables, operators, pattern codes) with provenance flags (`in_anno`, `in_ydb`, `in_iris`).
+- Three layered standards: **Pragmatic** (81 tokens — runs unmodified on both YDB and IRIS), **VA SAC-clean**, and **Operational** (58 tokens — Pragmatic ∩ SAC).
+- A `grammar-surface.json` artefact purpose-built for parser generators and tools to consume.
+- All sources offline-replicated; the build is byte-deterministic; **9 validation gates** passing on every CI run.
+
+This is the **vocabulary** every Tier 1 tool needs: which tokens are valid, which are pragmatic, which are SAC-compliant, what provenance each carries.
+
+### 2.2 [`tree-sitter-m`](https://github.com/rafael5/tree-sitter-m) — the parser
+
+A production tree-sitter grammar for M, generated from `m-standard`'s grammar-surface JSON:
+
+- **99.06% clean parse on the full 39,330-routine VistA corpus**; **100% on clinical packages**.
+- 10,000-line synthesised routine parses in **78.6 ms**.
+- 110 corpus tests + 19 lib tests + 347/347 keyword-coverage triples passing in CI.
+- Bindings scaffolded for **Node / Rust / Python / Go**; publishing to npm / crates.io / PyPI / Go-modules in progress.
+
+This is the **AST** every Tier 1 tool needs. A formatter walks it to produce canonical text. A linter visits its nodes with rule predicates. A test discoverer searches for `tXxx` test labels in it. Without this, every tool would have to re-implement an M parser from scratch — a multi-year effort that has been the historical blocker for M tooling.
+
+### 2.3 VistA — the corpus
+
+The U.S. Department of Veterans Affairs' VistA system — distributed publicly via [`WorldVistA/VistA-M`](https://github.com/WorldVistA/VistA-M) — is **~40,000 routines of pure ANSI MUMPS**, in active production use for decades. It is the **largest, most diverse open-source M codebase** in existence and the gold-standard real-world test corpus.
+
+Every Tier 1 tool can be:
+
+- **Built against it.** VistA exercises every hard-to-parse M idiom (dot-blocks, naked references, postconditionals, indirection, edge-case parameter passing). A tool that handles VistA handles real-world M.
+- **Validated against it.** A formatter that doesn't round-trip cleanly on VistA isn't ready. A linter that produces 40,000 false positives isn't ready. A test discoverer that misses VistA's `tXxx` conventions isn't ready.
+- **Demonstrated on it.** VistA is the showcase: *"this tool runs on the largest M codebase in production today."*
+
+VistA is the difference between toy tooling and tooling proven at scale. `tree-sitter-m` already validates against it (39,330 routines, 99.06% clean); every Tier 1 tool inherits that validation harness.
+
+---
+
+## 3. Strategy: incremental remediation
+
+### 3.1 Principles
+
+1. **Build in dependency order.** A formatter unblocks the linter (linter rules can assume canonical layout). A test runner unblocks single-test selection and the watcher. Build prerequisites first.
+2. **Ship each tool independently.** No tool waits for the next; each is usable on the day it's released.
+3. **Validate against VistA on every release.** A Tier 1 tool that isn't tested on the 40,000-routine corpus is unfinished work.
+4. **Build on YottaDB; design for portability.** YottaDB is AGPL-3.0 — fully open-source, fully reproducible CI, no licence negotiation, no per-developer seat costs. But the tools are **source-level**: they consume `.m` files via `tree-sitter-m`, not via any engine-specific interface. They run on any conformant M engine simply by running on the source.
+5. **No engine lock-in in the binding choice.** Each tool exposes a stable CLI; engine integration is a thin shell wrapper. A Python user with `tree-sitter-m` Python bindings can run the formatter without YottaDB; an IRIS shop can run it the same way.
+
+### 3.2 The sequence
+
+| Step | Tool | Depends on | Notes |
+|------|------|-----------|-------|
+| 1 | **Formatter** (`m fmt`) | tree-sitter-m AST + lossless byte-range pretty-printer | Build first: every later tool benefits from canonical layout. **Idempotent** (`m fmt | m fmt` produces no further change). `--check` mode for CI. |
+| 2 | **Linter — logic** (`m lint --logic`) | tree-sitter-m AST visitor + rule predicates | Catches missing `QUIT`, unreachable code, undefined labels, unused locals, naked-reference hazards. Pluggable rules; configurable via `m.toml`. JSON / TAP output for editor integration. |
+| 3 | **Test runner** (`m test`) | YottaDB runtime + parser-aware `tXxx` test discovery | The project already ships [`ytest`](../bin/ytest); the strategic step is to make it **parser-aware** (test discovery via tree-sitter-m, not regex), portable across engines via thin adapters, and TAP-13 compliant out of the box. |
+| 4 | **Single-test selection** | (folded into test runner) | `m test <suite> <label>`; `m test --pattern '...'`. Already prototyped in `ytest <suite> <label>`. |
+| 5 | **Test watcher** (`m watch`) | Formatter + linter + test runner | Auto-rerun on save. Smart routing — recompile + test only the affected suites. The project already ships [`ytest-watch-smart`](../bin/ytest-watch-smart) as a foundation; the parser-aware version replaces stat-based polling with AST-derived dependency tracking. |
+
+### 3.3 What ships at each step
+
+**After Step 1 (formatter):**
+- Every contributor in the M ecosystem can apply canonical layout to any `.m` file with a single command.
+- VistA codebases can adopt a consistent style without manual re-indenting.
+- Pre-commit hooks gain a meaningful `m fmt --check` gate.
+- Code reviews stop arguing about whitespace.
+
+**After Step 2 (linter — logic):**
+- Whole categories of bugs caught at edit time, not at runtime.
+- Pre-commit hook gains an `m lint --logic` gate.
+- VistA-specific rule sets can be enabled via SAC compliance level (driven from `m-standard`'s SAC mappings).
+- Editor integration via JSON output makes diagnostics first-class in VS Code, Vim, Emacs.
+
+**After Step 3+4 (test runner with single-test selection):**
+- Test discovery is parser-aware (no false-positive label detection).
+- Tests run on any M engine via adapters; YottaDB is primary, IRIS adapter follows.
+- `m test <suite> <label>` is the supported, documented invocation.
+- TAP-13 output integrates with mainstream CI dashboards.
+
+**After Step 5 (test watcher):**
+- The full inner loop: edit → save → format → lint → run-affected-test → instant feedback.
+- The first time the M ecosystem has the modern fast-feedback workflow that DORA / *Accelerate* research identifies as foundational.
+
+### 3.4 Portability across M implementations
+
+Each Tier 1 tool is **source-level by construction**:
+
+| Tool | Engine touchpoint | Portability story |
+|------|-------------------|-------------------|
+| Formatter | None | Operates on `.m` text via tree-sitter-m. Engine-independent. |
+| Linter (logic) | None | Operates on the AST. Engine-independent. |
+| Test discovery | None | Parser-aware label scan. Engine-independent. |
+| Test execution | Local M engine CLI | Pluggable adapter: YottaDB primary (`ydb -run ^TESTRUN`), IRIS via `iris session`, GT.M via `mumps -run`, etc. |
+| Test watcher | Filesystem + test execution | Engine-independent orchestrator; only the test-execution adapter is engine-specific. |
+
+The integration boundary with the engine is *only* the test-execution adapter — everything else operates on `.m` source files via the parser. This makes each tool **portable to IRIS-based VistA, GT.M, or any other conformant M engine** with **only a small adapter** for that engine's CLI. The bulk of the implementation is engine-neutral.
+
+YottaDB is the primary build / development engine for two pragmatic reasons:
+
+1. **Open-source reproducibility.** AGPL-3.0 means anyone can install, run, and contribute without licence negotiation. CI runs in any standard container.
+2. **Mature C API.** `libyottadb.so` is a stable extensibility surface; foreign-language bindings (Go, Python, Rust, Node.js, Lua, Perl) make it straightforward to embed engine calls in tool implementations when needed (see [m-tool-gap-analysis.md §4.4](m-tool-gap-analysis.md#44-foreign-language-integration-embedded-language-vs-embedded-database) for the architecture rationale).
+
+But "built on YottaDB" never means "locked to YottaDB." Each tool's parser-side work is engine-neutral; only the test-execution shim varies by engine.
+
+### 3.5 Validation gates
+
+Before any Tier 1 tool is considered production-ready, it must pass:
+
+1. **VistA round-trip.** Runs cleanly on the full 40,000-routine [VistA-M](https://github.com/WorldVistA/VistA-M) corpus with no false-positive failures (formatter), no false-positive lints (linter), or no missed tests (discovery).
+2. **Cross-engine smoke test.** The test runner adapters work on YottaDB (primary) and IRIS (`iris session` adapter); no engine-specific behaviours leak into the source-level tools.
+3. **CI integration.** Each tool is wired into the project's own CI as a self-test (`make ci` runs the tool on the project's own routines). Dogfooding is the first acceptance test.
+4. **Performance ceiling.** Each tool runs the full VistA corpus inside a documented budget — first-pass targets: formatter ≤ 60 s, linter ≤ 120 s on a current developer laptop. Performance-budgeting from day one prevents the "works on small examples but unusable at scale" failure mode.
+
+### 3.6 Out of scope (intentional)
+
+The Tier 1 plan does **not** cover:
+
+- **Coverage** (line / branch) — Tier 2 in [§8](m-tool-gap-analysis.md#8-rank-ordered-developer-impact-where-to-invest-first).
+- **Documentation generation** — Tier 3.
+- **Dependency management** — Tier 3, blocked on a manifest-format design in `m-standard`.
+- **IDE / DAP integration** — Tier 2; substantial engineering on its own.
+- **IRIS ObjectScript (IOS) tooling** — out of scope; IOS is a separate language ([m-tool-gap-analysis.md §4.1.1](m-tool-gap-analysis.md#411-iris-objectscript-ios-what-it-is-and-why-it-isnt-ansi-standard-mumps)) with its own toolchain.
+
+These are excluded to keep the Tier 1 plan focused. Each is sequenced separately in [gap-analysis-and-remediation-strategy.md → Addendum B](gap-analysis-and-remediation-strategy.md#addendum-b-prioritized-sequence-of-remediation-post-parser).
+
+---
+
+## 4. Why Tier 1 first
+
+The case for Tier 1 primacy has three legs, all already established in the companion analysis:
+
+**1. Empirical research on developer productivity.** [m-tool-gap-analysis.md §8.5](m-tool-gap-analysis.md#85-validation-empirical-grounding-for-the-ranking) cites primary sources:
+
+- Forsgren, Humble & Kim (2018), *Accelerate*, identifies test automation as among the technical capabilities most strongly correlated with high software-delivery performance (DORA programme, 23,000+ respondents).
+- Sadowski et al. (2018), [*"Lessons from Building Static Analysis Tools at Google"*](https://cacm.acm.org/research/lessons-from-building-static-analysis-tools-at-google/) (CACM 61(4)) — Tricorder static analysis prevents hundreds of bugs per day from entering Google's codebase.
+- Vasilescu et al. (2015), [*"Quality and productivity outcomes relating to continuous integration in GitHub"*](https://web.cs.ucdavis.edu/~filkov/papers/pr_soc_lan.pdf) (FSE 2015) — CI users merge PRs significantly faster and find more bugs.
+- Stack Overflow Annual Developer Survey: Ruff (84% admired) and Cargo (83% admired) — top-of-survey tools that bundle the Tier 1 capabilities.
+
+**2. Cross-engine consolidation.** [§7 in m-tool-gap-analysis.md](m-tool-gap-analysis.md#7-consolidated-gap-analysis) shows that **all five Tier 1 capabilities are MAJOR common gaps** — both IRIS and YottaDB ship **None** for MUMPS code. **A single source-level tool, built on a shared parser foundation, fills the gap on every M engine simultaneously.** That economy of leverage is the strategic case for treating M as a portable language with portable tooling, not as a vendor-locked feature.
+
+**3. The 40,000-routine VistA reality.** [§6.1 / §6.2](m-tool-gap-analysis.md#6-the-real-question-developer-experience-for-a-legacy-mumps-codebase) of m-tool-gap-analysis frames the real-world stakes: a VistA codebase has effectively zero benefit from IRIS's IOS-targeted tooling (the wrappers don't reach the MUMPS code) and effectively zero benefit from YottaDB's runtime-first investment (the developer-experience layer simply isn't there). Tier 1 is the work that closes the gap *for the actual M codebase that matters most*.
+
+---
+
+## 5. Design decisions
+
+The five questions raised during initial planning now have working resolutions. They remain provisional — they may be revisited as work proceeds — but the project starts with these decisions in place.
+
+### 5.1 IRIS adapter ownership
+
+**Decision: defer indefinitely. Tier 1 ships without an IRIS adapter.**
+
+InterSystems' demonstrated trajectory is to promote IRIS ObjectScript (IOS) as the developer-facing language and to scrub mention of MUMPS where possible — the 2018 Caché → IRIS rename was a marketing exercise, and IOS is a proprietary wrapper sitting *between* IRIS users and the MUMPS substrate. The vendor is not investing in MUMPS-side developer experience and has shown no interest in doing so. (See [m-tool-gap-analysis.md §1.2 naming history](m-tool-gap-analysis.md#naming-history-intersystems-mumps--caché-objectscript--iris-objectscript-ios) and [§4.1.3](m-tool-gap-analysis.md#413-iris-tooling-by-file-scope-and-language) for the evidence.)
+
+Building and maintaining an IRIS adapter would require coordinating with a vendor whose strategic interests are misaligned with the goals of this work. The pragmatic choice is to invest the same effort in YottaDB and the source-level tooling — which ports automatically to any conformant M engine — and let an IRIS adapter remain a community contribution if one ever emerges. The source-level tools (formatter, linter, test discovery) are unaffected by this decision; they run on `.m` files via the parser, regardless of which engine the runtime side targets.
+
+### 5.2 `^XINDEX` integration
+
+**Decision: import the `^XINDEX` rule set as the linter's baseline; validate against XINDEX on the VistA corpus; then expand. Expose rule-family selection via a `--rules` toggle.**
+
+Mechanics:
+
+- The Tier 1 linter's first rule pack **replicates the XINDEX rule set**, mapping each XINDEX check to an `m lint` rule with stable IDs (e.g., `M-XINDX-001`, `M-XINDX-002`, …).
+- Running `m lint --rules xindex` on the VistA corpus must reproduce XINDEX's findings — a hard validation gate ("if XINDEX flags it, we flag it; if XINDEX doesn't flag it, we don't").
+- After parity, `m lint` extends with rules that XINDEX does not cover: parser-aware checks XINDEX cannot do (e.g., naked-reference hazards in nested dot blocks), modern lint categories (dead-code analysis, unused-locals), SAC compliance levels, and project-specific rules.
+- A `--rules` toggle selects the rule family at invocation time:
+  - `--rules xindex` — XINDEX-equivalent only (legacy compatibility mode)
+  - `--rules sac` — VA SAC compliance set (driven from `m-standard`'s SAC mappings)
+  - `--rules all` — everything the linter knows
+  - `--rules <custom>` — per-project profiles defined in `m.toml`
+
+**Rationale.** XINDEX's rule set encodes decades of accumulated VA / VistA experience about what to catch in M code. Replicating it gives the linter immediate credibility on the VistA corpus, makes the migration path frictionless ("disable `^XINDEX`, enable `m lint --rules xindex`, expect the same findings"), and provides a baseline from which to expand. The pattern of "absorb the predecessor, then extend" is well-precedented — ESLint absorbed JSHint / JSLint rules; Ruff absorbed flake8 / isort / pyupgrade.
+
+### 5.3 Performance baselining
+
+**Decision: TBD. Resolved by measurement once the tools exist.**
+
+The 60 s / 120 s budgets in §3.5 are first-pass estimates. Empirical baselining will happen once each tool exists and can be measured against the 40,000-routine VistA corpus on representative hardware. If the budgets prove unrealistic, they will be revised based on actual measurements; the *requirement* of having a documented budget remains, even if the specific numbers change. Performance regressions versus the budget should be a CI-blocking event from the first release.
+
+### 5.4 Editor integration cadence
+
+**Decision: JSON and LSP-compatible output from the very first release of each tool. VS Code is the primary editor target.**
+
+Mechanics:
+
+- Each tool ships with a `--format=json` flag from the first release. The JSON output schema is documented and held stable across patch versions.
+- An **LSP server** is developed alongside the tool implementation, not bolted on after. The LSP wrapper consumes the tool's own JSON output internally — so editor integration and CLI use share a single source of truth for diagnostics, formatting, and test results.
+- A **VS Code extension** is the primary editor surface: linter diagnostics, format-on-save (`m fmt`), test-runner integration via the VS Code Test Explorer API. Vim / Emacs / JetBrains LSP clients work without additional effort once the LSP server exists.
+
+**Rationale.** Adding LSP / IDE support after the fact is materially more expensive than designing for it upfront — it requires retrofitting tool internals to expose structured output and support partial / incremental computation. VS Code is the dominant editor in contemporary developer surveys ([Stack Overflow Developer Survey 2024](https://survey.stackoverflow.co/2024/) places it at the top), and it is the editor most likely to be installed by the M developers who would benefit from this work. Designing for VS Code first incidentally serves all other LSP-aware editors.
+
+### 5.5 Versioning across `m-standard` updates
+
+**Decision: each tool pins to a specific `m-standard` snapshot identified by the generation date of the `m-standard` artefact the tool was built against.**
+
+Mechanics:
+
+- Each tool's release manifest records the `m-standard` generation date (e.g., `m-standard@2025-01-15`).
+- Upgrading a tool to a newer `m-standard` snapshot is a **deliberate operation at release time**, accompanied by regression tests against the VistA corpus to confirm no parsing or rule-evaluation drift.
+- Tools do not float against a moving `m-standard` — that would compromise reproducibility (a `m fmt` run today must produce the same output as a `m fmt` run last year against the same source).
+
+**Rationale.** Pinning by date is simpler than version-range constraints, and `m-standard`'s build is byte-deterministic — so a date is sufficient to identify an exact snapshot. Reproducibility is more important than being on the absolute newest grammar surface; users who need new tokens upgrade explicitly.
+
+---
+
+*End of m-tooling-tier1 document.*
diff --git a/profile/README.md b/profile/README.md
index 146be80..759d420 100644
--- a/profile/README.md
+++ b/profile/README.md
@@ -68,15 +68,17 @@ Manifest pointers for symbol-level lookups:
 
 ### Historical root
 
-| Repo | What it is |
+| Where | What it is |
 |---|---|
-| [`m-tools`](https://github.com/m-dev-tools/m-tools)                     | **Archived** seed of the entire org. The original 2026 gap analysis, Tier 1–4 strategy, and `m <subcommand>` command map that produced everything above. Kept as the historical record; working code has graduated into the sibling repos. Start here for the *why*. |
+| [`.github/docs/history/`](../docs/history/) | Frozen in-org snapshots of the design documents that seeded the entire org — the original 2026 gap analysis, the Tier 1–4 strategy, and the `m <subcommand>` command map that produced everything above. Imported verbatim from the (now archived) [`m-tools`](https://github.com/m-dev-tools/m-tools) repo. Start here for the *why*. |
 
 ## How the pieces connect
 
 ```
                           ┌────────────────────────────────────┐
-                          │   m-tools  (archived seed)         │
+                          │   .github/docs/history/            │
+                          │   (frozen snapshots from the       │
+                          │    archived m-tools seed repo)     │
                           │   • original gap analysis          │
                           │   • Tier 1–4 strategy docs         │
                           └─────────────────┬──────────────────┘
diff --git a/profile/task_index.json b/profile/task_index.json
index 5144d55..cb04f67 100644
--- a/profile/task_index.json
+++ b/profile/task_index.json
@@ -278,18 +278,18 @@
     "history": {
       "why_does_m_dev_tools_exist": {
         "intent": "Read the original gap analysis that justified the org",
-        "primary": "tool:m-tools",
-        "doc": "https://github.com/m-dev-tools/m-tools/blob/main/docs/gap-analysis-and-remediation-strategy.md"
+        "primary": "doc:m-dev-tools#gap-analysis-and-remediation-strategy",
+        "doc": "https://github.com/m-dev-tools/.github/blob/main/docs/history/gap-analysis-and-remediation-strategy.md"
       },
       "go_rust_python_analogues": {
         "intent": "How does m-cli map to go/cargo/poetry?",
-        "primary": "tool:m-tools",
-        "doc": "https://github.com/m-dev-tools/m-tools/blob/main/docs/m-tool-gap-analysis.md"
+        "primary": "doc:m-dev-tools#m-tool-gap-analysis",
+        "doc": "https://github.com/m-dev-tools/.github/blob/main/docs/history/m-tool-gap-analysis.md"
       },
       "tier1_strategy": {
         "intent": "What was the original Tier 1 deliverable scope?",
-        "primary": "tool:m-tools",
-        "doc": "https://github.com/m-dev-tools/m-tools/blob/main/docs/m-tooling-tier1.md"
+        "primary": "doc:m-dev-tools#m-tooling-tier1",
+        "doc": "https://github.com/m-dev-tools/.github/blob/main/docs/history/m-tooling-tier1.md"
       }
     }
   }
diff --git a/profile/tools.json b/profile/tools.json
index f0823eb..202a305 100644
--- a/profile/tools.json
+++ b/profile/tools.json
@@ -18,7 +18,7 @@
       "m-stdlib has architectural priority over m-cli — when both projects need a utility, it lands in m-stdlib first.",
       "Source-level tools (m fmt, m lint) are engine-neutral; runtime tools (m test, m coverage) are YottaDB-targeted.",
       "Every module ships with a vendored conformance corpus tied to the relevant RFC or NIST publication.",
-      "m-cli's CLI ergonomics deliberately mirror Go (`go fmt`/`test`/`vet`) and Rust (`cargo fmt`/`test`/`clippy`) — see m-tools/docs/m-tool-gap-analysis.md for the gap analysis that drove the design."
+      "m-cli's CLI ergonomics deliberately mirror Go (`go fmt`/`test`/`vet`) and Rust (`cargo fmt`/`test`/`clippy`) — see .github/docs/history/m-tool-gap-analysis.md for the gap analysis that drove the design."
     ]
   },
 
@@ -169,18 +169,6 @@
       "extension_info_url": "https://raw.githubusercontent.com/m-dev-tools/m-stdlib-vscode/main/dist/extension-info.json",
       "package_json_url":   "https://raw.githubusercontent.com/m-dev-tools/m-stdlib-vscode/main/package.json",
       "consumes": ["tool:m-stdlib"]
-    },
-
-    "m-tools": {
-      "id": "tool:m-tools",
-      "repo": "https://github.com/m-dev-tools/m-tools",
-      "role": "ARCHIVED — historical seed of the entire org. Original gap analysis + Tier 1–4 strategy + canonical command map.",
-      "language": "markdown",
-      "license": "AGPL-3.0",
-      "agent_instructions": "https://github.com/m-dev-tools/m-tools/blob/main/README.md",
-      "verified_on": "2026-05-10",
-      "status": "archived",
-      "notes": "Read for: why the ecosystem is shaped the way it is. See m-tools/docs/m-tool-gap-analysis.md for the Go/Rust/Python comparison that drove m-cli's CLI ergonomics. preserved_docs: docs/gap-analysis-and-remediation-strategy.md, docs/m-tool-gap-analysis.md, docs/m-tooling-tier1.md, docs/implementation.md."
     }
   },