Safe, testable LLM-assisted authoring. Move the assistant from a "confident editor" to a "safe,
testable editor" by closing the propose → ground → verify → confirm loop and adding a re-runnable
requirements (competency-question) suite — all built by reusing shipping primitives (the single-undo
transactional apply, the embedded reasoner, Jena ARQ, OWLEntityFinder, the catalog sidecar pattern).
55 → 61 tools.
New tools
add_competency_question/list_competency_questions/remove_competency_question/run_competency_questions— a re-runnable requirements suite. A competency question pairs an executable SPARQL query with an expected result —nonEmpty(default) /empty/count OP N(OP∈>=,<=,==,>,<) /exactRows— andrunre-checks them all against one shared point-in-time snapshot, so a curation edit that quietly breaks a requirement is caught like a failing unit test. CQs are stored via a small storage SPI with three conventions:robot-sparql-dir(the default — acqs/folder of*.rqfiles with header-comment metadata, for ROBOT/CI interop),sidecar-manifest(a full-fidelity<basename>-cqs.jsonwith aversion: 1contract, unknown-key-preserving, written atomically), andontology-annotations(CQs stored inside the artifact — the fallback when the ontology is unsaved).listdetects the convention(s),add/removeoperate in a chosen one (explicitconventionwins > single detected > default), andrunis convention-agnostic. Malformed input is isolated (a bad.rq/manifest entry is skipped-with-reason, never fatal); mandatory caveats (open-worldempty, truncated results/inferences) are surfaced, never silent.verify_ontology— run project-defined SPARQL invariants (like ROBOTverify): eachqueries[]item is a SELECT or ASK whose results are violations (a returned row / ASK true flags it, at the item'serror/warn/infoseverity) — a graph-producingCONSTRUCT/DESCRIBEis not a detector and is rejected (usesparql_queryfor those). Runs over a shared off-EDT snapshot (UPDATE/SERVICErejected); violations are reported as raw SPARQL bindings — never rendered through the UI thread. The overallgatefails when a violation reachesfail_on(defaulterror); a check that cannot run — a query that errors, aninclude_inferredinvariant with no classified reasoner, or a rejected non-SELECT/ASK form — fails fail-closed (it never silently degrades to the asserted triples and reports a false pass).run_qc_suite— one aggregate quality-control gate. Composable stages (defaultreasoner+profile+structural), plus opt-ininvariants,cqs, and a reservedshacl— all evaluated against one shared snapshot and collapsed to a single verdict. A stage whose backing data is absent (no classified reasoner, no invariants, no CQs, no SHACL) is skipped with a reason, never an error; the gate is the worst ran stage versusfail_on.
Improved
apply_changesgainsverify=none | report | rollback— reasoner-verified apply. Withreportorrollback, the batch is applied as one undoable transaction, the reasoner is classified off the UI thread, and the result is checked for a regression caused by this batch — a class that became unsatisfiable (postUnsat \ preUnsat) or an ontology that became inconsistent.reportkeeps the batch and returns the verdict;rollbackadditionally reverts the whole batch in one undo when a regression is attributable. The pre-read → apply → classify → post-read → undo sequence runs under a server-level write mutex (MCP handlers are multi-threaded), and an intervening GUI edit between apply and re-classification degrades toreportsemantics rather than blind-undoing. Warm reasoner = 1 classification, cold = 2; atimeout_msbounds each.search_entitiesis now grounding-aware (additive fields — note the ordering change below). Each hit carries ascoreand amatch_kind(exact|prefix|substring|fuzzy— the exact tier considers everyrdfs:labellanguage variant and the IRI local name, case/whitespace/diacritic-folded), and the result addsbest_match(the IRI the query grounds to, or null) andwould_mint(true when a single-term query grounds to nothing, so using it as acreate_*name would introduce a NEW entity — a full-IRI / Manchester / multi-word query is never flagged). This lets an assistant decide whether to reuse a term or mint one.
Behavior change
search_entitiesresults are now RANKED, not just display-sorted: the top-levelitems[]are ordered byscore(exact → prefix → substring → fuzzy), then display, then IRI (a stable tiebreak so the finder'sSetorder can't leak). Clients that relied on the previous purely alphabetical order should sort explicitly. Thecount/items/truncatedshape is otherwise unchanged, and every other tool'sentityListordering is untouched.
Notes
- New method-level tests for every core (F1 regression decision, F2 ranking + mint prediction incl. multi-language-label and diacritic cases, F3 expectation judging +
exactRowsset/bnode handling + each store's load/upsert/remove round-trip incl. malformed-skip + selection precedence, F4 violation detection + fail-closed gate, F5 stage aggregation + no-reasoner skip) and for the tool wrappers (verify_ontology/ the four competency-question tools — schema, arg parsing, store selection/aggregation, and the run/remove branches, driven end-to-end over a headlessOntologyAccess), plus a headless CQ add → run → remove pipeline. Three adversarial review rounds were folded in before release: an eight-finding first round; a second round that hardenedverify_ontology— aninclude_inferredinvariant with no reasoner now fails closed instead of silently degrading to the asserted triples, and aCONSTRUCT/DESCRIBEinvariant is now rejected (SELECT/ASK only); and a third round that fixedrun_qc_suite's aggregation — a warn/info-severity invariant that cannot run now surfaces asWARN(sofail_on=warntrips it) instead of being swallowed toPASS, and thecqsstage now surfaces per-CQ degradation caveats. Test count 1720 → 2036. - The default
robot-sparql-dirneeds no new serialization dependency (plain.rq+ header comments);sidecar-manifestuses JSON (jackson-databind, already a direct dependency). Requires a Java 17+ JVM (unchanged).
Install: download protege-mcp-0.4.0.jar below, or use Protégé ▸ File ▸ Check for plugins.