Skip to content

protege-mcp v0.4.0

Latest

Choose a tag to compare

@github-actions github-actions released this 02 Jul 01:49

Safe, testable LLM-assisted authoring. Move the assistant from a "confident editor" to a "safe,
testable editor" by closing the propose → ground → verify → confirm loop and adding a re-runnable
requirements (competency-question) suite — all built by reusing shipping primitives (the single-undo
transactional apply, the embedded reasoner, Jena ARQ, OWLEntityFinder, the catalog sidecar pattern).
55 → 61 tools.

New tools

  • add_competency_question / list_competency_questions / remove_competency_question / run_competency_questions — a re-runnable requirements suite. A competency question pairs an executable SPARQL query with an expected result — nonEmpty (default) / empty / count OP N (OP>=,<=,==,>,<) / exactRows — and run re-checks them all against one shared point-in-time snapshot, so a curation edit that quietly breaks a requirement is caught like a failing unit test. CQs are stored via a small storage SPI with three conventions: robot-sparql-dir (the default — a cqs/ folder of *.rq files with header-comment metadata, for ROBOT/CI interop), sidecar-manifest (a full-fidelity <basename>-cqs.json with a version: 1 contract, unknown-key-preserving, written atomically), and ontology-annotations (CQs stored inside the artifact — the fallback when the ontology is unsaved). list detects the convention(s), add/remove operate in a chosen one (explicit convention wins > single detected > default), and run is convention-agnostic. Malformed input is isolated (a bad .rq/manifest entry is skipped-with-reason, never fatal); mandatory caveats (open-world empty, truncated results/inferences) are surfaced, never silent.
  • verify_ontology — run project-defined SPARQL invariants (like ROBOT verify): each queries[] item is a SELECT or ASK whose results are violations (a returned row / ASK true flags it, at the item's error/warn/info severity) — a graph-producing CONSTRUCT/DESCRIBE is not a detector and is rejected (use sparql_query for those). Runs over a shared off-EDT snapshot (UPDATE/SERVICE rejected); violations are reported as raw SPARQL bindings — never rendered through the UI thread. The overall gate fails when a violation reaches fail_on (default error); a check that cannot run — a query that errors, an include_inferred invariant with no classified reasoner, or a rejected non-SELECT/ASK form — fails fail-closed (it never silently degrades to the asserted triples and reports a false pass).
  • run_qc_suite — one aggregate quality-control gate. Composable stages (default reasoner + profile + structural), plus opt-in invariants, cqs, and a reserved shacl — all evaluated against one shared snapshot and collapsed to a single verdict. A stage whose backing data is absent (no classified reasoner, no invariants, no CQs, no SHACL) is skipped with a reason, never an error; the gate is the worst ran stage versus fail_on.

Improved

  • apply_changes gains verify=none | report | rollback — reasoner-verified apply. With report or rollback, the batch is applied as one undoable transaction, the reasoner is classified off the UI thread, and the result is checked for a regression caused by this batch — a class that became unsatisfiable (postUnsat \ preUnsat) or an ontology that became inconsistent. report keeps the batch and returns the verdict; rollback additionally reverts the whole batch in one undo when a regression is attributable. The pre-read → apply → classify → post-read → undo sequence runs under a server-level write mutex (MCP handlers are multi-threaded), and an intervening GUI edit between apply and re-classification degrades to report semantics rather than blind-undoing. Warm reasoner = 1 classification, cold = 2; a timeout_ms bounds each.
  • search_entities is now grounding-aware (additive fields — note the ordering change below). Each hit carries a score and a match_kind (exact | prefix | substring | fuzzy — the exact tier considers every rdfs:label language variant and the IRI local name, case/whitespace/diacritic-folded), and the result adds best_match (the IRI the query grounds to, or null) and would_mint (true when a single-term query grounds to nothing, so using it as a create_* name would introduce a NEW entity — a full-IRI / Manchester / multi-word query is never flagged). This lets an assistant decide whether to reuse a term or mint one.

Behavior change

  • search_entities results are now RANKED, not just display-sorted: the top-level items[] are ordered by score (exact → prefix → substring → fuzzy), then display, then IRI (a stable tiebreak so the finder's Set order can't leak). Clients that relied on the previous purely alphabetical order should sort explicitly. The count/items/truncated shape is otherwise unchanged, and every other tool's entityList ordering is untouched.

Notes

  • New method-level tests for every core (F1 regression decision, F2 ranking + mint prediction incl. multi-language-label and diacritic cases, F3 expectation judging + exactRows set/bnode handling + each store's load/upsert/remove round-trip incl. malformed-skip + selection precedence, F4 violation detection + fail-closed gate, F5 stage aggregation + no-reasoner skip) and for the tool wrappers (verify_ontology / the four competency-question tools — schema, arg parsing, store selection/aggregation, and the run/remove branches, driven end-to-end over a headless OntologyAccess), plus a headless CQ add → run → remove pipeline. Three adversarial review rounds were folded in before release: an eight-finding first round; a second round that hardened verify_ontology — an include_inferred invariant with no reasoner now fails closed instead of silently degrading to the asserted triples, and a CONSTRUCT/DESCRIBE invariant is now rejected (SELECT/ASK only); and a third round that fixed run_qc_suite's aggregation — a warn/info-severity invariant that cannot run now surfaces as WARN (so fail_on=warn trips it) instead of being swallowed to PASS, and the cqs stage now surfaces per-CQ degradation caveats. Test count 1720 → 2036.
  • The default robot-sparql-dir needs no new serialization dependency (plain .rq + header comments); sidecar-manifest uses JSON (jackson-databind, already a direct dependency). Requires a Java 17+ JVM (unchanged).

Install: download protege-mcp-0.4.0.jar below, or use Protégé ▸ File ▸ Check for plugins.