Releases: nuri428/ontorag
v1.1.1 — stdio MCP · causal explainability · reasoning goldset
ontorag v1.1.1
Three lightweight, high-impact additions on top of v1.0. The GNN / learning layer stays deferred to a later release — v1.1 instead sharpens the existing reasoning stack and broadens integration reach, with no new heavy dependency in the core.
What's new
1. Standalone stdio MCP server (ontorag-mcp)
The HTTP /mcp endpoint (fastapi-mcp) needs a running FastAPI server. v1.1 adds a client-spawned stdio entrypoint so the ontology tools wire into any MCP client in a single config line — no server required:
{ "command": "ontorag-mcp", "env": { "GRAPH_STORE": "fuseki" } }- Built on the official MCP Python SDK (
stdio_server); handlers call the existingcreate_store()+GraphStoreprotocol + Bayesian/Causal engines directly. GRAPH_STOREselects the backend (Fuseki / Neo4j / FalkorDB all work).- Exposes the high-value read tools +
compute_posterior/do_query; raw SPARQL stays excluded (same policy as HTTP/mcp). - Ships as the
[mcp]extra (mcp>=1.0) + anontorag-mcpconsole script. - Verified end-to-end: a real MCP client spawns
ontorag-mcp,initialize→tools/list(10) →call_toolget_schema(6 classes / 20 props) +count_entities(Pokemon=13).
2. Causal answer explainability
do_query now carries its own justification — turning a bare number into an auditable answer (the vector-RAG differentiator made visible):
CausalEngine.explain_do()returns the interventional distribution plus the back-door adjustment set the graph surgery used, plus a one-line "why do ≠ see" summary.- Surfaced through the MCP tool, the REST route (
DoQueryResponsegains optionaladjustment+explanation), and the Reasoning WebUI (a "why:" trace under thedo()bars). - The existing
do_querysignature is unchanged — no caller breakage. - For the smoking example it reports the back-door set
{Genotype}; says "do equals see" when there is no confounder.
3. Reasoning-layer goldset + ontorag eval reasoning
Fills the gap flagged in docs/BENCHMARK_v1.md ("reasoning layers have no goldset"). The SPARQL-centric goldset can't express probability queries, so this adds a parallel reasoning goldset + a thin runner:
eval/reasoning_goldset.py—ReasoningQuestion/ReasoningGoldset(kinds:posterior/do/counterfactual/identify) +evaluate()checks each against expected values within tolerance, reusingBayesianEngine/CausalEngine. Backend-agnostic.ontorag eval reasoning <goldset>— loads the stored BN (+ causal DAG) from the active backend and reports pass/fail per question.examples/smoking/reasoning_goldset.jsonl— 6 hand-verified checks: P(Cancer|see)=0.72, P(Cancer|do)=0.60 / 0.20, counterfactual=0.28, back-door{Genotype}, marginal P(Cancer)=0.43.- The runner immediately earned its keep: it caught a wrong prior in the goldset itself (P(Cancer)=0.43, not 0.5), now fixed to the engine-verified value.
Notes
- Suite: 914 unit tests pass; all three features tested. No change to the v1.0 3-backend parity numbers (
docs/BENCHMARK_v1.md). - Versioning: these are feature additions; the
1.1.0 → 1.1.1tag was chosen per request. - License: MIT (ontorag itself). Backend licenses vary — FalkorDB is RSAL, not OSI-approved.
See the README for the quickstart, full CLI reference, and the stdio MCP config guide.
한국어 (Korean)
ontorag v1.1.1
v1.0 위에 올린, 가볍지만 임팩트 있는 세 가지 추가입니다. GNN(학습 레이어)은 다음 릴리스로 계속 미뤄두고, v1.1은 코어에 무거운 의존성을 더하지 않으면서 기존 추론 스택을 날카롭게 다듬고 통합 범위를 넓히는 데 집중합니다.
1. 독립 실행형 stdio MCP 서버 (ontorag-mcp)
HTTP /mcp 엔드포인트(fastapi-mcp)는 FastAPI 서버가 떠 있어야 동작합니다. v1.1은 클라이언트가 직접 띄우는 stdio 엔트리포인트를 추가해, 서버 없이 설정 한 줄로 온톨로지 툴을 어떤 MCP 클라이언트에든 연결할 수 있게 합니다:
{ "command": "ontorag-mcp", "env": { "GRAPH_STORE": "fuseki" } }- 공식 MCP Python SDK(
stdio_server) 기반. 핸들러가 기존create_store()+GraphStore프로토콜 + 베이지안/인과 엔진을 직접 호출합니다. GRAPH_STORE로 백엔드 선택 (Fuseki / Neo4j / FalkorDB 모두 동작).- 핵심 read 툴 +
compute_posterior/do_query노출. raw SPARQL은 제외 (HTTP/mcp와 동일 정책). [mcp]extra(mcp>=1.0) +ontorag-mcp콘솔 스크립트로 제공.- 엔드투엔드 검증 완료: 실제 MCP 클라이언트가
ontorag-mcp를 스폰 →initialize→tools/list(10개) →call_toolget_schema(클래스 6 / 속성 20) +count_entities(Pokemon=13).
2. 인과 답변 설명가능성 (Explainability)
이제 do_query가 자체 근거를 함께 반환합니다 — 단순한 숫자를 감사 가능한(auditable) 답변으로 바꿔, 벡터 RAG와의 차별점을 눈에 보이게 만듭니다:
CausalEngine.explain_do()가 개입(interventional) 분포 + 그래프 수술에 사용된 백도어 보정 집합(back-door adjustment set) + "왜 do ≠ see인지" 한 줄 설명을 함께 반환합니다.- MCP 툴, REST 라우트(
DoQueryResponse에 선택적adjustment+explanation추가), Reasoning WebUI(do() 막대 아래 "why:" 트레이스)에 모두 노출. - 기존
do_query시그니처는 그대로 — 호출부 깨짐 없음. - 동작 예제에서는 백도어 집합
{Genotype}을 보고하고, 교란요인이 없으면 "do equals see"라고 알려줍니다.
3. 추론 레이어 goldset + ontorag eval reasoning
docs/BENCHMARK_v1.md에서 지적했던 공백("추론 레이어에는 goldset이 없다")을 메웁니다. SPARQL 중심 goldset으로는 확률 질의를 표현할 수 없어, 병행하는 추론 goldset + 얇은 러너를 추가했습니다:
eval/reasoning_goldset.py—ReasoningQuestion/ReasoningGoldset(종류:posterior/do/counterfactual/identify) +evaluate()가 각 항목을 허용 오차 내에서 기대값과 대조.BayesianEngine/CausalEngine재사용, 백엔드 무관.ontorag eval reasoning <goldset>— 활성 백엔드에서 저장된 BN(+ 인과 DAG)을 로드해 질문별 pass/fail 리포트.examples/smoking/reasoning_goldset.jsonl— 손으로 검증한 6개 체크: P(Cancer|see)=0.72, P(Cancer|do)=0.60 / 0.20, 반사실=0.28, 백도어{Genotype}, 주변확률 P(Cancer)=0.43.- 러너는 곧바로 제 몫을 증명했습니다: goldset 자체의 잘못된 사전확률(P(Cancer)이 0.5가 아니라 0.43)을 잡아내, 엔진이 검증한 값으로 수정했습니다.
비고
- 테스트: 단위 테스트 914개 통과. 세 기능 모두 테스트됨. v1.0의 3-백엔드 parity 수치 변화 없음(
docs/BENCHMARK_v1.md). - 버전: 기능 추가이므로 원래는 minor 범프이나, 요청에 따라
1.1.0 → 1.1.1로 태깅했습니다. - 라이선스: ontorag 자체는 MIT. 백엔드 라이선스는 다름 — FalkorDB는 RSAL로 OSI 승인 오픈소스가 아닙니다.
빠른 시작, 전체 CLI 레퍼런스, stdio MCP 설정 가이드는 README를 참고하세요.
v1.0.0 — Production-Ready & Proven
ontorag v1.0.0 — "Production-Ready & Proven"
The 0.x → 1.0 maturity milestone. Not a new paradigm (GNN/learning layer is deferred to v1.1+) but a trust milestone: safe to run + here are the numbers.
What ontorag is
An OWL-native, ontology-aware RAG framework. The RDF/OWL graph is the source of truth; an LLM agent navigates it through typed MCP tools instead of approximate vector search — with probabilistic and causal reasoning layered on top.
Highlights of v1.0
Production hardening
- Configurable query/LLM timeouts on every backend — Neo4j (
NEO4J_QUERY_TIMEOUT), FalkorDB (FALKORDB_QUERY_TIMEOUT), Fuseki (FUSEKI_TIMEOUT), LLM (LLM_TIMEOUT). Closes the "a hung query blocks the worker" gap; defaults preserve prior behavior. - Global structured-500 exception handler — unexpected errors return
{detail, type}JSON, logged server-side, no traceback leak. - CI gate — ruff + the full unit suite (910) run on every push/PR, plus a Neo4j+FalkorDB integration job. Previously only the eval module was tested in CI.
Proof (docs/BENCHMARK_v1.md) — key-free, reproducible:
- Goldset quality: 5 domains / 130 questions, 0
gold_sparqlfailures. - 3-backend deterministic parity: 7/7 protocol metrics identical across Fuseki / Neo4j / FalkorDB (
full_parity=True) — schema, subclass-inferred counts, aggregation, traversal all match.
Cumulative capabilities (0.1 → 1.0)
- 3 graph backends — Fuseki (Apache 2.0), Neo4j + n10s (GPL/AGPL), FalkorDB (RSAL) — same CLI/MCP tools, full parity, swap via
GRAPH_STORE. - 4-layer reasoning stack — Logical (RDFS+/
subClassOf*/TransitiveProperty/inverseOf), Probabilistic (Bayesian:compute_posterior,mpe), Causal (Pearl Rung 2-3:do_query,identify_effect,counterfactual). The smoking example shows P(Cancer|see)=0.72 ≠ P(Cancer|do)=0.60. - LLMs4OL — text/CSV/JSON → RDF triples (
ontorag learn), with a SHACL validation gate. - Agentic MCP — 18 typed tools over SSE; raw SPARQL never exposed to the LLM.
- Web UI — Schema, Data, Playground, and Reasoning tabs (
/ui), all Playwright-verified.
Install
git clone https://github.com/nuri428/ontorag.git && cd ontorag
uv sync # core (Fuseki)
uv sync --extra bayes # probabilistic + causal reasoning
uv sync --extra neo4j # Neo4j backend
uv sync --extra falkordb # FalkorDB backendSee the README for the quickstart, CLI reference, and architecture.
Deferred to v1.1+
GNN learning layer (R-GCN link prediction, neural CPT), connection-pool tuning, startup health-check, JSON/JSONL typed-literal fidelity.
License: MIT (ontorag itself). Backend licenses vary — note FalkorDB is RSAL, not OSI-approved.
v0.4.2 — perf verification + harness/DX fixes
Perf verification + harness/DX fixes + docs. No new features — this release verifies and documents agent performance and fixes two latent issues found along the way.
⚡ Highlights
- Latency verified (4-domain speed bench). Real agent latency is ~1.5–1.9 s mean / ~2.2–2.7 s p95 across pokemon · techstack · ods · pure_land (agent =
gpt-4o). LLM round-trips are 98.5% of wall time; the graph/SPARQL layer is ~1.5% (median 21 ms per question). The previously seen "91 s" was a dead-key timeout artifact — not real latency. - Prompt-cache 72–81% when one ontology is queried repeatedly (shared schema prompt prefix), keeping per-query cost low.
| domain | wall p50 | wall mean | wall p95 | LLM% | tools/Q | cache% |
|---|---|---|---|---|---|---|
| pokemon | 1477 ms | 1601 ms | 2219 ms | 98.6% | 1.10 | 77.9% |
| techstack | 1573 ms | 1744 ms | 2512 ms | 98.3% | 1.15 | 79.6% |
| ods | 1633 ms | 1876 ms | 2486 ms | 98.4% | 1.30 | 80.9% |
| pure_land | 1650 ms | 1844 ms | 2740 ms | 98.7% | 1.05 | 71.9% |
🐞 Fixes
fix(fuseki)— defaultFUSEKI_DATASETtoontoragto match docker-compose, the Fuseki Dockerfile (--mem /ontorag), and templates. A fresh clone copying.env.example, or running CLI/scripts without the env var, previously hit 405 on a nonexistentontologydataset.fix(bench)— align the speed bench with the parallel-dispatch phase schema (was raisingKeyError: 'tool'on every tool-using question).
📚 Docs
- New Performance — agent latency profile section in README (EN + KO) with the speed table + one-line reproduce command.
- Moved the SPARQL 3-layer design note to
docs/design/sparql-approach.mdand updated the CLAUDE.md reference. chore— gitignore stale benchmark artifacts (bench outputs,.env, coverage,chat.db).
🔭 Next
v0.5.0 — Neo4j + n10s adapter (GRAPH_STORE env var).
Full changelog: v0.4.1...v0.4.2