OntographRAG

Turn unstructured documents into schema-consistent knowledge graphs. Query them with RAG. Measure how much to trust the answers.

OntographRAG extracts entities and relationships from raw text guided by a custom ontology, stores the result in Neo4j, and answers natural-language questions using hybrid vector + graph retrieval. A built-in uncertainty pipeline flags answers the model isn't confident in before they reach users.

Works for any domain — research literature, legal documents, financial reports, technical manuals. Particularly powerful for clinical and biomedical data, where ontology-constrained extraction enables population-level evidence generation from patient records at scale.

Use cases

Clinical intelligence and population-level evidence

Supply a clinical ontology (SNOMED CT, ICD-10, HPO) and process patient notes, discharge summaries, or EHR exports in bulk. Because every patient's data is extracted into the same schema, the entire population becomes queryable as a single graph:

-- Which comorbidities most frequently co-occur with hypertension in patients over 60?
MATCH (p:Patient)-[:HAS_DIAGNOSIS]->(d:Diagnosis {name: "Hypertension"})
      -[:CO_OCCURS_WITH]->(c:Diagnosis)
WHERE p.age > 60
RETURN c.name, count(*) AS frequency ORDER BY frequency DESC

Ask the same question in plain English via the RAG layer. The ontology is what makes this possible — without schema enforcement, "T2DM", "type 2 diabetes", and "DM2" land as different nodes and aggregation breaks.

Research and knowledge synthesis

Process a corpus of papers, extract entities and relationships consistently across all documents, then ask cross-paper questions the individual documents couldn't answer alone.

Any domain with structured knowledge requirements

Legal (case law entities), finance (company relationships), engineering (component hierarchies). Supply the domain ontology; OntographRAG handles the rest.

What makes this different

Most GraphRAG tools (including Microsoft's GraphRAG) let an LLM freely decide what to extract — producing inconsistent entity types, duplicate concepts, and graphs that drift between documents. OntographRAG takes the opposite approach: you define the schema, the system respects it.

	OntographRAG	Microsoft GraphRAG
Schema control	Bring your own OWL/RDF ontology — entities and relationships are constrained to your types	LLM decides freely; no schema enforcement
Graph storage	Neo4j — production graph DB with Cypher, vector indexes, persistent named KGs	Parquet files in a local directory
Hallucination detection	8 uncertainty metrics including RS-UQ (novel) and semantic entropy	None
LLM providers	OpenRouter, OpenAI, Gemini, Ollama, DeepSeek, HuggingFace	OpenAI / Azure OpenAI only
Retrieval	Hybrid: vector similarity + graph traversal in one query	Community summarisation (global) or entity search (local)
Interface	Web UI + REST API + Python library	CLI + Python library
Domain	Any domain; ontology is user-supplied	General purpose but tuned for summarisation

Key features

1. Ontology-guided KG construction

Supply a .owl / .rdf / .ttl ontology file and every extracted entity and relationship is validated against your schema. The same document processed twice produces the same graph shape. Across a corpus of documents, every entity lands in the same type hierarchy — enabling aggregation, comparison, and population-level queries that are impossible with free-form extraction.

This is the core differentiator. Without schema enforcement, LLMs produce synonym explosion ("myocardial infarction", "heart attack", "MI", "AMI" as four separate nodes), type drift (the same concept classified differently across documents), and graphs that can't be meaningfully queried at scale. The ontology collapses all of this into a consistent, traversable structure.

Without an ontology, extraction still works — the LLM infers types — but schema-constrained extraction is what unlocks population-level reasoning.

2. Neo4j as the graph store

Graphs are persisted in Neo4j with:

Vector indexes (384-dim all-MiniLM-L6-v2 by default) for semantic search over chunks
Named KGs — multiple independent graphs in one database, scoped by name tag
Full Cypher access — query or extend the graph with any Cypher statement

3. Hybrid retrieval

Queries run vector similarity search over chunk embeddings and graph traversal over entity neighborhoods simultaneously. The combined context is richer than either alone — the vector index finds relevant passages, the graph finds related concepts those passages didn't mention.

4. Uncertainty quantification and hallucination detection

A dedicated pipeline computes 8 metrics per answer to flag low-confidence responses:

Metric	What it measures
`semantic_entropy`	Shannon entropy over meaning-clusters of N sampled responses (Farquhar et al., Nature 2023)
`discrete_semantic_entropy`	Same with hard cluster boundaries
`token_entropy`	Surface-level diversity across responses
`p_true`	NLI-based probability responses are supported by context
`embedding_consistency`	Pairwise NLI contradiction rate between response pairs
`spuq`	Semantic entropy weighted by variance under probability perturbation
`rs_uq` ⭐	Novel. Cosine dissimilarity between the LLM's last-layer hidden state for the prompt alone vs. prompt + response. A large shift signals the model's answer diverges from its internal encoding of the question — no multiple samples needed.

Hypothesis: ontology-constrained, deduplicated graph context lowers semantic entropy compared to vanilla RAG because the model receives less contradictory evidence.

5. Provider-agnostic LLM support

Every endpoint accepts a provider + model pair. Supported providers: OpenRouter (free tier available), OpenAI, Google Gemini, Ollama (local), DeepSeek, HuggingFace. Switch model per request with no code changes.

Quick start

# 1. Clone and install
git clone https://github.com/julka01/OntographRAG.git
cd OntographRAG
uv sync          # creates .venv and installs all dependencies
source .venv/bin/activate

# 2. Start Neo4j
docker compose up -d neo4j

# 3. Configure
cp .env.example .env
# Edit .env — set NEO4J_PASSWORD and at least one LLM provider key

# 4. Start server
python start_server.py
# → Web UI at http://localhost:8004
# → API docs at http://localhost:8004/docs

Setup

Requirements

Python 3.11+
Neo4j 5.0+ (via Docker or local install)
8 GB RAM minimum (16 GB recommended for large documents)

Installation

uv sync          # recommended
# or
pip install -r requirements.txt

Environment variables

Copy .env.example to .env and fill in your values:

# ── Neo4j ───────────────────────────────────────────────────────────────────
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j

# ── LLM providers (at least one required) ───────────────────────────────────
OPENROUTER_API_KEY=your-key   # free-tier models available
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=...
HF_API_TOKEN=...
OLLAMA_HOST=http://localhost:11434

# ── Embeddings ───────────────────────────────────────────────────────────────
EMBEDDING_PROVIDER=huggingface   # huggingface | openai | vertexai

# ── Document processing ──────────────────────────────────────────────────────
CHUNK_SIZE=1500
CHUNK_OVERLAP=200
MAX_CHUNKS=50

# ── Retrieval ────────────────────────────────────────────────────────────────
VECTOR_SIMILARITY_THRESHOLD=0.1

# ── Security (production) ────────────────────────────────────────────────────
APP_API_KEY=               # set to enforce API key auth on all endpoints
ALLOWED_ORIGINS=*          # comma-separated origins for CORS

# ── Server ───────────────────────────────────────────────────────────────────
LOG_LEVEL=INFO
MAX_WORKERS=4
LLM_TIMEOUT_SECONDS=120

Web UI

The web interface is served at http://localhost:8004.

Knowledge graph panel

Build KG — upload a document (PDF, TXT, CSV, JSON, XML ≤ 50 MB), choose provider/model, optionally attach an ontology file. Extraction progress streams to the UI in real time via SSE and the graph loads automatically on completion.
Graph visualisation — interactive vis.js network. Node size scales with degree. Click a node to open its detail panel (type, properties, connected nodes).
Search — dims non-matching nodes rather than hiding them; shows match count.
Filter — per-type checkboxes with node/edge counts.
Named KG management — create, list, and switch between multiple saved graphs.

Chat panel

Ask questions against the active knowledge graph; answers cite source chunks.
Chat history persisted in localStorage.
Highlighted nodes — entities used in the answer are highlighted in the graph.
Thinking indicator while waiting for the LLM response.

API Reference

Server runs on port 8004. Interactive docs at http://localhost:8004/docs.

Authentication: set APP_API_KEY in .env to require X-API-Key: <key> on all requests. Unset = open (development mode).

Knowledge graph — build & query

Method	Endpoint	Description
`POST`	`/create_ontology_guided_kg`	Build an ontology-guided KG from a file upload
`POST`	`/extract_graph`	Extract a raw KG (no ontology) from a file
`POST`	`/load_kg_from_file`	Load a graph from file into Neo4j
`GET`	`/kg_progress_stream`	SSE stream of KG build progress

`POST /create_ontology_guided_kg`

Multipart form:

Field	Type	Default	Description
`file`	file	required	Document (PDF/TXT/CSV/JSON/XML, ≤ 50 MB)
`provider`	string	`openai`	LLM provider
`model`	string	`gpt-3.5-turbo`	Model name
`ontology_file`	file	optional	Custom ontology (.owl/.rdf/.ttl/.xml)
`max_chunks`	int	`50`	Max text chunks to process
`kg_name`	string	optional	Name tag for the resulting KG

Response:

{
  "kg_id": "uuid",
  "kg_name": "my-kg",
  "graph_data": { "nodes": [...], "relationships": [...] },
  "method": "ontology_guided"
}

`GET /kg_progress_stream`

Server-Sent Events. Connect with EventSource:

data: {"line": "✓ Extracted 42 entities from chunk 3/10"}
data: {"done": true}

Named KG management

Method	Endpoint	Description
`POST`	`/kg/create`	Create a named KG record
`GET`	`/kg/list`	List all KGs with document counts
`GET`	`/kg/{kg_name}`	Stats for a specific KG
`DELETE`	`/kg/{kg_name}`	Delete a KG
`GET`	`/kg/{kg_name}/entities`	List entities in a KG

Neo4j management

Method	Endpoint	Description
`POST`	`/save_kg_to_neo4j`	Persist an in-memory KG to Neo4j
`POST`	`/load_kg_from_neo4j`	Load a KG from Neo4j by name
`POST`	`/clear_kg`	Delete all nodes and relationships
`GET`	`/health/neo4j`	Connectivity check

Chat / RAG

`POST /chat`

Rate limited: 30 requests/minute per IP.

JSON body:

Field	Type	Default	Description
`question`	string	required	Question to answer (max 4096 chars)
`provider_rag`	string	`openrouter`	LLM provider
`model_rag`	string	`openai/gpt-oss-120b:free`	Model name
`kg_name`	string	optional	Restrict retrieval to a specific KG
`document_names`	string[]	`[]`	Restrict to specific documents
`session_id`	string	`default_session`	Session identifier

Response:

{
  "session_id": "default_session",
  "message": "...",
  "info": {
    "sources": ["chunk_id_1"],
    "model": "openai/gpt-oss-120b:free",
    "confidence": 0.87,
    "chunk_count": 5,
    "entity_count": 12,
    "relationship_count": 8,
    "entities": { "used_entities": [...] }
  }
}

CSV bulk processing

Method	Endpoint	Description
`POST`	`/validate_csv`	Validate a CSV before bulk processing
`POST`	`/bulk_process_csv`	Build KGs from all rows of a CSV

`POST /bulk_process_csv`

Field	Type	Default	Description
`file`	file	required	CSV file
`provider`	string	`openai`	LLM provider
`model`	string	`gpt-3.5-turbo`	LLM model
`text_column`	string	`full_report_text`	Column containing the text to process
`id_column`	string	optional	Column to use as document ID
`start_row`	int	`0`	First row to process
`batch_size`	int	`50`	Rows per batch

Models

GET /models/{provider} — lists available models for a provider.

cURL examples

# Build a KG with ontology
curl -X POST http://localhost:8004/create_ontology_guided_kg \
  -F "file=@document.pdf" \
  -F "provider=openrouter" \
  -F "model=openai/gpt-4o-mini" \
  -F "ontology_file=@schema.owl" \
  -F "kg_name=my-kg"

# Ask a question
curl -X POST http://localhost:8004/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the main concepts?", "kg_name": "my-kg", "provider_rag": "openrouter", "model_rag": "openai/gpt-4o-mini"}'

# Stream build progress
curl -N http://localhost:8004/kg_progress_stream

# List KGs
curl http://localhost:8004/kg/list

# Health check
curl http://localhost:8004/health/neo4j

Experiments

The experiments/ directory evaluates KG-RAG vs Vanilla RAG on biomedical QA benchmarks, measuring answer quality and hallucination via semantic uncertainty. See experiments/README.md for full details.

# 5-question smoke test
python experiments/experiment.py --num-samples 5 --entropy-samples 3 --datasets pubmedqa

# Sweep thresholds and chunk sizes
python experiments/experiment.py \
  --num-samples 50 --entropy-samples 4 \
  --similarity-thresholds 0.1 0.15 0.2 \
  --max-chunks-values 5 10 15 \
  --datasets pubmedqa bioasq

Supported datasets

Dataset	Task	Download
`pubmedqa`	Yes/no QA from PubMed abstracts	pubmedqa/pubmedqa
`bioasq`	Biomedical factoid QA	bioasq.org (free registration)

Place downloaded files under MIRAGE/rawdata/ — see experiments/README.md for exact paths.

Evaluation flags

Flag	Default	Description
`--num-samples`	all	Questions per dataset
`--entropy-samples`	`3`	Responses per question for uncertainty metrics
`--similarity-thresholds`	`[0.1]`	Cosine similarity cutoffs to sweep
`--max-chunks-values`	`[10]`	Retrieved chunk counts to sweep
`--llm-provider`	`openrouter`	LLM provider
`--llm-model`	`openai/gpt-oss-120b:free`	Model
`--datasets`	`pubmedqa bioasq`	Datasets to run
`--skip-kg-build`	`False`	Reuse existing Neo4j graph

Results are saved to results/<dataset>_<timestamp>.json and optionally synced to Weights & Biases.

Architecture

KG build pipeline

Ingest — file uploaded; PDF text extracted via PyMuPDF, plaintext decoded
Chunk — overlapping text windows (CHUNK_SIZE=1500, CHUNK_OVERLAP=200)
Ontology load — custom .owl/.ttl parsed (owlready2), or free-form extraction if none supplied
LLM extraction — each chunk sent with an ontology-constrained prompt; relationships and entities returned as structured JSON (relationships-first ordering prevents truncation data loss)
Cross-chunk extraction — sliding window over adjacent chunk pairs; a second LLM call extracts relationships that span chunk boundaries
Entity harmonization — duplicate entities merged by normalized text; most specific ontology type wins; stable UUID assigned per entity
Confidence filtering — each relationship triple is scored by co-occurrence in source chunks; hallucinated triples (score < 0.25) are dropped
Embed — chunk and entity embeddings computed with all-MiniLM-L6-v2 (384-dim, CPU) or OpenAI
Write — nodes, relationships, and embeddings stored in Neo4j; entities tagged with kgName for scoped loading; progress streamed via SSE

RAG query pipeline

Embed query — same model as build time
Vector search — top-K chunks by cosine similarity
Graph traversal — neighbour entities fetched via Cypher for each retrieved chunk
Assemble context — chunk text + entity graph merged into LLM prompt
Synthesise — LLM generates a grounded answer with citations

Module layout

ontographrag/
├── api/
│   ├── app.py                         # FastAPI application, all endpoints
│   └── static/
│       └── index.html                 # Single-page web UI
├── kg/
│   ├── builders/
│   │   ├── ontology_guided_kg_creator.py   # OntologyGuidedKGCreator — core extraction, harmonization, Neo4j write
│   │   └── enhanced_kg_creator.py          # UnifiedOntologyGuidedKGCreator — API-facing wrapper + CSV bulk ops
│   ├── loaders/
│   │   └── kg_loader.py               # KGLoader — reads KG from Neo4j by kgName
│   └── utils/
│       ├── common_functions.py        # Shared helpers (embedding, text normalization)
│       └── constants.py               # Default values and Neo4j label constants
├── rag/
│   └── systems/
│       ├── enhanced_rag_system.py     # KG-RAG: hybrid vector + graph retrieval
│       └── vanilla_rag_system.py      # Vanilla RAG: vector-only baseline
└── providers/
    └── model_providers.py             # LLM + embedding provider abstractions

Key specs

Component	Detail
Embeddings	`all-MiniLM-L6-v2` (384-dim), runs locally on CPU
Vector similarity	Cosine, default threshold 0.1
Chunk size	1500 chars, 200 overlap
Graph database	Neo4j 5.0+ with vector indexes
Graph visualisation	vis.js 9.1.0
File upload limit	50 MB
Chat rate limit	30 req/min per IP
KG build rate limit	5 req/min per IP

Utility scripts

Script	Purpose
`start_server.py`	Start the FastAPI server on port 8004
`run_kg_generation.py`	Build a KG from the command line without the API
`populate_neo4j.py`	Seed Neo4j with sample data
`clear_neo4j.py`	Delete all nodes and relationships
`cleanup_and_rebuild_kg.py`	Clear Neo4j and rebuild a KG in one step
`graphDB_dataAccess.py`	Low-level Neo4j data access layer (named KG CRUD)
`csv_processor.py`	`MedicalReportCSVProcessor` for medical-report CSVs
`compare_extraction_methods.py`	Compare ontology-guided vs free-form LLM extraction
`test_named_kg.py`	Integration test for named KG creation and retrieval
`MIRAGE_adaptation.py`	Adapter for the MIRAGE biomedical benchmark

Docker

# Neo4j only (recommended for development)
docker compose up -d neo4j

# Full stack (Neo4j + API server)
docker compose up -d

# Logs
docker compose logs -f

# Stop
docker compose down

# Neo4j Browser → http://localhost:7474
# Connect to bolt://localhost:7687

Configuration reference

LLM providers

Provider	Env var	Notes
`openrouter`	`OPENROUTER_API_KEY`	Recommended; free-tier models available
`openai`	`OPENAI_API_KEY`	GPT-3.5, GPT-4, GPT-4o
`gemini`	`GEMINI_API_KEY`	Gemini Pro, Flash
`ollama`	—	Local models; set `OLLAMA_HOST` if non-default
`huggingface`	`HF_API_TOKEN`	HuggingFace Inference API
`deepseek`	`DEEPSEEK_API_KEY`	DeepSeek Chat/Coder

Embedding providers

Provider	Env var	Model
`huggingface` (default)	—	`all-MiniLM-L6-v2`, runs locally
`openai`	`OPENAI_API_KEY`	`text-embedding-ada-002`
`vertexai`	GCP credentials	`textembedding-gecko`

Processing tuning

Variable	Default	Effect
`CHUNK_SIZE`	`1500`	Tokens per chunk
`CHUNK_OVERLAP`	`200`	Token overlap between consecutive chunks
`MAX_CHUNKS`	`50`	Max chunks processed per document
`VECTOR_SIMILARITY_THRESHOLD`	`0.1`	Minimum cosine similarity for retrieval
`MAX_WORKERS`	`4`	Parallel workers for batch processing
`LLM_TIMEOUT_SECONDS`	`120`	Per-request LLM timeout

License

MIT. See LICENSE for details.

Issues and feature requests: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
MIRAGE		MIRAGE
apoc		apoc
data		data
experiments		experiments
ontographrag		ontographrag
shared		shared
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cleanup_and_rebuild_kg.py		cleanup_and_rebuild_kg.py
csv_processor.py		csv_processor.py
docker-compose.yml		docker-compose.yml
fix_indexes.py		fix_indexes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_kg_generation.py		run_kg_generation.py
start_server.py		start_server.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

OntographRAG

Use cases

Clinical intelligence and population-level evidence

Research and knowledge synthesis

Any domain with structured knowledge requirements

What makes this different

Key features

1. Ontology-guided KG construction

2. Neo4j as the graph store

3. Hybrid retrieval

4. Uncertainty quantification and hallucination detection

5. Provider-agnostic LLM support

Quick start

Table of contents

Setup

Requirements

Installation

Environment variables

Web UI

Knowledge graph panel

Chat panel

API Reference

Knowledge graph — build & query

POST /create_ontology_guided_kg

GET /kg_progress_stream

Named KG management

Neo4j management

Chat / RAG

POST /chat

CSV bulk processing

POST /bulk_process_csv

Models

cURL examples

Experiments

Supported datasets

Evaluation flags

Architecture

KG build pipeline

RAG query pipeline

Module layout

Key specs

Utility scripts

Docker

Configuration reference

LLM providers

Embedding providers

Processing tuning

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /create_ontology_guided_kg`

`GET /kg_progress_stream`

`POST /chat`

`POST /bulk_process_csv`

Packages