A lightweight, Palantir Foundry-inspired platform for building domain ontologies from raw data. Connect your data sources, run them through a visual transform pipeline, map curated datasets to entity types, and explore the resulting knowledge graph — complete with entities, relations, logic rules, and executable actions.
Two build paths are supported:
- Pipeline Mapping (v2) — full data-integration chain:
Data Connection → Raw Storage → Transform → Curated Dataset → Ontology Mapping - Simple LLM Extraction (v1) — upload documents, pick a prompt and model, and extract a knowledge graph in one shot
An ontology is a formal representation of knowledge in a specific domain — a shared vocabulary of concepts and the relationships between them. Think of it as the structured backbone that turns raw data into machine-readable, queryable knowledge.
In nano-ontoprompt, every ontology is made of these building blocks:
| Building Block | What it captures | Example |
|---|---|---|
| Entity (Object Type) | A key concept mapped from a curated dataset, one node per data row | Supplier, PurchaseOrder |
| Relation (Link Type) | An edge between entities, inferred from foreign keys and cross-dataset value overlap | PurchaseOrder -[HAS_SUPPLIER]-> Supplier |
| Logic Rule | The rule layer: mapping / validation / state / inference / automation rules discovered from schema, quality reports and relations | amount > 0, state machine on 库存状态 |
| Action | The executable behavior layer: CRUD, state-transition and link actions generated from object types and relations, with submission criteria and audit snapshots | Approve Record, Link Order to Supplier |
Typical use cases: supply chain modeling, clinical concept extraction, financial compliance, legal document structuring — any domain where you need to turn heterogeneous data into structured knowledge.
- Visual pipeline builder — connector / storage / transform / output nodes on a canvas, with per-node status and data preview
- Three transform routes — A: structured (CSV/Excel, schema inference + cleansing), B: semi-structured (JSON flatten / XML parse), C: unstructured (document → Markdown → LLM or rule-based structured extraction)
- Connectors — file upload, MySQL/PostgreSQL, MongoDB, REST API (with incremental sync)
- Curated datasets — quality scoring, human review (admin approval), versioning
- Auto mapping engine — dataset → entity type, column → property, FK → link type, with cardinality inference
- Cross-dataset link inference — exact FK matching, value normalization (
SUP-001↔SUP001), alternate-key matching (e.g. document mentions of company names linking to Supplier entities), optional LLM-assisted semantic linking (ENABLE_LLM_FK_DETECTION=1) - Logic & Action discovery — rules and actions are discovered from mappings, schema constraints, state fields and relations, then go through draft → review → publish
- Knowledge graph — interactive Cytoscape.js mesh view with isolated-node toggle; Neo4j-backed when available, SQLite fallback otherwise
- Search — keyword search (SQL fallback when ChromaDB is down) and semantic search (ChromaDB)
- LLM extraction — any OpenAI, Anthropic, or OpenAI-compatible model
- Prompt management — versioned domain prompts with one-click template generation
- Export — JSON, YAML, CSV, Turtle (RDF), HTML
- Graceful degradation — Neo4j / MinIO / ChromaDB / Redis are all optional; the system falls back to SQLite + local file storage + synchronous runs
- Multi-language UI — English / Chinese toggle
- User management — JWT auth, admin/editor roles; curated approval is admin-only
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, Tailwind CSS, Cytoscape.js |
| Backend | FastAPI, SQLAlchemy, Alembic |
| Metadata DB | SQLite (dev) / PostgreSQL (prod) |
| Object storage | MinIO (optional, local-file fallback) |
| Graph DB | Neo4j (optional, SQLite fallback) |
| Vector DB | ChromaDB (optional) |
| Task queue | Celery + Redis (optional, synchronous fallback) |
| LLM clients | OpenAI SDK, Anthropic SDK |
git clone https://github.com/jingw2/nano-ontoprompt.git
cd nano-ontoprompt
cp .env.example .env # edit secrets before production use
docker compose -f docker-compose.v2.yml up --buildThis starts PostgreSQL, Redis, Neo4j, MinIO, ChromaDB, backend and frontend. For the lightweight v1 stack use docker-compose.yml instead.
Open http://localhost:5173. Default credentials: admin / changeme123.
Prerequisites: Python 3.11+, Node.js 18+
# Backend
cd backend
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
alembic upgrade head # or rely on auto create_all in dev
uvicorn app.main:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install
npm run devNeo4j / MinIO / ChromaDB / Redis are optional — without them the app uses SQLite graph fallback, local file storage and synchronous pipeline runs.
- Add a model — Models → Add Model: provider, API key, base URL. Tag usage (extraction / VLM / FK detection).
- Create a pipeline — Pipelines → New: drop connector / storage / transform / output nodes on the canvas, attach your data file, pick a transform route, then Run.
- Review curated data — Pipelines → Curated: inspect quality score and preview, then approve (admin).
- Create an ontology — Ontologies → New, build mode Pipeline Mapping: select approved curated datasets and map each to an entity type with a primary key.
- Build — relations are inferred across datasets automatically; logic rules and actions are discovered as drafts.
- Explore — Graph tab for the mesh view, Entities / Logic / Actions tabs for details and review, then publish logic/actions.
- Export — JSON, YAML, CSV, Turtle (RDF), or HTML from the Info tab.
For the Simple LLM Extraction path: create an ontology in simple_llm mode, upload documents in the Files tab, pick a prompt + model, and run extraction.
nano-ontoprompt/
├── backend/
│ ├── alembic/ # DB migrations (0001_full_baseline covers all tables)
│ ├── app/
│ │ ├── routers/ # v1 + v2 REST API endpoints
│ │ ├── models/ # SQLAlchemy ORM models (v1 + v2)
│ │ ├── services/
│ │ │ ├── connection/ # File / SQL / Mongo / REST connectors
│ │ │ └── v2/
│ │ │ ├── pipeline/ # Transform engine, routes A/B/C, steps
│ │ │ ├── mapping/ # Auto mapper, FK & alternate-key link inference
│ │ │ ├── graph/ # Neo4j service, Cypher validation, analytics
│ │ │ ├── curated/ # Quality scoring, review workflow
│ │ │ └── vector/ # ChromaDB service
│ │ └── tasks/ # Celery tasks (pipeline run, sync, extraction)
│ ├── scripts/ # Maintenance scripts (orphan data cleanup, migration)
│ └── tests/ # 300+ pytest cases
├── frontend/
│ └── src/
│ ├── pages/pipelines/ # Pipeline list + canvas builder
│ ├── pages/ontologies/ # Ontology detail: graph / entities / logic / actions
│ └── api/ # Axios clients (v1 + v2)
├── docker-compose.yml # v1 lightweight stack
├── docker-compose.v2.yml # Full stack: Postgres + Redis + Neo4j + MinIO + Chroma
└── test_data/ # Sample datasets and E2E acceptance scripts
See .env.example for the full list. Key settings:
ENVIRONMENT=development # "production" enforces non-default secrets at startup
DATABASE_URL=sqlite:///./ontoprompt.db
SECRET_KEY=change-me
ENCRYPTION_KEY= # Fernet key for encrypting stored API keys
FIRST_ADMIN_USER=admin
FIRST_ADMIN_PASSWORD=changeme123
# Optional services (graceful fallback when absent)
REDIS_URL=redis://localhost:6379/0
NEO4J_URI=bolt://localhost:7687
MINIO_ENDPOINT=localhost:9000
CHROMA_HOST=localhost
# Upload limits
MAX_UPLOAD_MB=200
ALLOWED_UPLOAD_EXTENSIONS=csv,xlsx,xls,json,xml,pdf,docx,doc,pptx,ppt,md,txt
# Optional: LLM-assisted semantic FK detection (needs a configured model)
ENABLE_LLM_FK_DETECTION=0MIT