A generic, domain-agnostic knowledge graph extraction and RAG system that works with any PDF document. The system automatically analyzes document content, generates an appropriate schema, extracts entities and relationships, and builds a queryable knowledge graph.
- 🔄 Fully Generic: Works with any PDF content - research papers, business docs, technical manuals, reports, etc.
- 🤖 Auto-Schema Generation: Automatically analyzes document content and creates appropriate entity/relationship schemas
- 📊 Smart Extraction: Uses LLM-powered agents to extract comprehensive entities and relationships
- 💾 Neo4j Integration: Stores extracted knowledge in a graph database for powerful queries
- 💬 Natural Language Chat: Ask questions about your documents using natural language
- 🔍 Direct Cypher Queries: Run custom graph queries for advanced analysis
# Install dependencies
pip install -r requirements.txt
# Copy and configure .env file
cp .env.example .env
# Edit .env with your API keys and Neo4j credentialspython main.pyChoose option 1, upload any PDF, and let the system automatically:
- Analyze the document content
- Generate an appropriate schema
- Extract entities and relationships
- Store everything in Neo4j
Use option 2 to chat with your knowledge graph using natural language!
docker compose up --build -d
# Pipeline (app), HIL UI http://localhost:8501, Jupyter http://localhost:8888Open http://localhost:8888 and try:
notebooks/pagerank.ipynbnotebooks/community.ipynbnotebooks/link_prediction.ipynbnotebooks/link_prediction_pipeline.ipynb
See docs/bloom.md.