A GraphRAG (Graph Retrieval-Augmented Generation) application for IBM Edge Application Manager (IEAM) documentation using Neo4j as the knowledge graph database.
This project implements a sophisticated RAG system that:
- Parses HTML documentation from local files
- Constructs a knowledge graph in Neo4j with entities, relationships, and semantic connections
- Supports multiple embedding providers (Ollama, OpenAI, etc.)
- Provides a REST API for querying documentation using natural language
- Leverages graph traversal for context-aware responses
┌─────────────────┐
│ HTML Documents │
└────────┬────────┘
│
▼
┌─────────────────┐
│ HTML Parser │
│ & Processor │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────┐
│ Entity │────▶│ Embeddings │
│ Extraction │ │ (Ollama/ │
└────────┬────────┘ │ OpenAI) │
│ └──────────────┘
▼
┌─────────────────┐
│ Neo4j Graph │
│ Construction │
└────────┬────────┘
│
▼
┌─────────────────┐
│ GraphRAG │
│ Query Engine │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Express API │
│ Server │
└─────────────────┘
- Automatic entity extraction from HTML documentation
- Relationship detection between concepts
- Hierarchical document structure preservation
- Semantic similarity connections
- Ollama: Local embedding generation (nomic-embed-text, mxbai-embed-large)
- OpenAI: Cloud-based embeddings (text-embedding-3-small, text-embedding-3-large)
- Configurable: Easy to add new providers
- Semantic search using vector similarity
- Graph traversal for contextual information
- Community detection for topic clustering
- Multi-hop reasoning across related concepts
/api/query- Natural language queries/api/graph/stats- Graph statistics/api/graph/search- Entity search/api/health- Health check
-
Neo4j Aura or Local Instance
- Neo4j Aura: https://console.neo4j.io/
- Local: Docker or Neo4j Desktop
-
Embedding Provider (choose one):
- Ollama (local): https://ollama.ai/
- OpenAI API key
-
Node.js >= 18.x
# Clone or navigate to the project
cd ieam-graphrag
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with your configuration
nano .envCreate a .env file with the following:
# Server Configuration
PORT=3000
HOST=localhost
# Neo4j Configuration
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j
# Embedding Provider (ollama or openai)
EMBEDDING_PROVIDER=ollama
# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.2:3b
# OpenAI Configuration (if using OpenAI)
OPENAI_API_KEY=your-api-key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_LLM_MODEL=gpt-4
# Data Paths
HTML_DOCS_PATH=./data/ieam-html
PROCESSED_DATA_PATH=./data/processed
# Graph Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
SIMILARITY_THRESHOLD=0.75
MAX_GRAPH_DEPTH=3# Build the project
npm run build
# Parse HTML and import to Neo4j
npm run import
# This will:
# - Parse all HTML files in data/ieam-html
# - Extract entities and relationships
# - Generate embeddings
# - Create knowledge graph in Neo4j# Development mode with auto-reload
npm run dev
# Production mode
npm start# Using curl
curl -X POST http://localhost:3000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "How to register an edge node in IEAM?"}'
# Using the provided test script
npm run test:query# Get graph statistics
curl http://localhost:3000/api/graph/stats
# Search for entities
curl "http://localhost:3000/api/graph/search?q=edge+node"ieam-graphrag/
├── src/
│ ├── config/
│ │ └── index.ts # Configuration management
│ ├── parsers/
│ │ ├── html-parser.ts # HTML document parser
│ │ └── entity-extractor.ts # Entity extraction
│ ├── embeddings/
│ │ ├── base.ts # Base embedding interface
│ │ ├── ollama.ts # Ollama provider
│ │ └── openai.ts # OpenAI provider
│ ├── graph/
│ │ ├── neo4j-client.ts # Neo4j connection
│ │ ├── graph-builder.ts # Graph construction
│ │ └── graph-query.ts # Graph queries
│ ├── graphrag/
│ │ ├── query-processor.ts # Query processing
│ │ └── context-builder.ts # Context aggregation
│ ├── api/
│ │ ├── server.ts # Express server
│ │ └── routes.ts # API routes
│ ├── utils/
│ │ ├── logger.ts # Logging utility
│ │ └── helpers.ts # Helper functions
│ └── index.ts # Main entry point
├── data/
│ ├── ieam-html/ # HTML documentation
│ └── processed/ # Processed data
├── docs/
│ ├── ARCHITECTURE.md # Architecture details
│ ├── API.md # API documentation
│ └── NEO4J_SETUP.md # Neo4j setup guide
├── tests/
│ └── integration/ # Integration tests
├── .env.example # Environment template
├── package.json
├── tsconfig.json
└── README.md
-
Document
- Properties:
id,title,url,content,embedding - Represents a documentation page
- Properties:
-
Section
- Properties:
id,title,content,level,embedding - Represents a section within a document
- Properties:
-
Entity
- Properties:
id,name,type,description,embedding - Types:
Concept,Component,Command,API,Configuration
- Properties:
-
Topic
- Properties:
id,name,description - Represents high-level topics/categories
- Properties:
- HAS_SECTION: Document → Section
- MENTIONS: Section → Entity
- RELATES_TO: Entity → Entity (semantic similarity)
- BELONGS_TO: Entity → Topic
- SIMILAR_TO: Document → Document (vector similarity)
- NEXT: Section → Section (sequential order)
- Query Embedding: Convert user query to vector
- Semantic Search: Find relevant nodes using vector similarity
- Graph Traversal: Expand context through relationships
- Community Detection: Identify related concept clusters
- Context Aggregation: Combine information from multiple paths
- Response Generation: Use LLM with enriched context
// POST /api/query
{
"query": "How to register an edge node in IEAM?",
"maxResults": 5,
"includeGraph": true
}
// Response
{
"answer": "To register an edge node in IEAM...",
"sources": [
{
"title": "Registering Edge Nodes",
"url": "...",
"relevance": 0.95
}
],
"graph": {
"nodes": [...],
"relationships": [...]
}
}// GET /api/graph/stats
{
"nodes": {
"Document": 150,
"Section": 450,
"Entity": 320,
"Topic": 25
},
"relationships": {
"HAS_SECTION": 450,
"MENTIONS": 1200,
"RELATES_TO": 850
},
"totalNodes": 945,
"totalRelationships": 2500
}Automatically groups related concepts:
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityIdIdentifies key concepts:
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, scoreFinds connections between concepts:
MATCH path = shortestPath(
(a:Entity {name: 'Edge Node'})-[*]-(b:Entity {name: 'Agent'})
)
RETURN path- Vector Indexes: Create vector indexes for fast similarity search
- Batch Processing: Import documents in batches
- Connection Pooling: Reuse Neo4j connections
- Caching: Cache frequent queries
- Parallel Processing: Process documents concurrently
# Test connection
npm run test:neo4j
# Check Neo4j logs in Aura console# For Ollama, ensure model is pulled
ollama pull nomic-embed-text
# Check Ollama is running
curl http://localhost:11434/api/tags# Check HTML files exist
ls -la data/ieam-html
# Verify Neo4j credentials
npm run test:config# Run tests
npm test
# Run specific test
npm test -- graph-builder
# Lint code
npm run lint
# Format code
npm run format
# Type check
npm run type-check# Build image
docker build -t ieam-graphrag .
# Run container
docker run -p 3000:3000 --env-file .env ieam-graphragSee docs/DEPLOYMENT.md for cloud deployment guides.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
ISC
For issues and questions:
- GitHub Issues: [Create an issue]
- Documentation: See
docs/folder - Neo4j Community: https://community.neo4j.com/
Made with ❤️ for IEAM Documentation