IEAM GraphRAG - Neo4j Knowledge Graph RAG System

A GraphRAG (Graph Retrieval-Augmented Generation) application for IBM Edge Application Manager (IEAM) documentation using Neo4j as the knowledge graph database.

Overview

This project implements a sophisticated RAG system that:

Parses HTML documentation from local files
Constructs a knowledge graph in Neo4j with entities, relationships, and semantic connections
Supports multiple embedding providers (Ollama, OpenAI, etc.)
Provides a REST API for querying documentation using natural language
Leverages graph traversal for context-aware responses

Architecture

┌─────────────────┐
│  HTML Documents │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  HTML Parser    │
│  & Processor    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│  Entity         │────▶│  Embeddings  │
│  Extraction     │     │  (Ollama/    │
└────────┬────────┘     │   OpenAI)    │
         │              └──────────────┘
         ▼
┌─────────────────┐
│  Neo4j Graph    │
│  Construction   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GraphRAG       │
│  Query Engine   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Express API    │
│  Server         │
└─────────────────┘

Features

1. Knowledge Graph Construction

Automatic entity extraction from HTML documentation
Relationship detection between concepts
Hierarchical document structure preservation
Semantic similarity connections

2. Multi-Provider Embedding Support

Ollama: Local embedding generation (nomic-embed-text, mxbai-embed-large)
OpenAI: Cloud-based embeddings (text-embedding-3-small, text-embedding-3-large)
Configurable: Easy to add new providers

3. GraphRAG Query Processing

Semantic search using vector similarity
Graph traversal for contextual information
Community detection for topic clustering
Multi-hop reasoning across related concepts

4. REST API

/api/query - Natural language queries
/api/graph/stats - Graph statistics
/api/graph/search - Entity search
/api/health - Health check

Prerequisites

Neo4j Aura or Local Instance
- Neo4j Aura: https://console.neo4j.io/
- Local: Docker or Neo4j Desktop
Embedding Provider (choose one):
- Ollama (local): https://ollama.ai/
- OpenAI API key
Node.js >= 18.x

Installation

# Clone or navigate to the project
cd ieam-graphrag

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
nano .env

Configuration

Create a .env file with the following:

# Server Configuration
PORT=3000
HOST=localhost

# Neo4j Configuration
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j

# Embedding Provider (ollama or openai)
EMBEDDING_PROVIDER=ollama

# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.2:3b

# OpenAI Configuration (if using OpenAI)
OPENAI_API_KEY=your-api-key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_LLM_MODEL=gpt-4

# Data Paths
HTML_DOCS_PATH=./data/ieam-html
PROCESSED_DATA_PATH=./data/processed

# Graph Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
SIMILARITY_THRESHOLD=0.75
MAX_GRAPH_DEPTH=3

Usage

1. Import HTML Documentation to Neo4j

# Build the project
npm run build

# Parse HTML and import to Neo4j
npm run import

# This will:
# - Parse all HTML files in data/ieam-html
# - Extract entities and relationships
# - Generate embeddings
# - Create knowledge graph in Neo4j

2. Start the API Server

# Development mode with auto-reload
npm run dev

# Production mode
npm start

3. Query the Documentation

# Using curl
curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How to register an edge node in IEAM?"}'

# Using the provided test script
npm run test:query

4. Explore the Graph

# Get graph statistics
curl http://localhost:3000/api/graph/stats

# Search for entities
curl "http://localhost:3000/api/graph/search?q=edge+node"

Project Structure

ieam-graphrag/
├── src/
│   ├── config/
│   │   └── index.ts              # Configuration management
│   ├── parsers/
│   │   ├── html-parser.ts        # HTML document parser
│   │   └── entity-extractor.ts   # Entity extraction
│   ├── embeddings/
│   │   ├── base.ts               # Base embedding interface
│   │   ├── ollama.ts             # Ollama provider
│   │   └── openai.ts             # OpenAI provider
│   ├── graph/
│   │   ├── neo4j-client.ts       # Neo4j connection
│   │   ├── graph-builder.ts      # Graph construction
│   │   └── graph-query.ts        # Graph queries
│   ├── graphrag/
│   │   ├── query-processor.ts    # Query processing
│   │   └── context-builder.ts    # Context aggregation
│   ├── api/
│   │   ├── server.ts             # Express server
│   │   └── routes.ts             # API routes
│   ├── utils/
│   │   ├── logger.ts             # Logging utility
│   │   └── helpers.ts            # Helper functions
│   └── index.ts                  # Main entry point
├── data/
│   ├── ieam-html/                # HTML documentation
│   └── processed/                # Processed data
├── docs/
│   ├── ARCHITECTURE.md           # Architecture details
│   ├── API.md                    # API documentation
│   └── NEO4J_SETUP.md           # Neo4j setup guide
├── tests/
│   └── integration/              # Integration tests
├── .env.example                  # Environment template
├── package.json
├── tsconfig.json
└── README.md

Neo4j Graph Schema

Node Types

Document
- Properties: id, title, url, content, embedding
- Represents a documentation page
Section
- Properties: id, title, content, level, embedding
- Represents a section within a document
Entity
- Properties: id, name, type, description, embedding
- Types: Concept, Component, Command, API, Configuration
Topic
- Properties: id, name, description
- Represents high-level topics/categories

Relationship Types

HAS_SECTION: Document → Section
MENTIONS: Section → Entity
RELATES_TO: Entity → Entity (semantic similarity)
BELONGS_TO: Entity → Topic
SIMILAR_TO: Document → Document (vector similarity)
NEXT: Section → Section (sequential order)

GraphRAG Query Process

Query Embedding: Convert user query to vector
Semantic Search: Find relevant nodes using vector similarity
Graph Traversal: Expand context through relationships
Community Detection: Identify related concept clusters
Context Aggregation: Combine information from multiple paths
Response Generation: Use LLM with enriched context

API Examples

Query Documentation

// POST /api/query
{
  "query": "How to register an edge node in IEAM?",
  "maxResults": 5,
  "includeGraph": true
}

// Response
{
  "answer": "To register an edge node in IEAM...",
  "sources": [
    {
      "title": "Registering Edge Nodes",
      "url": "...",
      "relevance": 0.95
    }
  ],
  "graph": {
    "nodes": [...],
    "relationships": [...]
  }
}

Get Graph Statistics

// GET /api/graph/stats
{
  "nodes": {
    "Document": 150,
    "Section": 450,
    "Entity": 320,
    "Topic": 25
  },
  "relationships": {
    "HAS_SECTION": 450,
    "MENTIONS": 1200,
    "RELATES_TO": 850
  },
  "totalNodes": 945,
  "totalRelationships": 2500
}

Advanced Features

1. Community Detection

Automatically groups related concepts:

CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId

2. PageRank for Important Concepts

Identifies key concepts:

CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score

3. Shortest Path Queries

Finds connections between concepts:

MATCH path = shortestPath(
  (a:Entity {name: 'Edge Node'})-[*]-(b:Entity {name: 'Agent'})
)
RETURN path

Performance Optimization

Vector Indexes: Create vector indexes for fast similarity search
Batch Processing: Import documents in batches
Connection Pooling: Reuse Neo4j connections
Caching: Cache frequent queries
Parallel Processing: Process documents concurrently

Troubleshooting

Neo4j Connection Issues

# Test connection
npm run test:neo4j

# Check Neo4j logs in Aura console

Embedding Generation Slow

# For Ollama, ensure model is pulled
ollama pull nomic-embed-text

# Check Ollama is running
curl http://localhost:11434/api/tags

Import Fails

# Check HTML files exist
ls -la data/ieam-html

# Verify Neo4j credentials
npm run test:config

Development

# Run tests
npm test

# Run specific test
npm test -- graph-builder

# Lint code
npm run lint

# Format code
npm run format

# Type check
npm run type-check

Deployment

Docker Deployment

# Build image
docker build -t ieam-graphrag .

# Run container
docker run -p 3000:3000 --env-file .env ieam-graphrag

Cloud Deployment

See docs/DEPLOYMENT.md for cloud deployment guides.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

ISC

Resources

Support

For issues and questions:

GitHub Issues: [Create an issue]
Documentation: See docs/ folder
Neo4j Community: https://community.neo4j.com/

Made with ❤️ for IEAM Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bin		bin
data/ieam-html		data/ieam-html
docs		docs
src		src
.env-local.json.example		.env-local.json.example
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
docker-compose.yml		docker-compose.yml
mcp.config.json.example		mcp.config.json.example

playground/ieam-graphrag

Folders and files

Latest commit

History

Repository files navigation

IEAM GraphRAG - Neo4j Knowledge Graph RAG System

Overview

Architecture

Features

1. Knowledge Graph Construction

2. Multi-Provider Embedding Support

3. GraphRAG Query Processing

4. REST API

Prerequisites

Installation

Configuration

Usage

1. Import HTML Documentation to Neo4j

2. Start the API Server

3. Query the Documentation

4. Explore the Graph

Project Structure

Neo4j Graph Schema

Node Types

Relationship Types

GraphRAG Query Process

API Examples

Query Documentation

Get Graph Statistics

Advanced Features

1. Community Detection

2. PageRank for Important Concepts

3. Shortest Path Queries

Performance Optimization

Troubleshooting

Neo4j Connection Issues

Embedding Generation Slow

Import Fails

Development

Deployment

Docker Deployment

Cloud Deployment

Contributing

License

Resources

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages