Skip to content

playground/ieam-graphrag

Repository files navigation

IEAM GraphRAG - Neo4j Knowledge Graph RAG System

A GraphRAG (Graph Retrieval-Augmented Generation) application for IBM Edge Application Manager (IEAM) documentation using Neo4j as the knowledge graph database.

Overview

This project implements a sophisticated RAG system that:

  • Parses HTML documentation from local files
  • Constructs a knowledge graph in Neo4j with entities, relationships, and semantic connections
  • Supports multiple embedding providers (Ollama, OpenAI, etc.)
  • Provides a REST API for querying documentation using natural language
  • Leverages graph traversal for context-aware responses

Architecture

┌─────────────────┐
│  HTML Documents │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  HTML Parser    │
│  & Processor    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│  Entity         │────▶│  Embeddings  │
│  Extraction     │     │  (Ollama/    │
└────────┬────────┘     │   OpenAI)    │
         │              └──────────────┘
         ▼
┌─────────────────┐
│  Neo4j Graph    │
│  Construction   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GraphRAG       │
│  Query Engine   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Express API    │
│  Server         │
└─────────────────┘

Features

1. Knowledge Graph Construction

  • Automatic entity extraction from HTML documentation
  • Relationship detection between concepts
  • Hierarchical document structure preservation
  • Semantic similarity connections

2. Multi-Provider Embedding Support

  • Ollama: Local embedding generation (nomic-embed-text, mxbai-embed-large)
  • OpenAI: Cloud-based embeddings (text-embedding-3-small, text-embedding-3-large)
  • Configurable: Easy to add new providers

3. GraphRAG Query Processing

  • Semantic search using vector similarity
  • Graph traversal for contextual information
  • Community detection for topic clustering
  • Multi-hop reasoning across related concepts

4. REST API

  • /api/query - Natural language queries
  • /api/graph/stats - Graph statistics
  • /api/graph/search - Entity search
  • /api/health - Health check

Prerequisites

  1. Neo4j Aura or Local Instance

  2. Embedding Provider (choose one):

  3. Node.js >= 18.x

Installation

# Clone or navigate to the project
cd ieam-graphrag

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
nano .env

Configuration

Create a .env file with the following:

# Server Configuration
PORT=3000
HOST=localhost

# Neo4j Configuration
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j

# Embedding Provider (ollama or openai)
EMBEDDING_PROVIDER=ollama

# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.2:3b

# OpenAI Configuration (if using OpenAI)
OPENAI_API_KEY=your-api-key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_LLM_MODEL=gpt-4

# Data Paths
HTML_DOCS_PATH=./data/ieam-html
PROCESSED_DATA_PATH=./data/processed

# Graph Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
SIMILARITY_THRESHOLD=0.75
MAX_GRAPH_DEPTH=3

Usage

1. Import HTML Documentation to Neo4j

# Build the project
npm run build

# Parse HTML and import to Neo4j
npm run import

# This will:
# - Parse all HTML files in data/ieam-html
# - Extract entities and relationships
# - Generate embeddings
# - Create knowledge graph in Neo4j

2. Start the API Server

# Development mode with auto-reload
npm run dev

# Production mode
npm start

3. Query the Documentation

# Using curl
curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How to register an edge node in IEAM?"}'

# Using the provided test script
npm run test:query

4. Explore the Graph

# Get graph statistics
curl http://localhost:3000/api/graph/stats

# Search for entities
curl "http://localhost:3000/api/graph/search?q=edge+node"

Project Structure

ieam-graphrag/
├── src/
│   ├── config/
│   │   └── index.ts              # Configuration management
│   ├── parsers/
│   │   ├── html-parser.ts        # HTML document parser
│   │   └── entity-extractor.ts   # Entity extraction
│   ├── embeddings/
│   │   ├── base.ts               # Base embedding interface
│   │   ├── ollama.ts             # Ollama provider
│   │   └── openai.ts             # OpenAI provider
│   ├── graph/
│   │   ├── neo4j-client.ts       # Neo4j connection
│   │   ├── graph-builder.ts      # Graph construction
│   │   └── graph-query.ts        # Graph queries
│   ├── graphrag/
│   │   ├── query-processor.ts    # Query processing
│   │   └── context-builder.ts    # Context aggregation
│   ├── api/
│   │   ├── server.ts             # Express server
│   │   └── routes.ts             # API routes
│   ├── utils/
│   │   ├── logger.ts             # Logging utility
│   │   └── helpers.ts            # Helper functions
│   └── index.ts                  # Main entry point
├── data/
│   ├── ieam-html/                # HTML documentation
│   └── processed/                # Processed data
├── docs/
│   ├── ARCHITECTURE.md           # Architecture details
│   ├── API.md                    # API documentation
│   └── NEO4J_SETUP.md           # Neo4j setup guide
├── tests/
│   └── integration/              # Integration tests
├── .env.example                  # Environment template
├── package.json
├── tsconfig.json
└── README.md

Neo4j Graph Schema

Node Types

  1. Document

    • Properties: id, title, url, content, embedding
    • Represents a documentation page
  2. Section

    • Properties: id, title, content, level, embedding
    • Represents a section within a document
  3. Entity

    • Properties: id, name, type, description, embedding
    • Types: Concept, Component, Command, API, Configuration
  4. Topic

    • Properties: id, name, description
    • Represents high-level topics/categories

Relationship Types

  1. HAS_SECTION: Document → Section
  2. MENTIONS: Section → Entity
  3. RELATES_TO: Entity → Entity (semantic similarity)
  4. BELONGS_TO: Entity → Topic
  5. SIMILAR_TO: Document → Document (vector similarity)
  6. NEXT: Section → Section (sequential order)

GraphRAG Query Process

  1. Query Embedding: Convert user query to vector
  2. Semantic Search: Find relevant nodes using vector similarity
  3. Graph Traversal: Expand context through relationships
  4. Community Detection: Identify related concept clusters
  5. Context Aggregation: Combine information from multiple paths
  6. Response Generation: Use LLM with enriched context

API Examples

Query Documentation

// POST /api/query
{
  "query": "How to register an edge node in IEAM?",
  "maxResults": 5,
  "includeGraph": true
}

// Response
{
  "answer": "To register an edge node in IEAM...",
  "sources": [
    {
      "title": "Registering Edge Nodes",
      "url": "...",
      "relevance": 0.95
    }
  ],
  "graph": {
    "nodes": [...],
    "relationships": [...]
  }
}

Get Graph Statistics

// GET /api/graph/stats
{
  "nodes": {
    "Document": 150,
    "Section": 450,
    "Entity": 320,
    "Topic": 25
  },
  "relationships": {
    "HAS_SECTION": 450,
    "MENTIONS": 1200,
    "RELATES_TO": 850
  },
  "totalNodes": 945,
  "totalRelationships": 2500
}

Advanced Features

1. Community Detection

Automatically groups related concepts:

CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId

2. PageRank for Important Concepts

Identifies key concepts:

CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score

3. Shortest Path Queries

Finds connections between concepts:

MATCH path = shortestPath(
  (a:Entity {name: 'Edge Node'})-[*]-(b:Entity {name: 'Agent'})
)
RETURN path

Performance Optimization

  1. Vector Indexes: Create vector indexes for fast similarity search
  2. Batch Processing: Import documents in batches
  3. Connection Pooling: Reuse Neo4j connections
  4. Caching: Cache frequent queries
  5. Parallel Processing: Process documents concurrently

Troubleshooting

Neo4j Connection Issues

# Test connection
npm run test:neo4j

# Check Neo4j logs in Aura console

Embedding Generation Slow

# For Ollama, ensure model is pulled
ollama pull nomic-embed-text

# Check Ollama is running
curl http://localhost:11434/api/tags

Import Fails

# Check HTML files exist
ls -la data/ieam-html

# Verify Neo4j credentials
npm run test:config

Development

# Run tests
npm test

# Run specific test
npm test -- graph-builder

# Lint code
npm run lint

# Format code
npm run format

# Type check
npm run type-check

Deployment

Docker Deployment

# Build image
docker build -t ieam-graphrag .

# Run container
docker run -p 3000:3000 --env-file .env ieam-graphrag

Cloud Deployment

See docs/DEPLOYMENT.md for cloud deployment guides.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

ISC

Resources

Support

For issues and questions:


Made with ❤️ for IEAM Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages