Skip to content

hc-graphrag/java-core

Repository files navigation

HC-GraphRAG (Hierarchical Community Graph RAG) - Java Implementation

CI Pipeline License: MIT Java Version

A Java implementation of Microsoft Research's GraphRAG approach, providing hierarchical community-based retrieval-augmented generation for enhanced question-answering over document collections.

Features

  • Hierarchical Community Detection: Leiden algorithm for multi-level community structure
  • ONNX-based Embeddings: High-performance embedding generation using ONNX Runtime
  • Multiple Search Modes: Local, global, and hybrid search strategies
  • LLM Integration: Support for AWS Bedrock and local LLM providers
  • Native Binary Support: GraalVM Native Image for instant startup and low memory footprint
  • Comprehensive Document Support: PDF, Word, HTML, and plain text processing

Quick Start

Prerequisites

  • Java 21 or higher
  • Maven 3.9.0 or higher
  • (Optional) GraalVM 21 for native image compilation

Installation

Using JAR

# Download the latest release
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag.jar

# Run with Java
java -jar hcgraphrag.jar --help

Using Native Binary

# Download native binary for your platform
# Linux
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-linux-amd64
chmod +x hcgraphrag-linux-amd64
./hcgraphrag-linux-amd64 --help

# macOS
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-darwin-amd64
chmod +x hcgraphrag-darwin-amd64
./hcgraphrag-darwin-amd64 --help

# Windows
# Download hcgraphrag-windows-amd64.exe and run

Building from Source

# Clone the repository
git clone https://github.com/asopi/hcgraphrag-java.git
cd hcgraphrag-java

# Build with Maven
mvn clean package

# Run
java -jar target/hcgraphrag-*.jar --help

# Build native image (requires GraalVM)
mvn clean package -Pnative
./target/hcgraphrag --help

Usage

Basic Configuration

Create a config.yaml file:

io:
  input_dir: ./data/raw
  output_dir: ./output

embedding:
  model: sentence-transformers/all-MiniLM-L6-v2
  backend: onnx-runtime
  batch_size: 32

llm:
  provider: bedrock  # or 'local'
  model: anthropic.claude-3-haiku-20240307
  region: us-east-1

graph:
  leiden:
    levels: 2
    resolution: 1.0

retrieval:
  top_k: 12

query:
  local_k: 6
  global_levels: [0, 1]

Command Line Interface

Load Documents and Build Knowledge Graph

# Process all documents in input directory
hcgraphrag load --config config.yaml

# With custom LLM settings
hcgraphrag load --config config.yaml \
  --llm-provider local \
  --llm-endpoint http://localhost:11434

Query the Knowledge Graph

# Local search (entity-based)
hcgraphrag query "What is machine learning?" --mode local

# Global search (community-based)
hcgraphrag query "Summarize the main topics" --mode global

# Hybrid search (combines both)
hcgraphrag query "Explain the key concepts" --mode hybrid

List Indexed Documents

hcgraphrag docs --config config.yaml

Programmatic Usage

import tech.asopi.hcgraphrag.config.HcGraphragConfig;
import tech.asopi.hcgraphrag.service.IngestService;
import tech.asopi.hcgraphrag.service.QueryService;

// Load configuration
HcGraphragConfig config = HcGraphragConfig.fromYamlFile(Path.of("config.yaml"));

// Process documents
IngestService ingestService = new IngestService(config);
List<Document> documents = ingestService.processAllDocuments();

// Query
QueryService queryService = new QueryService(config);
List<SearchResult> results = queryService.query("Your question here", "hybrid");

Architecture

Core Components

  1. IngestService: Document loading, chunking, and processing pipeline
  2. EmbeddingService: ONNX-based embedding generation for texts
  3. GraphService: Entity/relationship extraction and graph construction
  4. CommunityService: Hierarchical community detection using Leiden algorithm
  5. SearchService: Multi-modal search with vector similarity and graph traversal
  6. QueryService: Query orchestration with local/global/hybrid strategies

Data Flow

Documents → Chunking → Embedding → Entity Extraction → Graph Construction
                                                      ↓
                                              Community Detection
                                                      ↓
                                              Community Summarization
                                                      ↓
                                                Index Building
                                                      ↓
                                                Query Processing

Model Support

Embedding Models

  • Sentence-Transformers models (via ONNX)
  • Japanese language support with hotchpotch/japanese-embedding models
  • Custom ONNX models

LLM Providers

  • AWS Bedrock: Claude, Llama, and other Bedrock models
  • Local: Ollama, LlamaCpp, ONNX-GenAI
  • Custom: Extensible provider interface

Performance

Benchmarks

Operation Throughput Latency (P99)
Embedding Generation 50+ texts/sec < 100ms
Document Ingestion 1+ MB/sec -
Local Search 100+ queries/sec < 50ms
Global Search 50+ queries/sec < 100ms
Index Persistence 20+ MB/sec -

Memory Requirements

  • JVM Mode: 512MB - 2GB depending on dataset size
  • Native Mode: 128MB - 512MB with instant startup

Development

Running Tests

# Unit tests
mvn test

# Integration tests
mvn verify

# Performance tests
mvn test -Drun.performance.tests=true

Code Quality

# Run all quality checks
mvn clean verify -Pquality

# Individual checks
mvn checkstyle:check
mvn spotbugs:check
mvn jacoco:report

Contributing

Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Microsoft Research for the GraphRAG paper
  • Deep Java Library (DJL) team for ONNX Runtime integration
  • JGraphT developers for graph algorithms implementation

Citation

If you use HC-GraphRAG in your research, please cite:

@software{hcgraphrag2024,
  title = {HC-GraphRAG: Hierarchical Community Graph RAG Java Implementation},
  author = {ASOPI Tech},
  year = {2024},
  url = {https://github.com/asopi/hcgraphrag-java}
}

Support

For issues, questions, or suggestions, please open an issue on GitHub.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published