HC-GraphRAG (Hierarchical Community Graph RAG) - Java Implementation

A Java implementation of Microsoft Research's GraphRAG approach, providing hierarchical community-based retrieval-augmented generation for enhanced question-answering over document collections.

Features

Hierarchical Community Detection: Leiden algorithm for multi-level community structure
ONNX-based Embeddings: High-performance embedding generation using ONNX Runtime
Multiple Search Modes: Local, global, and hybrid search strategies
LLM Integration: Support for AWS Bedrock and local LLM providers
Native Binary Support: GraalVM Native Image for instant startup and low memory footprint
Comprehensive Document Support: PDF, Word, HTML, and plain text processing

Quick Start

Prerequisites

Java 21 or higher
Maven 3.9.0 or higher
(Optional) GraalVM 21 for native image compilation

Installation

Using JAR

# Download the latest release
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag.jar

# Run with Java
java -jar hcgraphrag.jar --help

Using Native Binary

# Download native binary for your platform
# Linux
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-linux-amd64
chmod +x hcgraphrag-linux-amd64
./hcgraphrag-linux-amd64 --help

# macOS
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-darwin-amd64
chmod +x hcgraphrag-darwin-amd64
./hcgraphrag-darwin-amd64 --help

# Windows
# Download hcgraphrag-windows-amd64.exe and run

Building from Source

# Clone the repository
git clone https://github.com/asopi/hcgraphrag-java.git
cd hcgraphrag-java

# Build with Maven
mvn clean package

# Run
java -jar target/hcgraphrag-*.jar --help

# Build native image (requires GraalVM)
mvn clean package -Pnative
./target/hcgraphrag --help

Usage

Basic Configuration

Create a config.yaml file:

io:
  input_dir: ./data/raw
  output_dir: ./output

embedding:
  model: sentence-transformers/all-MiniLM-L6-v2
  backend: onnx-runtime
  batch_size: 32

llm:
  provider: bedrock  # or 'local'
  model: anthropic.claude-3-haiku-20240307
  region: us-east-1

graph:
  leiden:
    levels: 2
    resolution: 1.0

retrieval:
  top_k: 12

query:
  local_k: 6
  global_levels: [0, 1]

Command Line Interface

Load Documents and Build Knowledge Graph

# Process all documents in input directory
hcgraphrag load --config config.yaml

# With custom LLM settings
hcgraphrag load --config config.yaml \
  --llm-provider local \
  --llm-endpoint http://localhost:11434

Query the Knowledge Graph

# Local search (entity-based)
hcgraphrag query "What is machine learning?" --mode local

# Global search (community-based)
hcgraphrag query "Summarize the main topics" --mode global

# Hybrid search (combines both)
hcgraphrag query "Explain the key concepts" --mode hybrid

List Indexed Documents

hcgraphrag docs --config config.yaml

Programmatic Usage

import tech.asopi.hcgraphrag.config.HcGraphragConfig;
import tech.asopi.hcgraphrag.service.IngestService;
import tech.asopi.hcgraphrag.service.QueryService;

// Load configuration
HcGraphragConfig config = HcGraphragConfig.fromYamlFile(Path.of("config.yaml"));

// Process documents
IngestService ingestService = new IngestService(config);
List<Document> documents = ingestService.processAllDocuments();

// Query
QueryService queryService = new QueryService(config);
List<SearchResult> results = queryService.query("Your question here", "hybrid");

Architecture

Core Components

IngestService: Document loading, chunking, and processing pipeline
EmbeddingService: ONNX-based embedding generation for texts
GraphService: Entity/relationship extraction and graph construction
CommunityService: Hierarchical community detection using Leiden algorithm
SearchService: Multi-modal search with vector similarity and graph traversal
QueryService: Query orchestration with local/global/hybrid strategies

Data Flow

Documents → Chunking → Embedding → Entity Extraction → Graph Construction
                                                      ↓
                                              Community Detection
                                                      ↓
                                              Community Summarization
                                                      ↓
                                                Index Building
                                                      ↓
                                                Query Processing

Model Support

Embedding Models

Sentence-Transformers models (via ONNX)
Japanese language support with hotchpotch/japanese-embedding models
Custom ONNX models

LLM Providers

AWS Bedrock: Claude, Llama, and other Bedrock models
Local: Ollama, LlamaCpp, ONNX-GenAI
Custom: Extensible provider interface

Performance

Benchmarks

Operation	Throughput	Latency (P99)
Embedding Generation	50+ texts/sec	< 100ms
Document Ingestion	1+ MB/sec	-
Local Search	100+ queries/sec	< 50ms
Global Search	50+ queries/sec	< 100ms
Index Persistence	20+ MB/sec	-

Memory Requirements

JVM Mode: 512MB - 2GB depending on dataset size
Native Mode: 128MB - 512MB with instant startup

Development

Running Tests

# Unit tests
mvn test

# Integration tests
mvn verify

# Performance tests
mvn test -Drun.performance.tests=true

Code Quality

# Run all quality checks
mvn clean verify -Pquality

# Individual checks
mvn checkstyle:check
mvn spotbugs:check
mvn jacoco:report

Contributing

Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Microsoft Research for the GraphRAG paper
Deep Java Library (DJL) team for ONNX Runtime integration
JGraphT developers for graph algorithms implementation

Citation

If you use HC-GraphRAG in your research, please cite:

@software{hcgraphrag2024,
  title = {HC-GraphRAG: Hierarchical Community Graph RAG Java Implementation},
  author = {ASOPI Tech},
  year = {2024},
  url = {https://github.com/asopi/hcgraphrag-java}
}

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.serena		.serena
.spec-workflow		.spec-workflow
config		config
demo-lance-db		demo-lance-db
docs		docs
samples		samples
scripts		scripts
src		src
.env		.env
.env.example		.env.example
.gitignore		.gitignore
.tool-versions		.tool-versions
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
pom.xml		pom.xml
作業指示書_20250905.md		作業指示書_20250905.md

hc-graphrag/java-core

Folders and files

Latest commit

History

Repository files navigation