A Java implementation of Microsoft Research's GraphRAG approach, providing hierarchical community-based retrieval-augmented generation for enhanced question-answering over document collections.
- Hierarchical Community Detection: Leiden algorithm for multi-level community structure
- ONNX-based Embeddings: High-performance embedding generation using ONNX Runtime
- Multiple Search Modes: Local, global, and hybrid search strategies
- LLM Integration: Support for AWS Bedrock and local LLM providers
- Native Binary Support: GraalVM Native Image for instant startup and low memory footprint
- Comprehensive Document Support: PDF, Word, HTML, and plain text processing
- Java 21 or higher
- Maven 3.9.0 or higher
- (Optional) GraalVM 21 for native image compilation
# Download the latest release
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag.jar
# Run with Java
java -jar hcgraphrag.jar --help# Download native binary for your platform
# Linux
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-linux-amd64
chmod +x hcgraphrag-linux-amd64
./hcgraphrag-linux-amd64 --help
# macOS
wget https://github.com/asopi/hcgraphrag-java/releases/latest/download/hcgraphrag-darwin-amd64
chmod +x hcgraphrag-darwin-amd64
./hcgraphrag-darwin-amd64 --help
# Windows
# Download hcgraphrag-windows-amd64.exe and run# Clone the repository
git clone https://github.com/asopi/hcgraphrag-java.git
cd hcgraphrag-java
# Build with Maven
mvn clean package
# Run
java -jar target/hcgraphrag-*.jar --help
# Build native image (requires GraalVM)
mvn clean package -Pnative
./target/hcgraphrag --helpCreate a config.yaml file:
io:
input_dir: ./data/raw
output_dir: ./output
embedding:
model: sentence-transformers/all-MiniLM-L6-v2
backend: onnx-runtime
batch_size: 32
llm:
provider: bedrock # or 'local'
model: anthropic.claude-3-haiku-20240307
region: us-east-1
graph:
leiden:
levels: 2
resolution: 1.0
retrieval:
top_k: 12
query:
local_k: 6
global_levels: [0, 1]# Process all documents in input directory
hcgraphrag load --config config.yaml
# With custom LLM settings
hcgraphrag load --config config.yaml \
--llm-provider local \
--llm-endpoint http://localhost:11434# Local search (entity-based)
hcgraphrag query "What is machine learning?" --mode local
# Global search (community-based)
hcgraphrag query "Summarize the main topics" --mode global
# Hybrid search (combines both)
hcgraphrag query "Explain the key concepts" --mode hybridhcgraphrag docs --config config.yamlimport tech.asopi.hcgraphrag.config.HcGraphragConfig;
import tech.asopi.hcgraphrag.service.IngestService;
import tech.asopi.hcgraphrag.service.QueryService;
// Load configuration
HcGraphragConfig config = HcGraphragConfig.fromYamlFile(Path.of("config.yaml"));
// Process documents
IngestService ingestService = new IngestService(config);
List<Document> documents = ingestService.processAllDocuments();
// Query
QueryService queryService = new QueryService(config);
List<SearchResult> results = queryService.query("Your question here", "hybrid");- IngestService: Document loading, chunking, and processing pipeline
- EmbeddingService: ONNX-based embedding generation for texts
- GraphService: Entity/relationship extraction and graph construction
- CommunityService: Hierarchical community detection using Leiden algorithm
- SearchService: Multi-modal search with vector similarity and graph traversal
- QueryService: Query orchestration with local/global/hybrid strategies
Documents → Chunking → Embedding → Entity Extraction → Graph Construction
↓
Community Detection
↓
Community Summarization
↓
Index Building
↓
Query Processing
- Sentence-Transformers models (via ONNX)
- Japanese language support with
hotchpotch/japanese-embeddingmodels - Custom ONNX models
- AWS Bedrock: Claude, Llama, and other Bedrock models
- Local: Ollama, LlamaCpp, ONNX-GenAI
- Custom: Extensible provider interface
| Operation | Throughput | Latency (P99) |
|---|---|---|
| Embedding Generation | 50+ texts/sec | < 100ms |
| Document Ingestion | 1+ MB/sec | - |
| Local Search | 100+ queries/sec | < 50ms |
| Global Search | 50+ queries/sec | < 100ms |
| Index Persistence | 20+ MB/sec | - |
- JVM Mode: 512MB - 2GB depending on dataset size
- Native Mode: 128MB - 512MB with instant startup
# Unit tests
mvn test
# Integration tests
mvn verify
# Performance tests
mvn test -Drun.performance.tests=true# Run all quality checks
mvn clean verify -Pquality
# Individual checks
mvn checkstyle:check
mvn spotbugs:check
mvn jacoco:reportContributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft Research for the GraphRAG paper
- Deep Java Library (DJL) team for ONNX Runtime integration
- JGraphT developers for graph algorithms implementation
If you use HC-GraphRAG in your research, please cite:
@software{hcgraphrag2024,
title = {HC-GraphRAG: Hierarchical Community Graph RAG Java Implementation},
author = {ASOPI Tech},
year = {2024},
url = {https://github.com/asopi/hcgraphrag-java}
}For issues, questions, or suggestions, please open an issue on GitHub.