Core-based Hierarchies for Efficient GraphRAG

This repository contains the code accompanying the paper
“Core-based Hierarchies for Efficient GraphRAG.”

Our implementation builds directly on the official GraphRAG benchmarking framework released by Microsoft. We introduce k-core–based hierarchical community construction algorithms as drop-in replacements for Leiden-based community detection. The overall pipeline, datasets, and evaluation procedures remain unchanged.

Code Base and Attribution

The code follows the same setup and execution procedure as the original GraphRAG benchmarking repository:

https://github.com/microsoft/graphrag-benchmarking-datasets

We retain:

The same dataset format
The same indexing and query-time execution flow
The same evaluation and head-to-head comparison framework

Our contributions are limited to modifications in the community construction and hierarchy generation components.

Running the Code

The code is executed using the standard GraphRAG commands. No additional steps are required beyond those described in the original GraphRAG repository.

Installing Necessary Packages

Run the following from the root directory:

pip install -e ./graphrag

Prepare Your Workspace

Create a new directory to hold your input files (e.g., for the Kevin Scott podcast benchmark):

mkdir -p ./kevin_scott_podcasts/input

Place the unzipped input files into the input folder.

Initialize the GraphRAG Workspace

python graphrag/cli/main.py init  --root kevin_scott_podcasts/

This command creates two files inside ./kevin_scott_podcasts/:

.env: contains the GRAPHRAG_API_KEY environment variable. Set it to your OpenAI or Azure OpenAI API key.
settings.yaml: configures the GraphRAG pipeline. You may edit this file to customize pipeline behavior.

Indexing

Indexing constructs the knowledge graph and hierarchical communities:

python graphrag/cli/main.py index --root ./kevin_scott_podcasts --community RkH

Here, RkH specifies the community construction algorithm. The default is the original GraphRAG Leiden algorithm. You may alternatively select one of our proposed heuristics:

RkH
M2hC
MRC

Indexing time varies depending on dataset size and may take several minutes to several hours.

Query

You can now run queries against your indexed dataset. For example, to perform global search on leaf-level (LF) communities:

python graphrag/cli/main.py query \
  --root ./kevin_scott_podcasts \
  --method global \
  --community-level LF \
  --query "What recurring topics do tech leaders emphasize in their discussions?"

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.semversioner		.semversioner
.vscode		.vscode
docs		docs
examples_notebooks		examples_notebooks
graphrag		graphrag
scripts		scripts
tests		tests
unified-search-app		unified-search-app
.gitignore		.gitignore
README.md		README.md
dictionary.txt		dictionary.txt
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Core-based Hierarchies for Efficient GraphRAG

Code Base and Attribution

Running the Code

Installing Necessary Packages

Prepare Your Workspace

Initialize the GraphRAG Workspace

Indexing

Query

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Core-based Hierarchies for Efficient GraphRAG

Code Base and Attribution

Running the Code

Installing Necessary Packages

Prepare Your Workspace

Initialize the GraphRAG Workspace

Indexing

Query

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages