Skip to content

patw/mGraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mGraphRAG

A GraphRAG demo using MongoDB Atlas as the graph and vector store, VoyageAI for embeddings, and a local LLM (via llama.cpp) for entity extraction and answering.

The pipeline ingests markdown documents, extracts entities and relationships using an LLM, stores them as a graph in MongoDB, then answers questions by anchoring on relevant entities via vector search and expanding context through multi-hop graph traversal ($graphLookup).

How it works

  1. Ingest — each document is chunked with LangChain's MarkdownTextSplitter. Each chunk is embedded (VoyageAI) and stored. An LLM extracts entities and relationships from the chunk, which are upserted as nodes and edges in MongoDB.
  2. Query — the question is embedded and used to find anchor entities via Atlas Vector Search. A $graphLookup traversal expands outward up to 2 hops, pulling in neighboring entities and evidence chunks. The assembled subgraph is printed (so you can see exactly what context the LLM received) then passed to the LLM for a final answer.

Collections

Collection Purpose
chunks Raw text chunks with embeddings
entities Graph nodes with embeddings
relationships Graph edges linking entities

Installation

Option A — uv (recommended)

uv is a fast Python package manager. If you don't have it:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a virtual environment and install dependencies:

uv venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Option B — pip

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configuration

Copy .env.example to .env and fill in your credentials:

cp .env.example .env
MONGO_URI=mongodb+srv://<user>:<pass>@<cluster>.mongodb.net/
VOYAGE_API_KEY=your-voyage-api-key

You will also need an Atlas cluster with Vector Search enabled (M10 or higher, or a local Atlas deployment). The ingest script will attempt to create the vector search indexes automatically; if that fails, create them manually in the Atlas UI:

  • Collection graphrag.entities, index name entity_vector_index, field embedding, 1024 dimensions, cosine similarity
  • Collection graphrag.chunks, index name chunk_vector_index, field embedding, 1024 dimensions, cosine similarity

The local LLM endpoint and model are hardcoded in both scripts (http://10.0.23.6:8086/v1, model gemma-4). Edit the llm = OpenAI(...) line to point at a different server or swap in a real OpenAI API key.

Usage

Ingest docs (run once, or whenever docs change — this takes a while):

python mgraphrag-ingest.py

Query:

# As a command-line argument
python mgraphrag-query.py "How do I create a vector search index with scalar quantization?"

# Piped from stdin
echo "What is binary quantization?" | python mgraphrag-query.py

# Interactive (type question, then Ctrl+D)
python mgraphrag-query.py

The query script prints the graph context (entities, relationships, and evidence chunks) that was passed to the LLM before showing the final answer, so you can trace exactly how the result was produced.

Notes

  • The embedding model is voyage-4. Changing it requires re-ingesting all documents since stored embeddings must match the query embedding space.
  • Ingest nukes and rebuilds all three collections on each run.

About

Simple GraphRAG example for Mongo/Atlas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages