This repository contains all code, data, and notebooks referenced in the Neo4j GraphRAG book. It provides a step-by-step guide to building a Retrieval-Augmented Generation (RAG) system using Neo4j, OpenAI, LangChain, and real-world financial filings.
- Ingest and model company, filing, and holdings data as a knowledge graph in Neo4j
- Run retrieval-augmented queries and semantic search over filings and financial data
- Use Jupyter notebooks for hands-on experimentation
01_PDF_Loader_for_Neo4j_GraphRAG.ipynb
— Loads PDF and CSV data into Neo4j, builds the graph, and creates vector embeddings02_Retreivers_notebook.ipynb
— Interactive notebook for querying the graph, running RAG workflows, and experimenting with retrieval03_Neo4j_agent_notebook.ipynb
— (Optional) Agent-based exploration and advanced workflowsdata/
— Contains:Asset_Manager_Holdings.csv
,Company_Filings.csv
: Sample data filesform10k-sample/
: Example 10-K filings for graph ingestion
Additional_Info/
— (Optional) Additional scripts and explained notebooks.env.sample
— Template for required environment variables
- Clone the repository
- Install dependencies
pip install -r requirements.txt
- Configure environment variables:
- Copy
.env.sample
to.env
and fill in:OPENAI_API_KEY
NEO4J_URI
,NEO4J_USERNAME
,NEO4J_PASSWORD
- Copy
- Start Neo4j AuraDB (see book for connection details)
- Run the loader notebook (
01_PDF_Loader_for_Neo4j_GraphRAG.ipynb
) to ingest data - Open
02_Retreivers_notebook.ipynb
to run retrievals and explore the graph
- All sample data used in the book is in the
data/
directory - You may add or replace files in
form10k-sample/
as needed for your experiments
- You may safely delete the following if not used in your workflow:
Additional_Info/__pycache__/
- Any script or notebook in
Additional_Info/
not referenced in the book neo4j_importer_model_2025-05-08.json
if not required
- Keep only the notebooks and data files you use in your book walkthrough
- Never commit secrets —
.env
is in.gitignore
- Use Jupyter or VS Code for interactive exploration
- All notebooks are designed to be run step-by-step as described in the book
For questions or help, open an issue in this repo.