Graph-Code: A Graph-Based RAG System for Python Codebases

A sophisticated Retrieval-Augmented Generation (RAG) system that analyzes Python repositories, builds knowledge graphs, and enables natural language querying of codebase structure and relationships.

🚀 Features

AST-based Code Analysis: Deep parsing of Python files to extract classes, functions, methods, and their relationships
Knowledge Graph Storage: Uses Memgraph to store codebase structure as an interconnected graph
Natural Language Querying: Ask questions about your codebase in plain English
AI-Powered Cypher Generation: Leverages Google Gemini to translate natural language to Cypher queries
Code Snippet Retrieval: Retrieves actual source code snippets for found functions/methods
Dependency Analysis: Parses pyproject.toml to understand external dependencies

🏗️ Architecture

The system consists of two main components:

Repository Parser (repo_parser.py): Analyzes Python codebases and ingests data into Memgraph
RAG System (codebase_rag/): Interactive CLI for querying the stored knowledge graph

Core Components

Graph Database: Memgraph for storing code structure as nodes and relationships
LLM Integration: Google Gemini for natural language processing
Code Analysis: AST traversal for extracting code elements
Query Tools: Specialized tools for graph querying and code retrieval

📋 Prerequisites

Python 3.12+
Docker & Docker Compose (for Memgraph)
Google Gemini API key
uv package manager

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd graph-code

Install dependencies:

uv sync

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Required environment variables:

GEMINI_API_KEY=your-api-key
GEMINI_MODEL_ID=gemini-2.5-pro
MODEL_CYPHER_ID=gemini-2.5-flash-lite-preview-06-17
MEMGRAPH_HOST=localhost
MEMGRAPH_PORT=7687

Start Memgraph database:

docker-compose up -d

🎯 Usage

Step 1: Parse a Repository

Parse and ingest a Python repository into the knowledge graph:

python repo_parser.py /path/to/your/python/repo --clean

Options:

--clean: Clear existing data before parsing
--host: Memgraph host (default: localhost)
--port: Memgraph port (default: 7687)

Step 2: Query the Codebase

Start the interactive RAG CLI:

python -m codebase_rag.main --repo-path /path/to/your/repo

Example queries:

"Show me all classes that contain 'user' in their name"
"Find functions related to database operations"
"What methods does the User class have?"
"Show me functions that handle authentication"

📊 Graph Schema

The knowledge graph uses the following node types and relationships:

Node Types

Project: Root node representing the entire repository
Package: Python packages (directories with __init__.py)
Module: Individual Python files
Class: Class definitions
Function: Module-level functions
Method: Class methods
Folder: Regular directories
File: Non-Python files
ExternalPackage: External dependencies

Relationships

CONTAINS_PACKAGE/MODULE/FILE/FOLDER: Hierarchical containment
DEFINES: Module defines classes/functions
DEFINES_METHOD: Class defines methods
DEPENDS_ON_EXTERNAL: Project depends on external packages

🔧 Configuration

Configuration is managed through environment variables and the config.py file:

MEMGRAPH_HOST = "localhost"
MEMGRAPH_PORT = 7687
GEMINI_MODEL_ID = "gemini-2.5-pro"  # Main RAG orchestrator model
MODEL_CYPHER_ID = "gemini-2.5-flash-lite-preview-06-17"  # Cypher generation model
TARGET_REPO_PATH = "."
GEMINI_API_KEY = "required"

🏃‍♂️ Development

Project Structure

graph-code/
├── repo_parser.py              # Repository analysis and ingestion
├── codebase_rag/              # RAG system package
│   ├── main.py                # CLI entry point
│   ├── config.py              # Configuration management
│   ├── prompts.py             # LLM prompts and schemas
│   ├── schemas.py             # Pydantic models
│   ├── services/              # Core services
│   │   ├── graph_db.py        # Memgraph integration
│   │   └── llm.py             # Gemini LLM integration
│   └── tools/                 # RAG tools
│       ├── codebase_query.py  # Graph querying tool
│       └── code_retrieval.py  # Code snippet retrieval
├── docker-compose.yaml        # Memgraph setup
└── pyproject.toml            # Project dependencies

Key Dependencies

pydantic-ai: AI agent framework
pymgclient: Memgraph Python client
loguru: Advanced logging
python-dotenv: Environment variable management

🐛 Debugging

Check Memgraph connection:
- Ensure Docker containers are running: docker-compose ps
- Verify Memgraph is accessible on port 7687
View database in Memgraph Lab:
- Open http://localhost:3000
- Connect to memgraph:7687
Enable debug logging:
- The RAG orchestrator runs in debug mode by default
- Check logs for detailed execution traces

🤝 Contributing

Follow the established code structure
Keep files under 100 lines (as per user rules)
Use type annotations
Follow conventional commit messages
Use DRY principles

🙋‍♂️ Support

For issues or questions:

Check the logs for error details
Verify Memgraph connection
Ensure all environment variables are set
Review the graph schema matches your expectations

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
codebase_rag		codebase_rag
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
repo_parser.py		repo_parser.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph-Code: A Graph-Based RAG System for Python Codebases

🚀 Features

🏗️ Architecture

Core Components

📋 Prerequisites

🛠️ Installation

🎯 Usage

Step 1: Parse a Repository

Step 2: Query the Codebase

📊 Graph Schema

Node Types

Relationships

🔧 Configuration

🏃‍♂️ Development

Project Structure

Key Dependencies

🐛 Debugging

🤝 Contributing

🙋‍♂️ Support

Star History

About

Uh oh!

Releases

Packages

Languages

License

jcvikl/code-graph-rag

Folders and files

Latest commit

History

Repository files navigation

Graph-Code: A Graph-Based RAG System for Python Codebases

🚀 Features

🏗️ Architecture

Core Components

📋 Prerequisites

🛠️ Installation

🎯 Usage

Step 1: Parse a Repository

Step 2: Query the Codebase

📊 Graph Schema

Node Types

Relationships

🔧 Configuration

🏃‍♂️ Development

Project Structure

Key Dependencies

🐛 Debugging

🤝 Contributing

🙋‍♂️ Support

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages