GitHab: AI-Powered Codebase Intelligence Agent

GitHab is a sophisticated RAG (Retrieval-Augmented Generation) system designed to "understand" entire python GitHub repositories. By combining AST parsing, modular chunking, and multi-agent orchestration via LangGraph, GitHab allows developers to have high-context conversations with their python codebases.

Key Features

Deep AST Parsing: Extracts functions, classes, and dependencies to preserve code logic, not just raw text.
Dual-Stream Indexing: Stores both raw code chunks and LLM-generated summaries in Pinecone for superior retrieval accuracy.
Intelligent Query Optimization: Uses a specialized "Understand" node to rewrite messy user questions into optimized search queries.
Stateful Multi-Agent Workflow: Orchestrated by LangGraph, ensuring a reliable path from question → retrieval → analysis → answer.
Persistent Context: Remembers which repository you are discussing across chat sessions.

Project Architecture

. ├── AI/ # Agentic logic & LangGraph │ ├── graph/ # Workflow definitions │ └── nodes/ # Specialized LLM tasks ├── ingestion/ # Data processing pipeline ├── pipeline/ # Orchestration layer ├── templates/ # Flask frontend (HTML) ├── vectorestore/ # Database & Embeddings ├── main.py # Flask Server Entry Point └── .env # Environment Secrets

The system is organized into specialized modules to ensure scalability and maintainability:

1. Ingestion Engine (`/ingestion`)

repository_loader.py: Handles cloning and local management of GitHub repos.
ast_parser.py: Navigates the Abstract Syntax Tree to identify code structures.
chunker.py: Breaks code into "semantic" chunks with rich metadata.
summary.py: Uses LLMs to generate high-level summaries of code modules.

2. AI & Agent Logic (`/AI`)

graph/stategraph.py: The "brain" of the app. Defines the LangGraph flow and state transitions.
nodes/: Individual processing units:
- understand_question.py: Optimizes user intent.
- retrieve_code_context.py: Queries Pinecone namespaces.
- analyze_code.py: Reasons over retrieved snippets.
- generate_answer.py: Produces the final natural language response.

3. Vector Storage (`/vectorestore`)

pinecone_client.py: Manages index lifecycle and serverless configurations.
vectordb.py: Handles embedding generation and namespace-isolated storage.

Tech Stack

Framework: Flask (Backend), Jinja2 (Frontend)
Orchestration: LangChain & LangGraph
LLMs: Nvidia Nemotron (via OpenRouter/NIM)
Embeddings: HuggingFace all-MiniLM-L6-v2
Vector Database: Pinecone

Getting Started

1. Prerequisites

Python 3.10+
Pinecone API Key
OpenAI-Compatible API Key (e.g., OpenRouter or NVIDIA)

2. Installation

# Clone this repository
git clone https://github.com/steve601/codebase-Understanding-agent.git
cd githab

# Install dependencies (using uv or pip)
pip install -r requirements.txt

3. Configuration

# Create .env file and add;
OPENAI_API_KEY=your_key_here
OPENAI_API_BASE=https://openrouter.ai  # or your preferred provider
PINECONE_API_KEY=your_key_here

4. Run the application

uv run python main.py

Future Roadmap

We are constantly working to make GitHab more powerful and accessible. Our upcoming features include:

Multi-Language AST Parsing: Expanding beyond Python to support TypeScript, Go, and Java using Tree-sitter for universal codebase compatibility.
Streaming Responses: Transitioning from batch processing to real-time token streaming (SSE) to provide an "instant-reply" chat experience.
Voice-to-Code Interaction: Integrated Speech-to-Text (STT) for hands-free codebase navigation and natural voice querying.
Incremental Ingestion: Automatically detecting file changes in a repository to update Pinecone vectors without re-parsing the entire project.
Local LLM Support: Optional integration with Ollama or vLLM for 100% private, on-premise code analysis.

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Author

Stephen Odhiambo
Building the future of AI-assisted software engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AI		AI
ingestion		ingestion
pipeline		pipeline
static		static
templates		templates
vectorestore		vectorestore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHab: AI-Powered Codebase Intelligence Agent

Key Features

Project Architecture

1. Ingestion Engine (`/ingestion`)

2. AI & Agent Logic (`/AI`)

3. Vector Storage (`/vectorestore`)

Tech Stack

Getting Started

1. Prerequisites

2. Installation

3. Configuration

4. Run the application

Future Roadmap

Contributing

Author

Sample tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitHab: AI-Powered Codebase Intelligence Agent

Key Features

Project Architecture

1. Ingestion Engine (/ingestion)

2. AI & Agent Logic (/AI)

3. Vector Storage (/vectorestore)

Tech Stack

Getting Started

1. Prerequisites

2. Installation

3. Configuration

4. Run the application

Future Roadmap

Contributing

Author

Sample tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Ingestion Engine (`/ingestion`)

2. AI & Agent Logic (`/AI`)

3. Vector Storage (`/vectorestore`)

Packages