Skip to content

steve601/codebase-Understanding-agent

Repository files navigation

GitHab: AI-Powered Codebase Intelligence Agent

GitHab is a sophisticated RAG (Retrieval-Augmented Generation) system designed to "understand" entire python GitHub repositories. By combining AST parsing, modular chunking, and multi-agent orchestration via LangGraph, GitHab allows developers to have high-context conversations with their python codebases.

Key Features

  • Deep AST Parsing: Extracts functions, classes, and dependencies to preserve code logic, not just raw text.
  • Dual-Stream Indexing: Stores both raw code chunks and LLM-generated summaries in Pinecone for superior retrieval accuracy.
  • Intelligent Query Optimization: Uses a specialized "Understand" node to rewrite messy user questions into optimized search queries.
  • Stateful Multi-Agent Workflow: Orchestrated by LangGraph, ensuring a reliable path from question → retrieval → analysis → answer.
  • Persistent Context: Remembers which repository you are discussing across chat sessions.

Project Architecture

. ├── AI/ # Agentic logic & LangGraph │ ├── graph/ # Workflow definitions │ └── nodes/ # Specialized LLM tasks ├── ingestion/ # Data processing pipeline ├── pipeline/ # Orchestration layer ├── templates/ # Flask frontend (HTML) ├── vectorestore/ # Database & Embeddings ├── main.py # Flask Server Entry Point └── .env # Environment Secrets

The system is organized into specialized modules to ensure scalability and maintainability:

1. Ingestion Engine (/ingestion)

  • repository_loader.py: Handles cloning and local management of GitHub repos.
  • ast_parser.py: Navigates the Abstract Syntax Tree to identify code structures.
  • chunker.py: Breaks code into "semantic" chunks with rich metadata.
  • summary.py: Uses LLMs to generate high-level summaries of code modules.

2. AI & Agent Logic (/AI)

  • graph/stategraph.py: The "brain" of the app. Defines the LangGraph flow and state transitions.
  • nodes/: Individual processing units:
    • understand_question.py: Optimizes user intent.
    • retrieve_code_context.py: Queries Pinecone namespaces.
    • analyze_code.py: Reasons over retrieved snippets.
    • generate_answer.py: Produces the final natural language response.

3. Vector Storage (/vectorestore)

  • pinecone_client.py: Manages index lifecycle and serverless configurations.
  • vectordb.py: Handles embedding generation and namespace-isolated storage.

Tech Stack

  • Framework: Flask (Backend), Jinja2 (Frontend)
  • Orchestration: LangChain & LangGraph
  • LLMs: Nvidia Nemotron (via OpenRouter/NIM)
  • Embeddings: HuggingFace all-MiniLM-L6-v2
  • Vector Database: Pinecone

Getting Started

1. Prerequisites

  • Python 3.10+
  • Pinecone API Key
  • OpenAI-Compatible API Key (e.g., OpenRouter or NVIDIA)

2. Installation

# Clone this repository
git clone https://github.com/steve601/codebase-Understanding-agent.git
cd githab

# Install dependencies (using uv or pip)
pip install -r requirements.txt

3. Configuration

# Create .env file and add;
OPENAI_API_KEY=your_key_here
OPENAI_API_BASE=https://openrouter.ai  # or your preferred provider
PINECONE_API_KEY=your_key_here

4. Run the application

uv run python main.py

Future Roadmap

We are constantly working to make GitHab more powerful and accessible. Our upcoming features include:

  • Multi-Language AST Parsing: Expanding beyond Python to support TypeScript, Go, and Java using Tree-sitter for universal codebase compatibility.
  • Streaming Responses: Transitioning from batch processing to real-time token streaming (SSE) to provide an "instant-reply" chat experience.
  • Voice-to-Code Interaction: Integrated Speech-to-Text (STT) for hands-free codebase navigation and natural voice querying.
  • Incremental Ingestion: Automatically detecting file changes in a repository to update Pinecone vectors without re-parsing the entire project.
  • Local LLM Support: Optional integration with Ollama or vLLM for 100% private, on-premise code analysis.

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Author

Stephen Odhiambo
Building the future of AI-assisted software engineering.

Sample tests

App Screenshot

App Screenshot

About

Allows developers to have high-context conversations with their python codebases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors