A Retrieval-Augmented Generation (RAG) system that provides intelligent answers to questions about GitHub documentation using local LLMs and vector embeddings.
This project creates a question-answering system that:
- Fetches and processes GitHub's official documentation
- Generates embeddings for document chunks using local Ollama models
- Stores embeddings in a PostgreSQL vector database
- Provides contextually relevant answers to GitHub-related questions
The system consists of several key components:
src/llama.py: Interface for Ollama LLM operations (text generation and embeddings)src/postgres_db.py: PostgreSQL vector database operations using pgvectorload.ipynb: Data loading pipeline that fetches and processes GitHub docsmain.ipynb: Query interface for asking questions and getting answers
The project uses Docker Compose with the following services:
- Ollama: Local LLM server for text generation and embeddings
- PostgreSQL with pgvector: Vector database for similarity search
This project is designed to run in a VS Code Dev Container with all dependencies and services pre-configured.
- Docker and Docker Compose
- VS Code with the Dev Containers extension
- Git
-
Clone the repository
git clone <repository-url> cd Dev-Container-Compose
-
Open in VS Code
code . -
Reopen in Container
- VS Code will prompt to "Reopen in Container"
- Or use Command Palette:
Dev Containers: Reopen in Container
The dev container will automatically:
- Start all required services (Ollama, PostgreSQL with pgvector)
- Install Python dependencies
- Configure the development environment
- Open
main.ipynbfor immediate use
If you prefer to run without dev containers:
-
Clone and navigate to the repository
git clone <repository-url> cd Dev-Container-Compose
-
Start the services
docker-compose -f .devcontainer/docker-compose.yml up -d ollama db
-
Install Python dependencies
pip install -r requirements.txt
Note: You'll need to adjust the host configurations in the notebooks to connect to the services.
Run the load.ipynb notebook to:
- Clone the GitHub docs repository
- Fetch documentation content via GitHub's API
- Process and chunk documents
- Generate embeddings and store in vector database
Key configuration:
- Embedding Model:
granite-embedding:30m(384 dimensions) - Chunk Size: 5000 characters with 100 character overlap
Use the main.ipynb notebook to:
- Submit questions about GitHub features
- Retrieve relevant documentation chunks
- Generate contextual answers using LLM
Example query:
prompt = "How much do github actions cost for linux?"OLLAMA_HOST: Ollama server URL (default:http://ollama:11434)POSTGRES_HOST: PostgreSQL server host (default:vector-postgres)
- Text Generation:
llama3.2:1b - Embeddings:
granite-embedding:30m
Find more models at Ollama Library
.
├── README.md
├── requirements.txt
├── main.ipynb # Query interface
├── load.ipynb # Data loading pipeline
├── AVAILABLE_MODELS.md # Model documentation
├── src/
│ ├── llama.py # Ollama LLM interface
│ └── postgres_db.py # Vector database operations
└── data/ # GitHub documentation storage
- GitHub Docs Cloning: Automatically clones the latest GitHub docs repository
- API Content Fetching: Retrieves rendered documentation via GitHub's API
- Document Processing: Splits documents into manageable chunks for embedding
- Batch Processing: Efficiently processes large document collections
- Semantic Search: Finds relevant documentation using vector similarity
- Contextual Answers: Generates responses using retrieved context
- Flexible Models: Easy switching between different Ollama models
- pgvector Integration: Uses PostgreSQL with vector extensions
- Similarity Search: Cosine similarity search for document retrieval
- Scalable Storage: Handles large document collections efficiently
- Clone GitHub docs repository
- Extract markdown files
- Fetch rendered content via API
- Split into 5000-character chunks
- Generate 384-dimensional embeddings
- Store in PostgreSQL with pgvector
- Generate embedding for user question
- Search for top-k similar document chunks
- Combine relevant context
- Generate response using LLM with context
- Rate Limiting: API requests are throttled to avoid GitHub rate limits
- Batch Processing: Documents are processed in batches with progress tracking
- Memory Efficiency: Large documents are chunked to fit model context windows
- Ollama Connection: Ensure Ollama service is running and accessible
- Model Downloads: First-time model usage requires downloading (may take time)
- PostgreSQL: Verify pgvector extension is installed and enabled
- API Rate Limits: GitHub API has rate limits; increase delays if needed