A production-ready Retrieval-Augmented Generation (RAG) toolkit for Laravel with PostgreSQL + pgvector. Ingest documents, store embeddings, and answer questions grounded only in your own content — with source citations.
- 🧩 Document ingestion — text is chunked (configurable size/overlap), embedded, and stored.
- 🔎 pgvector vector search — cosine similarity over an HNSW index for high recall at any scale.
- 💬 RAG pipeline — retrieves the most relevant chunks and asks an LLM to answer from them.
- 📎 Source citations — every answer returns the chunks it used, with similarity scores.
- 🔌 Driver-based — swap embedding/chat providers; OpenAI built-in, plus deterministic
fakedrivers for offline dev and tests (no API key, no network). - ⚙️ Sync or queued ingestion — inline by default, or dispatch a job for large documents.
- 🌐 HTTP API — ready-to-use
POST /api/rag/ingestandPOST /api/rag/askendpoints. - ✅ Tested — ships with a Pest suite that runs against pgvector using the fake drivers.
- PHP 8.3+
- Laravel 11, 12, or 13
- PostgreSQL with the
pgvectorextension (0.5+ for HNSW)
composer require rubyat/laravel-ragPublish the config and run the migration:
php artisan vendor:publish --tag=rag-config
php artisan migrateThe migration runs
CREATE EXTENSION IF NOT EXISTS vector, which needs a database role allowed to create extensions. The pgvector Docker image (pgvector/pgvector:pg16) is the quickest way to get it locally.
Configure a provider (or keep the offline fake drivers — no network needed):
OPENAI_API_KEY=sk-...
RAG_EMBEDDING_DRIVER=openai
RAG_CHAT_DRIVER=openaiuse Rubyat\LaravelRag\Ingestion\DocumentIngestor;
use Rubyat\LaravelRag\Rag\RagPipeline;
// Ingest a document (chunk -> embed -> store in pgvector)
app(DocumentIngestor::class)->ingest('handbook.md', $longText);
// Ask a question grounded in the ingested chunks
$result = app(RagPipeline::class)->ask('How does pgvector search work?');
$result['answer']; // string — grounded in your documents
$result['citations']; // array — [{ source, chunk_index, content, score }, ...]Ingest:
curl -X POST http://localhost:8000/api/rag/ingest \
-H "Content-Type: application/json" \
-d '{"source": "handbook.md", "content": "Postgres pgvector stores embeddings and supports cosine similarity search..."}'
# => 201 { "message": "Document ingested.", "source": "handbook.md", "chunks": 1 }Ask:
curl -X POST http://localhost:8000/api/rag/ask \
-H "Content-Type: application/json" \
-d '{"question": "How does pgvector search work?", "top_k": 4}'
# => 200
# {
# "answer": "pgvector stores embeddings and supports cosine similarity search...",
# "citations": [
# { "source": "handbook.md", "chunk_index": 0, "content": "...", "score": 0.83 }
# ]
# }Published to config/rag.php. Every option is environment-driven:
| Option | Env | Default | Description |
|---|---|---|---|
embedding_driver |
RAG_EMBEDDING_DRIVER |
openai |
openai or fake |
chat_driver |
RAG_CHAT_DRIVER |
openai |
openai or fake |
dimensions |
RAG_DIMENSIONS |
1536 |
Vector size; must match the embedding model |
chunk_size |
RAG_CHUNK_SIZE |
1000 |
Chunk length in characters |
chunk_overlap |
RAG_CHUNK_OVERLAP |
200 |
Overlap between chunks |
top_k |
RAG_TOP_K |
4 |
Chunks retrieved per question |
queue_ingestion |
RAG_QUEUE_INGESTION |
false |
Dispatch ingestion to the queue (needs a worker) |
register_routes |
RAG_REGISTER_ROUTES |
true |
Auto-register the HTTP routes |
route_prefix |
RAG_ROUTE_PREFIX |
api/rag |
Prefix for the package routes |
The fake drivers are deterministic and require no network access — they power the test suite and let you try the full flow without API keys.
Embedding and chat providers are resolved from config and implement small contracts, so you can plug in your own:
use Rubyat\LaravelRag\Contracts\EmbeddingDriver;
use Rubyat\LaravelRag\Contracts\ChatDriver;Built in: OpenAiEmbeddingDriver / OpenAiChatDriver (HTTP) and FakeEmbeddingDriver / FakeChatDriver (offline, deterministic).
ingest ask
│ │
▼ ▼
DocumentChunker VectorRetriever
│ chunks │ query embedding
▼ ▼
EmbeddingDriver ─── embeddings ──► pgvector documents (HNSW, cosine)
│ top-k chunks
▼
RagPipeline
(context + LLM)
▼
answer + citations
Documents are stored one row per chunk in the documents table (source, chunk_index, content, metadata, embedding vector(n)), indexed with HNSW (vector_cosine_ops). Retrieval orders by cosine distance (embedding <=> :query) and returns a similarity score in [0, 1].
The suite runs against PostgreSQL + pgvector using the deterministic fake drivers, so no API keys or network are required:
./vendor/bin/pestThe MIT License (MIT). See LICENSE.