A Retrieval-Augmented Generation (RAG) system for querying vector embeddings and LLMs.
This project demonstrates a simple implementation of a RAG system that:
- Processes and extracts data from PDF documents
- Splits them into chunks and generates vector embeddings
- Stores these embeddings in a Chroma vector database
- Allows users to query the system in natural language
- Retrieves relevant context and generates accurate answers
- Python 3.8+
- Ollama (for embeddings and inference)
- Clone this repository:
git clone https://github.com/emarashliev/simple-rag.git
cd simple-rag- Install the required packages:
pip install -r requirements.txt- Make sure Ollama is installed and running on your system. Visit Ollama's website for installation instructions.
ollama pull mistral
ollama serveBefore querying, you need to populate the vector database with your documents:
python populate_database.py --data_path data --chunk_size 500 --chunk_overlap 50Options:
--data_path: Directory containing PDF documents (default: 'data')--chunk_size: Size of text chunks (default: 500)--chunk_overlap: Overlap between chunks (default: 50)--clear_db: Clear the existing database before adding new documents
To ask questions:
python query_data.py --query "How many players can play Monopoly?" --model mistralOptions:
--query: The question you want to ask--model: The Ollama model to use (default: mistral)--k: Number of similar documents to retrieve (default: 4)
To run the test suite:
pytest test_rag.pydata/: Contains PDF documentschroma/: Vector database storage (created during database population)populate_database.py: Script to process documents and populate the databasequery_data.py: Script to query the RAG systemget_embedding_function.py: Provides embedding functionality using Ollamatest_rag.py: Tests for validating the RAG system's responses
- Document Processing: PDFs are loaded, parsed, and split into chunks with overlaps
- Embedding Generation: Text chunks are converted to vector embeddings
- Vector Storage: Embeddings are stored in a Chroma vector database
- Retrieval: When a query is received, the system finds semantically similar chunks
- Response Generation: Retrieved context is sent to an LLM along with the query to generate a response
To add more data:
- Add PDFs to the
data/directory - Run
populate_database.pywith the--clear_dbflag if you want to rebuild the entire database
MIT