Welcome to the Python RAG (Retrieval-Augmented Generation) Tutorial! This project is designed as a step-by-step learning journey to help you understand, build, and optimize RAG applications running completely locally.
By following the scripts in order, you will learn how to process documents, generate embeddings, store them in a vector database, and connect them to a local LLM using Ollama.
RAG_Python_Tutorial/
├── data/ # Contains sample datasets used in the tutorial
│ ├── knowledge_base/ # Text files covering generic Python knowledge
│ ├── pdfs/ # Folder for testing PDF processing
│ ├── python_info.txt # Single document sample
│ └── sample_doc.txt # General sample text
├── genai_env/ # Python virtual environment (dependencies)
└── scripts/ # The core tutorial scripts (numbered sequentially)
(Note: As you run the scripts, various chroma_db folders will be generated to store your local vector embeddings).
To build a solid RAG system, it is crucial to understand the foundational lifecycle. This tutorial implementation structure maps sequentially through these core concepts:
- Documents and ingestion → Gathering raw data sources (text files, PDFs, etc.).
- Text extraction and cleaning → Parsing the documents to remove noise and extract usable textual data.
- Chunking → Breaking the clean text into smaller intelligently grouped pieces.
- Embeddings → Converting chunked text into vectors to allow calculation of semantic similarity.
- Vector storage → Storing and searching these embeddings efficiently using a database.
- Retrieval → Finding the most relevant context vectors for user queries based on semantic representation.
- Prompting + answer generation → Feeding the retrieved context to the LLM alongside the user prompt to synthesize an accurate answer.
- Evaluation → Analyzing the quality, relevance, and accuracy of generated outputs.
- Serving/UI + deployment → Wrapping your operational RAG engine into user-friendly platforms (e.g., Gradio web UI).
The tutorial is broken down into sequentially numbered Python scripts located in the scripts/ folder. It is highly recommended to study and run them in order.
01_embeddings_basics.py: Introduction to creating embeddings. Learn how text is converted into numerical vectors so machines can understand semantic meaning.02_document_processing.py: Learn how to load, clean, and split (chunk) large text documents into smaller, manageable pieces suitable for vectorization.03_rag_ollama_basic.py: Your first end-to-end RAG pipeline! Connect document embeddings, a ChromaDB vector store, and a local Ollama LLM to answer questions based on your data.
04_retrieval_strategies.py: Explores advanced retrieval techniques (e.g., semantic search vs. keyword search, similarity thresholds) to ensure the LLM gets the most relevant context.05_multi_document_rag.py: Scales up the basic pipeline to ingest and query across multiple text documents within thedata/knowledge_base/directory.05b_rag_ollama_pdf.py: Extends the data ingestion pipeline to handle PDFs instead of just plain text files.
06_ollama_chatbot_local.py: Upgrades your RAG system into a continuous, interactive terminal chatbot.06b_add_conversation_memory.py: Adds conversation history (memory) to the chatbot so it can remember previous questions and answers in your chat session.06c_add_calculate_tokens.py: Introduces token counting and management, crucial for ensuring your prompts and conversational memory don't exceed the LLM's context window limits.
07_rag_chatbot_ui.py: Moves the chatbot out of the terminal and into a web-based User Interface (using tools like Gradio) for a more user-friendly experience.08_optimization_techniques.py: Covers advanced optimization strategies to improve the speed, accuracy, and reliability of your RAG outputs.09_custom_RAG.py: A fully customized, advanced RAG implementation incorporating everything you've learned into a robust, object-oriented pipeline.
To get the most value out of this tutorial, follow this iterative approach:
- Read, Run, Check & Edit: Start by reading the script to understand its logic. Then, run the script, review the answers it generates, and critically edit or customize it to test new behaviors.
- Capstone Challenge: Once you finish all scripts, challenge yourself by creating a "final project": a completely customized RAG-based local chatbot centered around a dataset (PDFs or Text) of your own choosing!
To run this project on your own machine:
- Install Ollama: Download and install Ollama. Pull your preferred local model (e.g.,
ollama run llama3orollama run mistral). - Set up Virtual Environment:
# Create a virtual environment (if not already created) python -m venv genai_env # Activate it (On Windows) .\genai_env\Scripts\activate # Or on macOS/Linux source genai_env/bin/activate
- Install Requirements:
pip install -r requirements.txt
- Run the Scripts: Navigate to the
scriptsdirectory and run them one by one to see how the ecosystem works.cd scripts python 01_embeddings_basics.py
- Ollama Models: The specific LLMs used in this tutorial (like
llama3,mistral, etc.) might be updated or replaced over time. Ensure you have pulled the model required by the active script usingollama pull <model_name>. - Gradio UI: The
gradiopackage used for the web interface in script07may undergo API changes in newer versions. If you encounter errors launching the UI, check your package version against the Gradio documentation.
Happy coding and enjoy building your own local AI applications!