# Local RAG Workflow (Overview)

This notebook demonstrates a minimal local RAG pipeline using:
- Ollama for embeddings & chat LLMs
- Chroma as a local vector store
- PyMuPDF for PDF ingestion

Sections:
1) Setup: install dependencies and verify connectivity
2) Ingestion: load PDFs, split into chunks, and index into Chroma
3) Retrieval & QA: generate multi-query retriever and ask the model

Run cells sequentially. If you run into errors, check that `ollama serve` is running locally.

In [1]:
pip install ollama

You should consider upgrading via the '/Users/mohitxsh/Dev/rag/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install chromadb

You should consider upgrading via the '/Users/mohitxsh/Dev/rag/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Connectivity & Imports
This section imports the libraries used in the notebook and performs a quick connectivity test to ensure `ollama` is reachable.

In [3]:
# Imports for connectivity check
import ollama
import chromadb

# Test if Ollama is reachable. Ensure you have run `ollama serve` locally before executing.
try:
    response = ollama.chat(model='llama3', messages=[
        {'role': 'user', 'content': 'Hello, are you there?'}
    ])
    print("Ollama Connection Success:", response['message']['content'])
except Exception as e:
    print("Error connecting to Ollama. Make sure 'ollama serve' is running in your terminal.")
    print(e)

Ollama Connection Success: Hello! Yes, I'm here. I'm an AI assistant trained to help answer your questions and provide information on a wide range of topics. What's on your mind today? Do you have something specific you'd like to chat about or ask me? I'm all ears (or rather, all text)!


In [4]:
pip install pymupdf

You should consider upgrading via the '/Users/mohitxsh/Dev/rag/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [5]:
pip install langchain-text-splitters

You should consider upgrading via the '/Users/mohitxsh/Dev/rag/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install -U langchain-ollama langchain-chroma langchain_community

You should consider upgrading via the '/Users/mohitxsh/Dev/rag/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Part 1: Ingestion â€” Load & Index Data

This section defines `load_and_index_data(folder_path)`, which:
- Loads PDF files from a folder using `PyMuPDFLoader`.
- Splits documents into chunks for embeddings.
- Creates/persists a Chroma vector DB using `OllamaEmbeddings`.

Run this cell to (re)build the vector store before querying.

In [7]:
import os
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_chroma import Chroma

from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

# Constants
CHROMA_PATH = "./chroma_db"
COLLECTION_NAME = "local-rag"
EMBEDDING_MODEL = "llama3" 
LLM_MODEL = "llama3"

# --- PART 1: INGESTION (Loading & Embedding) ---

def load_and_index_data(folder_path):
    # 1. Check if DB exists to save time
    if os.path.exists(CHROMA_PATH) and os.listdir(CHROMA_PATH):
        print("Vector DB already exists. Loading...")
        return Chroma(
            persist_directory=CHROMA_PATH, 
            embedding_function=OllamaEmbeddings(model=EMBEDDING_MODEL), 
            collection_name=COLLECTION_NAME
        )

    print("Creating new Vector DB...")
    
    # 2. Load PDFs
    if not os.path.exists(folder_path):
        print(f"Folder {folder_path} not found.")
        return None
        
    documents = []
    for filename in os.listdir(folder_path):
        if filename.endswith('.pdf'):
            file_path = os.path.join(folder_path, filename)
            loader = PyMuPDFLoader(file_path)
            documents.extend(loader.load())

    # 3. Split Text
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = text_splitter.split_documents(documents)
    
    # 4. Create DB
    # Using the new OllamaEmbeddings from langchain_ollama
    embedding = OllamaEmbeddings(model=EMBEDDING_MODEL)
    
    # Using the new Chroma from langchain_chroma
    vector_db = Chroma.from_documents(
        documents=chunks,
        embedding=embedding,
        collection_name=COLLECTION_NAME,
        persist_directory=CHROMA_PATH
    )
    
    return vector_db

# Initialize the DB
vector_db = load_and_index_data("./data")
print("Database ready.")



Vector DB already exists. Loading...
Database ready.


## Part 2: Retrieval & QA
The following cells build the retriever and the QA chain. The `ask_rag` function generates multiple query variants, retrieves relevant chunks, and asks the LLM to answer using only the provided context.

In [8]:
def ask_rag(question, vector_db):
    if not question or not vector_db:
        return None
        
    llm = ChatOllama(model=LLM_MODEL)

    # Multi-Query Prompt
    QUERY_PROMPT = PromptTemplate(
        input_variables=["question"],
        template="""You are an AI language model assistant. Your task is to generate five
        different versions of the given user question to retrieve relevant documents from
        a vector database. By generating multiple perspectives on the user question, your
        goal is to help the user overcome some of the limitations of the distance-based
        similarity search. Provide these alternative questions separated by newlines.
        Original question: {question}""",
    )

    # Retriever
    retriever = MultiQueryRetriever.from_llm(
        vector_db.as_retriever(), 
        llm,
        prompt=QUERY_PROMPT
    )

    # Answer Prompt
    template = """Answer the question based ONLY on the following context:
    {context}
    Question: {question}
    """
    answer_prompt = ChatPromptTemplate.from_template(template)

    # Chain
    chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | answer_prompt
        | llm
        | StrOutputParser()
    )

    print(f"Thinking... (Generating variations and searching)")
    response = chain.invoke(question)
    
    return response

In [9]:
vector_db = load_and_index_data("./data")
if vector_db:
    # Use the defined `ask_rag` function to query the DB
    answer = ask_rag("What should I do to avoid hearing damage while using my mac?", vector_db)
    print("\n--- Answer ---\n")
    print(answer)

Vector DB already exists. Loading...
Thinking... (Generating variations and searching)

--- Answer ---

Based on the provided context, the key points are:

1. Software License Agreement:
	* Using MacBook Air constitutes acceptance of Apple and third-party software license terms.
2. Apple One-Year Limited Warranty Summary:
	* Apple warrants the included hardware product against defects in materials and workmanship for one year from the date of original retail purchase.
3. Safety and Handling:
	* Avoid hearing damage by not listening at high volume levels for long periods.
	* MacBook Air contains magnets that may interfere with medical devices.
4. Prolonged Heat Exposure:
	* Keep the device on a hard, stable, and well-ventilated work surface when in use or charging to avoid discomfort or injury.
5. Regulatory Information:
	* The device complies with part 15 of the FCC Rules and ISED Canada licence-exempt RSS standard(s).
6. ENERGY STAR Compliance:
	* Standard configurations of this produ