# Lab 2.1: Local RAG with Ollama - Starter Notebook

**Duration**: 60 minutes | **Difficulty**: Intermediate

## Objectives
- Build a complete RAG pipeline using local Ollama models
- Implement document loading, chunking, and embedding
- Create a vector store with Chroma
- Build a query-response pipeline with source attribution

## Prerequisites
- Ollama installed and running
- Python packages: langchain, chromadb, sentence-transformers

## Setup and Configuration

In [None]:
# Import required libraries
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import Ollama
from langchain.chains import RetrievalQA
import os

# TODO: Configure your RAG system parameters
CONFIG = {
    "corpus_path": "data/corpus",  # Path to your documents
    "chunk_size": 1000,            # Size of text chunks
    "chunk_overlap": 200,          # Overlap between chunks
    "embedding_model": "all-MiniLM-L6-v2",  # HuggingFace model
    "llm_model": "llama2",         # Ollama model
    "vector_db_path": "./chroma_db",  # Path to store vector DB
    "top_k": 5                     # Number of chunks to retrieve
}

print("✅ Configuration loaded")
print(f"Corpus path: {CONFIG['corpus_path']}")
print(f"Chunk size: {CONFIG['chunk_size']}")
print(f"Top-k: {CONFIG['top_k']}")

## Step 1: Document Loading

**Task**: Load documents from your corpus directory.

**Hints**: 
- Use `DirectoryLoader` to load all text files
- You can use glob patterns like `**/*.txt` to match files recursively

In [None]:
# TODO: Implement document loading
# loader = DirectoryLoader(...)
# documents = loader.load()

# SOLUTION TEMPLATE:
# loader = DirectoryLoader(
#     CONFIG["corpus_path"],
#     glob="**/*.txt"
# )
# documents = loader.load()

# For now, create sample documents if corpus doesn't exist
from langchain.schema import Document

# Sample documents about RAG
documents = [
    Document(page_content="""Retrieval Augmented Generation (RAG) is a technique that combines 
    information retrieval with text generation. It allows language models to access external 
    knowledge bases, reducing hallucinations and improving factual accuracy.""", 
    metadata={"source": "rag_intro.txt"}),
    
    Document(page_content="""Vector databases store embeddings and enable semantic search. 
    Popular options include Chroma, FAISS, and Elasticsearch. They use similarity metrics 
    like cosine similarity to find relevant documents.""", 
    metadata={"source": "vector_db.txt"}),
    
    Document(page_content="""Chunking strategies affect RAG performance. Common approaches include 
    fixed-size chunking, semantic chunking, and recursive splitting. Overlap between chunks 
    helps maintain context.""", 
    metadata={"source": "chunking.txt"})
]

print(f"✅ Loaded {len(documents)} documents")

## Step 2: Text Chunking

**Task**: Split documents into smaller chunks for better retrieval.

**Hints**:
- Use `RecursiveCharacterTextSplitter`
- Try different separators: `["\n\n", "\n", ". ", " "]`
- Experiment with chunk_size and chunk_overlap

In [None]:
# TODO: Implement text chunking
# splitter = RecursiveCharacterTextSplitter(...)
# chunks = splitter.split_documents(documents)

splitter = RecursiveCharacterTextSplitter(
    chunk_size=CONFIG["chunk_size"],
    chunk_overlap=CONFIG["chunk_overlap"],
    separators=["\n\n", "\n", ". ", " "],
    length_function=len
)

chunks = splitter.split_documents(documents)

print(f"✅ Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nFirst chunk preview:")
print(chunks[0].page_content[:200])

## Step 3: Create Embeddings

**Task**: Initialize the embedding model.

**Hints**:
- Use `HuggingFaceEmbeddings`
- Set `model_kwargs={'device': 'cpu'}` for CPU
- Enable `normalize_embeddings=True` for better similarity search

In [None]:
# TODO: Initialize embedding model
# embeddings = HuggingFaceEmbeddings(...)

embeddings = HuggingFaceEmbeddings(
    model_name=CONFIG["embedding_model"],
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

print("✅ Embedding model initialized")

# Test embedding
test_embedding = embeddings.embed_query("What is RAG?")
print(f"Embedding dimension: {len(test_embedding)}")

## Step 4: Create Vector Store

**Task**: Create a Chroma vector database from your chunks.

**Hints**:
- Use `Chroma.from_documents()`
- Provide documents, embeddings, and persist_directory
- Call `.persist()` to save the database

In [None]:
# TODO: Create vector store
# vectorstore = Chroma.from_documents(...)
# vectorstore.persist()

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=CONFIG["vector_db_path"]
)

vectorstore.persist()

print("✅ Vector store created and persisted")
print(f"Location: {CONFIG['vector_db_path']}")

## Step 5: Test Retrieval

**Task**: Test that retrieval works correctly.

**Hints**:
- Use `vectorstore.similarity_search(query, k=top_k)`
- Inspect the returned documents

In [None]:
# TODO: Test retrieval
# test_query = "What is RAG?"
# results = vectorstore.similarity_search(...)

test_query = "What is RAG?"
results = vectorstore.similarity_search(test_query, k=CONFIG["top_k"])

print(f"Query: {test_query}")
print(f"\nFound {len(results)} relevant chunks:")
for i, doc in enumerate(results):
    print(f"\n[{i+1}] Source: {doc.metadata.get('source', 'unknown')}")
    print(doc.page_content[:150] + "...")

## Step 6: Initialize LLM

**Task**: Initialize the Ollama LLM for generation.

**Hints**:
- Use `Ollama(model=...)`
- Make sure Ollama is running (`ollama serve`)
- Make sure the model is pulled (`ollama pull llama2`)

In [None]:
# TODO: Initialize LLM
# llm = Ollama(...)

llm = Ollama(model=CONFIG["llm_model"])

print("✅ LLM initialized")

# Test LLM
test_response = llm.invoke("Say hello!")
print(f"\nTest response: {test_response[:100]}")

## Step 7: Build RAG Chain

**Task**: Connect the retriever and LLM into a complete RAG pipeline.

**Hints**:
- Use `RetrievalQA.from_chain_type()`
- Set `chain_type="stuff"` to stuff all retrieved docs into the prompt
- Set `return_source_documents=True` for citations

In [None]:
# TODO: Build RAG chain
# qa_chain = RetrievalQA.from_chain_type(...)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": CONFIG["top_k"]}),
    return_source_documents=True
)

print("✅ RAG chain created")

## Step 8: Query the RAG System

**Task**: Ask questions and get answers with citations.

In [None]:
def ask_question(question: str):
    """Ask a question and display answer with sources."""
    result = qa_chain({"query": question})
    
    print(f"Question: {question}")
    print(f"\nAnswer: {result['result']}")
    print(f"\nSources:")
    for doc in result['source_documents']:
        print(f"  - {doc.metadata.get('source', 'unknown')}")
    print("\n" + "="*80 + "\n")

# Test queries
ask_question("What is RAG?")
ask_question("What are vector databases?")
ask_question("What is chunking?")

## Experiments

Try modifying parameters and see how results change:

In [None]:
# TODO: Experiment with different parameters
# - Try different chunk sizes
# - Try different top_k values
# - Try different embedding models
# - Add your own documents to the corpus

## Key Takeaways

- ✅ Built a complete local RAG pipeline
- ✅ Implemented document loading and chunking
- ✅ Created embeddings and vector store
- ✅ Connected retrieval and generation
- ✅ Added source attribution

**Next**: Lab 2.2 - Build the same system with watsonx.ai for enterprise deployment!