# Github

In [None]:
!git clone https://github.com/shainakumar/cs6501workshop.git
%cd cs6501workshop

Cloning into 'cs6501workshop'...
remote: Enumerating objects: 239, done.[K
remote: Counting objects: 100% (239/239), done.[K
remote: Compressing objects: 100% (175/175), done.[K
remote: Total 239 (delta 119), reused 170 (delta 60), pack-reused 0 (from 0)[K
Receiving objects: 100% (239/239), 2.44 MiB | 18.76 MiB/s, done.
Resolving deltas: 100% (119/119), done.
/content/cs6501workshop


# Manual RAG Pipeline: Mechanisms First

This notebook builds a Retrieval-Augmented Generation (RAG) pipeline from scratch.
You'll see every step explicitly before we move to frameworks like LangChain.

**Works on:** Google Colab, Local Jupyter (Mac/Windows/Linux)

**Pipeline Overview:**
```
Documents ‚Üí Chunking ‚Üí Embedding ‚Üí Index (FAISS)
                                        ‚Üì
User Query ‚Üí Embed Query ‚Üí Similarity Search ‚Üí Top-K Chunks
                                                    ‚Üì
                                        Prompt Assembly ‚Üí LLM ‚Üí Answer
```

## TODO ‚Äî Topic 5 RAG Course Project Checklist

- **Exercise 0:** Set-up ‚Äî Get notebook running; unzip Corpora.zip. Use PDFs from `Corpora/<corpus>/pdf_embedded/`.
- **Exercise 1:** Open model RAG vs no RAG ‚Äî Compare Qwen 2.5 1.5B with/without RAG on Model T manual and Congressional Record.
- **Exercise 2:** Open model + RAG vs large model ‚Äî Run GPT-4o Mini with no tools on same queries.
- **Exercise 3:** Open model + RAG vs frontier chat ‚Äî Compare local Qwen+RAG vs GPT-4/Claude (web).
- **Exercise 4:** Effect of top-K ‚Äî Test k = 1, 3, 5, 10, 20.
- **Exercise 5:** Unanswerable questions ‚Äî Off-topic, related-but-missing, false premise.
- **Exercise 6:** Query phrasing sensitivity ‚Äî Same question in 5+ phrasings.
- **Exercise 7:** Chunk overlap ‚Äî Re-chunk with overlap 0, 64, 128, 256.
- **Exercise 8:** Chunk size ‚Äî Chunk at 128, 256, 512, 1024, 2048.
- **Exercise 9:** Retrieval score analysis ‚Äî 10 queries, top-10 chunks, score distribution.
- **Exercise 10:** Prompt template variations ‚Äî Minimal, strict grounding, citation, permissive, structured.
- **Exercise 11:** Failure mode catalog ‚Äî Computation, temporal, comparison, ambiguous, multi-hop, etc.
- **Exercise 12:** Cross-document synthesis ‚Äî Questions needing multiple chunks.

## Setup

First, let's install the required packages and detect our compute environment.

In [None]:
# Install dependencies
# On Colab, these install quickly. Locally, you may already have them.
# Use a kernel-aware install when available; fall back to subprocess otherwise.
try:
    ip = get_ipython()
    ip.run_line_magic('pip', 'install -q torch transformers sentence-transformers faiss-cpu pymupdf accelerate ipyfilechooser')
except NameError:
    import subprocess, sys
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', 'torch', 'transformers', 'sentence-transformers', 'faiss-cpu', 'pymupdf', 'accelerate', 'ipyfilechooser'])
# For Exercise 2 (GPT-4o Mini): add 'openai' to the list above if needed


[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.8/23.8 MB[0m [31m72.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m24.9/24.9 MB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.6/1.6 MB[0m [31m61.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# =============================================================================
# ENVIRONMENT AND DEVICE DETECTION
# =============================================================================
import os
import sys

# Enable MPS fallback for any PyTorch operations not yet implemented on Metal
# This MUST be set before importing torch
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

# Prevent kernel crash from duplicate OpenMP libraries (PyTorch + FAISS conflict on macOS)
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

import torch
from typing import Tuple

def detect_environment() -> str:
    """Detect if we're running on Colab or locally."""
    try:
        import google.colab
        return 'colab'
    except ImportError:
        return 'local'

def get_device() -> Tuple[str, torch.dtype]:
    """
    Detect the best available compute device.

    Priority: CUDA > MPS (Apple Silicon) > CPU

    Returns:
        Tuple of (device_string, recommended_dtype)

    Notes:
        - CUDA: Use float16 for memory efficiency (Tensor Cores optimize this)
        - MPS: Use float32 - Apple Silicon doesn't have the same float16
               optimizations as NVIDIA, and float32 is often faster
        - CPU: Use float32 (float16 not well supported on CPU)
    """
    if torch.cuda.is_available():
        device = 'cuda'
        dtype = torch.float16
        device_name = torch.cuda.get_device_name(0)
        memory_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
        print(f"‚úì Using CUDA GPU: {device_name} ({memory_gb:.1f} GB)")

    elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
        device = 'mps'
        dtype = torch.float32  # float32 is often faster on Apple Silicon!
        print("‚úì Using Apple Silicon GPU (MPS)")
        print("  Note: Using float32 (faster than float16 on Apple Silicon)")

    else:
        device = 'cpu'
        dtype = torch.float32
        print("‚ö† Using CPU (no GPU detected)")
        print("  Tip: For faster processing, use a machine with a GPU")

    return device, dtype

# Detect environment and device
ENVIRONMENT = detect_environment()
DEVICE, DTYPE = get_device()

print(f"\nEnvironment: {ENVIRONMENT.upper()}")
print(f"Device: {DEVICE}, Dtype: {DTYPE}")

‚úì Using CUDA GPU: Tesla T4 (15.6 GB)

Environment: COLAB
Device: cuda, Dtype: torch.float16


## Load Your Documents

**Cell 1:** Configure your document source and select/upload files
- **Local Jupyter**: Use the folder picker, then run Cell 2
- **Colab + Upload**: Files upload immediately (blocking), then run Cell 2
- **Colab + Drive**: Set `USE_GOOGLE_DRIVE = True`, mounts Drive and shows picker, then run Cell 2

**Cell 2:** Confirms selection and lists documents

In [None]:
# =============================================================================
# CELL 1: SELECT DOCUMENT SOURCE
# =============================================================================
# This cell either:
#   - Shows a folder picker (Local or Colab+Drive) - NON-BLOCKING
#   - Shows an upload dialog (Colab+Upload) - BLOCKING
#
# If a folder picker is shown, SELECT YOUR FOLDER BEFORE running Cell 2.
# The picker widget is non-blocking, so the code continues before you select.
# =============================================================================

from pathlib import Path

# ------------- COLAB USERS: CONFIGURE HERE -------------
USE_GOOGLE_DRIVE = True  # Set to True to use Google Drive instead of uploading
# -------------------------------------------------------

# Default folder: use Corpora from course project (unzip Corpora.zip first).
_folder_default = Path("Corpora/ModelTService")
DOC_FOLDER = str(_folder_default) if _folder_default.exists() else "documents"
folder_chooser = None  # Will hold the picker widget if used

if ENVIRONMENT == 'colab':
    if USE_GOOGLE_DRIVE:
        # ----- COLAB + GOOGLE DRIVE -----
        # Mount Drive first, then show folder picker
        from google.colab import drive
        print("Mounting Google Drive...")
        drive.mount('/content/drive')
        print("‚úì Google Drive mounted\n")

        # Now show folder picker for the Drive
        try:
            from ipyfilechooser import FileChooser

            folder_chooser = FileChooser(
                path='/content/drive/MyDrive',
                title='Select your documents folder in Google Drive',
                show_only_dirs=True,
                select_default=True
            )
            print("üìÅ Select your documents folder below, then run Cell 2:")
            print("   (The picker is non-blocking - select BEFORE running the next cell)")
            display(folder_chooser)

        except ImportError:
            # Fallback: manual path entry
            print("Folder picker not available.")
            print("Edit DOC_FOLDER below with your Google Drive path, then run Cell 2:")
            DOC_FOLDER = '/content/drive/MyDrive/your_documents_folder'  # ‚Üê Edit this!
            print(f"  DOC_FOLDER = '{DOC_FOLDER}'")
    else:
        # ----- COLAB + UPLOAD -----
        # Upload dialog blocks until complete, so DOC_FOLDER is ready when done
        from google.colab import files
        os.makedirs(DOC_FOLDER, exist_ok=True)

        print("Upload your documents (PDF, TXT, or MD):")
        print("(This dialog blocks until upload is complete)\n")
        uploaded = files.upload()

        for filename in uploaded.keys():
            os.rename(filename, f'{DOC_FOLDER}/{filename}')
            print(f"  ‚úì Saved: {DOC_FOLDER}/{filename}")

        print(f"\n‚úì Upload complete. Run Cell 2 to continue.")

else:
    # ----- LOCAL JUPYTER -----
    # Show folder picker
    print("Running locally\n")

    try:
        from ipyfilechooser import FileChooser

        folder_chooser = FileChooser(
            path=str(Path.home()),
            title='Select your documents folder',
            show_only_dirs=True,
            select_default=True
        )
        print("üìÅ Select your documents folder below, then run Cell 2:")
        print("   (The picker is non-blocking - select BEFORE running the next cell)")
        display(folder_chooser)

    except ImportError:
        # Fallback: manual path entry
        print("Folder picker not available (ipyfilechooser not installed).")
        print(f"\nUsing default folder: {Path(DOC_FOLDER).absolute()}")
        print("\nTo use a different folder, edit DOC_FOLDER in this cell:")
        print("  DOC_FOLDER = '/path/to/your/documents'")
        os.makedirs(DOC_FOLDER, exist_ok=True)

Mounting Google Drive...
Mounted at /content/drive
‚úì Google Drive mounted

üìÅ Select your documents folder below, then run Cell 2:
   (The picker is non-blocking - select BEFORE running the next cell)


FileChooser(path='/content/drive/MyDrive', filename='', title='Select your documents folder in Google Drive', ‚Ä¶

In [None]:
# =============================================================================
# CELL 2: CONFIRM SELECTION AND LIST DOCUMENTS
# =============================================================================
# If you used a folder picker above, make sure you selected a folder
# BEFORE running this cell. The picker is non-blocking.
# =============================================================================

# Read selection from folder picker (if one was used)
if folder_chooser is not None and folder_chooser.selected_path:
    DOC_FOLDER = folder_chooser.selected_path
    print(f"‚úì Using selected folder: {DOC_FOLDER}")
elif folder_chooser is not None:
    print("‚ö† No folder selected in picker!")
    print("  Please go back to Cell 1, select a folder, then run this cell again.")
else:
    # No picker used (upload or manual path)
    print(f"‚úì Using folder: {DOC_FOLDER}")

# Confirm folder (listing skipped for speed)
doc_path = Path(DOC_FOLDER)
if doc_path.exists():
    print(f"‚úì Folder set: {doc_path.absolute()}")
    print("  Run the next cells to load, chunk, and index documents.")
else:
    print(f"‚ö† Folder not found: {DOC_FOLDER}")
    print("  Please set DOC_FOLDER in the previous cell and run it again.")

‚úì Using selected folder: /content/drive/MyDrive/Corpora 3/NewModelT
‚úì Folder set: /content/drive/MyDrive/Corpora 3/NewModelT
  Run the next cells to load, chunk, and index documents.


---
## Stage 1: Document Loading

We need to extract text from our documents. For PDFs with embedded text,
PyMuPDF (fitz) reads the text layer directly - no OCR needed.

**Corpora:** Use PDFs from `Corpora/<name>/pdf_embedded/`. The `.txt` files in `txt/` are for checking retrieval vs OCR issues.

In [None]:
# Exercise 1 (and reuse): Official query lists. Reference: CR Jan 13, 20, 21, 23, 2026.
QUERIES_MODEL_T = [
    "How do I adjust the carburetor on a Model T?",
    "What is the correct spark plug gap for a Model T Ford?",
    "How do I fix a slipping transmission band?",
    "What oil should I use in a Model T engine?",
]
QUERIES_CR = [
    "What did Mr. Flood have to say about Mayor David Black in Congress on January 13, 2026?",
    "What mistake did Elise Stefanik make in Congress on January 23, 2026?",
    "What is the purpose of the Main Street Parity Act?",
    "Who in Congress has spoken for and against funding of pregnancy centers?",
]

In [None]:
import fitz  # PyMuPDF
from typing import List, Tuple

def load_text_file(filepath: str) -> str:
    """Load a plain text file."""
    with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
        return f.read()


def load_pdf_file(filepath: str) -> str:
    """
    Extract text from a PDF with embedded text.

    PyMuPDF reads the text layer directly.
    For scanned PDFs without embedded text, you'd need OCR.
    """
    doc = fitz.open(filepath)
    text_parts = []

    for page_num, page in enumerate(doc):
        text = page.get_text()
        if text.strip():
            # Add page marker for debugging/citation
            text_parts.append(f"\n[Page {page_num + 1}]\n{text}")

    doc.close()
    return "\n".join(text_parts)


def load_documents(doc_folder: str) -> List[Tuple[str, str]]:
    """Load all documents from a folder. Returns list of (filename, content)."""
    documents = []
    folder = Path(doc_folder)

    for filepath in folder.rglob("*"):
        try:
            if not filepath.is_file():
                continue
        except OSError:
            continue
        if filepath.suffix.lower() not in ('.pdf', '.txt', '.md', '.text'):
            continue
        try:
            if filepath.suffix.lower() == '.pdf':
                content = load_pdf_file(str(filepath))
            elif filepath.suffix.lower() in ['.txt', '.md', '.text']:
                content = load_text_file(str(filepath))
            else:
                continue

            if content.strip():
                documents.append((filepath.name, content))
                print(f"‚úì Loaded: {filepath.name} ({len(content):,} chars)")
        except Exception as e:
            print(f"‚úó Error loading {filepath}: {e}")

    return documents

In [None]:
# Load your documents
documents = load_documents(DOC_FOLDER)
print(f"\nLoaded {len(documents)} documents")

if len(documents) == 0:
    print("\n‚ö† No documents loaded! Please add PDF or TXT files to the documents folder.")

‚úì Loaded: ModelTNew.pdf (469,891 chars)
‚úì Loaded: ModelTNew.txt (545,492 chars)

Loaded 2 documents


In [None]:
# Inspect a document to verify loading worked
if documents:
    filename, content = documents[1]
    print(f"First document: {filename}")
    print(f"Total length: {len(content):,} characters")
    print(f"\nFirst 1000 characters:\n{'-'*40}")
    print(content[:1000])

First document: ModelTNew.txt
Total length: 545,492 characters

First 1000 characters:
----------------------------------------
SERVI

 Detailed Instructions for
  Servicing Ford Gars




    PRICE $250



         Published by




 DETROIT, MICHIGAN, U. S. A.
                                         Contents

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    111
Essentials of good service. . . . : . . . . . : . . . . . . . . . . . . . . . . . . . . . . .               ix
Ideal shop layout for average size dealer. . . . . . . . . . . . . . . . . . . . .                           x
Essential shop equipment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                  xi
The parts department. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
                                                                                                            ...
                                   

---
## Stage 2: Chunking

Documents need to be split into pieces small enough to be relevant but large enough to carry meaning.

**Why overlap?** If a key sentence sits right at a chunk boundary, splitting without overlap might cut it in half. Overlap ensures that information near boundaries appears intact in at least one chunk.

**Experiment:** Try different chunk sizes (256, 512, 1024) and see how it affects retrieval!

In [None]:
from dataclasses import dataclass

@dataclass
class Chunk:
    """A chunk of text with metadata for tracing back to source."""
    text: str
    source_file: str
    chunk_index: int
    start_char: int
    end_char: int


def chunk_text(
    text: str,
    source_file: str,
    chunk_size: int = 512,
    chunk_overlap: int = 128
) -> List[Chunk]:
    """
    Split text into overlapping chunks.

    We try to break at sentence or paragraph boundaries
    to avoid cutting mid-thought.
    """
    chunks = []
    start = 0
    chunk_index = 0

    while start < len(text):
        end = start + chunk_size

        # Try to break at a good boundary
        if end < len(text):
            # Look for paragraph break first
            para_break = text.rfind('\n\n', start + chunk_size // 2, end)
            if para_break != -1:
                end = para_break + 2
            else:
                # Look for sentence break
                sentence_break = text.rfind('. ', start + chunk_size // 2, end)
                if sentence_break != -1:
                    end = sentence_break + 2

        chunk_text_str = text[start:end].strip()

        if chunk_text_str:
            chunks.append(Chunk(
                text=chunk_text_str,
                source_file=source_file,
                chunk_index=chunk_index,
                start_char=start,
                end_char=end
            ))
            chunk_index += 1

        # Move forward, accounting for overlap
        start = end - chunk_overlap
        if chunks and start <= chunks[-1].start_char:
            start = end  # Safety: ensure progress

    return chunks

In [None]:
# ============================================
# EXPERIMENT: Try different chunk sizes!
# ============================================
CHUNK_SIZE = 512      # Try: 256, 512, 1024
CHUNK_OVERLAP = 128   # Try: 64, 128, 256
# For Ex 7/8 use rebuild_pipeline() ‚Äî see cell after FAISS index.

# Chunk all documents
all_chunks = []
for filename, content in documents:
    doc_chunks = chunk_text(content, filename, CHUNK_SIZE, CHUNK_OVERLAP)
    all_chunks.extend(doc_chunks)
    print(f"{filename}: {len(doc_chunks)} chunks")

print(f"\nTotal: {len(all_chunks)} chunks")

ModelTNew.pdf: 1496 chunks
ModelTNew.txt: 1781 chunks

Total: 3277 chunks


In [None]:
# Inspect some chunks
if all_chunks:
    print("Sample chunks:")
    indices_to_show = [0, len(all_chunks)//2, -1] if len(all_chunks) > 2 else range(len(all_chunks))
    for i in indices_to_show:
        chunk = all_chunks[i]
        print(f"\n{'='*60}")
        print(f"Chunk {chunk.chunk_index} from {chunk.source_file}")
        print(f"{'='*60}")
        print(chunk.text[:300] + "..." if len(chunk.text) > 300 else chunk.text)

Sample chunks:

Chunk 0 from ModelTNew.pdf
[Page 2]
S E R V I  
Detailed Instructions for 
Servicing Ford Gars 
PRICE $250 
Published by 
DETROIT, MICHIGAN, U. S. A. 


[Page 3]
Contents 
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
111 
. . .  
Essentials of good service. 
: . . ...

Chunk 142 from ModelTNew.txt
g off the four dash
     bracket to frame bolt nuts "C" and withdrawing bolts.
50      Remove radiator studs, nuts and springs. The two radiator
     studs, washer nuts and springs "D" are removed by withdrawing
     cotter pins and unscrewing studs from washer nuts.
51      Remove front fender iron...

Chunk 1780 from ModelTNew.txt
trouble chart (Page 233 )




                           Note : Numbers refer to paragraphs.


---
## Stage 3: Embedding

Embeddings map text to dense vectors where **semantic similarity = geometric proximity**.

A sentence about "cardiac arrest" and one about "heart attack" will have similar embeddings even though they share no words.

**Note:** sentence-transformers does NOT auto-detect Apple MPS - we must pass the device explicitly.

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np

# Load embedding model
# Options:
# - "sentence-transformers/all-MiniLM-L6-v2": Fast, small (80MB), good quality
# - "BAAI/bge-small-en-v1.5": Better for retrieval, similar size

EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

print(f"Loading embedding model: {EMBEDDING_MODEL}")
print(f"Device: {DEVICE}")

# Must explicitly pass device for MPS support!
embed_model = SentenceTransformer(EMBEDDING_MODEL, device=DEVICE)
EMBEDDING_DIM = embed_model.get_sentence_embedding_dimension()
print(f"Embedding dimension: {EMBEDDING_DIM}")

Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding dimension: 384


In [None]:
# DEMO: See how embeddings capture semantic similarity
test_sentences = [
    "The engine needs regular oil changes.",
    "Motor oil should be replaced periodically.",
    "The Senate convened at noon.",
    "Congress began its session at midday."
]

test_embeddings = embed_model.encode(test_sentences)

# Compute cosine similarity matrix
from numpy.linalg import norm

def cosine_sim(a, b):
    return np.dot(a, b) / (norm(a) * norm(b))

print("Cosine similarity matrix:")
print("\n" + " " * 40 + "  [0]    [1]    [2]    [3]")
for i, s1 in enumerate(test_sentences):
    sims = [cosine_sim(test_embeddings[i], test_embeddings[j]) for j in range(4)]
    print(f"[{i}] {s1[:35]:35} {sims[0]:.3f}  {sims[1]:.3f}  {sims[2]:.3f}  {sims[3]:.3f}")

print("\n‚Üí Notice: [0]-[1] are similar (both about oil), [2]-[3] are similar (both about Congress)")

Cosine similarity matrix:

                                          [0]    [1]    [2]    [3]
[0] The engine needs regular oil change 1.000  0.728  -0.045  -0.032
[1] Motor oil should be replaced period 0.728  1.000  0.014  0.035
[2] The Senate convened at noon.        -0.045  0.014  1.000  0.684
[3] Congress began its session at midda -0.032  0.035  0.684  1.000

‚Üí Notice: [0]-[1] are similar (both about oil), [2]-[3] are similar (both about Congress)


In [None]:
# Embed all chunks - this may take a few minutes for large corpora
if all_chunks:
    print(f"Embedding {len(all_chunks)} chunks on {DEVICE}...")
    chunk_texts = [c.text for c in all_chunks]
    chunk_embeddings = embed_model.encode(chunk_texts, show_progress_bar=True)
    chunk_embeddings = chunk_embeddings.astype('float32')  # FAISS wants float32
    print(f"Embeddings shape: {chunk_embeddings.shape}")
else:
    print("No chunks to embed - please load documents first.")

Embedding 3277 chunks on cuda...


Batches:   0%|          | 0/103 [00:00<?, ?it/s]

Embeddings shape: (3277, 384)


---
## Stage 4: Vector Index (FAISS)

FAISS efficiently finds nearest neighbors in high-dimensional spaces.

We use a simple **flat index** (brute-force search) which is transparent and works well for up to ~100k vectors. For larger corpora, you'd use approximate methods like IVF or HNSW.

**Note:** FAISS GPU support is CUDA-only. On MPS/CPU, we use faiss-cpu (still very fast for <100k vectors).

In [None]:
import faiss

# Create FAISS index
# IndexFlatIP = Inner Product (for cosine similarity on normalized vectors)
index = faiss.IndexFlatIP(EMBEDDING_DIM)

if all_chunks:
    # Normalize vectors so inner product = cosine similarity
    faiss.normalize_L2(chunk_embeddings)

    # Add vectors to index
    index.add(chunk_embeddings)
    print(f"Index built with {index.ntotal} vectors")
else:
    print("No embeddings to index - please load and embed documents first.")

Index built with 3277 vectors


---
## Stage 5: Retrieval

Now we can search! Given a query, we:
1. Embed the query with the same model
2. Find the top-k most similar chunks
3. Return those chunks as context

In [None]:
# Helper for Exercises 7 & 8: rebuild chunks + index with different chunk_size / chunk_overlap.
def rebuild_pipeline(chunk_size: int = 512, chunk_overlap: int = 128):
    """Re-chunk documents, re-embed, and rebuild FAISS index. Updates global all_chunks and index."""
    global all_chunks, index
    all_chunks = []
    for filename, content in documents:
        all_chunks.extend(chunk_text(content, filename, chunk_size=chunk_size, chunk_overlap=chunk_overlap))
    chunk_embeddings = embed_model.encode([c.text for c in all_chunks], show_progress_bar=True).astype("float32")
    faiss.normalize_L2(chunk_embeddings)
    index = faiss.IndexFlatIP(EMBEDDING_DIM)
    index.add(chunk_embeddings)
    print(f"Rebuilt: {len(all_chunks)} chunks, chunk_size={chunk_size}, chunk_overlap={chunk_overlap}")

In [None]:
def retrieve(query: str, top_k: int = 5):
    """
    Retrieve the top-k most relevant chunks for a query.

    Returns: List of (chunk, similarity_score) tuples
    """
    # Embed the query
    query_embedding = embed_model.encode([query]).astype('float32')
    faiss.normalize_L2(query_embedding)

    # Search
    scores, indices = index.search(query_embedding, top_k)

    results = []
    for score, idx in zip(scores[0], indices[0]):
        if idx != -1:
            results.append((all_chunks[idx], float(score)))

    return results

In [None]:
# Test retrieval
# ============================================
# TRY DIFFERENT QUERIES FOR YOUR CORPUS!
# ============================================
test_query = "What is the procedure for engine maintenance?"  # ‚Üê Modify this!

if index.ntotal > 0:
    results = retrieve(test_query, top_k=5)

    print(f"Query: {test_query}\n")
    print("Top 5 retrieved chunks:")
    for i, (chunk, score) in enumerate(results, 1):
        print(f"\n[{i}] Score: {score:.4f} | Source: {chunk.source_file}")
        print(f"    {chunk.text[:200]}...")
else:
    print("Index is empty - please load, chunk, and embed documents first.")

Query: What is the procedure for engine maintenance?

Top 5 retrieved chunks:

[1] Score: 0.4028 | Source: CREC-2026-01-22.txt
    the property and cure all project
deficiencies or seek a judicial order of specific performance requiring the owner to
cure all project deficiencies;
(H) work with the owner, lender, or other
related ...

[2] Score: 0.3517 | Source: CREC-2026-01-06.txt
    Operation and Maintenance

Council Grove Lake, KS; U.S. Army
Corps of Engineers

2,095,000

13,725,000

15,820,000

Moran

S

Army Corps of Engineers (Civil)

Operation and Maintenance

Kanopolis Lake...

[3] Score: 0.3493 | Source: CREC-2026-01-22-bk2.txt
    as and encourages the Department to
quickly identify and empower strong leadership to properly oversee and manage its biomanufacturing programs, establish a rapid
tempo of execution commensurate with ...

[4] Score: 0.3481 | Source: CREC-2026-01-08-bk3.txt
    o hire and train dedicated
personnel for all activities included under
the Energy and Mine

---
## Stage 6: Generation (LLM)

Now we load a local LLM to generate answers from the retrieved context.

**Recommended models:**
- `Qwen/Qwen2.5-1.5B-Instruct` - Best instruction following at this size
- `Qwen/Qwen2.5-3B-Instruct` - Even better if you have 8GB+ VRAM
- `meta-llama/Llama-3.2-1B-Instruct` - Alternative, slightly weaker

**Device handling:**
- CUDA: Uses `device_map="auto"` and float16
- MPS: Loads to CPU first, then moves to MPS with float32
- CPU: Uses float32 (slower but works)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# ============================================
# CHOOSE YOUR MODEL
# ============================================
LLM_MODEL = "Qwen/Qwen2.5-1.5B-Instruct"  # Or try "Qwen/Qwen2.5-3B-Instruct"

print(f"Loading LLM: {LLM_MODEL}")
print(f"Device: {DEVICE}, Dtype: {DTYPE}")
print("This may take a few minutes on first run...\n")

tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)

# Load with appropriate settings for each device type
if DEVICE == 'cuda':
    model = AutoModelForCausalLM.from_pretrained(
        LLM_MODEL,
        device_map="auto",
        torch_dtype=DTYPE,
        trust_remote_code=True
    )
    print("Model loaded on CUDA")

elif DEVICE == 'mps':
    # For MPS, load to CPU first, then move to MPS
    # (device_map="auto" doesn't work well with MPS)
    model = AutoModelForCausalLM.from_pretrained(
        LLM_MODEL,
        torch_dtype=DTYPE,
        trust_remote_code=True
    )
    model = model.to(DEVICE)
    print("Model loaded on MPS (Apple Silicon)")

else:
    # CPU
    model = AutoModelForCausalLM.from_pretrained(
        LLM_MODEL,
        torch_dtype=DTYPE,
        trust_remote_code=True
    )
    print("Model loaded on CPU (this will be slow)")

Loading LLM: Qwen/Qwen2.5-1.5B-Instruct
Device: cuda, Dtype: torch.float16
This may take a few minutes on first run...



config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/338 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

Model loaded on CUDA


In [None]:
def generate_response(prompt: str, max_new_tokens: int = 512, temperature: float = 0.3) -> str:
    """
    Generate a response from the LLM.

    Lower temperature = more focused/deterministic
    Higher temperature = more creative/random
    """
    inputs = tokenizer(prompt, return_tensors="pt")

    # Move inputs to the correct device
    if DEVICE == 'cuda':
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
    else:
        inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            do_sample=True if temperature > 0 else False,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode only the new tokens
    response = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:],
        skip_special_tokens=True
    )

    return response.strip()

---
## Stage 7: The Complete RAG Pipeline

Now we put it all together. The **prompt template** is critical - it must instruct the model to use the retrieved context.

In [None]:
# The RAG prompt template
PROMPT_TEMPLATE = """You are a helpful assistant that answers questions based on the provided context.

CONTEXT:
{context}

QUESTION: {question}

INSTRUCTIONS:
- Answer the question based ONLY on the information in the context above
- If the context doesn't contain enough information to answer, say so
- Quote relevant parts of the context to support your answer
- Be concise and direct
- If the context doesn't contain the answer, say 'I cannot answer this from the available documents.'

ANSWER:"""


def direct_query(question: str, max_new_tokens: int = 512) -> str:
    """Ask the LLM directly with no retrieved context (for RAG vs no-RAG comparison)."""
    prompt = f"""Answer this question:
{question}

Answer:"""
    return generate_response(prompt, max_new_tokens=max_new_tokens)

def rag_query(question: str, top_k: int = 5, show_context: bool = False, prompt_template: str = None) -> str:
    """The complete RAG pipeline. prompt_template: custom template for Exercise 10."""
    # Step 1: Retrieve
    results = retrieve(question, top_k)

    # Format context
    context_parts = []
    for chunk, score in results:
        context_parts.append(f"[Source: {chunk.source_file}, Relevance: {score:.3f}]\n{chunk.text}")
    context = "\n\n---\n\n".join(context_parts)

    if show_context:
        print("=" * 60)
        print("RETRIEVED CONTEXT:")
        print("=" * 60)
        print(context)
        print("=" * 60 + "\n")

    # Step 2: Build prompt (use custom template if provided)
    template = prompt_template if prompt_template is not None else PROMPT_TEMPLATE
    prompt = template.format(context=context, question=question)

    # Step 3: Generate
    answer = generate_response(prompt)

    return answer

In [None]:
# ============================================
# TEST YOUR RAG PIPELINE!
# ============================================

question = "What maintenance is required for the engine?"  # ‚Üê Modify for your corpus!

if index.ntotal > 0:
    print(f"Question: {question}\n")
    print("Generating answer...\n")

    answer = rag_query(question, top_k=5, show_context=True)

    print("ANSWER:")
    print(answer)
else:
    print("Pipeline not ready - please complete all previous stages first.")

Question: What maintenance is required for the engine?

Generating answer...

RETRIEVED CONTEXT:
[Source: ModelTNew.txt, Relevance: 0.529]
. . . . . . . .    1      00
3     Install generator , test and remove car covers . . . . . . .                             15

                                                                                        1      25
                        CHAPTER XXX

            Starting Motor Overhaul

---

[Source: ModelTNew.txt, Relevance: 0.510]
g, clean all parts thoroughly, also lubricate all
  moving parts and the surfaces upon which they move, such as
  bearings, bushings, pistons, cylinders, etc. Draw all bolts, nuts and
  cap screws down tightly, making sure to replace lock washers and
  cotter pins as required.

---

[Source: ModelTNew.txt, Relevance: 0.510]
.......          30
3     Install hood, fill radiator with water, remove car covers.. .                                    8

                                                             

---
## Experiments: Understanding RAG Behavior

Now that you have a working pipeline, try these experiments to understand how each component affects the results.

In [None]:
# # EXPERIMENT 1: Compare WITH vs WITHOUT RAG
# # ==========================================

# question = "What are the specifications for the landing gear?"  # ‚Üê Use a corpus-specific question!

# if index.ntotal > 0:
#     # WITHOUT RAG - just ask the model directly
#     direct_prompt = f"""Answer this question:
# {question}

# Answer:"""

#     print("WITHOUT RAG (model's own knowledge):")
#     print("-" * 40)
#     direct_answer = generate_response(direct_prompt)
#     print(direct_answer)

#     print("\n" + "=" * 60 + "\n")

#     # WITH RAG
#     print("WITH RAG (using retrieved context):")
#     print("-" * 40)
#     rag_answer = rag_query(question, top_k=5)
#     print(rag_answer)
# else:
#     print("Please complete the pipeline setup first.")


# Run for Model T
model_t_questions = [
    "How do I adjust the carburetor on a Model T?",
    "What is the correct spark plug gap for a Model T Ford?",
    "How do I fix a slipping transmission band?",
    "What oil should I use in a Model T engine?",
]

def compare_no_rag_vs_rag(questions, top_k=5, show_context=True):
    if index.ntotal == 0:
        print("Please complete the pipeline setup first (index is empty).")
        return

    for question in questions:
        print("\n" + "=" * 90)
        print("QUESTION:", question)

        # WITHOUT RAG
        direct_prompt = f"""Answer this question:
{question}

Answer:"""
        print("\nWITHOUT RAG (model's own knowledge):")
        print("-" * 40)
        direct_answer = generate_response(direct_prompt)
        print(direct_answer)

        print("\n" + "-" * 40)

        # WITH RAG
        print("WITH RAG (using retrieved context):")
        print("-" * 40)
        rag_answer = rag_query(question, top_k=top_k, show_context=show_context)
        print(rag_answer)

        print("=" * 90)

compare_no_rag_vs_rag(model_t_questions, top_k=5, show_context=True)



QUESTION: How do I adjust the carburetor on a Model T?

WITHOUT RAG (model's own knowledge):
----------------------------------------
To adjust the carburetor on a Model T, you will need to follow these steps:

1. Locate the choke lever and set it in the "off" position.
2. Remove the spark plug wire by loosening the nut at the top of the cylinder head with an open-end wrench or socket.
3. Use a screwdriver to remove the small screws holding the carburetor assembly in place.
4. Once the carburetor is removed, locate the float bowl and replace any worn-out parts such as the float, needle valve, or jets.
5. Adjust the idle speed by turning the adjustment screw located near the throttle body until the engine runs smoothly without hesitation.
6. Check for proper fuel flow by using a fuel pressure gauge if available. The recommended fuel pressure should be around 20-30 psi.
7. Reassemble the carburetor and reinstall the spark plug wire.
8. Start the engine and test its performance.

Note: T

In [None]:
%cd /content/cs6501workshop
!mkdir Topic5RAG
!mkdir Topic5RAG/outputs

/content/cs6501workshop


In [None]:
cr_questions = [
    "What did Mr. Flood have to say about Mayor David Black in Congress on January 13, 2026?",
    "What mistake Elise Stefanovic make in Congress on January 23, 2026?",
    "What is the purpose of the Main Street Parity Act?",
    "Who in Congress has spoken for and against funding of pregnancy centers?",
]

# Run for Congressional Record
compare_no_rag_vs_rag(cr_questions, top_k=5, show_context=True)




QUESTION: What did Mr. Flood have to say about Mayor David Black in Congress on January 13, 2026?

WITHOUT RAG (model's own knowledge):
----------------------------------------
In a speech delivered at the House of Representatives on January 13, 2026, Mr. Flood expressed his support for Mayor David Black's leadership and dedication to improving the city's infrastructure. He emphasized that Mayor Black had demonstrated strong commitment to public service and was well-liked by constituents. The mayor had also shown resilience during challenging times, which Mr. Flood commended. Additionally, Mr. Flood highlighted Mayor Black's efforts to address traffic congestion and improve transportation systems within the city. He concluded by expressing confidence in Mayor Black's ability to continue leading the city effectively and positively influence its future development. This statement reflects a positive outlook on Mayor Black's performance and his role as a leader in the community.

-------

In [None]:
!git add .
!git commit -m "Topic 5 Experiment 1"
!git push

[main 3bf3aae] Topic 5 Experiment 1
 2 files changed, 828 insertions(+)
 create mode 100644 Topic5RAG/outputs/experiment1_congrecord.txt
 create mode 100644 Topic5RAG/outputs/experiment1_modelt.txt
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 13.05 KiB | 6.53 MiB/s, done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/shainakumar/cs6501workshop.git
   e4bfeff..3bf3aae  main -> main


## Top K Experiments

In [None]:
# # EXPERIMENT 2: Effect of top_k
# # ==========================================

# question = "What safety procedures are required?"  # ‚Üê Use a corpus-specific question!

# if index.ntotal > 0:
#     for k in [1, 3, 5, 10]:
#         print(f"\n{'='*60}")
#         print(f"TOP_K = {k}")
#         print(f"{'='*60}")
#         answer = rag_query(question, top_k=k)
#         print(answer[:500] + "..." if len(answer) > 500 else answer)
# else:
#     print("Please complete the pipeline setup first.")

In [None]:
# EXPERIMENT 2: Effect of top_k
# =================================================

import time

questions = model_t_questions[:3]  # use 3‚Äì5 questions
k_values = [1, 3, 5, 10, 20]

if index.ntotal > 0:
    for q in questions:
        print("\n" + "="*90)
        print("QUESTION:", q)

        for k in k_values:
            print(f"\n{'-'*60}")
            print(f"TOP_K = {k}")

            start = time.time()
            answer = rag_query(q, top_k=k)
            latency = time.time() - start

            print("\nAnswer:")
            print(answer)
            print(f"\nLatency: {latency:.2f} seconds")

        print("="*90)
else:
    print("Please complete the pipeline setup first.")


QUESTION: How do I adjust the carburetor on a Model T?

------------------------------------------------------------
TOP_K = 1

Answer:
To adjust the carburetor on a Model T, you need to follow these steps:

1. Insert the end of the rod through the throttle lever "B".
2. Lock this rod in place using a cotter pin.
3. Next, install the carburetor adjusting rod by threading its head into the dash slot.
4. Position the forked end of rod "C" over the carburetor needle valve.
5. Finally, secure the rod in place with a cotter key at the end.

These instructions provide a clear method for adjusting the carburetor on a Model T vehicle as described in the given context. The process involves securing rods through specific slots and holes to ensure proper alignment and adjustment of components within the carburetor system. This ensures optimal fuel delivery during engine operation. 

The context does not mention any additional tools or methods beyond those listed, making it straightforward and co

In [None]:
!git add .
!git commit -m"exercise 4"
!git push

[main 1f1aa74] exercise 4
 1 file changed, 254 insertions(+)
 create mode 100644 Topic5RAG/outputs/exercise4.txt
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 5.66 KiB | 483.00 KiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/shainakumar/cs6501workshop.git
   649fd60..1f1aa74  main -> main


## Exercise 2

In [None]:
!pip install openai



In [None]:
from openai import OpenAI
import os

from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

client = OpenAI()

def gpt4o_mini_query(question):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": question}
        ],
        temperature=0  # important for fair comparison
    )
    return response.choices[0].message.content

In [None]:
def run_gpt4o_mini(questions):
    for q in questions:
        print("\n" + "="*90)
        print("QUESTION:", q)
        print("\nGPT-4o Mini (NO RAG)")
        print("-"*40)
        print(gpt4o_mini_query(q))
        print("="*90)

# Model T
run_gpt4o_mini(model_t_questions)

# Congressional Record
run_gpt4o_mini(cr_questions)


QUESTION: How do I adjust the carburetor on a Model T?

GPT-4o Mini (NO RAG)
----------------------------------------
Adjusting the carburetor on a Model T Ford involves several steps to ensure proper fuel-air mixture and engine performance. Here‚Äôs a general guide to help you with the adjustment:

### Tools Needed:
- Screwdriver
- Wrench (if necessary)
- Tachometer (optional, for fine-tuning)

### Steps to Adjust the Carburetor:

1. **Warm Up the Engine**: Start the engine and let it warm up to operating temperature. This ensures that the adjustments you make are accurate.

2. **Locate the Carburetor**: The Model T typically has a simple carburetor mounted on the side of the engine. Familiarize yourself with its components, including the mixture adjustment screw and the throttle.

3. **Adjust the Mixture**:
   - **Idle Mixture**: Locate the mixture adjustment screw, usually found on the side of the carburetor. 
   - **Initial Setting**: Turn the screw clockwise until it lightly seat

In [None]:
!git add .
!git commit -m "exercise 2"
!git push

[main a32c969] exercise 2
 3 files changed, 0 insertions(+), 0 deletions(-)
 rename Topic5RAG/outputs/{experiment1_congrecord.txt => exercise1_congrecord.txt} (100%)
 rename Topic5RAG/outputs/{experiment1_modelt.txt => exercise1_modelt.txt} (100%)
 rename Topic5RAG/outputs/{experiment2.txt => exercise2.txt} (100%)
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 2 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 410 bytes | 410.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/shainakumar/cs6501workshop.git
   bd9493b..a32c969  main -> main


## Exercise 3

In [None]:
!git add .
!git commit -m"exercise 3"
!git push

[main 649fd60] exercise 3
 1 file changed, 207 insertions(+)
 create mode 100644 Topic5RAG/outputs/exercise3.txt
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 3.59 KiB | 3.59 MiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/shainakumar/cs6501workshop.git
   a32c969..649fd60  main -> main


## Exercise 5

In [None]:
unanswerable_questions = [
    # Completely off-topic
    "What is the capital of France?",

    # Related but likely not in the manual (depends on your manual content)
    "What's the horsepower of a 1925 Model T?",

    # False premise / leading
    "Why does the manual recommend synthetic oil?",
    "Which section says to use 5W-30 full synthetic oil?",
]

def test_unanswerables(questions, top_k=5, show_context=True):
    if index.ntotal == 0:
        print("Index empty ‚Äî build the RAG pipeline first.")
        return

    for q in questions:
        print("\n" + "="*90)
        print("QUESTION:", q)

        # NO RAG
        print("\nNO RAG:")
        print("-"*40)
        direct_prompt = f"""Answer this question:
{q}

Answer:"""
        print(generate_response(direct_prompt))

        # WITH RAG
        print("\nWITH RAG:")
        print("-"*40)
        print(rag_query(q, top_k=top_k, show_context=show_context))

test_unanswerables(unanswerable_questions, top_k=5, show_context=True)


QUESTION: What is the capital of France?

NO RAG:
----------------------------------------
Paris
Paris is the capital city of France. It's located in northern France and is known for its famous landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and Louvre Museum. The city has a population of over 2 million people and serves as the center of government, finance, culture, education, science, and transportation for the country. Paris is also home to many world-renowned fashion houses, including Chanel, Dior, and Louis Vuitton. The city is known for its beautiful architecture, art museums, and cultural events, making it one of the most popular tourist destinations in Europe. Additionally, Paris hosts the annual French presidential election every five years, where citizens elect their president through a direct vote. In summary, Paris is the capital city of France, serving as both the political and economic hub of the nation.

WITH RAG:
----------------------------------------
RETRI

In [None]:
!git add .
!git commit -m"exercise 6"
!git push

[main c1c8e49] exercise 6
 2 files changed, 754 insertions(+)
 create mode 100644 Topic5RAG/outputs/exercise5_modifiedprompt.txt
 create mode 100644 Topic5RAG/outputs/exercise5_originalprompt.txt
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 9.11 KiB | 1.30 MiB/s, done.
Total 6 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 2 local objects.[K
To https://github.com/shainakumar/cs6501workshop.git
   1f1aa74..c1c8e49  main -> main


## Exercise 6

In [None]:
phrasings = {
    "Formal": "What is the correct spark plug gap specification for a Model T Ford?",
    "Casual": "How far apart should the Model T spark plug be?",
    "Keywords": "Model T spark plug gap",
    "Question form": "What gap do I set the spark plugs to?",
    "Indirect": "Ignition system spark plug spacing requirement",
    "More specific": "Spark plug gap in thousandths of an inch for a Model T",
}

In [None]:
def show_top_chunks(query, top_k=5, max_chars=250):
    results = retrieve(query, top_k)
    print("\n" + "="*90)
    print("QUERY:", query)
    print("="*90)
    for i, (chunk, score) in enumerate(results, 1):
        preview = chunk.text.replace("\n", " ").strip()
        preview = preview[:max_chars] + ("..." if len(preview) > max_chars else "")
        print(f"\n#{i}  score={score:.3f}  source={chunk.source_file}")
        print(preview)

all_results = {}  # store for overlap comparisons

for label, q in phrasings.items():
    results = retrieve(q, top_k=5)
    all_results[label] = results
    show_top_chunks(q, top_k=5, max_chars=250)


QUERY: What is the correct spark plug gap specification for a Model T Ford?

#1  score=0.542  source=ModelTNew.pdf
lacing a screw driver on the terminal and resting it  against the radiator stay rod (See Fig. 503). (The commutator ter- 237    [Page 256] 238  FORD SERVICE  Fig. 503  minals on the coil box are the four upper terminals and for conven- ience are numb...

#2  score=0.536  source=ModelTNew.txt
0" higher clearance                     Fig. 210                          Fig. 2ll 86                            FORD SERVICE       t:1an the upper half. Shims (See Fig. 210) are furnished in various      thicknesses so that extremely close adjustme...

#3  score=0.510  source=ModelTNew.txt
. . . . . . . . . . . . . . . . . . . . . . . . . . . 149               installing and removing .... . .. . ....... . ........ 66- 60 Fuel system, tracing trouble in ....... . .. . .. . .. . ... . . . . . . .... 1014                                  ...

#4  score=0.500  source=ModelTNew.txt
iddle

In [None]:
def chunk_id(chunk):
    # create a simple stable-ish identifier
    return (chunk.source_file, chunk.text[:80])

# Build sets of chunk IDs for each phrasing
result_sets = {
    label: {chunk_id(chunk) for (chunk, score) in results}
    for label, results in all_results.items()
}

labels = list(result_sets.keys())

print("\n" + "="*90)
print("OVERLAP (count of shared chunks in top-5)")
print("="*90)

# Pairwise overlap counts
for i in range(len(labels)):
    for j in range(i+1, len(labels)):
        a, b = labels[i], labels[j]
        overlap = len(result_sets[a] & result_sets[b])
        print(f"{a:>12} vs {b:<12}  overlap={overlap}/5")


OVERLAP (count of shared chunks in top-5)
      Formal vs Casual        overlap=1/5
      Formal vs Keywords      overlap=1/5
      Formal vs Question form  overlap=1/5
      Formal vs Indirect      overlap=1/5
      Formal vs More specific  overlap=0/5
      Casual vs Keywords      overlap=4/5
      Casual vs Question form  overlap=2/5
      Casual vs Indirect      overlap=2/5
      Casual vs More specific  overlap=4/5
    Keywords vs Question form  overlap=2/5
    Keywords vs Indirect      overlap=2/5
    Keywords vs More specific  overlap=3/5
Question form vs Indirect      overlap=3/5
Question form vs More specific  overlap=1/5
    Indirect vs More specific  overlap=1/5


In [None]:
import numpy as np

print("\n" + "="*90)
print("SCORE SUMMARY")
print("="*90)

for label, results in all_results.items():
    scores = [score for (chunk, score) in results]
    print(f"{label:>12}: max={max(scores):.3f}  mean={np.mean(scores):.3f}  min={min(scores):.3f}")


SCORE SUMMARY
      Formal: max=0.542  mean=0.517  min=0.496
      Casual: max=0.525  mean=0.477  min=0.447
    Keywords: max=0.586  mean=0.537  min=0.500
Question form: max=0.590  mean=0.550  min=0.530
    Indirect: max=0.516  mean=0.464  min=0.447
More specific: max=0.593  mean=0.545  min=0.502


In [None]:
!git add .
!git commit -m "exercise 6"
!git push

[main d6df5ac] exercise 6
 1 file changed, 148 insertions(+)
 create mode 100644 Topic5RAG/outputs/exercise6.txt
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 2.47 KiB | 2.47 MiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/shainakumar/cs6501workshop.git
   c1c8e49..d6df5ac  main -> main


---
## Save/Load Your Index

For large corpora, you don't want to re-embed every time. Here's how to persist the index.

In [None]:
import pickle

def save_index(filepath: str):
    """Save FAISS index and chunks to disk."""
    faiss.write_index(index, f"{filepath}.faiss")
    with open(f"{filepath}.chunks", 'wb') as f:
        pickle.dump(all_chunks, f)
    print(f"‚úì Saved index to {filepath}.faiss")
    print(f"‚úì Saved chunks to {filepath}.chunks")

def load_saved_index(filepath: str):
    """Load FAISS index and chunks from disk."""
    global index, all_chunks
    index = faiss.read_index(f"{filepath}.faiss")
    with open(f"{filepath}.chunks", 'rb') as f:
        all_chunks = pickle.load(f)
    print(f"‚úì Loaded index with {index.ntotal} vectors")

# Save your index
if index.ntotal > 0:
    save_index("my_rag_index")
else:
    print("No index to save.")

# Later, to load:
# load_saved_index("my_rag_index")

---
## Next Steps

You've built a complete RAG pipeline from scratch! In the next class, we'll:

1. **Improve retrieval** with query rewriting and hybrid search
2. **Rebuild with LangChain** to see how frameworks abstract these steps
3. **Evaluate systematically** with test questions and metrics

### Exercises to try:
- Vary chunk size (256, 512, 1024) and measure retrieval quality
- Try a different embedding model (`BAAI/bge-small-en-v1.5`)
- Try a larger LLM (`Qwen/Qwen2.5-3B-Instruct`) and compare answer quality
- Ask questions that require combining information from multiple chunks

---
## Appendix: Device Information

Run this cell to see detailed information about your compute environment.

In [None]:
def print_device_info():
    """Print detailed information about available compute devices."""
    print("=" * 60)
    print("DEVICE INFORMATION")
    print("=" * 60)

    print(f"\nEnvironment: {ENVIRONMENT}")
    print(f"PyTorch version: {torch.__version__}")

    # CUDA
    print(f"\nCUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"  Device: {torch.cuda.get_device_name(0)}")
        print(f"  Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

    # MPS
    print(f"\nMPS available: {torch.backends.mps.is_available()}")
    print(f"MPS built: {torch.backends.mps.is_built()}")

    # Current selection
    print(f"\n‚Üí Selected device: {DEVICE}")
    print(f"‚Üí Selected dtype: {DTYPE}")
    print("=" * 60)

print_device_info()

In [None]:
# EXPERIMENT 3: Question the corpus CAN'T answer
# ==========================================
# Does the model admit it doesn't know, or hallucinate?

unanswerable_question = "What is the CEO's favorite color?"

if index.ntotal > 0:
    print(f"Question: {unanswerable_question}\n")
    answer = rag_query(unanswerable_question, top_k=5, show_context=True)
    print(f"\nAnswer: {answer}")
else:
    print("Please complete the pipeline setup first.")