
# Day 2 — Exercise 2: Baseline RAG & Hybrid Retrieval (with LiteLLM + RRF)

**Objective:** Develop a retrieval-augmented generation (RAG) system with **hybrid retrieval**.

**Tasks:**
- Build a simple **RAG pipeline (retrieve → read)** using the vector store from Ex1 to answer 10 domain-specific queries.
- Extend it to a **hybrid retrieval** system combining **BM25 (sparse)** and **dense embeddings**, and merge with **Reciprocal Rank Fusion (RRF)**.
- Compare **Top-5 retrieved documents** for **BM25**, **Dense**, and **Hybrid** approaches.

**Learning Outcome:** Learn the mechanics of RAG and how **hybrid retrieval** improves relevance over single-method retrieval.

> Notes:
> - This notebook uses **LiteLLM** as the API wrapper for embeddings & chat models, so you can switch providers with minimal code changes.
> - Optional: a debug/enrichment panel shows how to query a FIL/Perplexity-style endpoint for web search (off by default).



## 1. Setup
Install dependencies and import packages. Rerun this cell if you modify environments.


In [3]:

%pip install -qU litellm chromadb rank_bm25 faiss-cpu numpy pandas scikit-learn tqdm python-dotenv


Note: you may need to restart the kernel to use updated packages.


In [4]:

import os
import json
import time
import math
import glob
import uuid
from dataclasses import dataclass
from typing import List, Dict, Any, Tuple

import numpy as np
import pandas as pd
from tqdm import tqdm

from rank_bm25 import BM25Okapi

# For cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

# LiteLLM wrapper
from litellm import embedding, completion

# Optional: Chroma or FAISS (we'll support both paths)
import chromadb
from chromadb.config import Settings as ChromaSettings

import faiss



## 2. Configuration

Centralize all knobs here. Update paths and keys before running.


In [5]:

# ==== Paths & Artifacts ====
DATA_DIR = os.getenv("DOCS_DIR", "./data/docs")           # Where raw docs live (used if rebuilding index)
ARTIFACT_DIR = "./artifacts/day2_ex2"
os.makedirs(ARTIFACT_DIR, exist_ok=True)

# Vector store choices: "chroma" or "faiss"
VECTOR_STORE_KIND = os.getenv("VECTOR_STORE_KIND", "chroma")

# Persisted vector store paths
CHROMA_DIR = os.path.join(ARTIFACT_DIR, "chroma_store")
FAISS_INDEX_PATH = os.path.join(ARTIFACT_DIR, "faiss.index")
FAISS_META_PATH = os.path.join(ARTIFACT_DIR, "faiss_meta.json")

# ==== Retrieval Settings ====
TOP_K_RETRIEVE = 5
RRF_K = 60  # reciprocal rank fusion constant

# ==== Embeddings ====
# Two models to compare interchangeably via LiteLLM
EMBED_MODEL_A = os.getenv("EMBED_MODEL_A", "text-embedding-3-small")  # OpenAI
EMBED_MODEL_B = os.getenv("EMBED_MODEL_B", "sentence-transformers/all-MiniLM-L6-v2")  # HF via LiteLLM route (if configured)

# Normalize vectors for cosine similarity (recommended)
NORMALIZE_EMBEDDINGS = True

# ==== Reader / Chat Model ====
CHAT_MODEL = os.getenv("CHAT_MODEL", "gpt-4o-mini")  # Any LiteLLM-supported chat model

# ==== Keys / Environment ====
# Set provider keys in env, e.g., OPENAI_API_KEY, etc.
# If using Azure/OpenAI/Anthropic/etc., set their corresponding env vars as required by LiteLLM.
os.environ.setdefault("LITELLM_LOG", "INFO")  # or DEBUG

# ==== Optional: FIL/Perplexity-style Web Debug Panel ====
USE_FIL_PERPLEXITY = os.getenv("USE_FIL_PERPLEXITY", "false").lower() == "true"
FIL_HOST = os.getenv("FIL_HOST", "")
FIL_ACCESS_TOKEN = os.getenv("FIL_ACCESS_TOKEN", "")



## 3. Document Loading & Chunking (fallback if vector store not found)

If a persisted vector store isn't found, we will rebuild by:
1. Loading files from `DATA_DIR` (txt/pdf/csv basic loader).
2. Chunking text.
3. Indexing into the chosen vector store (Chroma/FAISS) with LiteLLM embeddings.


In [6]:

def read_txt(path: str) -> str:
    try:
        with open(path, "r", encoding="utf-8") as f:
            return f.read()
    except Exception:
        with open(path, "r", encoding="latin-1") as f:
            return f.read()

def read_csv(path: str) -> str:
    df = pd.read_csv(path)
    return df.to_csv(index=False)

def read_pdf(path: str) -> str:
    # minimal no-deps placeholder: in real use, install pypdf and parse properly
    try:
        import pypdf  # type: ignore
        text = []
        reader = pypdf.PdfReader(path)
        for page in reader.pages:
            text.append(page.extract_text() or "")
        return "\n".join(text)
    except Exception:
        # Fallback: just record the file name
        return f"[PDF placeholder] {os.path.basename(path)} (install pypdf for full text)"
    
def load_docs_from_dir(data_dir: str) -> List[Dict[str, Any]]:
    paths = []
    paths += glob.glob(os.path.join(data_dir, "**/*.txt"), recursive=True)
    paths += glob.glob(os.path.join(data_dir, "**/*.csv"), recursive=True)
    paths += glob.glob(os.path.join(data_dir, "**/*.pdf"), recursive=True)

    docs = []
    for p in paths:
        ext = os.path.splitext(p)[1].lower()
        if ext == ".txt":
            content = read_txt(p)
        elif ext == ".csv":
            content = read_csv(p)
        elif ext == ".pdf":
            content = read_pdf(p)
        else:
            continue
        docs.append({
            "doc_id": str(uuid.uuid4()),
            "source": p,
            "text": content
        })
    return docs

def chunk_text(text: str, chunk_size: int = 800, chunk_overlap: int = 100) -> List[str]:
    # Simple whitespace chunker
    tokens = text.split()
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        chunk = " ".join(tokens[start:end])
        if chunk.strip():
            chunks.append(chunk)
        start += chunk_size - chunk_overlap
        if chunk_size - chunk_overlap <= 0:
            break
    return chunks



## 4. Embedding Helpers (LiteLLM)

Two interchangeable embedding models are configured. We also support normalization and basic on-disk caching.


In [7]:

def embed_texts(texts: List[str], model: str) -> np.ndarray:
    # Batch for stability
    BATCH = 64
    all_vecs = []
    for i in tqdm(range(0, len(texts), BATCH), desc=f"Embedding with {model}"):
        batch = texts[i:i+BATCH]
        resp = embedding(model=model, input=batch)
        vecs = [d["embedding"] for d in resp["data"]]
        all_vecs.extend(vecs)
    arr = np.array(all_vecs, dtype=np.float32)
    if NORMALIZE_EMBEDDINGS:
        norms = np.linalg.norm(arr, axis=1, keepdims=True) + 1e-12
        arr = arr / norms
    return arr

def save_json(path: str, obj: Any):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(obj, f, ensure_ascii=False, indent=2)

def load_json(path: str) -> Any:
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)



## 5. Vector Store Adapters (Chroma & FAISS)

We try to **load** existing stores. If not available, we **rebuild** from `DATA_DIR`.


In [8]:

@dataclass
class Corpus:
    texts: List[str]
    metadatas: List[Dict[str, Any]]
    ids: List[str]

def build_corpus_from_docs(docs: List[Dict[str, Any]], chunk_size=800, chunk_overlap=100) -> Corpus:
    texts, metas, ids = [], [], []
    for d in docs:
        chunks = chunk_text(d["text"], chunk_size=chunk_size, chunk_overlap=chunk_overlap)
        for idx, ch in enumerate(chunks):
            texts.append(ch)
            metas.append({"source": d["source"], "doc_id": d["doc_id"], "chunk": idx})
            ids.append(str(uuid.uuid4()))
    return Corpus(texts=texts, metadatas=metas, ids=ids)

# --------- Chroma ----------
def chroma_load_or_build(corpus: Corpus, embed_model: str):
    client = chromadb.PersistentClient(path=CHROMA_DIR, settings=ChromaSettings(allow_reset=True))
    coll = client.get_or_create_collection(name=f"day2_ex2_{embed_model.replace('/', '_')}")

    # If empty, build it
    if coll.count() == 0:
        vecs = embed_texts(corpus.texts, embed_model).tolist()
        coll.add(documents=corpus.texts, metadatas=corpus.metadatas, ids=corpus.ids, embeddings=vecs)
    return coll

# --------- FAISS ----------
def faiss_load_or_build(corpus: Corpus, embed_model: str):
    if os.path.exists(FAISS_INDEX_PATH) and os.path.exists(FAISS_META_PATH):
        index = faiss.read_index(FAISS_INDEX_PATH)
        meta = load_json(FAISS_META_PATH)
        return index, meta

    vecs = embed_texts(corpus.texts, embed_model)
    dim = vecs.shape[1]
    index = faiss.IndexFlatIP(dim)  # works best with normalized vectors (cosine via dot product)
    index.add(vecs)

    meta = {"texts": corpus.texts, "metadatas": corpus.metadatas, "ids": corpus.ids, "embed_model": embed_model}
    faiss.write_index(index, FAISS_INDEX_PATH)
    save_json(FAISS_META_PATH, meta)
    return index, meta



## 6. Build / Load Vector Store

- Tries to **load** a persisted store first.
- If missing, **rebuilds** from `DATA_DIR`.


In [9]:

# Load docs only if needed
need_docs = False
if VECTOR_STORE_KIND == "chroma":
    # Check if there is already a persisted Chroma collection (we'll just attempt and see count)
    client_probe = chromadb.PersistentClient(path=CHROMA_DIR, settings=ChromaSettings(allow_reset=True))
    try:
        probe = client_probe.get_collection(name=f"day2_ex2_{EMBED_MODEL_A.replace('/', '_')}")
        empty = probe.count() == 0
        if empty:
            need_docs = True
    except Exception:
        need_docs = True
else:
    # FAISS
    if not (os.path.exists(FAISS_INDEX_PATH) and os.path.exists(FAISS_META_PATH)):
        need_docs = True

if need_docs:
    print("No persisted store found; rebuilding from DATA_DIR =", DATA_DIR)
    docs = load_docs_from_dir(DATA_DIR)
    if len(docs) == 0:
        print("WARNING: No documents found in DATA_DIR. Please add some .txt/.csv/.pdf files.")
    corpus = build_corpus_from_docs(docs)
else:
    # If we don't need docs, still provide a corpus with metadata if available
    docs = load_docs_from_dir(DATA_DIR)  # may be empty if you rely entirely on persisted store
    corpus = build_corpus_from_docs(docs) if len(docs) else Corpus(texts=[], metadatas=[], ids=[])

# Build or load store for EMBED_MODEL_A (primary dense retriever)
if VECTOR_STORE_KIND == "chroma":
    dense_store = chroma_load_or_build(corpus, EMBED_MODEL_A)
    print("Chroma collection size:", dense_store.count())
else:
    dense_index, dense_meta = faiss_load_or_build(corpus, EMBED_MODEL_A)
    print("FAISS index size:", dense_index.ntotal)


No persisted store found; rebuilding from DATA_DIR = ./data/docs


Embedding with text-embedding-3-small: 0it [00:00, ?it/s]


AxisError: axis 1 is out of bounds for array of dimension 1


## 7. Build Sparse (BM25) Index


In [10]:

def tokenize_for_bm25(text: str) -> List[str]:
    # Simple whitespace tokenization; you can enhance with regex or nltk
    return text.lower().split()

if VECTOR_STORE_KIND == "chroma":
    # Chroma: get texts directly
    # We may have only a persisted store with no in-memory corpus; fetch docs in batches if needed
    # For simplicity, we assume we have `corpus.texts`. If empty, we can pull from chroma.
    if not corpus.texts:
        # Pull all documents back (in chunks) — beware for large corpora
        # Here, we assume moderate size
        all_texts = []
        all_metas = []
        all_ids = []
        BATCH = 500
        count = dense_store.count()
        for start in range(0, count, BATCH):
            end = min(start + BATCH, count)
            res = dense_store.get(include=["documents", "metadatas", "ids"], limit=BATCH, offset=start)
            all_texts.extend(res.get("documents", []))
            all_metas.extend(res.get("metadatas", []))
            all_ids.extend(res.get("ids", []))
        corpus = Corpus(texts=all_texts, metadatas=all_metas, ids=all_ids)

    tokenized = [tokenize_for_bm25(t) for t in corpus.texts]
    bm25 = BM25Okapi(tokenized)
else:
    # FAISS path: use dense_meta as source of texts
    tokenized = [tokenize_for_bm25(t) for t in dense_meta["texts"]]
    bm25 = BM25Okapi(tokenized)


NameError: name 'dense_store' is not defined


## 8. Retrieval Functions: Dense, BM25, and RRF


In [None]:

def dense_search(query: str, top_k: int = TOP_K_RETRIEVE, embed_model: str = EMBED_MODEL_A):
    # Embed the query
    qvec = np.array(embedding(model=embed_model, input=query)["data"][0]["embedding"], dtype=np.float32)
    if NORMALIZE_EMBEDDINGS:
        qvec = qvec / (np.linalg.norm(qvec) + 1e-12)

    if VECTOR_STORE_KIND == "chroma":
        # Use chroma similarity search with provided query embedding
        res = dense_store.query(
            query_embeddings=[qvec.tolist()],
            n_results=top_k,
            include=["documents", "metadatas", "distances", "ids"]
        )
        hits = []
        for i in range(len(res["ids"][0])):
            hits.append({
                "id": res["ids"][0][i],
                "text": res["documents"][0][i],
                "metadata": res["metadatas"][0][i],
                "score": 1 - res["distances"][0][i]  # convert distance to similarity-ish
            })
        return hits
    else:
        # FAISS
        D, I = dense_index.search(np.expand_dims(qvec, axis=0), top_k)
        hits = []
        for rank, (dist, idx) in enumerate(zip(D[0], I[0])):
            if idx == -1: 
                continue
            hits.append({
                "id": dense_meta["ids"][idx],
                "text": dense_meta["texts"][idx],
                "metadata": dense_meta["metadatas"][idx],
                "score": float(dist)  # inner product if normalized ~ cosine
            })
        return hits

def bm25_search(query: str, top_k: int = TOP_K_RETRIEVE):
    tokenized_query = tokenize_for_bm25(query)
    scores = bm25.get_scores(tokenized_query)
    top_idx = np.argsort(scores)[::-1][:top_k]
    hits = []
    for idx in top_idx:
        if VECTOR_STORE_KIND == "chroma":
            hits.append({
                "id": corpus.ids[idx],
                "text": corpus.texts[idx],
                "metadata": corpus.metadatas[idx],
                "score": float(scores[idx])
            })
        else:
            hits.append({
                "id": dense_meta["ids"][idx],
                "text": dense_meta["texts"][idx],
                "metadata": dense_meta["metadatas"][idx],
                "score": float(scores[idx])
            })
    return hits

def rrf_merge(bm25_hits: List[Dict[str, Any]], dense_hits: List[Dict[str, Any]], k: int = RRF_K, top_k: int = TOP_K_RETRIEVE):
    # Build rank maps
    bm25_rank = {h["id"]: r for r, h in enumerate(bm25_hits, start=1)}
    dense_rank = {h["id"]: r for r, h in enumerate(dense_hits, start=1)}

    # Doc universe
    doc_ids = set(bm25_rank.keys()) | set(dense_rank.keys())
    fused = []
    for did in doc_ids:
        score = 0.0
        if did in bm25_rank:
            score += 1.0 / (k + bm25_rank[did])
        if did in dense_rank:
            score += 1.0 / (k + dense_rank[did])
        fused.append((did, score))

    fused.sort(key=lambda x: x[1], reverse=True)
    # Build full records using dense/bm25 lookup (prefer dense text/metadata if available)
    by_id = {}
    for h in dense_hits + bm25_hits:
        by_id[h["id"]] = h
    results = []
    for did, fscore in fused[:top_k]:
        rec = by_id[did].copy()
        rec["rrf_score"] = fscore
        results.append(rec)
    return results



## 9. RAG: Prompting & Answering

We implement two pipelines:
- **Baseline RAG (Dense-only)**
- **Hybrid RAG (RRF over Dense + BM25)**

Both produce answers with **citations**.


In [None]:

SYSTEM_PROMPT = (
    "You are a careful assistant. Answer using only the provided context. "
    "If the answer is not in the context, say you don't know."
)

def build_context(docs: List[Dict[str, Any]]) -> str:
    lines = []
    for i, d in enumerate(docs, start=1):
        src = d.get("metadata", {}).get("source", "unknown")
        lines.append(f"[{i}] Source: {os.path.basename(src)}\n{d['text'][:1000]}")
    return "\n\n".join(lines)

def cite_sources(docs: List[Dict[str, Any]]) -> str:
    cites = []
    for i, d in enumerate(docs, start=1):
        src = d.get("metadata", {}).get("source", "unknown")
        cites.append(f"[{i}] {os.path.basename(src)}")
    return "Citations: " + "; ".join(cites)

def rag_answer(query: str, retriever: str = "dense") -> Dict[str, Any]:
    if retriever == "dense":
        hits = dense_search(query, top_k=TOP_K_RETRIEVE)
    elif retriever == "bm25":
        hits = bm25_search(query, top_k=TOP_K_RETRIEVE)
    elif retriever == "hybrid":
        hits = rrf_merge(bm25_search(query, TOP_K_RETRIEVE), dense_search(query, TOP_K_RETRIEVE))
    else:
        raise ValueError("retriever must be one of: dense | bm25 | hybrid")

    context = build_context(hits)
    user_prompt = f"Question: {query}\n\nContext:\n{context}\n\nAnswer clearly and cite sources by index."

    resp = completion(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.2,
        max_tokens=400
    )
    answer_text = resp["choices"][0]["message"]["content"]
    return {
        "query": query,
        "retriever": retriever,
        "hits": hits,
        "answer": answer_text,
        "citations": cite_sources(hits)
    }



## 10. Evaluation Queries (10)

Edit these to match your domain. You can also load from a CSV.


In [None]:

# Option A: inline list of queries
queries = [
    "What are the warranty terms for the product?",
    "How do I reset the device to factory settings?",
    "What are the installation prerequisites?",
    "How often should I schedule maintenance?",
    "What is the recommended operating temperature range?",
    "How can I export data to CSV?",
    "What are the safety precautions before setup?",
    "How do I enable debug logging?",
    "What is the process to request technical support?",
    "How do I upgrade to the latest firmware?"
]

# Option B: load from a CSV with a 'query' column
# import pandas as pd
# df_q = pd.read_csv("./data/queries_day2_ex2.csv")
# queries = df_q["query"].dropna().tolist()



## 11. Compare Top-5 Retrieved Documents (BM25 vs Dense vs Hybrid)

We will collect the top-5 for each method and present side-by-side tables.


In [None]:

def summarize_hits(hits: List[Dict[str, Any]]) -> pd.DataFrame:
    rows = []
    for rank, h in enumerate(hits, start=1):
        rows.append({
            "rank": rank,
            "id": h["id"],
            "score": round(h.get("score", 0.0), 4),
            "rrf_score": round(h.get("rrf_score", 0.0), 4) if "rrf_score" in h else None,
            "source": os.path.basename(h.get("metadata", {}).get("source", "unknown")),
            "snippet": (h["text"][:200] + "...") if len(h["text"]) > 200 else h["text"]
        })
    return pd.DataFrame(rows)

all_comparisons = {}
for q in queries:
    bm25_hits = bm25_search(q, TOP_K_RETRIEVE)
    dense_hits = dense_search(q, TOP_K_RETRIEVE, EMBED_MODEL_A)
    hybrid_hits = rrf_merge(bm25_hits, dense_hits, RRF_K, TOP_K_RETRIEVE)

    all_comparisons[q] = {
        "bm25": summarize_hits(bm25_hits),
        "dense": summarize_hits(dense_hits),
        "hybrid": summarize_hits(hybrid_hits)
    }

# Show a sample query comparison
sample_q = queries[0]
print("Sample Query:", sample_q)
display(all_comparisons[sample_q]["bm25"])
display(all_comparisons[sample_q]["dense"])
display(all_comparisons[sample_q]["hybrid"])



## 12. Baseline vs Hybrid RAG Answers

We generate answers using:
- **Baseline RAG** (dense-only)
- **Hybrid RAG** (RRF merged)

and collect their outputs with citations.


In [None]:

answers = []
for q in tqdm(queries, desc="Answering queries"):
    baseline = rag_answer(q, retriever="dense")
    hybrid = rag_answer(q, retriever="hybrid")
    answers.append({
        "query": q,
        "baseline_answer": baseline["answer"],
        "baseline_citations": baseline["citations"],
        "hybrid_answer": hybrid["answer"],
        "hybrid_citations": hybrid["citations"]
    })

df_answers = pd.DataFrame(answers)
df_answers.head()



## 13. Light-weight Retrieval Check (Optional)

If you have a ground-truth relevance file (CSV) with columns: `query, doc_id` (multiple rows per query), 
we can compute a simple **Precision@5** for each method. If no ground truth is present, this section will skip gracefully.


In [None]:

relevance_path = "./data/relevance.csv"
if os.path.exists(relevance_path):
    rel_df = pd.read_csv(relevance_path)
    rel_map = {}
    for q, grp in rel_df.groupby("query"):
        rel_map[q] = set(grp["doc_id"].astype(str).tolist())

    rows = []
    for q in queries:
        bm25_ids = all_comparisons[q]["bm25"]["id"].astype(str).tolist()
        dense_ids = all_comparisons[q]["dense"]["id"].astype(str).tolist()
        hybrid_ids = all_comparisons[q]["hybrid"]["id"].astype(str).tolist()
        gt = rel_map.get(q, set())

        def p_at_5(ids):
            return round(sum(1 for x in ids if x in gt) / max(1, len(ids)), 3)

        rows.append({
            "query": q,
            "bm25_p@5": p_at_5(bm25_ids),
            "dense_p@5": p_at_5(dense_ids),
            "hybrid_p@5": p_at_5(hybrid_ids),
        })
    df_metrics = pd.DataFrame(rows)
    display(df_metrics)
else:
    print("No ground-truth relevance file found at", relevance_path, "- skipping metric computation.")



## 14. (Optional) FIL/Perplexity Web Debug Panel

Use this section to ping a FIL/Perplexity-style endpoint for web results when you need external confirmation.
Toggle via `USE_FIL_PERPLEXITY=True` env var and set `FIL_HOST`, `FIL_ACCESS_TOKEN`.


In [None]:

import requests

def perplexity_web_search(prompt: str) -> Dict[str, Any]:
    assert FIL_HOST and FIL_ACCESS_TOKEN, "Set FIL_HOST and FIL_ACCESS_TOKEN for FIL/Perplexity access."
    url = f"{FIL_HOST}/api/filcopilot/chat/perplexity/v1/"
    headers = {"Authorization": f"Bearer {FIL_ACCESS_TOKEN}", "Content-Type": "application/json"}
    data = {
        "messages": [
            {"role": "system", "content": "Be precise and concise."},
            {"role": "user", "content": prompt},
        ],
        "max_tokens": 800,
        "temperature": 0.2,
        "return_images": False,
        "search_after_date_filter": None,
        "search_before_date_filter": None,
        "search_domain_filter": [],
        "search_context_size": "low",
        "top_p": 0.9,
        "search_mode": "web",
        "return_related_questions": False,
        "top_k": 0,
        "presence_penalty": 0,
        "frequency_penalty": 0,
    }
    resp = requests.post(url, headers=headers, json=data, verify=False, timeout=60)
    resp.raise_for_status()
    return resp.json()

if USE_FIL_PERPLEXITY:
    try:
        dbg = perplexity_web_search("Quick test query for Day 2 Ex2 debug panel")
        print("FIL/Perplexity result keys:", list(dbg.keys())[:10])
    except Exception as e:
        print("Perplexity debug call failed:", e)
else:
    print("FIL/Perplexity panel disabled. Set USE_FIL_PERPLEXITY=True to enable.")



## 15. Results & Takeaways

- **Top-5 Comparison:** Inspect the tables in Section 11 for each query. Hybrid (RRF) should balance **precision** (BM25) and **semantic recall** (Dense).
- **Answers:** Check the **Baseline vs Hybrid** outputs in Section 12. Look for cases where hybrid surfaces a slightly off-keyword but semantically relevant chunk that improves the final answer.
- **Metrics:** If you provided `relevance.csv`, compare **Precision@5**; hybrid typically outperforms single-method approaches.

### What to tweak next
- Try **different embedding models** via LiteLLM by changing `EMBED_MODEL_A/B`.
- Adjust chunk size/overlap and stopwords/tokenization for BM25.
- Experiment with `RRF_K` and `TOP_K_RETRIEVE`.
- Add a **reranker** (cross-encoder) for Section 13+ in Day 2 Ex3.



## Appendix: Inspect Full Comparison for a Specific Query


In [None]:

def compare_query(q: str):
    display(all_comparisons[q]["bm25"])
    display(all_comparisons[q]["dense"])
    display(all_comparisons[q]["hybrid"])

# Example:
# compare_query(queries[1])
