# CS 5588 — Week 3 Hands-On  
## Building a Multimodal RAG Product Prototype (PDF + Images)

**Goal (today):** Build a *working product prototype* that answers user questions from real documents (PDFs + images) with **evidence citations**.

**What you’ll leave with:**
- A project-ready multimodal RAG pipeline (ingestion → indexing → retrieval → grounded answer)
- A short **Product Brief** inside the notebook (persona, problem, value, success metrics)
- A small **demo loop** you can show to stakeholders (prompt → answer + citations)

> This hands-on is application-first: prioritize a realistic use case and a clean demo.


## 0) Product Brief (Fill in — REQUIRED for Week 3)
- **Team / Name:**  
- **Project name (working title):**  

### 0.1 Target user persona
- Who will use this? (role, context, pain point)

- Role: Clinical Research Coordinator or Junior Doctor.
- Environment: Busy hospital setting or research review board.
- Pain Point: Medical guidelines and new research papers are hundreds of pages long. Critical data (dosage limits, exclusion criteria) is often buried in complex tables or statistical charts, making it hard to quickly verify protocol compliance.


### 0.2 Problem statement (1–2 sentences)
- What decision/task does your product support?

- Clinicians struggle to rapidly cross-reference textual treatment guidelines with visual statistical evidence (tables/charts) from the latest research, leading to potential delays in evidence-based decision making.


### 0.3 Value proposition (1 sentence)
- What improves (speed, accuracy, trust, cost, risk)?

- This module instantly extracts specific protocols and visualizes the underlying data charts to answer clinical questions, providing verifiable citations to ensure patient safety and research integrity.

### 0.4 Success metrics (pick 2–3)
- e.g., time-to-answer, citation coverage, % “not enough evidence” when missing, user satisfaction (1–5), precision@5

- Retrieval Speed: < 10 seconds to find a specific dosage or exclusion criteria.
- Evidence Integrity: 100% of answers must link to the specific guideline page or figure number.
- Refusal Rate: System must refuse to answer if the specific drug or condition is not in the uploaded documents (avoiding hallucinations).


## 1) Setup (Colab)
Run installs, then imports.


In [1]:
# === Setup & Imports (Colab-friendly) ===
import os, re, glob, json, math
from dataclasses import dataclass
from typing import List, Dict, Any, Tuple, Optional

import numpy as np
import pandas as pd

# ---- Core deps ----
# PyMuPDF for PDF text extraction
!pip -q install pymupdf pillow pandas numpy scikit-learn

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import normalize

import fitz  # PyMuPDF
from PIL import Image

# ---- OCR deps ----
!pip -q install pytesseract
!sudo apt-get -qq update
!sudo apt-get -qq install -y tesseract-ocr

import pytesseract

# ---- Retrieval deps ----
!pip -q install faiss-cpu rank-bm25
import faiss
from rank_bm25 import BM25Okapi

# ---- Dense + rerank (optional) ----
# Some environments may have version conflicts. We try to install, but fall back gracefully if needed.
USE_ST = True
USE_RERANK = True

try:
    from sentence_transformers import SentenceTransformer, CrossEncoder
except Exception as e:
    USE_ST = False
    USE_RERANK = False
    print("⚠️ sentence-transformers not available in this runtime. Falling back to TF-IDF for 'dense' retrieval.")
    print("   Error:", e)

# Optional captioning (bonus)
USE_CAPTIONING = False
try:
    from transformers import pipeline
    USE_CAPTIONING = True
except Exception:
    USE_CAPTIONING = False


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/24.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.7/24.1 MB[0m [31m22.8 MB/s[0m eta [36m0:00:02[0m[2K   [91m━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.1/24.1 MB[0m [31m88.8 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━[0m [32m13.6/24.1 MB[0m [31m191.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━[0m [32m21.0/24.1 MB[0m [31m196.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m24.1/24.1 MB[0m [31m201.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m24.1/24.1 MB[0m [31m201.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [3

### 1.1 System dependencies (Colab/Linux)
If OCR fails, run this cell.


In [2]:
# (Handled in Setup & Imports above)
print('System dependencies installed in Section 1.')

System dependencies installed in Section 1.


### 1.2 Imports


> **Note:** Dependencies are installed and imported above. If you restart the runtime, re-run Sections 1–2.

## 2) Choose a project dataset (realistic, stakeholder-facing)
Create this structure (you can start small today):

```
project_data_mm/
  docs/
    doc1.pdf
    doc2.pdf
  figures/
    fig1.png
    fig2.jpg
  notes.txt (optional)
```

**Recommended today:** 2 PDFs + 3–5 images that matter to your use case.


In [3]:
DATA_DIR = "project_data_mm"
DOC_DIR = os.path.join(DATA_DIR, "docs")
FIG_DIR = os.path.join(DATA_DIR, "figures")

for d in [DATA_DIR, DOC_DIR, FIG_DIR]:
    if not os.path.isdir(d):
        print(f"Missing folder: {d} (create it and add files)")

pdfs = sorted(glob.glob(os.path.join(DOC_DIR, "*.pdf")))
imgs = sorted(glob.glob(os.path.join(FIG_DIR, "*.*")))

print("PDFs:", len(pdfs), pdfs[:5])
print("Images:", len(imgs), imgs[:5])


PDFs: 4 ['project_data_mm/docs/CCR-16-258.pdf', 'project_data_mm/docs/diaclincd23as01.pdf', 'project_data_mm/docs/fendo-14-1236404.pdf', 'project_data_mm/docs/virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf']
Images: 6 ['project_data_mm/figures/13098_2024_1553_Fig3_HTML.jpg', 'project_data_mm/figures/13098_2024_1553_Fig4_HTML.jpg', 'project_data_mm/figures/13098_2024_1553_Fig5_HTML.jpg', 'project_data_mm/figures/13098_2024_1553_Fig6_HTML.jpg', 'project_data_mm/figures/13098_2024_1553_Fig7_HTML.jpg']


## 3) Define 3 stakeholder questions (application-oriented)
- **Q1/Q2:** require both text + figure/table evidence  
- **Q3:** ambiguous/missing evidence → system should say **Not enough evidence in the retrieved context.**

Also add:
- Must-cite evidence (page or figure)
- Success criteria (what a good answer must include)


In [4]:
QUERIES = [
    {"id":"Q1","question":"TODO","must_cite":["TODO"],"success_criteria":["TODO"],"keywords":["TODO"]},
    {"id":"Q2","question":"TODO","must_cite":["TODO"],"success_criteria":["TODO"],"keywords":["TODO"]},
    {"id":"Q3","question":"TODO","must_cite":[],"success_criteria":["Not enough evidence in the retrieved context."],"keywords":["TODO"]},
]
for q in QUERIES:
    print(q["id"], q["question"])


Q1 TODO
Q2 TODO
Q3 TODO


## 4) Ingest PDFs (per-page text)


In [5]:
@dataclass
class TextChunk:
    chunk_id: str
    doc_id: str
    page_num: int
    text: str

def extract_pdf_pages(pdf_path: str) -> List[TextChunk]:
    doc_id = os.path.basename(pdf_path)
    doc = fitz.open(pdf_path)
    out = []
    for i in range(len(doc)):
        page = doc.load_page(i)
        text = page.get_text("text") or ""
        text = re.sub(r"\s+", " ", text).strip()
        if text:
            out.append(TextChunk(f"{doc_id}::p{i+1}", doc_id, i+1, text))
    return out

page_chunks = []
for p in pdfs:
    page_chunks.extend(extract_pdf_pages(p))

print("Total PDF page chunks:", len(page_chunks))
if page_chunks:
    print("Sample:", page_chunks[0].chunk_id, page_chunks[0].text[:250])


Total PDF page chunks: 158
Sample: CCR-16-258.pdf::p1 Send Orders for Reprints to reprints@benthamscience.net 258 Current Cardiology Reviews, 2020, 16, 258-265 REVIEW ARTICLE 1573-403X/20 $65.00+.00 © 2020 Bentham Science Publishers SGLT2 Inhibitors and Cardiovascular Outcomes: Do They Differ or There i


## 5) Ingest images (OCR first, optional captioning)


In [6]:
@dataclass
class EvidenceItem:
    evid_id: str
    source: str
    image_path: str
    ocr_text: str
    caption_text: str
    evidence_text: str

def run_ocr(image_path: str) -> str:
    img = Image.open(image_path).convert("RGB")
    text = pytesseract.image_to_string(img)
    return re.sub(r"\s+", " ", text).strip()

evidence_items = []
for ip in imgs:
    base = os.path.basename(ip)
    evid_id = os.path.splitext(base)[0]
    ocr = run_ocr(ip)
    evidence_items.append(EvidenceItem(evid_id, base, ip, ocr, "", ocr))

print("Evidence items:", len(evidence_items))
if evidence_items:
    print("Sample OCR:", evidence_items[0].source, evidence_items[0].ocr_text[:200])


Evidence items: 6
Sample OCR: 13098_2024_1553_Fig3_HTML.jpg 1.00 075 > a 3 3 2 a 0.50: g < Non-SGLT2i Group: 2 a * SGLT2 Group 0.25 p=0.011 0.00: 0 200 400 600 800 Time (Days) Number at risk Now-SGLT2 Group 154 33 14 5 2 scuricrop | 134 32 "1 2 0 0 200 400 600


### 5.1 Optional captioning (bonus)


In [7]:
USE_CAPTIONING = False
if USE_CAPTIONING:
    from transformers import pipeline
    captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
    for ei in evidence_items:
        cap = captioner(Image.open(ei.image_path).convert("RGB"), max_new_tokens=40)[0]["generated_text"]
        cap = re.sub(r"\s+", " ", cap).strip()
        ei.caption_text = cap
        ei.evidence_text = (ei.ocr_text + "\n" + cap).strip()
    print("Captioning complete.")
else:
    print("Captioning skipped.")


Captioning skipped.


## 6) Chunking (page-based vs fixed-size)


In [8]:
@dataclass
class SubChunk:
    chunk_id: str
    doc_id: str
    page_num: int
    text: str

def fixed_size_chunk(text: str, words_per_chunk: int = 250, overlap: int = 40) -> List[str]:
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(len(words), start + words_per_chunk)
        chunks.append(" ".join(words[start:end]))
        if end == len(words):
            break
        start = max(0, end - overlap)
    return chunks

sub_chunks = []
for pc in page_chunks:
    for j, t in enumerate(fixed_size_chunk(pc.text, 250, 40)):
        sub_chunks.append(SubChunk(f"{pc.doc_id}::p{pc.page_num}::c{j+1}", pc.doc_id, pc.page_num, t))

print("Page chunks:", len(page_chunks))
print("Fixed-size chunks:", len(sub_chunks))


Page chunks: 158
Fixed-size chunks: 647


## 7) Indexing & retrieval (dense + sparse + rerank)


In [9]:
def tokenize(text: str) -> List[str]:
    return [t.lower() for t in re.findall(r"[a-zA-Z0-9]+", text)]

# --- Embeddings (dense retrieval) ---
# If SentenceTransformers is available, we use it. Otherwise, we fall back to TF-IDF vectors.
if USE_ST:
    embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

    def embed_texts(texts: List[str], batch_size: int = 32) -> np.ndarray:
        return embedder.encode(
            texts, batch_size=batch_size, show_progress_bar=True,
            convert_to_numpy=True, normalize_embeddings=True
        )
else:
    # TF-IDF fallback (acts as a "dense-ish" baseline)
    tfidf_vec = TfidfVectorizer(max_features=50000, ngram_range=(1,2))
    _tfidf_fitted = False

    def embed_texts(texts: List[str], batch_size: int = 32) -> np.ndarray:
        global _tfidf_fitted
        X = tfidf_vec.fit_transform(texts) if not _tfidf_fitted else tfidf_vec.transform(texts)
        _tfidf_fitted = True
        X = normalize(X)
        return X.toarray().astype(np.float32)

def build_faiss_ip(vectors: np.ndarray):
    dim = vectors.shape[1]
    index = faiss.IndexFlatIP(dim)
    index.add(vectors.astype(np.float32))
    return index

TEXT_CORPUS_A = page_chunks
TEXT_CORPUS_B = sub_chunks

texts_A = [c.text for c in TEXT_CORPUS_A]
vecs_A = embed_texts(texts_A) if texts_A else np.zeros((0,384), dtype=np.float32)
faiss_A = build_faiss_ip(vecs_A) if len(texts_A)>0 else None
bm25_A = BM25Okapi([tokenize(t) for t in texts_A]) if len(texts_A)>0 else None

texts_B = [c.text for c in TEXT_CORPUS_B]
vecs_B = embed_texts(texts_B) if texts_B else np.zeros((0,384), dtype=np.float32)
faiss_B = build_faiss_ip(vecs_B) if len(texts_B)>0 else None
bm25_B = BM25Okapi([tokenize(t) for t in texts_B]) if len(texts_B)>0 else None

evid_texts = [e.evidence_text for e in evidence_items]
evid_vecs = embed_texts(evid_texts) if evid_texts else np.zeros((0,384), dtype=np.float32)
faiss_E = build_faiss_ip(evid_vecs) if len(evid_texts)>0 else None

print("Indexes ready.")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/5 [00:00<?, ?it/s]

Batches:   0%|          | 0/21 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Indexes ready.


In [10]:
def dense_search(query: str, index, corpus, top_k: int = 5):
    if index is None or len(corpus)==0:
        return []
    qv = embed_texts([query])
    scores, idxs = index.search(qv.astype(np.float32), top_k)
    out = []
    for s, i in zip(scores[0], idxs[0]):
        if int(i) >= 0:
            out.append((float(s), corpus[int(i)]))
    return out

def sparse_search(query: str, bm25, corpus, top_k: int = 5):
    if bm25 is None or len(corpus)==0:
        return []
    scores = bm25.get_scores(tokenize(query))
    top = np.argsort(scores)[::-1][:top_k]
    return [(float(scores[i]), corpus[int(i)]) for i in top]

def hybrid_fuse(dense_res, sparse_res, alpha: float = 0.5, top_k: int = 5):
    def k(item): return getattr(item, "chunk_id", getattr(item, "evid_id", str(item)))
    dense_rank = {k(it): r for r, (_, it) in enumerate(dense_res, start=1)}
    sparse_rank = {k(it): r for r, (_, it) in enumerate(sparse_res, start=1)}
    keys = set(dense_rank) | set(sparse_rank)
    fused = []
    for key in keys:
        dr = dense_rank.get(key, len(dense_res)+1)
        sr = sparse_rank.get(key, len(sparse_res)+1)
        score = alpha*(1.0/dr) + (1-alpha)*(1.0/sr)
        obj = next((it for _, it in dense_res if k(it)==key), None) or next((it for _, it in sparse_res if k(it)==key), None)
        fused.append((score, obj))
    fused.sort(key=lambda x: x[0], reverse=True)
    return fused[:top_k]

# --- Reranker (optional) ---
reranker = None
if USE_ST and USE_RERANK:
    try:
        reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
    except Exception as e:
        reranker = None
        USE_RERANK = False
        print("⚠️ Reranker unavailable, continuing without reranking. Error:", e)


def rerank(query: str, items, get_text, top_k=5):
    if reranker is None:
        return list(items)[:top_k]

    if not items:
        return []
    scores = reranker.predict([(query, get_text(it)) for it in items])
    ranked = sorted(zip(scores, items), key=lambda x: x[0], reverse=True)
    return [it for _, it in ranked[:top_k]]

def retrieve_text(query: str, chunking: str = "page", method: str = "hybrid", top_k: int = 5, alpha: float = 0.5, use_rerank: bool = True):
    if chunking == "page":
        corpus, index, bm25 = TEXT_CORPUS_A, faiss_A, bm25_A
    else:
        corpus, index, bm25 = TEXT_CORPUS_B, faiss_B, bm25_B

    if method == "dense":
        res = dense_search(query, index, corpus, top_k=max(10, top_k))
        items = [it for _, it in res]
    elif method == "sparse":
        res = sparse_search(query, bm25, corpus, top_k=max(10, top_k))
        items = [it for _, it in res]
    else:
        d = dense_search(query, index, corpus, top_k=max(10, top_k))
        s = sparse_search(query, bm25, corpus, top_k=max(10, top_k))
        res = hybrid_fuse(d, s, alpha=alpha, top_k=max(10, top_k))
        items = [it for _, it in res]

    if use_rerank:
        return rerank(query, items, lambda it: it.text, top_k=top_k)
    return items[:top_k]

def retrieve_evidence(query: str, top_k: int = 3):
    res = dense_search(query, faiss_E, evidence_items, top_k=top_k)
    return [it for _, it in res]


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/105 [00:00<?, ?it/s]

BertForSequenceClassification LOAD REPORT from: cross-encoder/ms-marco-MiniLM-L-6-v2
Key                          | Status     |  | 
-----------------------------+------------+--+-
bert.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

## 8) Evidence pack + citations (product output)


In [11]:
def cite_text(it): return f"[{it.doc_id} p{it.page_num}]"
def cite_fig(ei): return f"[{os.path.splitext(ei.source)[0]}]"

def build_evidence_pack(question: str, chunking="page", method="hybrid", top_k_text=4, top_k_fig=2):
    txt = retrieve_text(question, chunking=chunking, method=method, top_k=top_k_text, use_rerank=True)
    figs = retrieve_evidence(question, top_k=top_k_fig)
    pack = []
    for it in txt:
        pack.append({"type":"text", "cite": cite_text(it), "content": it.text[:800]})
    for ei in figs:
        pack.append({"type":"figure", "cite": cite_fig(ei), "content": (ei.evidence_text or "")[:800], "path": ei.image_path})
    return pack

ep = build_evidence_pack(QUERIES[0]["question"])
for e in ep:
    print(e["cite"], e["type"], e["content"][:120])


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p19] text CLINICAL STATEMENTS AND GUIDELINES Circulation. 2023;148:e9–e119. DOI: 10.1161/CIR.0000000000001168 August 29, 2023 Vir
[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p111] text CLINICAL STATEMENTS AND GUIDELINES Circulation. 2023;148:e9–e119. DOI: 10.1161/CIR.0000000000001168 August 29, 2023 Vir
[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p108] text CLINICAL STATEMENTS AND GUIDELINES August 29, 2023 Circulation. 2023;148:e9–e119. DOI: 10.1161/CIR.0000000000001168 Vir
[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p21] text CLINICAL STATEMENTS AND GUIDELINES Circulation. 2023;148:e9–e119. DOI: 10.1161/CIR.0000000000001168 August 29, 2023 Vir
[1

## 9) Grounded response (LLM/VLM) — connect Gemini/HF if available


In [12]:
def rag_prompt(question: str, evidence_pack: list) -> str:
    evidence_lines = [f'{e["cite"]} {e["content"]}' for e in evidence_pack]
    evidence_block = "\n\n".join(evidence_lines)
    return f"""You are a grounded assistant. Use ONLY the evidence below.
Every key claim must cite evidence like [doc p#] or [fig1].
If the evidence is insufficient, respond exactly:
Not enough evidence in the retrieved context.

Evidence:
{evidence_block}

Question:
{question}

Answer (with citations):
"""

def generate_answer(prompt: str, image_paths: Optional[list]=None) -> str:
    # TODO: connect Gemini or a HF generator
    return "TODO: connect Gemini/HF LLM. For demo: summarize evidence with citations."


## 10) Demo loop (stakeholder-facing)


In [13]:
def demo_one(question: str, chunking="page", method="hybrid"):
    ep = build_evidence_pack(question, chunking=chunking, method=method)
    prompt = rag_prompt(question, ep)
    ans = generate_answer(prompt, image_paths=[e.get("path") for e in ep if e.get("type")=="figure"])
    return ep, ans

for q in QUERIES:
    ep, ans = demo_one(q["question"])
    print("\n=== ", q["id"], " ===")
    print("Q:", q["question"])
    print("Top evidence citations:", [e["cite"] for e in ep])
    print("Answer:", ans[:400])


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


===  Q1  ===
Q: TODO
Top evidence citations: ['[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p19]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p111]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p108]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p21]', '[13098_2024_1553_Fig7_HTML]', '[13098_2024_1553_Fig5_HTML]']
Answer: TODO: connect Gemini/HF LLM. For demo: summarize evidence with citations.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


===  Q2  ===
Q: TODO
Top evidence citations: ['[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p19]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p111]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p108]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p21]', '[13098_2024_1553_Fig7_HTML]', '[13098_2024_1553_Fig5_HTML]']
Answer: TODO: connect Gemini/HF LLM. For demo: summarize evidence with citations.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


===  Q3  ===
Q: TODO
Top evidence citations: ['[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p19]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p111]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p108]', '[virani-et-al-2023-2023-aha-acc-accp-aspc-nla-pcna-guideline-for-the-management-of-patients-with-chronic-coronary.pdf p21]', '[13098_2024_1553_Fig7_HTML]', '[13098_2024_1553_Fig5_HTML]']
Answer: TODO: connect Gemini/HF LLM. For demo: summarize evidence with citations.


## 11) Week 3 acceptance tests (CS 5588)
Fill in after running your demo:
- Does the evidence pack include the must-cite items for Q1/Q2?
- Does Q3 properly refuse with “Not enough evidence…”?
- Is the output understandable to your target user?


In [14]:
ACCEPTANCE_CHECKLIST = [
    {"qid":"Q1", "must_cite_expected":"TODO", "pass_fail":"TODO", "notes":"TODO"},
    {"qid":"Q2", "must_cite_expected":"TODO", "pass_fail":"TODO", "notes":"TODO"},
    {"qid":"Q3", "must_cite_expected":"(none) — should refuse", "pass_fail":"TODO", "notes":"TODO"},
]
ACCEPTANCE_CHECKLIST


[{'qid': 'Q1',
  'must_cite_expected': 'TODO',
  'pass_fail': 'TODO',
  'notes': 'TODO'},
 {'qid': 'Q2',
  'must_cite_expected': 'TODO',
  'pass_fail': 'TODO',
  'notes': 'TODO'},
 {'qid': 'Q3',
  'must_cite_expected': '(none) — should refuse',
  'pass_fail': 'TODO',
  'notes': 'TODO'}]

## 11.5 Team work items (project enhancement)

Use this hands-on to **advance your semester project**. Each team member should “own” at least one deliverable below.

**Product Lead (Applicability)**
- Update your project **persona + workflow** so the multimodal RAG module is a *core feature*, not an add-on.
- Write 3 stakeholder tasks that map to your product’s real decision points (2 require text+figure evidence, 1 must refuse).

**Systems Lead (Integration)**
- Replace the toy dataset with your **project-domain PDFs + figures**.
- Add **metadata fields** that matter to your domain (e.g., policy date, version, department, study cohort, device model).
- Implement a clean **`retrieve()` API** your final demo can reuse.

**Evaluation & Risk Lead (Shipping readiness)**
- Build a tiny evaluation table: *Task × Method × P@5 × R@10 × Faithfulness*.
- Add one real failure scenario + mitigation UX (warnings, “show evidence” first, or human-in-the-loop flag).
- Draft the “If we shipped this” plan: data refresh, monitoring, and governance rule.

**Bonus (Optional)**
- Add a minimal UI (Gradio/Streamlit) that shows: question → evidence pack → answer with citations.


## 12) Week 3 deliverables (CS 5588)
- Product Brief completed (persona, problem, value, success metrics)
- Demo run for Q1–Q3 with citations (screenshots encouraged)
- 1 failure case + mitigation plan (risk + fix)
- Repo link submitted in the survey
