### High Risk Project - Retrieval-Augmented QA system over real clinical guidelines
#### Build a retrieval-augmented QA system over real clinical guidelines where we’ll:
1.	Ingest PDF guidelines (e.g. sepsis management)
2.	Chunk & embed with OpenAI’s embeddings
3.	Index with FAISS
4.	Build a query pipeline that retrieves the top chunks and asks GPT for a final answer

#### Step 1: Notebook Setup & PDF Ingestion

# Clinical-QA: Retrieval-Augmented System

**Goal:** Ask free-text clinical questions and get grounded answers from PDF guidelines.

**Outline:**
1. Setup & imports  
2. Load & extract PDF text  
3. Chunking  
4. Embedding & FAISS indexing  
5. Query + answer  
6. Demo & evaluation  

In [1]:
# ——— 1. (Re)install needed packages ———
!pip install PyPDF2 faiss-cpu openai tiktoken

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-macosx_14_0_arm64.whl.metadata (4.8 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.7 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
Downloading faiss_cpu-1.11.0-cp311-cp311-macosx_14_0_arm64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.9.0-cp311-cp311-macosx_11_0_arm64.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2, faiss-cpu, tiktoken
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [tiktoken]
[1A[2KSuccessfully installed PyPDF2-3.0.1 faiss-cpu-1.11.0 tiktoken-0.9.0


In [2]:

# ——— 2. Imports ———
import os
import PyPDF2
import faiss
import openai
from tiktoken import get_encoding

In [4]:
assert "OPENAI_API_KEY" in os.environ

In [5]:
# ——— 3. Load a PDF from disk ———
pdf_path = "/Users/rajeshkohli/Documents/UT_Austin/AI_In_Healthcare/High_Risk_Project/Sepsis-Guidelines-UF-Shands.pdf"
reader = PyPDF2.PdfReader(pdf_path)
raw_text = "\n\n".join(page.extract_text() or "" for page in reader.pages)

print("Extracted", len(raw_text), "characters of text.")

Extracted 17137 characters of text.


#### Step 2: Chunk the PDF Text

## 2. Chunking the Text

We’ll:

1. Count tokens using the same tokenizer as our embedding model  
2. Split the raw text into fixed-size chunks (e.g. 500 tokens) with overlap (e.g. 50 tokens)  
3. Inspect a few chunks to make sure the splits look clean  

In [6]:
from tiktoken import get_encoding

# 1. Select the encoding for text-embedding-ada-002
enc = get_encoding("cl100k_base")

def count_tokens(text: str) -> int:
    return len(enc.encode(text))

def chunk_text(text: str, max_tokens=500, overlap=50):
    tokens = enc.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + max_tokens, len(tokens))
        chunk = enc.decode(tokens[start:end])
        chunks.append(chunk)
        # step forward by max_tokens - overlap
        start += max_tokens - overlap
    return chunks

# 2. Run chunking
chunks = chunk_text(raw_text, max_tokens=500, overlap=50)

# 3. Inspect results
print(f"Total chunks: {len(chunks)}")
print("\n--- Sample chunk [0] ---\n")
print(chunks[0][:500], "…")

Total chunks: 11

--- Sample chunk [0] ---

 
 1 
  POLICY AND PROCEDURE ADULT SEVERE SEPSIS AND SEPTIC SHOCK MANAGEMENT  SUBJECT:  Guidelines for the management of Severe Sepsis and Septic Shock at Shands UF   PURPOSE:   Sepsis is recognized as a challenging disease to overcome.  The progression of sepsis to severe sepsis and septic shock is devastating yielding a mortality of 30-80%.1 In an effort to reduce the morbidity and mortality from sepsis, Shands University of Florida Hospital has committed to identify and implement “bundles.”   …


#### Step 3: Embedding & FAISS Indexing

## 3. Embedding & Indexing

We’ll:

1. Use OpenAI’s `text-embedding-ada-002` to embed each chunk  
2. Build a FAISS index over these vectors for fast similarity search  
3. Persist both the FAISS index and our chunk list for querying  

In [7]:
import os
import openai
import faiss
import pickle
import numpy as np
from tqdm.auto import tqdm

# 0. Explicitly load the key into the client
openai.api_key = os.environ["OPENAI_API_KEY"]

# 1. Model name
model_name = "text-embedding-ada-002"

# 2. Embed each chunk
emb_list = []
for chunk in tqdm(chunks, desc="Embedding chunks"):
    resp = openai.Embedding.create(
        input=chunk,
        model=model_name
    )
    emb_list.append(resp["data"][0]["embedding"])

# 3. Convert to numpy array
emb_matrix = np.array(emb_list, dtype="float32")
print("Embeddings shape:", emb_matrix.shape)

# 4. Build FAISS index
dim = emb_matrix.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(emb_matrix)
print("FAISS index contains", index.ntotal, "vectors")

# 5. Save to disk
faiss.write_index(index, "sepsis_guideline.index")
with open("sepsis_chunks.pkl", "wb") as f:
    pickle.dump(chunks, f)
print("Index and chunks saved.")

Embedding chunks:   0%|          | 0/11 [00:00<?, ?it/s]

Embeddings shape: (11, 1536)
FAISS index contains 11 vectors
Index and chunks saved.


#### Step 4: Build & Test the Query Function

## 4. Question Answering

We’ll:
1. Load our FAISS index and chunk list  
2. Embed incoming questions  
3. Retrieve the top-k most similar chunks  
4. Feed those chunks plus the question to OpenAI’s chat model  
5. Return the model’s answer  

In [8]:
import os, pickle, faiss, openai
import numpy as np
from tiktoken import get_encoding

# 0. Reload your API key
openai.api_key = os.environ["OPENAI_API_KEY"]

# 1. Load the saved FAISS index and chunks
index = faiss.read_index("sepsis_guideline.index")
with open("sepsis_chunks.pkl", "rb") as f:
    chunks = pickle.load(f)

# 2. Setup tokenizer for embeddings
enc = get_encoding("cl100k_base")
def embed_text(text):
    resp = openai.Embedding.create(
        input=text, model="text-embedding-ada-002"
    )
    return np.array(resp["data"][0]["embedding"], dtype="float32")

# 3. Define QA function
def answer_question(question, k=3):
    # 3a. Embed question
    q_emb = embed_text(question)
    # 3b. Retrieve top-k
    D, I = index.search(q_emb.reshape(1, -1), k)
    selected = [chunks[i] for i in I[0]]
    # 3c. Build the context prompt
    context = "\n\n---\n\n".join(selected)
    system_prompt = (
        "You are a knowledgeable clinical assistant. "
        "Use the following extracted guideline snippets to answer the question truthfully."
    )
    user_prompt = f"CONTEXT:\n\n{context}\n\nQUESTION: {question}"
    # 3d. Call chat completion
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": user_prompt}
        ],
        temperature=0.0,
        max_tokens=300
    )
    return resp.choices[0].message.content.strip()

# 4. Test it
print(answer_question("What is the mortality range for severe sepsis and septic shock?"))

The mortality range for severe sepsis and septic shock is 30-80%.


#### Step 5: Interactive Q&A Demo

Now let’s turn this into a little interactive demo so you (or anyone) can type in a question right in the notebook and get an answer on the fly.

### 5. Interactive Demo

Below is a simple loop: enter any clinical question about sepsis management, and our system will retrieve relevant guideline snippets and answer it.
Type **“exit”** to stop.

In [9]:
# 🔧 Ask your question here:
question = "When should vasopressors be started in septic shock?"

# 🔍 Get an answer:
answer = answer_question(question, k=3)
print(f"Question: {question}\n\nAnswer:\n{answer}")

Question: When should vasopressors be started in septic shock?

Answer:
Vasopressors should be initiated in septic shock when the mean arterial pressure (MAP) is less than 65 mmHg or the systolic blood pressure (SBP) is less than 90 mmHg despite a fluid challenge of 20 ml/kg or 2 liters of crystalloid, or if the central venous pressure (CVP) is greater than 8 mmHg.


## 6. Save & Publish Code

We’ll push our notebook, index, and chunks to GitHub so others can reproduce our work.

- Create a new repo on GitHub (e.g. `sepsis-clinical-qa`).  
- Replace `<your-repo-url>` below with your repo’s SSH or HTTPS URL.

In [10]:
# Initialize git (if you haven’t already)
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint:[m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint:[m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint:[m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /Users/rajeshkohli/Documents/UT_Austin/AI_In_Healthcare/High_Risk_Project/.git/


In [11]:
# Add files
!git add High_Risk_Project_Rajesh_Kohli.ipynb sepsis_guideline.index sepsis_chunks.pkl


In [13]:
# Commit
!git commit -m "Initial retrieval-augmented clinical QA pipeline"

[master (root-commit) c8fc385] Initial retrieval-augmented clinical QA pipeline
 Committer: Rajesh Kohli <rajeshkohli@Mac-90.lan>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly. Run the
following command and follow the instructions in your editor to edit
your configuration file:

    git config --global --edit

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 3 files changed, 479 insertions(+)
 create mode 100644 High_Risk_Project_Rajesh_Kohli.ipynb
 create mode 100644 sepsis_chunks.pkl
 create mode 100644 sepsis_guideline.index


In [14]:
# Add your GitHub repo URL and push
!git remote add origin https://github.com/rajesh-kohli/sepsis-clinical-qa

In [15]:
!git branch -M main

In [16]:
!git push -u origin main

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 14 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 67.91 KiB | 22.63 MiB/s, done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
remote: [1;31merror[m: GH013: Repository rule violations found for refs/heads/main.[K
remote: 
remote: - GITHUB PUSH PROTECTION[K
remote:   —————————————————————————————————————————[K
remote:     Resolve the following violations before pushing again[K
remote: 
remote:     - Push cannot contain secrets[K
remote: 
remote:     [K
remote:      (?) Learn how to resolve a blocked push[K
remote:      https://docs.github.com/code-security/secret-scanning/working-with-secret-scanning-and-push-protection/working-with-push-protection-from-the-command-line#resolving-a-blocked-push[K
remote:     [K
remote:     [K
remote:       —— OpenAI API Key ————————————————————————————————————[K
remote:        locations:[K
remote:    