<a href="https://colab.research.google.com/github/phfrebelo/aiml-portfolio/blob/main/NLP_RAG_Project_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.28 --force-reinstall --no-cache-dir -q

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [None]:
# For installing the libraries & downloading models from HF Hub
!pip install -q \
  huggingface_hub==0.35.3 pandas==2.2.2 tiktoken==0.12.0 pymupdf==1.26.5 \
  langchain==0.3.27 langchain-community==0.3.31 chromadb==1.1.1 \
  sentence-transformers==5.1.1 numpy==2.3.3

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [None]:
#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [None]:
model_path = hf_hub_download(
    repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    filename="mistral-7b-instruct-v0.2.Q5_K_M.gguf",
    resume_download=True
)

In [None]:
llm = Llama(model_path=model_path, n_ctx=8192, n_gpu_layers=-1, n_batch=512, verbose=True)

#### Response

In [None]:
def response(query, max_tokens=256, temperature=0.2, top_p=0.95, top_k=50):
    prompt = f"[INST] {query} [/INST]"
    out = llm(prompt=prompt, max_tokens=max_tokens, temperature=temperature, top_p=top_p, top_k=top_k)
    return out["choices"][0]["text"].strip()

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response(user_input2)

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response(user_input3)

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(user_input4)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(user_input5)

### Observations

- The LLM provides a coherent and medically plausible response based on its pretrained knowledge.
- However, answers are generated without referencing any external medical source, which may lead to hallucinations or outdated recommendations.
- For protocol-based questions (e.g., sepsis management), responses are high-level and may lack institution-specific steps.
- This highlights the limitation of using an LLM alone for clinical decision support and motivates the need for Retrieval-Augmented Generation (RAG).

## Question Answering using LLM with Prompt Engineering

In [None]:
system_prompt = """
You are a medical knowledge assistant helping healthcare professionals quickly summarize information.
You must follow these rules:

1) Do NOT invent facts. If you are uncertain or the question requires specifics you do not have, say:
   "Insufficient information to answer with certainty."
2) Do NOT provide individualized medical advice. Provide general clinical information only.
3) Keep the answer structured and concise, optimized for clinical scanning.
4) If medications are discussed, avoid exact dosing unless explicitly asked AND you are confident.
5) Use clear headings and bullet points.
"""

answer_format = """
Return your answer in this exact structure:

1) Summary (2-3 bullets)
2) Key clinical features / symptoms
3) Diagnostic approach
4) Management / treatment
5) Red flags / when to escalate
6) Notes / limitations (what you are assuming or what is missing)
"""

In [None]:
param_grid = [
    {"temperature":0.0, "max_tokens":300, "top_p":0.95, "top_k":50},
    {"temperature":0.1, "max_tokens":350, "top_p":0.95, "top_k":50},
    {"temperature":0.2, "max_tokens":400, "top_p":0.90, "top_k":40},
    {"temperature":0.3, "max_tokens":400, "top_p":0.95, "top_k":20},
    {"temperature":0.5, "max_tokens":450, "top_p":0.98, "top_k":50},
]

user_question = "What is the protocol for managing sepsis in a critical care unit?"

for i, p in enumerate(param_grid, 1):
    prompt = system_prompt + "\n\n" + answer_format + f"\n\nUser question:\n{user_question}\n\nAssistant answer:\n"
    ans = response(prompt, **p)
    print(f"\n--- Combo {i}: {p} ---\n{ans}\n")

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_question = "What is the protocol for managing sepsis in a critical care unit?"
user_input = (system_prompt + "\n\n" + answer_format + "\n\nUser question:\n" + user_question + "\n\nAssistant answer:\n")
response(user_input)

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_question2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
user_input2 = (system_prompt + "\n\n" + answer_format + "\n\nUser question:\n" + user_question2 + "\n\nAssistant answer:\n")
response(user_input2)

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_question3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
user_input3 = (system_prompt + "\n\n" + answer_format + "\n\nUser question:\n" + user_question3 + "\n\nAssistant answer:\n")
response(user_input3)

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_question4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
user_input4 = (system_prompt + "\n\n" + answer_format + "\n\nUser question:\n" + user_question4 + "\n\nAssistant answer:\n")
response(user_input4)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_question5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
user_input5 = (system_prompt + "\n\n" + answer_format + "\n\nUser question:\n" + user_question5 + "\n\nAssistant answer:\n")
response(user_input5)

### Observations (Prompt Engineering)

- Adding a system prompt and structured answer format significantly improves readability and clinical scanability.
- Lower temperature values (0.0–0.2) produce more deterministic and protocol-like answers, which are preferable for healthcare use cases.
- Higher temperature values introduce more variability and narrative detail but may reduce precision.
- Explicitly instructing the model to avoid guessing reduces hallucinated medical details.
- Prompt engineering alone improves response quality but still does not guarantee factual grounding.

## Data Preparation for RAG

In [None]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

### Loading the Data

In [None]:
# Import dataset from google.colab drive
from google.colab import drive
import os

# Mount Google Drive ONCE
drive.mount('/content/drive', force_remount=True)

# Define base folder where the file is stored
base_path = "/content/drive/MyDrive/Colab Notebooks/NLP/medical_diagnosis_manual.pdf"

pdf_loader = PyMuPDFLoader(base_path)

manual = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [None]:
for i in range(5):
    print(f"\n===== PAGE {i+1} =====")
    print(manual[i].page_content[:1500])

#### Checking the number of pages

In [None]:
print("Total pages:", len(manual))

### Data Chunking

In [None]:
# We use token-based chunking to ensure compatibility with the LLM context window.
# A chunk size of 800 tokens with overlap helps preserve semantic continuity,
# and is often preferable for protocol-heavy clinical content.
# Chunk overlap ensures that critical information spanning boundaries is not lost.

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=800,
    chunk_overlap=100
)

In [None]:
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
len(document_chunks)

In [None]:
document_chunks[0].page_content

In [None]:
document_chunks[-2].page_content

In [None]:
document_chunks[-1].page_content

### Embedding

In [None]:
# We use a transformer-based embedding model to convert text chunks into dense vectors.
# These embeddings capture semantic similarity, enabling retrieval of relevant medical passages
# even when exact keywords are not present in the query.

embedding_model = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')

In [None]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [None]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

In [None]:
embedding_1,embedding_2

### Vector Database

In [None]:
out_dir = 'manual_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [None]:
# The vector database stores embeddings along with document metadata.
# This enables efficient semantic similarity search and allows retrieved answers
# to be traced back to specific sections of the medical manual.

vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    persist_directory=out_dir
)

In [None]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

In [None]:
vectorstore.embeddings

In [None]:
vectorstore.similarity_search("appendicitis sepsis",k=3)

### Retriever

In [None]:
DEFAULT_K = 5

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": DEFAULT_K}
)

In [None]:
# Sanity check
retriever.get_relevant_documents("appendicitis symptoms treatment")

In [None]:
def get_retriever(k: int):
    return vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": k}
    )

### System and User Prompt Template

In [None]:
qna_system_message = """
You are a medical knowledge assistant.

Rules:
- Answer ONLY using the reference sections provided.
- Do NOT mention "context", "retrieval", "documents", "excerpts", "passages", or "provided text".
- Do NOT describe your process.
- If the answer is not present, respond exactly: "I don't know".
- Include citations in the answer using this exact format: (Merck Manual, p. X).
- If you make multiple factual claims, include citations for each major claim.
"""

In [None]:
qna_user_message_template = """
###Context
Source text:
{context}

###Question
{question}
"""

### Response Function

In [None]:
def page_label(d):
    p = d.metadata.get("page", None)
    if isinstance(p, int):
        return str(p + 1)  # PyMuPDFLoader uses 0-based indexing
    return "NA"

In [None]:
def generate_rag_response(user_input, k=3, max_tokens=256, temperature=0.2, top_p=0.95, top_k=50):
    r = get_retriever(k)
    docs = r.get_relevant_documents(user_input)

    # Consistent, model-friendly page markers
    context_for_query = "\n\n".join(
      [f"[p. {page_label(d)}] {d.page_content}" for d in docs]
    )

    user_message = qna_user_message_template.format(context=context_for_query, question=user_input)
    prompt = qna_system_message + "\n" + user_message

    out = llm(prompt=prompt, max_tokens=max_tokens, temperature=temperature, top_p=top_p, top_k=top_k)
    return out["choices"][0]["text"].strip()

In [None]:
rag_param_grid = [
    {"k": 2, "temperature": 0.0, "max_tokens": 250},
    {"k": 3, "temperature": 0.0, "max_tokens": 300},
    {"k": 5, "temperature": 0.0, "max_tokens": 350},
    {"k": 5, "temperature": 0.2, "max_tokens": 350},
    {"k": 8, "temperature": 0.2, "max_tokens": 450},
]

q = "What is the protocol for managing sepsis in a critical care unit?"

for i, p in enumerate(rag_param_grid, 1):
    ans = generate_rag_response(q, k=p["k"], temperature=p["temperature"], max_tokens=p["max_tokens"])
    print(f"\n--- RAG Combo {i}: {p} ---\n{ans}\n")

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
print(generate_rag_response(user_input))

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
print(generate_rag_response(user_input2))

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
print(generate_rag_response(user_input3))

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
print(generate_rag_response(user_input4))

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
print(generate_rag_response(user_input5))

### Observations (RAG-based Answer)

- The response is grounded in retrieved excerpts from the Merck Manual, reducing hallucinations.
- Compared to the LLM-only answer, the RAG response is more specific and clinically reliable.
- The system correctly avoids answering when the required information is not present in the retrieved context.
- Retrieval quality strongly influences answer completeness; increasing `k` improves coverage for complex questions.

### Fine-tuning

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
print(generate_rag_response(user_input, temperature=0.5))

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
print(generate_rag_response(user_input2, temperature=0.1, max_tokens=350))

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
print(generate_rag_response(user_input3, top_p=0.98, top_k=20, max_tokens=256))

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
print(generate_rag_response(user_input4, temperature=0.5))

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
print(generate_rag_response(user_input5, temperature=0.1, max_tokens=200))

### Fine-tuning Insights

- Increasing the number of retrieved chunks (`k`) improves groundedness for complex protocols such as sepsis management.
- Larger chunk sizes provide better context continuity but may introduce irrelevant information.
- Lower temperature values remain preferable for healthcare use cases to ensure consistent recommendations.
- Fine-tuning retrieval parameters has a greater impact on answer quality than changing the LLM alone.

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [None]:
groundedness_rater_system_message = """
You are a strict Groundedness Rater for a Retrieval-Augmented Generation (RAG) system.

Goal:
Evaluate whether the ANSWER is supported by the CONTEXT excerpts provided. Groundedness means the answer’s factual claims can be traced to the context (directly stated or clearly implied). Do NOT use outside knowledge.

Instructions:
1) Read the QUESTION, CONTEXT, and ANSWER.
2) Identify the key factual claims in the ANSWER (diagnostic criteria, treatments, steps, contraindications, definitions, statistics, timelines, etc.).
3) For each key claim, check if it is:
   - SUPPORTED: explicitly stated or clearly implied by the CONTEXT.
   - UNSUPPORTED: not present in the CONTEXT.
   - CONTRADICTED: conflicts with the CONTEXT.
4) If the ANSWER includes medical dosing, exact protocols, or specific steps not present in the CONTEXT, mark those claims UNSUPPORTED.
5) If CONTEXT is insufficient, the correct behavior is for the ANSWER to say it lacks enough information.

Scoring rubric:
5 = Fully grounded: all key claims supported; no major gaps.
4 = Mostly grounded: minor unsupported details; core answer supported.
3 = Partially grounded: mixed; several key claims unsupported or vague.
2 = Not grounded: most claims unsupported; answer relies on outside knowledge.
1 = Contradicted: one or more key claims conflict with context OR answer is largely invented.
"""

In [None]:
relevance_rater_system_message = """
You are a strict Relevance Rater for a Retrieval-Augmented Generation (RAG) system.

Goal:
Evaluate whether the ANSWER directly addresses the QUESTION. Relevance measures alignment with the user's intent and task, not factual correctness or grounding.

Instructions:
1) Read the QUESTION and the ANSWER only. Ignore the CONTEXT.
2) Identify the core intent of the QUESTION (e.g., definition, protocol, symptoms, comparison, treatment steps).
3) Evaluate whether the ANSWER:
   - Directly addresses the core intent
   - Covers all major sub-parts of the question
   - Stays focused without unnecessary or tangential information
4) Penalize answers that are:
   - Vague, generic, or overly high-level
   - Missing required sub-questions
   - Off-topic or addressing a different problem
   - Overly verbose without adding relevant value

Scoring rubric:
5 = Fully relevant: directly and completely answers all parts of the question.
4 = Mostly relevant: answers the main intent but misses minor aspects.
3 = Partially relevant: addresses the question superficially or misses key parts.
2 = Not relevant: mostly off-topic or fails to answer the question.
1 = Irrelevant: does not address the question at all.
"""

In [None]:
user_message_template = """
QUESTION:
{question}

ANSWER:
{answer}

CONTEXT:
{context}

Please evaluate the ANSWER according to your role.
"""

In [None]:
# The evaluator LLM is run deterministically (temperature=0)
# to ensure reproducible and consistent scoring across runs.

import json

def safe_json_loads(s: str):
    try:
        return json.loads(s)
    except Exception:
        return {"raw_text": s}

def generate_ground_relevance_response(user_input, k=3, gen_max_tokens=256, gen_temperature=0.2,
                                      judge_max_tokens=256):
    r = get_retriever(k)
    docs = r.get_relevant_documents(user_input)

    context_for_query = "\n\n".join(
      [f"[p. {page_label(d)}] {d.page_content}" for d in docs]
    )

    # Generate answer (RAG)
    user_message = qna_user_message_template.format(context=context_for_query, question=user_input)
    gen_prompt = qna_system_message + "\n" + user_message

    gen_out = llm(prompt=gen_prompt, max_tokens=gen_max_tokens, temperature=gen_temperature, top_p=0.95, top_k=50)
    answer = gen_out["choices"][0]["text"].strip()

    # Judge
    eval_user_message = user_message_template.format(question=user_input, answer=answer, context=context_for_query)

    grounded_prompt = groundedness_rater_system_message + "\n" + eval_user_message + """
    Return JSON with keys:
    - score (1-5)
    - justification (1-3 sentences)
    - unsupported_claims (list; can be empty)
    """

    relevance_prompt = relevance_rater_system_message + "\n" + eval_user_message + """
    Return JSON with keys:
    - score (1-5)
    - justification (1-3 sentences)
    - missing_parts (list; can be empty)
    """

    ground_out = llm(prompt=grounded_prompt, max_tokens=judge_max_tokens, temperature=0.0, top_p=1.0, top_k=0)
    rel_out    = llm(prompt=relevance_prompt, max_tokens=judge_max_tokens, temperature=0.0, top_p=1.0, top_k=0)

    ground_text = ground_out["choices"][0]["text"].strip()
    rel_text    = rel_out["choices"][0]["text"].strip()

    return answer, safe_json_loads(ground_text), safe_json_loads(rel_text)

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
answer, ground, rel = generate_ground_relevance_response("What is the protocol for managing sepsis in a critical care unit?", k=5)
print("ANSWER:\n", answer)
print("\nGROUNDEDNESS:\n", json.dumps(ground, indent=2))
print("\nRELEVANCE:\n", json.dumps(rel, indent=2))

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
answer, ground, rel = generate_ground_relevance_response("What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", k=5)
print("ANSWER:\n", answer)
print("\nGROUNDEDNESS:\n", json.dumps(ground, indent=2))
print("\nRELEVANCE:\n", json.dumps(rel, indent=2))


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
answer, ground, rel = generate_ground_relevance_response("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", k=5)
print("ANSWER:\n", answer)
print("\nGROUNDEDNESS:\n", json.dumps(ground, indent=2))
print("\nRELEVANCE:\n", json.dumps(rel, indent=2))

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
answer, ground, rel = generate_ground_relevance_response("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", k=5)
print("ANSWER:\n", answer)
print("\nGROUNDEDNESS:\n", json.dumps(ground, indent=2))
print("\nRELEVANCE:\n", json.dumps(rel, indent=2))

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
answer, ground, rel = generate_ground_relevance_response("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", k=5)
print("ANSWER:\n", answer)
print("\nGROUNDEDNESS:\n", json.dumps(ground, indent=2))
print("\nRELEVANCE:\n", json.dumps(rel, indent=2))

## Actionable Insights and Business Recommendations

### Key Business Insights
- A RAG-based assistant significantly reduces the time clinicians spend searching through large medical manuals.
- Grounded responses improve trust and reliability, which is critical in healthcare settings.
- Structured outputs reduce cognitive load and support faster decision-making in emergency scenarios.
- Retrieval quality has a greater impact on answer reliability than model size alone.

### Business Recommendations
- Deploy the system as a clinical decision support tool rather than a decision-making authority.
- Integrate authoritative medical sources and enforce citation-based answers to ensure compliance and trust.
- Continuously monitor groundedness and relevance scores to identify retrieval or prompting issues.
- Customize prompts and retrieval strategies for high-impact use cases such as sepsis, trauma, and acute care.
- Invest in governance and auditing mechanisms to support regulatory and safety requirements.

Overall, a RAG-based medical knowledge assistant can enhance operational efficiency,
standardize care guidance, and support better clinical outcomes when used responsibly.

<font size=6 color='blue'>Power Ahead</font>
___