## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=ON" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q



- **Run the above installation code only once.**
- After installation, check if the following file exists:
  
  ```
  .venv/lib/python3.12/site-packages/llama_cpp/ggml-metal.metal
  ```

- **Note:** Regular `pip install llama-cpp-python` will **not** enable GPU support. You must install it as shown above for GPU acceleration.

- You may see some errors related to `numpy`. These can be ignored for now; we will address them in the next step.

**To install all required dependencies for this notebook, run the following command in your terminal:**

`pip install -r requirements_rag.txt`

or install your versions of libraries using !pip install

This will fix the numpy errors


In [None]:
### Restart Kernel after running the above code ###

In [1]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

  from .autonotebook import tqdm as notebook_tqdm


## Question Answering using LLM

#### Downloading and Loading the model

In [2]:
import sys
print(sys.version)

3.12.8 (v3.12.8:2dc476bcb91, Dec  3 2024, 14:43:19) [Clang 13.0.0 (clang-1300.0.29.30)]


In [2]:
model_path = hf_hub_download(
    repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    filename="mistral-7b-instruct-v0.2.Q6_K.gguf"
)

In [3]:
print(model_path)

/Users/jithinravi/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf


In [4]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=OFF"

In [6]:
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=10,  # CPU cores
    n_batch=512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43,  # uncomment and change this value based on GPU VRAM pool.
    n_ctx=4096,  # Context window
verbose=False,  # Reduce verbose output
)

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/jithinravi/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version unknown)
llama_model_loader: - tensor    0:                token_embd.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q6_K     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q6_K     

#### Response

In [7]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = lcpp_llm (
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )
    #print("Model Output:", model_output)
    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [8]:
query1 = "What is the protocol for managing sepsis in a critical care unit?"
response_text1 = response(query1)
print("Response:", response_text1)

Response: 

Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:

1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.
2. Source control


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [9]:
query2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response2_text = response(query2)
print("Response:", response2_text)

Response: 

Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:

1. Abdominal pain: The pain is typically located in the lower right quadrant of the abdomen and may start as a mild discomfort that gradually worsens over time. The pain may be constant or intermittent and may be aggravated by movement, deep breathing, or coughing.
2. Loss of appetite


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [10]:
query3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response3_text = response(query3)
print("Response:", response3_text)

Response: 

Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, or eyelashes.

The exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [11]:
query4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response4_text = response(query4)
print("Response:", response4_text)

Response: 

A person who has sustained a physical injury to the brain tissue may require various treatments depending on the severity and location of the injury. Here are some common treatments that may be recommended:

1. Emergency care: In case of a traumatic brain injury (TBI), it is essential to seek emergency medical attention as soon as possible. The primary goal of emergency care is to prevent further damage to the brain, stabilize vital signs, and manage any life-threatening conditions.
2. Medications: Depending on the symptoms, healthcare professionals may prescribe medications to manage various conditions associated with a


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [12]:
query5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response5_text = response(query5)
print("Response:", response5_text)

Response: 

First and foremost, if you suspect that someone has fractured their leg while hiking, it's essential to ensure their safety and prevent further injury. Here are some necessary precautions:

1. Keep the person calm and still: Encourage them to remain as still as possible to minimize pain and prevent worsening the injury.
2. Assess the situation: Check for any signs of shock, such as pale skin, rapid heartbeat, or shallow breathing. If you notice these symptoms, seek medical help immediately.
3. Immobilize the leg: Use a splint, sl


## Summary - First Approach - Plain LLM response

- **Initial Approach:**  
    The plain LLM model is used for question answering without any prompt engineering or parameter tuning.

- **Default Model Behavior:**  
    The model responds to queries using its pre-trained knowledge, without any additional guidance or customization.

- **Limitations:**  
    - While this method is straightforward and quick to implement, it may not yield the most accurate or contextually relevant answers—especially for specialized domains like medicine. Without tuning parameters (e.g., `temperature`, `max_tokens`) or providing system/user prompts, the model's outputs can be generic and may lack the specificity or structure required for clinical use. 
    - We also see the answers are largely `truncated` which is not ideal.

- **Advanced Techniques:**  
    More advanced methods, such as prompt engineering or Retrieval-Augmented Generation (RAG), can significantly improve the quality and reliability of responses in such applications.

## Question Answering using LLM with Prompt Engineering

In [13]:
# Example of prompt engineering for better LLM responses

# Define a system prompt to instruct the LLM to answer as a medical expert using evidence from the Merck Manual
qna_system_message = (
    "You are a highly knowledgeable medical assistant. "
    "Be concise, accurate. If you don't know the answer, say 'I don't know'. "
    "Not more than 4 points in your response.No more than approximately 30 words in each point. "
)

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [14]:
response_text_pe1 = response(qna_system_message + " " + query1, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe1) 

Response with prompt engineering:
1. Immediate antibiotic administration based on suspected infection source and sensitivity results.
2. Aggressive fluid resuscitation to maintain adequate tissue perfusion.
3. Close monitoring of vital signs, lactate levels, and organ function.
4. Consideration of vasopressors or mechanical ventilation as needed for hemodynamic instability.


**Note:**  
We added a system prompt to guide the model's responses and used the `max_tokens` parameter to ensure the responses are not truncated.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [15]:
response_text_pe2 = response(qna_system_message + " " + query2, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe2)

Response with prompt engineering:
1. Appendicitis: Sudden pain around navel, loss of appetite, fever, abdominal swelling, and vomiting.
2. No cure with medicine: Appendicitis requires surgery to remove the inflamed appendix before it bursts and spreads infection.
3. Surgical procedure: Laparoscopic appendectomy is a common surgical procedure used to remove the appendix through small incisions, minimizing recovery time.


In [16]:
response_text_pe2 = response(qna_system_message + " " + query2, max_tokens=256, temperature=5).strip()
print("Response with prompt engineering:")
print(response_text_pe2)

Response with prompt engineering:
1. Abdominal pain, especially around the navel that migrates to the lower right side

2. Loss of appetite
3. Fever
4. Vomiting or feeling sick (nausea)

Applying heat or antibiotics may relieve mild symptoms but won't cure appendicitis. The only treatment is appendectomy - surgery to remove the infected appendix.


**Note:**  
Increasing the temperature parameter has made the model's responses more non-deterministic. This is evident in the following:  

- Some points in the response are overly brief, sometimes consisting of just a single word. Sometimes not answering the question at all 

This behavior highlights the trade-off between creativity and consistency when adjusting the temperature setting.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [17]:
response_text_pe3 = response(qna_system_message + " " + query3, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe3)

Response with prompt engineering:
1. Minoxidil: Over-the-counter topical treatment that stimulates hair growth and slows down hair loss in affected areas.
2. Corticosteroids: Prescription medication to reduce inflammation and suppress immune system response causing alopecia areata.
3. DHT blockers: Finasteride or dutasteride can inhibit the production of dihydrotestosterone, a hormone contributing to male pattern baldness.
4. Hair transplant: Surgical procedure where healthy hair follicles are moved from donor areas to bald spots for permanent regrowth.

Possible causes: autoimmune disorders, stress, genetics, nutritional deficiencies, certain medications, or infections. Consult a healthcare professional for proper diagnosis and treatment plan.


In [18]:
response_text_pe3 = response(qna_system_message + " " + query3, max_tokens=256, temperature= 5, top_p=0.4).strip()
print("Response with prompt engineering:")
print(response_text_pe3)

Response with prompt engineering:
1. Minoxidil: Over-the-counter topical treatment that stimulates hair growth, effective for androgenetic alopecia and some types of patchy hair loss.
2. Corticosteroids: Prescription medication to reduce inflammation and suppress the immune system, useful for treating alopecia areata, an autoimmune cause of sudden hair loss.
3. Hair transplant surgery: Permanent solution for bald spots caused by androgenetic alopecia or other forms of permanent hair loss.
4. Nutritional deficiencies: Correcting vitamin or mineral deficiencies, such as iron or biotin, can help improve hair growth and potentially reverse patchy hair loss.


**Note:**
- When `top_p` is set low (e.g., 0.4), the model only considers the most probable tokens, often resulting in shorter, less detailed, or less varied answers. This can cause the model to omit less common details—like "possible causes"—and focus only on the most typical treatments or facts. 


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [19]:
response_text_pe4 = response(qna_system_message + " " + query4, max_tokens=256 ).strip()
print("Response with prompt engineering:")
print(response_text_pe4)

Response with prompt engineering:
1. Rehabilitation therapy: Occupational, speech, and physical therapy help improve function and compensate for deficits.
2. Medications: Anti-seizure drugs, pain relievers, and psychotropic medications may be prescribed to manage symptoms.
3. Surgery: Depending on the injury's location and severity, surgery might be necessary to remove hematomas or repair damaged tissue.
4. Lifestyle modifications: Rest, proper nutrition, and stress management can aid in recovery and help maintain brain health.


In [20]:
response_text_pe4 = response(qna_system_message + " " + query4, max_tokens=256, top_k=1).strip()
print("Response with prompt engineering:")
print(response_text_pe4)

Response with prompt engineering:
1. Rehabilitation therapy: Occupational, speech, and physical therapy help improve function and compensate for deficits.
2. Medications: Anti-seizure drugs, pain relievers, and psychotropic medications may be prescribed to manage symptoms.
3. Surgery: Depending on the injury's location and severity, surgery might be necessary to remove hematomas or repair damaged tissue.
4. Lifestyle modifications: Rest, proper nutrition, and stress management can aid in recovery and help maintain brain health.


**Note**
- In our case the model is generating deterministic responses, changing `top_k` may not have any effect. (mostly because we set temp to zero)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [21]:
response_text_pe5 = response(qna_system_message + " " + query5, max_tokens=256 ).strip()
print("Response with prompt engineering:")
print(response_text_pe5)

Response with prompt engineering:
1. Immediate first aid: Apply a sterile dressing to the wound, immobilize the leg with a splint or sling, and control bleeding if necessary.
2. Seek medical attention: Transport the person to the nearest hospital or urgent care center for proper diagnosis and treatment of the fracture.
3. Pain management: Administer over-the-counter pain medication or prescription painkillers as directed by a healthcare professional.
4. Follow doctor's instructions: Attend follow-up appointments, undergo rehabilitation therapy, and adhere to any weight-bearing restrictions or other guidelines provided by the healthcare team.


## Summary - Second Approach - After prompt Engineering and fine tuning
The chosen parameters for a deterministic model configuration are:

- **max_tokens=256**: Limits the response length to a maximum of 256 tokens, ensuring concise and controlled outputs.
- **temperature=0**: Eliminates randomness, making the models responses deterministic and consistent.
- **top_p=0.95**: Enables nucleus sampling, considering only the top 95% of the probability mass for token selection, ensuring relevance while maintaining some flexibility.
- **top_k=50**: Restricts token selection to the top 50 most probable tokens, further refining the response quality.

This configuration is ideal for generating precise, repeatable, and contextually accurate outputs, especially in critical applications like medical question answering.

## Data Preparation for RAG

### Loading the Data

In [22]:
#load the pdf file
loader = PyMuPDFLoader("medical_diagnosis_manual.pdf")
documents = loader.load()

### Data Overview

#### Checking the first 2 pages

In [23]:
# Display the content of the first 2 pages
for i, doc in enumerate(documents[:2]):
    print(f"Page {i + 1} Content:\n{doc.page_content}\n{'-' * 80}")

Page 1 Content:
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
for personal use by hovels_bounces.0v@
shing the contents in part or full is liable 

--------------------------------------------------------------------------------
Page 2 Content:
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
This file is meant for personal use by hovels_bounces.0v@icloud.com only.
Sharing or publishing the contents in part or full is liable for legal action.

--------------------------------------------------------------------------------


#### Checking the number of pages

In [24]:
# Check the number of pages in the loaded PDF document
num_pages = len(documents)
print(f"Number of pages in the PDF: {num_pages}")

Number of pages in the PDF: 4114


### Data Chunking

In [25]:
# Chunk the documents for embedding using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Number of characters per chunk
    chunk_overlap=200     # Overlap between chunks to preserve context
)

# Split all documents into chunks
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks created: {len(chunks)}")
print("Sample chunk:\n", chunks[0].page_content[:500])


Total number of chunks created: 18088
Sample chunk:
 hovels_bounces.0v@icloud.com
Y74OHC3EQZ
for personal use by hovels_bounces.0v@
shing the contents in part or full is liable


In [26]:
print("Sample chunk:\n", chunks[-1].page_content[-500:])

Sample chunk:
 201, 910
mastocytosis vs 1125
Menetrier's disease vs 132
peptic ulcer disease vs 134
Zolmitriptan 1721
Zolpidem 1709, 3103
Zonisamide 1701
Zoonotic diseases, cutaneous 718
Zoophobia 1498
Zoster (see Herpes zoster virus infection)
Zygomycosis 1332
The Merck Manual of Diagnosis & Therapy, 19th Edition
Z
4104
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
This file is meant for personal use by hovels_bounces.0v@icloud.com only.
Sharing or publishing the contents in part or full is liable for legal action.


### Embedding

In [27]:
def embed_and_create_chroma(chunks, persist_directory="chroma_db", embedding_model_name="all-MiniLM-L6-v2"):
    """
    Embeds the provided chunks and creates a persistent Chroma vector store.

    Args:
        chunks (list): List of langchain Document objects to embed.
        persist_directory (str): Directory to persist the Chroma database.
        embedding_model_name (str): Name of the sentence-transformer model to use for embeddings.

    Returns:
        Chroma: A persistent Chroma vector store instance.
    """
    # Create embedding function
    embedding_function = SentenceTransformerEmbeddings(model_name=embedding_model_name)

    # Create Chroma vector store and persist it
    vectordb = Chroma.from_documents(
        documents=chunks,
        embedding=embedding_function,
        persist_directory=persist_directory
    )
    vectordb.persist()
    return vectordb

### Vector Database

In [28]:
# Create and persist the Chroma vector store
persist_directory = "chroma_db"  # Directory to store the Chroma database
embedding_model_name = "all-MiniLM-L6-v2"  # Name of the embedding model

vectordb = embed_and_create_chroma(chunks, persist_directory=persist_directory, embedding_model_name=embedding_model_name)

print("Chroma vector store created and persisted successfully.")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
Matplotlib is building the font cache; this may take a moment.
Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Chroma vector store created and persisted successfully.


In [29]:
# Check the number of documents (chunks) stored in the Chroma vector database  vs Original chunks
print(f"Total number of chunks created: {len(chunks)}")

Total number of chunks created: 18088


In [30]:
# Check the number of documents (chunks) stored in the Chroma vector database
print("Number of chunks in the vector database:", vectordb._collection.count())

# Optionally, check a few sample chunks stored in the vector database
sample_docs = vectordb.get(include=['documents'], ids=[str(i) for i in range(5)])
for idx, doc in enumerate(sample_docs['documents']):
    print(f"Chunk {idx}:\n{doc[:500]}\n{'-'*60}")

Number of chunks in the vector database: 18088


### Retriever

In [31]:
# Create a retriever from the Chroma vector database
retriever = vectordb.as_retriever(  search_type='similarity',search_kwargs={"k": 3})

In [32]:
retriever.get_relevant_documents("How to treat a patient with sepsis in a critical care unit?")

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(page_content='16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\nof physiologic parameters.\nSupportive care for the ICU patient includes provision of adequate nutrition (see p. 21) and prevention of\ninfection, stress ulcers and gastritis (see p. 131), and pulmonary embolism (see p. 1920). Because 15 to\n25% of patients admitted to ICUs die there, physicians should know how to minimize suffering and help\ndying patients maintain dignity (see p. 3480).\nPatient Monitoring and Testing', metadata={'author': '', 'creationDa

### System and User Prompt Template

In [33]:
# System and user prompt templates for RAG-based medical assistant

qna_system_message = (
    "You are a highly knowledgeable medical assistant. "
    "Use only the provided context from the Merck Manual to answer the user's question. "
    "Be concise, accurate, and cite relevant sections if possible. "
    "If the answer is not present in the context, say 'I don't know'. "
    "Limit your response to a maximum of 4 bullet points, each not exceeding 30 words."
)

qna_user_message_template = (
    "Context:\n{context}\n\n"
    "Question: {question}\n"
    "Answer:"
)

### Response Function

In [34]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    #print("Prompt for LLM:\n", prompt)

    # Generate the response
    try:
        response = lcpp_llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [35]:
response_text_rag1 = generate_rag_response(query1,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag1)

Response with RAG:
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [36]:
response_text_rag2 = generate_rag_response(query2,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag2)

Response with RAG:
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [37]:
response_text_rag3 = generate_rag_response(query3,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag3)

Response with RAG:
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [38]:
response_text_rag4 = generate_rag_response(query4,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag4)

Response with RAG:
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [39]:
response_text_rag5 = generate_rag_response(query5,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag5)

Response with RAG:
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.


### Fine-tuning

In [40]:
response_text_rag1_ft = generate_rag_response(query1,max_tokens=256, top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag1_ft)

Response with RAG FT:
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.


In [42]:
response_text_rag2_ft = generate_rag_response(query2,max_tokens= 256, top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag2_ft)

Response with RAG FT:
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.


In [43]:
response_text_rag3_ft = generate_rag_response(query3,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag3_ft)

Response with RAG FT:
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.


In [44]:
response_text_rag4_ft = generate_rag_response(query4,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag4_ft)

Response with RAG FT:
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.


In [45]:
response_text_rag5_ft = generate_rag_response(query5,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag5_ft)

Response with RAG FT:
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.


## Output Evaluation

In [46]:
groundedness_rater_system_message = (
    "You are a medical expert evaluating the groundedness of an AI-generated medical answer. "
    "Your task is to assess how well the answer is strictly supported by the provided context from the Merck Manual PDF. "
    "Rate the answer based on these criteria:\n"
    "- 'Fully Grounded': All medical information comes directly from the provided context, with accurate facts and procedures\n"
    "- 'Partially Grounded': Some information is supported by the context, but contains unsupported details or generalizations\n"
    "- 'Not Grounded': Information is not found in the context or contradicts the provided medical manual content\n\n"
    "IMPORTANT: Medical answers must be factually accurate and traceable to the source material. "
    "Any medical advice, treatments, or procedures mentioned must be explicitly found in the Merck Manual context. "
    "Respond with your rating and provide a detailed justification in 2-3 sentences explaining your assessment.\n\n"
    "Sample answer format:\n"
    "Rating: Fully Grounded\n"
    "Justification: The answer accurately reflects the sepsis management protocols outlined in the Merck Manual context, "
    "including specific treatment steps and medication recommendations that are directly referenced in the provided text."
)

relevance_rater_system_message = (
    "You are a medical expert evaluating the relevance of an AI-generated answer. "
    "Given the user's question and the answer, rate how relevant the answer is to the question. "
    "Respond with one of: 'Highly Relevant', 'Somewhat Relevant', or 'Not Relevant'. "
    "And in the next line, briefly justify your rating in 1-2 sentences."
    "Sample answer: Rating: Higly Relevant \
    Justification:  The answer directly addresses the user's question, providing specific and actionable information."
)

user_message_template = (
    "Context:\n{context}\n\n"
    "Answer:\n{answer}\n\n"
    "Question:\n{question}\n\n"
    "Please provide your rating and justification."
)

In [47]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    pages = [str(doc.metadata.get('page', 'N/A')) for doc in relevant_document_chunks]
    pages_str = ", ".join(pages)

    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = lcpp_llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 =  lcpp_llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = lcpp_llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text'],pages_str

In [48]:
results = []
for q, ans in [
    (query1, response_text_rag1_ft),
    (query2, response_text_rag2_ft),
    (query3, response_text_rag3_ft),
    (query4, response_text_rag4_ft),
    (query5, response_text_rag5_ft),
]:
    ground, rel, page_str = generate_ground_relevance_response(user_input=q, max_tokens=370, top_k=20)
    results.append({
        "Query": q,
        "Answer": ans,
        "Groundedness": ground.strip(),
        "Relevance": rel.strip(),
        "Pages": page_str if "Not Grounded" not in ground else None
    })

eval_df = pd.DataFrame(results)
eval_df

Unnamed: 0,Query,Answer,Groundedness,Relevance,Pages
0,What is the protocol for managing sepsis in a ...,1. Provide supplemental oxygen and secure airw...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"2400, 2448, 2453"
1,"What are the common symptoms for appendicitis,...",1. Common symptoms include epigastric or periu...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"173, 172, 3567"
2,What are the effective treatments or solutions...,1. Sudden patchy hair loss can be caused by va...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"858, 855, 857"
3,What treatments are recommended for a person w...,1. Ensure reliable airway and maintain adequat...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"3647, 3409, 3404"
4,What are the necessary precautions and treatme...,"1. Treat life-threatening injuries first, such...",Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"3390, 3646, 1023"


In [49]:
from IPython.display import display, Markdown

# Create a more readable summary of the evaluation results, including pages

def format_eval_content(df):
    content = ""
    for idx, row in df.iterrows():
        content += f"**Query {idx+1}:** {row['Query']}\n\n"
        content += f"**Answer:**\n{row['Answer']}\n\n"
        content += f"**Groundedness:** {row['Groundedness']}\n\n"
        if 'Pages' in row and pd.notnull(row['Pages']):
            content += f"**Pages:** {row['Pages']}\n\n"
        content += f"**Relevance:** {row['Relevance']}\n\n"
        content += "---\n"
    return content

readable_eval_content = format_eval_content(eval_df)
display(Markdown(readable_eval_content))

**Query 1:** What is the protocol for managing sepsis in a critical care unit?

**Answer:**
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the sepsis management protocols outlined in the Merck Manual context, including providing supplemental oxygen, checking airway and ventilation, inserting large IV catheters for fluid resuscitation, initiating aggressive fluid resuscitation, antibiotics, and considering surgical intervention if necessary. These steps are directly referenced in the provided text.

**Pages:** 2400, 2448, 2453

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific steps for managing sepsis in a critical care unit, including supplemental oxygen, checking airway and ventilation, inserting large IV catheters for fluid resuscitation, aggressive fluid resuscitation, antibiotics, surgical intervention if necessary, and supportive care.

---
**Query 2:** What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

**Answer:**
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the symptoms of appendicitis as described in the Merck Manual context, including epigastric or periumbilical pain, nausea, vomiting, anorexia, and right lower quadrant pain that increases with cough and motion. The answer also correctly states that appendicitis cannot be cured via medicine and requires surgical removal, specifically an appendectomy, which is also mentioned in the Merck Manual context. Before surgery, patients may receive an NGT, urinary catheter, IV fluids and electrolytes, and antibiotics effective against intestinal flora as stated in the answer, which is also supported by the Merck Manual context.

**Pages:** 173, 172, 3567

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing a clear list of common symptoms for appendicitis and stating that it cannot be cured via medicine, but rather requires surgical removal through an appendectomy. The answer also mentions the pre-surgery preparations such as NGT, urinary catheter, IV fluids and electrolytes, and antibiotics.

---
**Query 3:** What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

**Answer:**
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the information provided in the Merck Manual context regarding the causes (alopecia areata, tinea capitis, trichotillomania) and treatments (topical corticosteroids, minoxidil, anthralin, immunotherapy, systemic corticosteroids, antifungals, behavior modification techniques, clomipramine, and SSRIs) for sudden patchy hair loss. The answer also includes additional details about alopecia areata being an autoimmune disorder that may progress to permanent baldness if left untreated, and the use of fungal and bacterial cultures and immunofluorescence studies for diagnosis.

**Pages:** 858, 855, 857

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing information about the potential causes of sudden patchy hair loss (alopecia areata, tinea capitis, trichotillomania) and effective treatments for each condition.

---
**Query 4:** What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

**Answer:**
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the treatment recommendations for traumatic brain injury (TBI) as outlined in the Merck Manual context. The answer includes specific treatments such as ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, and the potential need for surgery to place monitors, decompress the brain, or remove intracranial hematomas. Additionally, the importance of rehabilitation for functional recovery and preventing secondary disabilities is mentioned, which aligns with the information provided in the context on pages 3231 and 3400.

**Pages:** 3647, 3409, 3404

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific treatments recommended for a person with a traumatic brain injury, including ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, surgery if necessary, and rehabilitation.

---
**Query 5:** What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

**Answer:**
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the recommended precautions and treatment steps for a person with a fractured leg as outlined in the Merck Manual context, including prioritizing life-threatening injuries, immobilization using a splint, seeking medical attention for definitive treatment, applying RICE for comfort and swelling reduction, and maintaining proper body alignment during recovery for individuals with lower-limb prostheses. The answer also emphasizes the importance of following healthcare practitioner instructions during hip fracture surgery rehabilitation. All information is directly referenced in the provided text.

**Pages:** 3390, 3646, 1023

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific and actionable information on the necessary precautions and treatment steps for a person who has fractured their leg, as well as considerations for their care and recovery.

---


## Actionable Insights and Business Recommendations


1. **Improved Contextual Responses**:
    - RAG-based approach enhances relevance and groundedness using trusted sources like the Merck Manual.
    - Fine-tuned parameters (`temperature=0`, `top_p=0.95`, `top_k=20`) ensure consistent outputs.

2. **Efficient Data Handling**:
    - Chunking with RecursiveCharacterTextSplitter preserves context, creating 18,088 chunks from 4,114 pages.
    - Chroma vector database enables fast, domain-specific retrieval.

3. **Evaluation Metrics**:
    - Groundedness: 100% Fully Grounded.
    - Relevance: 100% Highly Relevant.


---

### Business Recommendations:
1. **Healthcare Assistant Product**:
    - Develop an AI-powered assistant for clinicians and patients.

2. **Expand Knowledge Base**:
    - Add sources like PubMed to cover broader medical domains.

3. **Regulatory Compliance**:
    - Ensure HIPAA/GDPR compliance for data security.

4. **Specialized Solutions**:
    - Tailor systems for specialties like neurology or cardiology.

5. **Continuous Updates**:
    - Regularly update the vector database with new medical research.

<font size=6 color='blue'>Power Ahead</font>
___