## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=OFF" FORCE_CMAKE=1 #pip install llama-cpp-python --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

In [2]:
# For installing the libraries & downloading models from HF Hub
#!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

In [3]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

  from .autonotebook import tqdm as notebook_tqdm


## Question Answering using LLM

#### Downloading and Loading the model

In [4]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [5]:
import sys
print(sys.version)

3.12.8 (v3.12.8:2dc476bcb91, Dec  3 2024, 14:43:19) [Clang 13.0.0 (clang-1300.0.29.30)]


In [6]:
# Using hf_hub_download to download a model from the Hugging Face model hub
# The repo_id parameter specifies the model name or path in the Hugging Face repository
# The filename parameter specifies the name of the file to download
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

In [7]:
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=10,  # CPU cores
    n_batch=512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=5,  # uncomment and change this value based on GPU VRAM pool.
    n_ctx=4096,  # Context window
    verbose=False,  # Reduce verbose output
)

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/jithinravi/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version unknown)
llama_model_loader: - tensor    0:                token_embd.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q6_K     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q6_K     

#### Response

In [10]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = lcpp_llm (
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )
    #print("Model Output:", model_output)
    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [11]:
query1 = "What is the protocol for managing sepsis in a critical care unit?"
response_text1 = response(query1)
print("Response:", response_text1)

Response: 

Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:

1. Early recognition and suspicion: Septic patients may present with non-specific symptoms such as fever, chills, tachycardia, tachypnea, altered mental status, or lactic acidosis. It is essential to have a high index of suspicion for sepsis in patients with suspected or confirmed infection.
2. Early goal-directed


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [12]:
query2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response2_text = response(query2)
print("Response:", response2_text)

Response: 

Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:

1. Abdominal pain: The pain is typically located in the lower right quadrant of the abdomen and may start as a mild discomfort that gradually worsens over time. The pain may be constant or intermittent and may be aggravated by movement, deep breathing, or coughing.
2. Loss of appetite


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [13]:
query3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response3_text = response(query3)
print("Response:", response3_text)

Response: 

Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, and eyelashes.

The exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [14]:
query4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response4_text = response(query4)
print("Response:", response4_text)

Response: 

A person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), may require various treatments depending on the severity and location of the injury. Here are some common treatments recommended for TBIs:

1. Emergency care: The first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating normally. In severe cases, emergency surgery may be required to remove hematomas or other obstructions.
2. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [15]:
query5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response5_text = response(query5)
print("Response:", response5_text)

Response: 

First and foremost, it is essential to ensure the safety of the injured person and any companions. If possible, try to keep the person as comfortable as possible while preventing further injury or complications. Here are some general steps to follow:

1. Assess the severity of the injury: Check for signs of open wounds, swelling, deformity, numbness, or inability to move the leg. If the fracture is open (compound), do not attempt to move the person unless it is necessary for their safety. In this case, call for emergency medical assistance immediately.
2.


## Summary - First Approach - Plain LLM response

- **Initial Approach:**  
    The plain LLM model is used for question answering without any prompt engineering or parameter tuning.

- **Default Model Behavior:**  
    The model responds to queries using its pre-trained knowledge, without any additional guidance or customization.

- **Limitations:**  
    - While this method is straightforward and quick to implement, it may not yield the most accurate or contextually relevant answers—especially for specialized domains like medicine. Without tuning parameters (e.g., `temperature`, `max_tokens`) or providing system/user prompts, the model's outputs can be generic and may lack the specificity or structure required for clinical use. 
    - We also see the answers are largely `truncated` which is not ideal.

- **Advanced Techniques:**  
    More advanced methods, such as prompt engineering or Retrieval-Augmented Generation (RAG), can significantly improve the quality and reliability of responses in such applications.

## Question Answering using LLM with Prompt Engineering

In [16]:
# Example of prompt engineering for better LLM responses

# Define a system prompt to instruct the LLM to answer as a medical expert using evidence from the Merck Manual
qna_system_message = (
    "You are a highly knowledgeable medical assistant. "
    "Be concise, accurate. If you don't know the answer, say 'I don't know'. "
    "Not more than 4 points in your response.No more than approximately 30 words in each point. "
)

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [17]:
response_text_pe1 = response(qna_system_message + " " + query1, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe1) 

Response with prompt engineering:
1. Immediate antibiotic administration based on suspected infection source and sensitivity results.
2. Aggressive fluid resuscitation to maintain adequate tissue perfusion.
3. Close monitoring of vital signs, lactate levels, and organ function.
4. Consideration of vasopressors or mechanical ventilation as needed for hemodynamic instability.


**Note:**  
We added a system prompt to guide the model's responses and used the `max_tokens` parameter to ensure the responses are not truncated.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [19]:
response_text_pe2 = response(qna_system_message + " " + query2, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe2)

Response with prompt engineering:
1. Appendicitis: Sudden onset of abdominal pain, usually localized in lower right quadrant. Nausea, vomiting, loss of appetite, fever, and constipation or diarrhea may occur.
2. No cure with medicine: Appendicitis requires surgical removal of the appendix (appendectomy) due to its potential rupture and risk of peritonitis.
3. Laparoscopic appendectomy: Commonly performed procedure, using small incisions and a laparoscope for minimally invasive surgery.
4. Open appendectomy: Traditional surgical method with a larger incision to directly access the appendix, used in cases where laparoscopy is not feasible or effective.


In [20]:
response_text_pe2 = response(qna_system_message + " " + query2, max_tokens=256, temperature=5).strip()
print("Response with prompt engineering:")
print(response_text_pe2)

Response with prompt engineering:
1. Appendicitis symptoms: sudden pain in lower right abdomen, loss of appetite, nausea, vomiting
2. Appendicitis diagnosis may include pelvic exam, blood test for white blood cell count elevation, imaging studies like CT scan
3. Appendicitis can't be cured with medicine - it requires appendectomy, a surgical procedure to remove the appendix
4. Appendix is usually removed through a small incision using laparoscopic surgery or an open procedure through a larger incision depending on severity.


**Note:**  
Increasing the temperature parameter has made the model's responses more non-deterministic. This is evident in the following:  

- Some points in the response are overly brief, sometimes consisting of just a single word. Sometimes not answering the question at all 

This behavior highlights the trade-off between creativity and consistency when adjusting the temperature setting.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [21]:
response_text_pe3 = response(qna_system_message + " " + query3, max_tokens=256).strip()
print("Response with prompt engineering:")
print(response_text_pe3)

Response with prompt engineering:
1. Minoxidil: Over-the-counter topical treatment that stimulates hair growth and slows down hair loss in affected areas.
2. Finasteride: Prescription medication that blocks DHT production, reducing hair loss and promoting regrowth.
3. Corticosteroids: Topical or oral treatments to reduce inflammation and suppress the immune system, helping with alopecia areata.
4. Hair transplant: Surgical procedure where healthy hair follicles are moved from one area of the scalp to another to replace lost hair. Possible causes: stress, autoimmune disorders, hormonal imbalances, nutritional deficiencies.


In [22]:
response_text_pe3 = response(qna_system_message + " " + query3, max_tokens=256, temperature= 5, top_p=0.4).strip()
print("Response with prompt engineering:")
print(response_text_pe3)

Response with prompt engineering:
1. Minoxidil: A topical medication applied directly to the affected area, promoting hair growth and reducing hair loss.
2. Finasteride: An oral medication that inhibits the conversion of testosterone to dihydrotestosterone, which can cause hair loss.
3. Corticosteroids: Topical or injected steroids can help reduce inflammation and suppress the immune system response, which may be contributing to the hair loss.
4. Identify causes: Possible causes include alopecia areata, nutritional deficiencies, stress, or autoimmune disorders; proper diagnosis is essential for effective treatment.


**Note:**
- When `top_p` is set low (e.g., 0.4), the model only considers the most probable tokens, often resulting in shorter, less detailed, or less varied answers. This can cause the model to omit less common details—like "possible causes"—and focus only on the most typical treatments or facts. 


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [23]:
response_text_pe4 = response(qna_system_message + " " + query4, max_tokens=256 ).strip()
print("Response with prompt engineering:")
print(response_text_pe4)

Response with prompt engineering:
1. Rehabilitation therapy: Occupational, speech, and physical therapy help improve function and compensate for deficits.
2. Medications: Anti-seizure drugs, pain relievers, and psychotropic medications may be prescribed to manage symptoms.
3. Surgery: Depending on the injury's location and severity, surgery might be necessary to remove hematomas or repair damaged tissue.
4. Assistive devices: Devices like wheelchairs, walkers, or communication aids can help individuals with mobility or communication impairments.


In [24]:
response_text_pe4 = response(qna_system_message + " " + query4, max_tokens=256, top_k=1).strip()
print("Response with prompt engineering:")
print(response_text_pe4)

Response with prompt engineering:
1. Rehabilitation therapy: Occupational, speech, and physical therapy help improve function and compensate for deficits.
2. Medications: Anti-seizure drugs, pain relievers, and psychotropic medications may be prescribed to manage symptoms.
3. Surgery: Depending on the injury's location and severity, surgery might be necessary to remove hematomas or repair damaged tissue.
4. Assistive devices: Devices like wheelchairs, walkers, or communication aids can help individuals with mobility or communication impairments.


**Note**
- In our case the model is generating deterministic responses, changing `top_k` may not have any effect. (mostly because we set temp to zero)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [25]:
response_text_pe5 = response(qna_system_message + " " + query5, max_tokens=256 ).strip()
print("Response with prompt engineering:")
print(response_text_pe5)

Response with prompt engineering:
1. Immediate first aid: Apply a sterile dressing to control bleeding, immobilize the affected leg with a splint or sling.
2. Seek medical attention: Transport the person to the nearest hospital for proper diagnosis and treatment of the fracture.
3. Pain management: Administer over-the-counter pain relievers or prescribed medication as directed by a healthcare professional.
4. Rest, ice, compression, elevation (RICE): Follow this protocol to reduce swelling and promote healing after the fracture has been set and immobilized.


## Summary - Second Approach - After prompt Engineering and fine tuning
The chosen parameters for a deterministic model configuration are:

- **max_tokens=256**: Limits the response length to a maximum of 256 tokens, ensuring concise and controlled outputs.
- **temperature=0**: Eliminates randomness, making the models responses deterministic and consistent.
- **top_p=0.95**: Enables nucleus sampling, considering only the top 95% of the probability mass for token selection, ensuring relevance while maintaining some flexibility.
- **top_k=50**: Restricts token selection to the top 50 most probable tokens, further refining the response quality.

This configuration is ideal for generating precise, repeatable, and contextually accurate outputs, especially in critical applications like medical question answering.

## Data Preparation for RAG

### Loading the Data

In [26]:
#load the pdf file
loader = PyMuPDFLoader("medical_diagnosis_manual.pdf")
documents = loader.load()

### Data Overview

#### Checking the first 2 pages

In [27]:
# Display the content of the first 2 pages
for i, doc in enumerate(documents[:2]):
    print(f"Page {i + 1} Content:\n{doc.page_content}\n{'-' * 80}")

Page 1 Content:
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
for personal use by hovels_bounces.0v@
shing the contents in part or full is liable 

--------------------------------------------------------------------------------
Page 2 Content:
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
This file is meant for personal use by hovels_bounces.0v@icloud.com only.
Sharing or publishing the contents in part or full is liable for legal action.

--------------------------------------------------------------------------------


#### Checking the number of pages

In [28]:
# Check the number of pages in the loaded PDF document
num_pages = len(documents)
print(f"Number of pages in the PDF: {num_pages}")

Number of pages in the PDF: 4114


### Data Chunking

In [29]:
# Chunk the documents for embedding using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Number of characters per chunk
    chunk_overlap=200     # Overlap between chunks to preserve context
)

# Split all documents into chunks
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks created: {len(chunks)}")
print("Sample chunk:\n", chunks[0].page_content[:500])


Total number of chunks created: 18088
Sample chunk:
 hovels_bounces.0v@icloud.com
Y74OHC3EQZ
for personal use by hovels_bounces.0v@
shing the contents in part or full is liable


In [30]:
print("Sample chunk:\n", chunks[-1].page_content[-500:])

Sample chunk:
 201, 910
mastocytosis vs 1125
Menetrier's disease vs 132
peptic ulcer disease vs 134
Zolmitriptan 1721
Zolpidem 1709, 3103
Zonisamide 1701
Zoonotic diseases, cutaneous 718
Zoophobia 1498
Zoster (see Herpes zoster virus infection)
Zygomycosis 1332
The Merck Manual of Diagnosis & Therapy, 19th Edition
Z
4104
hovels_bounces.0v@icloud.com
Y74OHC3EQZ
This file is meant for personal use by hovels_bounces.0v@icloud.com only.
Sharing or publishing the contents in part or full is liable for legal action.


### Embedding

In [31]:
def embed_and_create_chroma(chunks, persist_directory="chroma_db", embedding_model_name="all-MiniLM-L6-v2"):
    """
    Embeds the provided chunks and creates a persistent Chroma vector store.

    Args:
        chunks (list): List of langchain Document objects to embed.
        persist_directory (str): Directory to persist the Chroma database.
        embedding_model_name (str): Name of the sentence-transformer model to use for embeddings.

    Returns:
        Chroma: A persistent Chroma vector store instance.
    """
    # Create embedding function
    embedding_function = SentenceTransformerEmbeddings(model_name=embedding_model_name)

    # Create Chroma vector store and persist it
    vectordb = Chroma.from_documents(
        documents=chunks,
        embedding=embedding_function,
        persist_directory=persist_directory
    )
    vectordb.persist()
    return vectordb

### Vector Database

In [None]:
# Create and persist the Chroma vector store
persist_directory = "chroma_db"  # Directory to store the Chroma database
embedding_model_name = "all-MiniLM-L6-v2"  # Name of the embedding model

vectordb = embed_and_create_chroma(chunks, persist_directory=persist_directory, embedding_model_name=embedding_model_name)

print("Chroma vector store created and persisted successfully.")

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
# Check the number of documents (chunks) stored in the Chroma vector database  vs Original chunks
print(f"Total number of chunks created: {len(chunks)}")

Total number of chunks created: 18088


In [None]:
# Check the number of documents (chunks) stored in the Chroma vector database
print("Number of chunks in the vector database:", vectordb._collection.count())

# Optionally, check a few sample chunks stored in the vector database
sample_docs = vectordb.get(include=['documents'], ids=[str(i) for i in range(5)])
for idx, doc in enumerate(sample_docs['documents']):
    print(f"Chunk {idx}:\n{doc[:500]}\n{'-'*60}")

Number of chunks in the vector database: 18088


### Retriever

In [None]:
# Create a retriever from the Chroma vector database
retriever = vectordb.as_retriever(  search_type='similarity',search_kwargs={"k": 3})

In [None]:
retriever.get_relevant_documents("How to treat a patient with sepsis in a critical care unit?")

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(page_content='16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\nof physiologic parameters.\nSupportive care for the ICU patient includes provision of adequate nutrition (see p. 21) and prevention of\ninfection, stress ulcers and gastritis (see p. 131), and pulmonary embolism (see p. 1920). Because 15 to\n25% of patients admitted to ICUs die there, physicians should know how to minimize suffering and help\ndying patients maintain dignity (see p. 3480).\nPatient Monitoring and Testing', metadata={'author': '', 'creationDa

### System and User Prompt Template

In [None]:
# System and user prompt templates for RAG-based medical assistant

qna_system_message = (
    "You are a highly knowledgeable medical assistant. "
    "Use only the provided context from the Merck Manual to answer the user's question. "
    "Be concise, accurate, and cite relevant sections if possible. "
    "If the answer is not present in the context, say 'I don't know'. "
    "Limit your response to a maximum of 4 bullet points, each not exceeding 30 words."
)

qna_user_message_template = (
    "Context:\n{context}\n\n"
    "Question: {question}\n"
    "Answer:"
)

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    #print("Prompt for LLM:\n", prompt)

    # Generate the response
    try:
        response = lcpp_llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
response_text_rag1 = generate_rag_response(query1,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag1)

Llama.generate: prefix-match hit


Response with RAG:
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    98.57 ms /   102 runs   (    0.97 ms per token,  1034.83 tokens per second)
llama_print_timings: prompt eval time =  8641.84 ms /   828 tokens (   10.44 ms per token,    95.81 tokens per second)
llama_print_timings:        eval time =  6523.88 ms /   101 runs   (   64.59 ms per token,    15.48 tokens per second)
llama_print_timings:       total time = 15497.53 ms


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
response_text_rag2 = generate_rag_response(query2,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag2)

Llama.generate: prefix-match hit


Response with RAG:
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =   110.27 ms /   174 runs   (    0.63 ms per token,  1577.93 tokens per second)
llama_print_timings: prompt eval time =  5597.57 ms /   796 tokens (    7.03 ms per token,   142.20 tokens per second)
llama_print_timings:        eval time = 10997.95 ms /   173 runs   (   63.57 ms per token,    15.73 tokens per second)
llama_print_timings:       total time = 16883.49 ms


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
response_text_rag3 = generate_rag_response(query3,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag3)

Llama.generate: prefix-match hit


Response with RAG:
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =   124.86 ms /   197 runs   (    0.63 ms per token,  1577.83 tokens per second)
llama_print_timings: prompt eval time =  6098.52 ms /   855 tokens (    7.13 ms per token,   140.20 tokens per second)
llama_print_timings:        eval time = 12476.78 ms /   196 runs   (   63.66 ms per token,    15.71 tokens per second)
llama_print_timings:       total time = 18892.20 ms


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
response_text_rag4 = generate_rag_response(query4,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag4)

Llama.generate: prefix-match hit


Response with RAG:
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    53.43 ms /    85 runs   (    0.63 ms per token,  1590.90 tokens per second)
llama_print_timings: prompt eval time =  4880.26 ms /   689 tokens (    7.08 ms per token,   141.18 tokens per second)
llama_print_timings:        eval time =  5327.22 ms /    84 runs   (   63.42 ms per token,    15.77 tokens per second)
llama_print_timings:       total time = 10340.30 ms


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
response_text_rag5 = generate_rag_response(query5,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50)
print("Response with RAG:")
print(response_text_rag5)

Llama.generate: prefix-match hit


Response with RAG:
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    58.78 ms /    93 runs   (    0.63 ms per token,  1582.28 tokens per second)
llama_print_timings: prompt eval time =  5358.41 ms /   760 tokens (    7.05 ms per token,   141.83 tokens per second)
llama_print_timings:        eval time =  5871.22 ms /    92 runs   (   63.82 ms per token,    15.67 tokens per second)
llama_print_timings:       total time = 11376.05 ms


### Fine-tuning

In [None]:
response_text_rag1_ft = generate_rag_response(query1,max_tokens=256, top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag1_ft)

Llama.generate: prefix-match hit


Response with RAG FT:
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    68.85 ms /   102 runs   (    0.68 ms per token,  1481.46 tokens per second)
llama_print_timings: prompt eval time =  5382.90 ms /   751 tokens (    7.17 ms per token,   139.52 tokens per second)
llama_print_timings:        eval time =  6636.87 ms /   101 runs   (   65.71 ms per token,    15.22 tokens per second)
llama_print_timings:       total time = 12199.76 ms


In [None]:
response_text_rag2_ft = generate_rag_response(query2,max_tokens= 256, top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag2_ft)

Llama.generate: prefix-match hit


Response with RAG FT:
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =   109.58 ms /   174 runs   (    0.63 ms per token,  1587.84 tokens per second)
llama_print_timings: prompt eval time =  5678.07 ms /   796 tokens (    7.13 ms per token,   140.19 tokens per second)
llama_print_timings:        eval time = 11093.37 ms /   173 runs   (   64.12 ms per token,    15.59 tokens per second)
llama_print_timings:       total time = 17052.90 ms


In [None]:
response_text_rag3_ft = generate_rag_response(query3,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag3_ft)

Llama.generate: prefix-match hit


Response with RAG FT:
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =   130.15 ms /   197 runs   (    0.66 ms per token,  1513.68 tokens per second)
llama_print_timings: prompt eval time =  6120.32 ms /   855 tokens (    7.16 ms per token,   139.70 tokens per second)
llama_print_timings:        eval time = 12597.85 ms /   196 runs   (   64.27 ms per token,    15.56 tokens per second)
llama_print_timings:       total time = 19056.01 ms


In [None]:
response_text_rag4_ft = generate_rag_response(query4,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag4_ft)

Llama.generate: prefix-match hit


Response with RAG FT:
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    53.62 ms /    85 runs   (    0.63 ms per token,  1585.23 tokens per second)
llama_print_timings: prompt eval time =  4878.83 ms /   689 tokens (    7.08 ms per token,   141.22 tokens per second)
llama_print_timings:        eval time =  5317.33 ms /    84 runs   (   63.30 ms per token,    15.80 tokens per second)
llama_print_timings:       total time = 10331.82 ms


In [None]:
response_text_rag5_ft = generate_rag_response(query5,max_tokens=256,top_k=20) ## moving top_k to 20 to make it deterministic
print("Response with RAG FT:")
print(response_text_rag5_ft)

Llama.generate: prefix-match hit


Response with RAG FT:
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.



llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    62.26 ms /    93 runs   (    0.67 ms per token,  1493.83 tokens per second)
llama_print_timings: prompt eval time =  5345.83 ms /   760 tokens (    7.03 ms per token,   142.17 tokens per second)
llama_print_timings:        eval time =  6014.52 ms /    92 runs   (   65.38 ms per token,    15.30 tokens per second)
llama_print_timings:       total time = 11537.44 ms


## Output Evaluation

In [None]:
groundedness_rater_system_message = (
    "You are a medical expert evaluating the groundedness of an AI-generated answer. "
    "Given the context from the Merck Manual and the answer, rate how well the answer is supported by the provided context. "
    "Respond with one of: 'Fully Grounded', 'Partially Grounded', or 'Not Grounded'. "
    "And in the next line, Briefly justify your rating in 1-2 sentences."
    "Sample answer: Rating: Fully Grounded \
     Justification: The answer is fully supported by the context provided, citing specific sections of the Merck Manual."
)

relevance_rater_system_message = (
    "You are a medical expert evaluating the relevance of an AI-generated answer. "
    "Given the user's question and the answer, rate how relevant the answer is to the question. "
    "Respond with one of: 'Highly Relevant', 'Somewhat Relevant', or 'Not Relevant'. "
    "And in the next line, briefly justify your rating in 1-2 sentences."
    "Sample answer: Rating: Higly Relevant \
    Justification:  The answer directly addresses the user's question, providing specific and actionable information."
)

user_message_template = (
    "Context:\n{context}\n\n"
    "Answer:\n{answer}\n\n"
    "Question:\n{question}\n\n"
    "Please provide your rating and justification."
)

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    pages = [str(doc.metadata.get('page', 'N/A')) for doc in relevant_document_chunks]
    pages_str = ", ".join(pages)

    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = lcpp_llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 =  lcpp_llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = lcpp_llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text'],pages_str

In [None]:
results = []
for q, ans in [
    (query1, response_text_rag1_ft),
    (query2, response_text_rag2_ft),
    (query3, response_text_rag3_ft),
    (query4, response_text_rag4_ft),
    (query5, response_text_rag5_ft),
]:
    ground, rel, page_str = generate_ground_relevance_response(user_input=q, max_tokens=370, top_k=20)
    results.append({
        "Query": q,
        "Answer": ans,
        "Groundedness": ground.strip(),
        "Relevance": rel.strip(),
        "Pages": page_str if "Not Grounded" not in ground else None
    })

eval_df = pd.DataFrame(results)
eval_df

Llama.generate: prefix-match hit

llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    48.89 ms /    76 runs   (    0.64 ms per token,  1554.54 tokens per second)
llama_print_timings: prompt eval time =  5921.23 ms /   852 tokens (    6.95 ms per token,   143.89 tokens per second)
llama_print_timings:        eval time =  4769.98 ms /    75 runs   (   63.60 ms per token,    15.72 tokens per second)
llama_print_timings:       total time = 10811.98 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =   291.60 ms
llama_print_timings:      sample time =    75.09 ms /   119 runs   (    0.63 ms per token,  1584.81 tokens per second)
llama_print_timings: prompt eval time =  7040.76 ms /   972 tokens (    7.24 ms per token,   138.05 tokens per second)
llama_print_timings:        eval time =  7528.65 ms /   118 runs   (   63.80 ms per token,    15.67 tokens per second)
llama_print_timings:       total time = 14760.79 ms
Llama.gene

Unnamed: 0,Query,Answer,Groundedness,Relevance,Pages
0,What is the protocol for managing sepsis in a ...,1. Provide supplemental oxygen and secure airw...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"2400, 2448, 2453"
1,"What are the common symptoms for appendicitis,...",1. Common symptoms include epigastric or periu...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"173, 172, 3567"
2,What are the effective treatments or solutions...,1. Sudden patchy hair loss can be caused by va...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"858, 855, 857"
3,What treatments are recommended for a person w...,1. Ensure reliable airway and maintain adequat...,Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"3647, 3409, 3404"
4,What are the necessary precautions and treatme...,"1. Treat life-threatening injuries first, such...",Rating: Fully Grounded\nJustification: The ans...,Rating: Highly Relevant\nJustification: The an...,"3390, 3646, 1023"


In [None]:
from IPython.display import display, Markdown

# Create a more readable summary of the evaluation results, including pages

def format_eval_content(df):
    content = ""
    for idx, row in df.iterrows():
        content += f"**Query {idx+1}:** {row['Query']}\n\n"
        content += f"**Answer:**\n{row['Answer']}\n\n"
        content += f"**Groundedness:** {row['Groundedness']}\n\n"
        if 'Pages' in row and pd.notnull(row['Pages']):
            content += f"**Pages:** {row['Pages']}\n\n"
        content += f"**Relevance:** {row['Relevance']}\n\n"
        content += "---\n"
    return content

readable_eval_content = format_eval_content(eval_df)
display(Markdown(readable_eval_content))

**Query 1:** What is the protocol for managing sepsis in a critical care unit?

**Answer:**
1. Provide supplemental oxygen and secure airway if necessary through intubation and mechanical ventilation.
2. Insert two large IV catheters into separate peripheral veins or use a central venous line or intraosseous needle for access.
3. Initiate aggressive fluid resuscitation.
4. Administer antibiotics promptly based on culture results and suspected organism, and consider surgical intervention for infected or necrotic tissues.

**Groundedness:** Rating: Fully Grounded
Justification: The answer directly aligns with the information provided in the context from the Merck Manual, specifically Chapter 227 on Sepsis and Septic Shock. The manual suggests providing supplemental oxygen, checking airway and ventilation, inserting large IV catheters for fluid resuscitation, initiating aggressive fluid resuscitation, antibiotics, and considering surgical intervention if necessary. Additionally, the answer mentions monitoring for organ failure and providing supportive care as needed, which is also mentioned in the context.

**Pages:** 2400, 2448, 2453

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific steps for managing sepsis in a critical care unit, including supplemental oxygen, checking airway and ventilation, inserting large IV catheters for fluid resuscitation, aggressive fluid resuscitation, antibiotics, surgical intervention if necessary, and supportive care.

---
**Query 2:** What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

**Answer:**
1. Common symptoms include epigastric or periumbilical pain followed by brief nausea, vomiting, anorexia, and a shift in pain to the right lower quadrant after a few hours. Pain increases with cough and motion. Direct and rebound tenderness are located at McBurney's point.
2. Appendicitis cannot be cured via medicine; it requires surgical removal.
3. The standard surgical procedure for appendicitis is an appendectomy, which involves removing the inflamed appendix.
4. Before the operation, an NGT is inserted, and patients with signs of volume depletion should have urine output monitored with a catheter. IV antibiotics effective against intestinal flora are given to maintain fluid status and prevent infection.

**Groundedness:** Rating: Fully Grounded
Justification: The answer directly addresses the question by accurately quoting the common symptoms of appendicitis from the context provided in the Merck Manual, and also clarifies that appendicitis cannot be cured via medicine and requires surgical removal (appendectomy). The answer is fully supported by the context.

**Pages:** 173, 172, 3567

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing a clear list of common symptoms for appendicitis and stating that it cannot be cured via medicine, but rather requires surgical removal through an appendectomy. The answer also mentions the pre-surgery preparations such as NGT, urinary catheter, IV fluids and electrolytes, and antibiotics.

---
**Query 3:** What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

**Answer:**
1. Sudden patchy hair loss can be caused by various conditions such as fungal infections (tinea capitis), bacterial infections, or autoimmune disorders like alopecia areata.
2. For fungal infections, topical or oral antifungals are effective treatments.
3. For alopecia areata, treatment options include topical corticosteroids, minoxidil, anthralin, immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
4. In cases of trichotillomania, behavior modification, clomipramine, or selective serotonin reuptake inhibitors (SSRIs) may be beneficial.

Note: The provided context does not mention any surgical options for treating sudden patchy hair loss specifically.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately identifies the causes of sudden patchy hair loss (alopecia areata, tinea capitis, trichotillomania) based on the provided context from the Merck Manual. Additionally, it lists the appropriate treatments for each condition as mentioned in the manual.

**Pages:** 858, 855, 857

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing information about the potential causes of sudden patchy hair loss (alopecia areata, tinea capitis, trichotillomania) and effective treatments for each condition.

---
**Query 4:** What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

**Answer:**
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed for monitoring intracranial pressure, decompression, or hematoma removal.
3. In the first few days, maintain adequate brain perfusion and oxygenation to prevent complications.
4. Subsequently, rehabilitation is often required to improve functional recovery.

**Groundedness:** Rating: Fully Grounded
Justification: The answer accurately reflects the information provided in the context from the Merck Manual, which includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure as initial treatments for traumatic brain injury (p. 3400). Additionally, the answer mentions the need for surgery for patients with more severe injuries to place monitors, decompress the brain, or remove intracranial hematomas (p. 3400), and the importance of rehabilitation for functional recovery and preventing secondary disabilities after initial treatment (p. 3231).

**Pages:** 3647, 3409, 3404

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific treatments recommended for a person with a traumatic brain injury, including ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, surgery if necessary, and rehabilitation.

---
**Query 5:** What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

**Answer:**
1. Treat life-threatening injuries first, such as excessive bleeding or signs of shock.
2. Immobilize the injured leg using a splint to prevent further damage.
3. For lower-limb prosthesis users, maintain body alignment and adjust fit as needed to prevent skin breakdown.
4. Seek medical attention for hip fracture surgery and rehabilitation, including walking aids or orthotics for recovery.

**Groundedness:** Rating: Fully Grounded
Justification: The answer directly addresses the question by summarizing the necessary precautions and treatment steps for a person who has fractured their leg, drawing from specific sections of the Merck Manual context provided.

**Pages:** 3390, 3646, 1023

**Relevance:** Rating: Highly Relevant
Justification: The answer directly addresses the user's question by providing specific and actionable information on the necessary precautions and treatment steps for a person who has fractured their leg, as well as considerations for their care and recovery.

---


## Actionable Insights and Business Recommendations


1. **Improved Contextual Responses**:
    - RAG-based approach enhances relevance and groundedness using trusted sources like the Merck Manual.
    - Fine-tuned parameters (`temperature=0`, `top_p=0.95`, `top_k=20`) ensure consistent outputs.

2. **Efficient Data Handling**:
    - Chunking with RecursiveCharacterTextSplitter preserves context, creating 18,088 chunks from 4,114 pages.
    - Chroma vector database enables fast, domain-specific retrieval.

3. **Evaluation Metrics**:
    - Groundedness: 100% Fully Grounded.
    - Relevance: 100% Highly Relevant.


---

### Business Recommendations:
1. **Healthcare Assistant Product**:
    - Develop an AI-powered assistant for clinicians and patients.

2. **Expand Knowledge Base**:
    - Add sources like PubMed to cover broader medical domains.

3. **Regulatory Compliance**:
    - Ensure HIPAA/GDPR compliance for data security.

4. **Specialized Solutions**:
    - Tailor systems for specialties like neurology or cardiology.

5. **Continuous Updates**:
    - Regularly update the vector database with new medical research.

<font size=6 color='blue'>Power Ahead</font>
___