## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
#!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir -q

#!CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# For installing the libraries & downloading models from HF Hub
%pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community chromadb sentence-transformers numpy -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.3.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/ggovvala/Development/Code/medical-diagnoses-rag/.venv/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Users/ggovvala/Development/Code/medical-diagnoses-rag/.venv/lib/python3.12/site-packages/traitlets/config/application.py", line 1075, in launch_instance
    app.start()
  File "/Users/ggovvala/Development/Code/medical-diagnoses-rag/.venv/lib

## Question Answering using LLM

#### Downloading and Loading the model

In [4]:
# Download and load Mistral-7B model optimized for Intel Mac
# This model is quantized for better performance on CPU

import os

# Model configuration for Intel Mac
model_name = "mistralai/Mistral-7B-Instruct-v0.2"
model_file = "mistral-7b-instruct-v0.2.Q4_K_M.gguf"  # 4-bit quantized for Intel Mac

print("🔄 Downloading Mistral-7B model for Intel Mac...")
print(f"Model: {model_name}")
print(f"File: {model_file}")
print("⚠️  This may take several minutes (model size: ~4GB)")

try:
    # Download the model from Hugging Face Hub
    model_path = hf_hub_download(
        repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
        filename=model_file,
        cache_dir="./models"  # Download to local models directory
    )
    
    print(f"✅ Model downloaded successfully!")
    print(f"📁 Model path: {model_path}")
    
    # Load the model with llama-cpp-python optimized for Intel Mac
    print("🔄 Loading model into memory...")
    
    llm = Llama(
        model_path=model_path,
        n_ctx=2048,      # Context length
        n_threads=8,     # Number of CPU threads (adjust based on your Mac)
        n_gpu_layers=35,  # 0 for CPU-only on Intel Mac
        verbose=False,   # Set to True for debugging
        temperature=0.7, # Default temperature
        max_tokens=512,  # Default max tokens
        seed=42         # For reproducible results
    )
    
    print("✅ Model loaded successfully!")
    print("🎯 Ready for inference!")
    
    # Test the model with a simple query
    print("\n🧪 Testing model with a simple medical query...")
    test_response = llm(
        "What is hypertension?",
        max_tokens=100,
        temperature=0
    )
    
    print("Test Response:")
    print("-" * 50)
    print(test_response['choices'][0]['text'])
    print("-" * 50)
    
except Exception as e:
    print(f"❌ Error loading model: {e}")
    print("💡 Make sure you have sufficient RAM (8GB+ recommended)")
    print("💡 Check your internet connection for downloading")
    
# Display model information
if 'llm' in locals():
    print(f"\n📊 Model Information:")
    print(f"   • Model: Mistral-7B-Instruct-v0.2")
    print(f"   • Quantization: Q4_K_M (4-bit)")
    print(f"   • Context Length: 2048 tokens")
    print(f"   • Optimization: Intel Mac CPU")
    print(f"   • Memory Usage: ~4-6GB RAM")

🔄 Downloading Mistral-7B model for Intel Mac...
Model: mistralai/Mistral-7B-Instruct-v0.2
File: mistral-7b-instruct-v0.2.Q4_K_M.gguf
⚠️  This may take several minutes (model size: ~4GB)


mistral-7b-instruct-v0.2.Q4_K_M.gguf:   0%|          | 0.00/4.37G [00:00<?, ?B/s]

✅ Model downloaded successfully!
📁 Model path: ./models/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q4_K_M.gguf
🔄 Loading model into memory...


llama_context: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_soft_max_f16                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4                    (not supported)
ggml_metal_init: skipping kernel_soft_max_f32                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4                    (not supported)
ggml_metal_init: skipping kernel_soft_max_f16                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4                    (not supported)
ggml_metal_init: skipping kernel_soft_max_f32                      (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4                    (not supported)
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16           

✅ Model loaded successfully!
🎯 Ready for inference!

🧪 Testing model with a simple medical query...
Test Response:
--------------------------------------------------


Hypertension, also known as high blood pressure, is a condition where the force of blood pushing against the walls of the arteries is consistently too high. The American Heart Association defines hypertension as a systolic blood pressure of 140 mm Hg or higher and/or a diastolic blood pressure of 90 mm Hg or higher.

What causes hypertension?

The exact cause of hypertension is not known in most
--------------------------------------------------

📊 Model Information:
   • Model: Mistral-7B-Instruct-v0.2
   • Quantization: Q4_K_M (4-bit)
   • Context Length: 2048 tokens
   • Optimization: Intel Mac CPU
   • Memory Usage: ~4-6GB RAM
Test Response:
--------------------------------------------------


Hypertension, also known as high blood pressure, is a condition where the force of blood pushing against the walls of the art

#### Response

In [8]:
import IPython.display as display

def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return display.Markdown(model_output['choices'][0]['text'])

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [9]:
#Query1 output response
sepsis_query1 = "What is the protocol for managing sepsis in a critical care unit?"
response(sepsis_query1)



Sepsis is a life-threatening condition that can arise from an infection, and prompt recognition and appropriate management are crucial for improving outcomes. In a critical care unit, the following steps should be taken for managing sepsis:

1. Early recognition: Identify sepsis early by recognizing the signs and symptoms, such as fever, chills, tachycardia, tachypnea, altered mental status, and lactic acidosis. Use the Sequential Organ Failure Assessment (SOFA) score or the Quick Sequential Organ Failure Assessment (qSOFA

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [10]:
# Query2 output response
query2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to. treat it?"
response(query2)



Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some of the most common ones include:

1. Abdominal pain: The pain is typically located in the lower right side of the abdomen and may start as a mild discomfort that gradually worsens over time. The pain may be constant or come and go, and it may be accompanied by cramping or bloating.
2. Loss of appetite

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [11]:
# Query3 output response
query3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response(query3)



Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles, leading to hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be related to a dysfunction of the immune system.

There are several treatments that have been shown to be effective in addressing sudden patchy hair loss:

1. Corticosteroids: Corticosteroids are anti-inflammat

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [12]:
query4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(query4)



A person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, is typically diagnosed with a traumatic brain injury (TBI). The treatment for a TBI depends on the severity and location of the injury. Here are some common treatments recommended for individuals with a TBI:

1. Emergency care: If the injury is severe, the person may require emergency care, including surgery to remove hematomas or decompressing a skull fracture.
2. Medications: Depending on the symptoms, the doctor may prescribe medications to manage seiz

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [13]:
query5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(query5)



First and foremost, it is essential to ensure the safety and stability of the injured person. If the fracture is open or compound, meaning the bone has pierced the skin, do not attempt to move the person without proper medical assistance. In this case, call for emergency medical services and follow their instructions.

If the fracture is closed, meaning the bone has not pierced the skin, the following steps can be taken:

1. Assess the situation: Check the person's vital signs, such as pulse, breathing rate, and level of consciousness. If there are any signs of shock,

## Question Answering using LLM with Prompt Engineering

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [18]:
prompt = "Step-by-Step approach or treatement guidelines" 
input = prompt + "\n" + "What is the protocol for managing sepsis in a critical care unit?";
response(input)


1. Early recognition and prompt assessment: Recognize sepsis early and initiate assessment as soon as possible. Use the Sequential [Sepsis-related] Organ Failure Assessment (SOFA) score to evaluate organ dysfunction.
2. Fluid resuscitation: Administer intravenous fluids to maintain adequate tissue perfusion. Use crystalloids initially, and consider colloids or blood products if fluid resuscitation is insufficient.
3. Antibiotics: Administer broad-spectrum antibiotics as soon as possible, based on suspected infection site and microbi

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [19]:
appendicitis_prompt = "Answer about appendicitis in a structured format including symptoms, medical treatments, and how to explain to patient."
appendicitis_input = appendicitis_prompt + "\n" + "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response(appendicitis_input)



Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The common symptoms of appendicitis include:

1. Sudden onset of pain in the lower right abdomen, which may be mild at first but eventually becomes severe and constant.
2. Loss of appetite and feeling sick to your stomach.
3. Nausea and vomiting.
4. Fever, often over 100.4°F (38°C).
5. Diarrhe

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [20]:
hair_loss_prompt = "Answer about a dermatologist taking care of a patient with hair loss."
hair_loss_input = hair_loss_prompt + "\n" + "What are the common causes of hair loss, and what treatment options are available?"
response(hair_loss_input)

 A dermatologist is a medical doctor who specializes in the diagnosis and treatment of skin, hair, and nail conditions. When it comes to hair loss, a ddermatologist is an excellent specialist to consult.
Common causes of hair loss include:
1. Androgenetic alopecia (male or female pattern baldness)
2. Telogen effluvium (stress-related hair loss)
3. Alopecia areata (autoimmune hair loss)
4. Thyroid disorders
5. Nutritional deficiencies
6. Medications (che

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [24]:
brain_injury_prompt = "Answer about Traumatic Brain Injury (TBI) in a structured format including causes, symptoms, and treatment protocols. List the common symptoms of TBI"
brain_injury_input = brain_injury_prompt + "\n" + "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(brain_injury_input)

 This condition is known as a Traumatic Brain Inction (TBI).

Causes:
1. Motor vehicle accidents
2. Falls
3. Sports injuries
4. Violence (assaults or gunshot wounds)
5. Explosions

Symptoms:
1. Headache or a feeling of pressure in the head
2. Temporary loss of consciousness
3. Confusion or disorientation
4. Amnesia surrounding the traumatic event
5. Dizziness or loss of balance
6. Nausea or vomiting
7. Sl

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [25]:
factured_leg_prompt = "Answer about a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery."
factured_leg_input = factured_leg_prompt + "\n" + "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(factured_leg_input)



First and foremost, the safety and well-being of the injured person should be prioritized. If the fracture is obvious and the person is in significant pain, immobilization of the leg should be achieved as soon as possible to prevent further injury or complications. This can be done by using a splint, a makeshift sling, or a hiking pole to keep the leg stable.

If the fracture is severe or if the person's condition is deteriorating, it may be necessary to call for emergency medical assistance. In the meantime, efforts should be made to keep the

## Data Preparation for RAG

### Loading the Data

### Data Overview

#### Checking the first 5 pages

#### Checking the number of pages

### Data Chunking

### Embedding

### Vector Database

### Retriever

### System and User Prompt Template

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

### Fine-tuning

## Output Evaluation

In [None]:
groundedness_rater_system_message  = ""

In [None]:
relevance_rater_system_message = ""

In [None]:
user_message_template = ""

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

## Actionable Insights and Business Recommendations

<font size=6 color='blue'>Power Ahead</font>
___