## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
!pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122 \
  --upgrade --force-reinstall --no-cache-dir

Looking in indexes: https://pypi.org/simple, https://abetlen.github.io/llama-cpp-python/whl/cu122
Collecting llama-cpp-python
  Downloading https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.16-cu122/llama_cpp_python-0.3.16-cp312-cp312-linux_x86_64.whl (551.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m551.6/551.6 MB[0m [31m100.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-2.3.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python)
  Downloading j

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community chromadb sentence-transformers langchain-huggingface numpy --upgrade -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m114.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m105.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.4/21.4 MB[0m [31m23.7 MB/s[0m eta [36

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [2]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [3]:
model_path = hf_hub_download(
    repo_id= model_name_or_path,
    filename= model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [4]:

llm = Llama(
    model_path=model_path,
    n_ctx=2300,
    n_gpu_layers=38,
    n_batch=512
)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14992 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                  

#### Response

In [5]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [6]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =     648.96 ms /    16 tokens (   40.56 ms per token,    24.65 tokens per second)
llama_perf_context_print:        eval time =    4701.49 ms /   127 runs   (   37.02 ms per token,    27.01 tokens per second)
llama_perf_context_print:       total time =    5424.60 ms /   143 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are the general steps for managing sepsis in a critical care unit:\n\n1. Early recognition and suspicion: Septic patients may present with non-specific symptoms such as fever, chills, tachycardia, tachypnea, altered mental status, and lactic acidosis. It is essential to have a high index of suspicion for sepsis, especially in patients with known infections or risk factors.\n2.'

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [7]:
user_input_2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" #Complete the code to define the query #2
response(user_input_2)

Llama.generate: 2 prefix-match hit, remaining 32 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =     122.55 ms /    32 tokens (    3.83 ms per token,   261.11 tokens per second)
llama_perf_context_print:        eval time =    4705.10 ms /   127 runs   (   37.05 ms per token,    26.99 tokens per second)
llama_perf_context_print:       total time =    4888.51 ms /   159 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small pouch-like structure that extends from the cecum, the first part of the large intestine. The symptoms of appendicitis can vary from person to person, but the following are the most common ones:\n\n1. Abdominal pain: The pain is typically located in the lower right side of the abdomen, although it can also be felt in the middle or left side. The pain may start as a mild discomfort, but it can quickly become severe and constant.\n2. Loss of'

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [8]:
user_input_2 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" #Complete the code to define the query #3
response(user_input_2)

Llama.generate: 4 prefix-match hit, remaining 34 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      65.56 ms /    34 tokens (    1.93 ms per token,   518.60 tokens per second)
llama_perf_context_print:        eval time =    4661.19 ms /   127 runs   (   36.70 ms per token,    27.25 tokens per second)
llama_perf_context_print:       total time =    4821.29 ms /   161 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nSudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles, leading to hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be related to a problem with the immune system.\n\nThere are several treatments that have been shown to be effective in addressing sudden patchy hair loss:\n\n1. Corticosteroids: Corticosteroids are anti-inflammatory'

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [9]:
user_input_2 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" #Complete the code to define the query #4
response(user_input_2)

Llama.generate: 2 prefix-match hit, remaining 28 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      61.18 ms /    28 tokens (    2.19 ms per token,   457.64 tokens per second)
llama_perf_context_print:        eval time =    4763.69 ms /   127 runs   (   37.51 ms per token,    26.66 tokens per second)
llama_perf_context_print:       total time =    4891.68 ms /   155 tokens
llama_perf_context_print:    graphs reused =        122


"\n\nA person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, is typically diagnosed with a traumatic brain injury (TBI). The treatment for a TBI depends on the severity and location of the injury, as well as the individual's overall health and age.\n\nImmediate treatment for a TBI may include:\n\n1. Emergency medical care: This may include surgery to remove hematomas or other obstructions, as well as treatment for any life-threatening conditions, such as airway obstruction or severe bleeding.\n"

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [10]:
user_input_2 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?" #Complete the code to define the query #5
response(user_input_2)

Llama.generate: 2 prefix-match hit, remaining 35 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      66.40 ms /    35 tokens (    1.90 ms per token,   527.14 tokens per second)
llama_perf_context_print:        eval time =    4846.69 ms /   127 runs   (   38.16 ms per token,    26.20 tokens per second)
llama_perf_context_print:       total time =    4975.38 ms /   162 tokens
llama_perf_context_print:    graphs reused =        122


"\n\nFirst and foremost, if a person has fractured their leg during a hiking trip, it is essential to ensure their safety and prevent further injury. Here are some necessary precautions and treatment steps:\n\n1. Assess the situation: Check the extent of the injury and assess the person's condition. If the fracture is open or the person is in severe pain, do not move them unless it is necessary to prevent further harm.\n2. Call for help: If possible, call for emergency medical assistance. If there is no cell phone reception, try to find a way to signal for help,"

## Question Answering using LLM with Prompt Engineering

In [11]:
system_prompt = "You are a knowledgeable medical professinal. Answer the following questions accurately and concisely."

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [12]:
user_input = system_prompt+"\n"+ "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

Llama.generate: 1 prefix-match hit, remaining 34 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =     106.76 ms /    34 tokens (    3.14 ms per token,   318.46 tokens per second)
llama_perf_context_print:        eval time =    4920.21 ms /   127 runs   (   38.74 ms per token,    25.81 tokens per second)
llama_perf_context_print:       total time =    5088.49 ms /   161 tokens
llama_perf_context_print:    graphs reused =        122


'\nSepsis is a life-threatening condition caused by a dysregulated response to infection. In a critical care unit, the protocol for managing sepsis includes the following steps:\n1. Early recognition and diagnosis: Identify sepsis suspects based on clinical suspicion, laboratory results, and vital signs. Use the Sequential Organ Failure Assessment (SOFA) score to assess organ dysfunction.\n2. Immediate fluid resuscitation: Administer intravenous fluids to maintain adequate tissue perfusion and prevent hypotension. Use crystalloids initially, and consider colloids or'

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [13]:
user_input = system_prompt + "\n" + "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" #Complete the code to pass the query #2
response(user_input)

Llama.generate: 21 prefix-match hit, remaining 32 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      62.59 ms /    32 tokens (    1.96 ms per token,   511.22 tokens per second)
llama_perf_context_print:        eval time =    5020.66 ms /   127 runs   (   39.53 ms per token,    25.30 tokens per second)
llama_perf_context_print:       total time =    5151.84 ms /   159 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small pouch that extends from the cecum in the large intestine. Common symptoms include:\n\n1. Sudden and persistent pain in the lower right abdomen, which may start as a mild discomfort and gradually worsen over hours.\n2. Loss of appetite and feeling sick to your stomach.\n3. Fever, which may be low-grade at first but can rise as high as 102°F (38.9°C) or higher.\n4. N'

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [14]:
user_input = system_prompt + "\n" + "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" #Complete the code to pass the query #3
response(user_input)

Llama.generate: 23 prefix-match hit, remaining 34 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      67.94 ms /    34 tokens (    2.00 ms per token,   500.45 tokens per second)
llama_perf_context_print:        eval time =    5004.72 ms /   127 runs   (   39.41 ms per token,    25.38 tokens per second)
llama_perf_context_print:       total time =    5157.46 ms /   161 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nSudden patchy hair loss, also known as alopecia areata, is an autoimmune condition that causes hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be related to a problem with the immune system.\n\nEffective treatments for alopecia areata include:\n\n1. Corticosteroids: These are anti-inflammatory medications that can help reduce inflammation and suppress the immune system response that causes hair loss.'

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [15]:
user_input = system_prompt+"\n"+"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" #Complete the code to pass the query #4
response(user_input)

Llama.generate: 21 prefix-match hit, remaining 28 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      59.47 ms /    28 tokens (    2.12 ms per token,   470.83 tokens per second)
llama_perf_context_print:        eval time =    5011.55 ms /   127 runs   (   39.46 ms per token,    25.34 tokens per second)
llama_perf_context_print:       total time =    5143.71 ms /   155 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nThe treatment for a person with a brain injury depends on the severity and location of the injury. For mild to moderate brain injuries, rest, rehabilitation, and medication may be recommended. Rehabilitation may include physical therapy, occupational therapy, speech therapy, and cognitive rehabilitation to help the person regain lost skills and functions. Medications may be used to manage symptoms such as headaches, seizures, or depression. For severe brain injuries, such as those causing permanent impairment or coma, more intensive treatments may be necessary. These may include surgery to remove hematomas or other les'

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [16]:
user_input = system_prompt+"\n"+"What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?" #Complete the code to pass the query #5
response(user_input)

Llama.generate: 21 prefix-match hit, remaining 35 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      68.67 ms /    35 tokens (    1.96 ms per token,   509.72 tokens per second)
llama_perf_context_print:        eval time =    5141.20 ms /   127 runs   (   40.48 ms per token,    24.70 tokens per second)
llama_perf_context_print:       total time =    5280.54 ms /   162 tokens
llama_perf_context_print:    graphs reused =        122


'\n\nA fractured leg during a hiking trip requires prompt medical attention. Here are the necessary precautions and treatment steps:\n\n1. Assess the severity of the injury: Check for signs of open fractures, severe swelling, or inability to move the leg. If the injury appears severe, do not move the person and call for emergency medical help.\n2. Immobilize the leg: Use a splint or a makeshift sling to immobilize the leg to prevent further damage and provide support. Be careful not to apply too much pressure on the injury site.\n3.'

## Data Preparation for RAG

In [17]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

### Loading the Data

In [18]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [19]:
manual_pdf_path = "/content/drive/MyDrive/medical_diagnosis_manual.pdf"

In [20]:
pdf_loader = PyMuPDFLoader(manual_pdf_path)
manual = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [21]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(manual[i].page_content,end="\n")

Page Number : 1
vviswanadhan@humana.com
XAM85L0U74
nt for personal use by vviswanadhan@hu
shing the contents in part or full is liable
Page Number : 2
vviswanadhan@humana.com
XAM85L0U74
This file is meant for personal use by vviswanadhan@humana.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    .......................................................................................................................................................................................

#### Checking the number of pages

In [22]:
len(manual)

4114

### Data Chunking

In [23]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=256,
    chunk_overlap= 20
)

In [24]:
document_chunks = pdf_loader.load_and_split(text_splitter)
len(document_chunks)

15701

In [25]:
#Checking that the contents are overlapping
document_chunks[0].page_content

'vviswanadhan@humana.com\nXAM85L0U74\nnt for personal use by vviswanadhan@hu\nshing the contents in part or full is liable'

In [26]:
document_chunks[1].page_content

'vviswanadhan@humana.com\nXAM85L0U74\nThis file is meant for personal use by vviswanadhan@humana.com only.\nSharing or publishing the contents in part or full is liable for legal action.'

In [27]:
document_chunks[2].page_content

'Table of Contents\n1\nFront    ................................................................................................................................................................................................................\n1\nCover    .......................................................................................................................................................................................................\n2\nFront Matter    ...........................................................................................................................................................................................\n53\n1 - Nutritional Disorders    ...............................................................................................................................................................\n53\nChapter 1. Nutrition: General Considerations    ...........................................................................................

In [None]:
document_chunks[3].page_content

### Embedding

In [28]:
#Defining the model name
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [29]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [30]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  384


True

### Vector Database

In [31]:
out_dir = 'medical_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [32]:
vectorstore = Chroma.from_documents(
    documents=document_chunks,
    embedding=embedding_model,
    persist_directory=out_dir
)

In [33]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

  vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


In [34]:
vectorstore.embeddings

HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, show_progress=False)

In [35]:
vectorstore.similarity_search("What is the protocol for managing sepsis?",k=5)

[Document(metadata={'keywords': '', 'total_pages': 4114, 'source': '/content/drive/MyDrive/medical_diagnosis_manual.pdf', 'creator': 'Atop CHM to PDF Converter', 'format': 'PDF 1.7', 'file_path': '/content/drive/MyDrive/medical_diagnosis_manual.pdf', 'creationdate': '2012-06-15T05:44:40+00:00', 'author': '', 'modDate': 'D:20251208184942Z', 'page': 2456, 'moddate': '2025-12-08T18:49:42+00:00', 'trapped': '', 'creationDate': 'D:20120615054440Z', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'subject': ''}, page_content='benefit assessment is required in other patients with increased risk of serious bleeding (eg, with\nthrombocytopenia or recent GI bleeding, receiving concurrent heparin, or with recent aspirin or other\nanticoagulant use).\nOther emerging therapies for severe sepsis include cooling for hyperthermia and early treatment of renal\nfailure (eg, with continuous venovenous hemofiltration).\nTrials o

### Retriever

In [36]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 3}
)

In [37]:
rel_docs = retriever.invoke("What is the protocol for managing sepsis?")
rel_docs

[Document(metadata={'keywords': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'moddate': '2025-12-08T18:49:42+00:00', 'format': 'PDF 1.7', 'creationDate': 'D:20120615054440Z', 'file_path': '/content/drive/MyDrive/medical_diagnosis_manual.pdf', 'creationdate': '2012-06-15T05:44:40+00:00', 'total_pages': 4114, 'creator': 'Atop CHM to PDF Converter', 'trapped': '', 'page': 2456, 'subject': '', 'source': '/content/drive/MyDrive/medical_diagnosis_manual.pdf', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'modDate': 'D:20251208184942Z', 'author': ''}, page_content='benefit assessment is required in other patients with increased risk of serious bleeding (eg, with\nthrombocytopenia or recent GI bleeding, receiving concurrent heparin, or with recent aspirin or other\nanticoagulant use).\nOther emerging therapies for severe sepsis include cooling for hyperthermia and early treatment of renal\nfailure (eg, with continuous venovenous hemofiltration).\nTrials o

In [38]:
model_output = llm(
      "What is the protocol for managing sepsis?",
      max_tokens=512,
      temperature=0.1,
    )

Llama.generate: 1 prefix-match hit, remaining 10 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =      82.25 ms /    10 tokens (    8.23 ms per token,   121.57 tokens per second)
llama_perf_context_print:        eval time =   23442.16 ms /   497 runs   (   47.17 ms per token,    21.20 tokens per second)
llama_perf_context_print:       total time =   24016.70 ms /   507 tokens
llama_perf_context_print:    graphs reused =        481


In [39]:
model_output['choices'][0]['text']

'\n\nSepsis is a life-threatening condition that can arise from an infection, and early recognition and management are crucial for improving outcomes. The following steps outline the general approach to managing sepsis:\n\n1. Recognition: Identify patients who may have sepsis based on clinical signs and symptoms, such as fever, chills, rapid heart rate, rapid breathing, confusion, and low blood pressure. Use the Sequential Organ Failure Assessment (SOFA) score or the Quick Sequential Organ Failure Assessment (qSOFA) score to help identify patients who may have sepsis.\n2. Resuscitation: Begin resuscitation efforts as soon as possible. This includes administering oxygen, intravenous fluids, and vasopressors if necessary to maintain adequate blood pressure and organ perfusion.\n3. Source control: Identify and address the source of the infection, if possible. This may involve drainage of an abscess, debridement of necrotic tissue, or removal of a catheter or other foreign body.\n4. Antibi

### System and User Prompt Template

In [40]:
qna_system_message = "You are an AI medical assistant. Your goal is to answer questions accurately using only the provided context from the medical manual. If the answer is not contained in the context, state that you do not know."

In [41]:
qna_user_message_template = """
### Context:
{context}

### Question:
{question}

### Answer:
"""

### Response Function

In [42]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.invoke(user_input)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [43]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,top_k=20)

Llama.generate: 1 prefix-match hit, remaining 931 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1137.51 ms /   931 tokens (    1.22 ms per token,   818.45 tokens per second)
llama_perf_context_print:        eval time =    6147.34 ms /   127 runs   (   48.40 ms per token,    20.66 tokens per second)
llama_perf_context_print:       total time =    7365.14 ms /  1058 tokens
llama_perf_context_print:    graphs reused =        122


'The protocol for managing sepsis in a critical care unit involves providing adequate supportive care, including nutrition, prevention of infection, stress ulcers and gastritis, and pulmonary embolism. For patient monitoring and testing, vital signs are measured, and severe sepsis is diagnosed when there is failure of at least one organ. Septic shock is severe sepsis with organ hypoperfusion and hypotension that is poorly responsive to initial fluid resuscitation. Most cases of septic shock are caused by hospital-acquired gram-negative bacilli or gram-positive cocci and'

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [44]:
user_input_2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" #Complete the code to pass the query #2
generate_rag_response(user_input_2)

Llama.generate: 48 prefix-match hit, remaining 919 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1126.92 ms /   919 tokens (    1.23 ms per token,   815.50 tokens per second)
llama_perf_context_print:        eval time =    6030.77 ms /   127 runs   (   47.49 ms per token,    21.06 tokens per second)
llama_perf_context_print:       total time =    7225.63 ms /  1046 tokens
llama_perf_context_print:    graphs reused =        122


"The common symptoms for appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia, which is then followed by pain shifting to the right lower quadrant. The pain increases with cough and motion. Classic signs include right lower quadrant direct and rebound tenderness located at McBurney's point. Additional signs include pain felt in the right lower quadrant with palpation of the left lower quadrant (Rovsing sign), and an increase in pain from passive extension of the right hip joint.\n\nAppendicitis cannot be c"

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [45]:
user_input_2 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" #Complete the code to pass the query #3
generate_rag_response(user_input_2)

Llama.generate: 48 prefix-match hit, remaining 882 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1038.40 ms /   882 tokens (    1.18 ms per token,   849.38 tokens per second)
llama_perf_context_print:        eval time =    5903.09 ms /   127 runs   (   46.48 ms per token,    21.51 tokens per second)
llama_perf_context_print:       total time =    7017.53 ms /  1009 tokens
llama_perf_context_print:    graphs reused =        122


'The condition you are describing is likely alopecia areata. The effective treatments or solutions for addressing sudden patchy hair loss in alopecia areata include:\n\n1. Topical treatments: Topical corticosteroids, minoxidil, anthralin, or immunotherapy (diphencyprone or squaric acid dibutylester) can be used.\n2. Intralesional treatments: Corticosteroid injections can be administered directly into the affected areas.\n3. Systemic treatments: In severe cases, systemic corticosteroids may'

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [46]:
user_input_2 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" #Complete the code to pass the query #4
generate_rag_response(user_input_2)

Llama.generate: 48 prefix-match hit, remaining 858 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1022.59 ms /   858 tokens (    1.19 ms per token,   839.04 tokens per second)
llama_perf_context_print:        eval time =    5854.11 ms /   127 runs   (   46.10 ms per token,    21.69 tokens per second)
llama_perf_context_print:       total time =    6943.27 ms /   985 tokens
llama_perf_context_print:    graphs reused =        122


'The context suggests that there is no specific treatment for brain damage, but supportive care is recommended. This includes preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers. Early intervention by rehabilitation specialists is also important for maximal functional recovery, which includes prevention of secondary disabilities, prevention of pneumonia, and family education. Rehabilitation specialists should evaluate patients to establish baseline findings and reevaluate them before starting rehabilitation therapy to help prioritize treatment. Patients with severe cognitive dysfunction require extensive cognitive therapy, which is often begun immediately after injury and continued for months or years'

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [47]:
user_input_2 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?" #Complete the code to pass the query #5
generate_rag_response(user_input_2)

Llama.generate: 48 prefix-match hit, remaining 856 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1021.17 ms /   856 tokens (    1.19 ms per token,   838.26 tokens per second)
llama_perf_context_print:        eval time =    5811.79 ms /   127 runs   (   45.76 ms per token,    21.85 tokens per second)
llama_perf_context_print:       total time =    6905.67 ms /   983 tokens
llama_perf_context_print:    graphs reused =        122


'The person with a fractured leg should first receive evaluation in the emergency department for any signs or symptoms of ischemia, such as absent pulses, marked pallor, coolness distal to the injury, or severe pain. If the limb is injured, they should be immediately evaluated for these symptoms. The mechanism of the injury, which is the direction of force applied to the bone or joint, may suggest the type of injury. For a fracture, there is usually immediate pain.\n\nThe fracture should be immobilized immediately by splinting to prevent further injury to soft tissues and to decrease pain'

### Fine-tuning

#### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [48]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,temperature=0.5)

Llama.generate: 48 prefix-match hit, remaining 884 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1028.22 ms /   884 tokens (    1.16 ms per token,   859.74 tokens per second)
llama_perf_context_print:        eval time =    5851.56 ms /   127 runs   (   46.08 ms per token,    21.70 tokens per second)
llama_perf_context_print:       total time =    6950.79 ms /  1011 tokens
llama_perf_context_print:    graphs reused =        122


'The protocol for managing sepsis in a critical care unit includes supportive care such as adequate nutrition, prevention of infection, stress ulcers and gastritis, and pulmonary embolism. Patient monitoring and testing involve measuring vital signs and monitoring for signs of organ failure. Severe sepsis is defined as sepsis with failure of at least one organ, and septic shock is severe sepsis with organ hypoperfusion and unresponsive hypotension. Most cases of septic shock are caused by hospital-acquired gram-negative bacilli or gram-positive cocci and'

#### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [49]:
user_input_2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
generate_rag_response(user_input_2, temperature = 0.5)

Llama.generate: 48 prefix-match hit, remaining 919 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1113.31 ms /   919 tokens (    1.21 ms per token,   825.47 tokens per second)
llama_perf_context_print:        eval time =    5876.23 ms /   127 runs   (   46.27 ms per token,    21.61 tokens per second)
llama_perf_context_print:       total time =    7062.75 ms /  1046 tokens
llama_perf_context_print:    graphs reused =        122


"The common symptoms for appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia, which is then followed by pain shifting to the right lower quadrant. The pain increases with cough and motion. Classic signs are right lower quadrant direct and rebound tenderness located at McBurney's point. Additional signs are pain felt in the right lower quadrant with palpation of the left lower quadrant (Rovsing sign), an increase in pain from passive extension of the right hip joint, and pain with palpation of the left"

#### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [50]:
user_input_2 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
generate_rag_response(user_input_2, temperature = 0.5)

Llama.generate: 48 prefix-match hit, remaining 882 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1035.63 ms /   882 tokens (    1.17 ms per token,   851.66 tokens per second)
llama_perf_context_print:        eval time =    5915.55 ms /   127 runs   (   46.58 ms per token,    21.47 tokens per second)
llama_perf_context_print:       total time =    7027.45 ms /  1009 tokens
llama_perf_context_print:    graphs reused =        122


'The condition you are describing is likely alopecia areata. The treatment options for alopecia areata include topical, intralesional, or systemic corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). The underlying cause of alopecia areata is nonspecific inflammation. It is important to note that the most common cause of alopecia is androgenetic al'

#### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [51]:
user_input_2 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
generate_rag_response(user_input_2, temperature = 0.5)

Llama.generate: 48 prefix-match hit, remaining 858 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1035.56 ms /   858 tokens (    1.21 ms per token,   828.53 tokens per second)
llama_perf_context_print:        eval time =    5980.86 ms /   127 runs   (   47.09 ms per token,    21.23 tokens per second)
llama_perf_context_print:       total time =    7087.94 ms /   985 tokens
llama_perf_context_print:    graphs reused =        122


'The context suggests that there is no specific treatment for brain damage other than supportive care. Supportive care includes preventing systemic complications such as pneumonia, UTI, and thromboembolic disease, providing good nutrition, and preventing pressure ulcers. Early intervention by rehabilitation specialists is also recommended for maximal functional recovery, which includes prevention of secondary disabilities, and evaluation and reevaluation of patients to establish baseline findings and prioritize treatment. Patients with severe cognitive dysfunction require extensive cognitive therapy. For mild injuries, discharge and observation are recommended. For moderate and severe injuries, optimization of ventilation,'

#### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [52]:
user_input_2 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
generate_rag_response(user_input_2, temperature = 0.5)

Llama.generate: 48 prefix-match hit, remaining 856 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1032.80 ms /   856 tokens (    1.21 ms per token,   828.81 tokens per second)
llama_perf_context_print:        eval time =    5980.89 ms /   127 runs   (   47.09 ms per token,    21.23 tokens per second)
llama_perf_context_print:       total time =    7095.87 ms /   983 tokens
llama_perf_context_print:    graphs reused =        122


'The first priority for a person with a fractured leg, as with any injury, is to ensure their safety and prevent further harm. This may involve immobilizing the leg with a splint to prevent additional injury to soft tissues and decrease pain. For long-bone fractures, splinting can also help prevent fat embolism.\n\nIf the fracture involves an artery, the person may require surgical repair. However, if the injury only affects small arteries with good collateral circulation, no repair may be necessary. For nerve injuries, observation, supportive measures, and sometimes physical therapy may be the'

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [53]:
groundedness_rater_system_message = "You are a rater. Your task is to check if the answer is based on the provided context. If the answer is fully supported by the context, respond with 'Grounded'. If the answer contains information not found in the context, respond with 'Not Grounded'."

In [54]:
relevance_rater_system_message = "You are a rater. Your task is to check if the answer is relevant to the user question. If the answer directly addresses the question, respond with 'Relevant'. If it does not, respond with 'Not Relevant'."

In [55]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [56]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.invoke(user_input)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [57]:
ground,rel = generate_ground_relevance_response(user_input="What is the protocol for managing sepsis in a critical care unit?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

Llama.generate: 1 prefix-match hit, remaining 947 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1174.85 ms /   947 tokens (    1.24 ms per token,   806.06 tokens per second)
llama_perf_context_print:        eval time =   11158.40 ms /   242 runs   (   46.11 ms per token,    21.69 tokens per second)
llama_perf_context_print:       total time =   12506.63 ms /  1189 tokens
llama_perf_context_print:    graphs reused =        233
Llama.generate: 6 prefix-match hit, remaining 1198 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1461.34 ms /  1198 tokens (    1.22 ms per token,   819.80 tokens per second)
llama_perf_context_print:        eval time =    3395.44 ms /    70 runs   (   48.51 ms per token,    20.62 tokens per second)
llama_perf_context_print:       total time =    4891.91 ms /  1268 tokens
llama_perf_context_print:    

 Grounded. The answer is fully supported by the context, which discusses the approach to managing sepsis in a critical care unit, including supportive care measures, patient monitoring, and treatment options. The answer accurately summarizes the information provided in the context regarding the diagnosis and treatment of sepsis, as well as the causes and risk factors.

 Relevant. The user asked about the protocol for managing sepsis in a critical care unit, and the answer provided information on the diagnosis, causes, symptoms, and treatment of sepsis, as well as the role of supportive care and antibiotics in managing the condition. The answer also mentioned the importance of adapting guidelines to individual patient circumstances, which is relevant to the user's question.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [58]:
ground,rel = generate_ground_relevance_response(user_input="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

Llama.generate: 6 prefix-match hit, remaining 977 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1204.19 ms /   977 tokens (    1.23 ms per token,   811.33 tokens per second)
llama_perf_context_print:        eval time =   11195.08 ms /   224 runs   (   49.98 ms per token,    20.01 tokens per second)
llama_perf_context_print:       total time =   12568.82 ms /  1201 tokens
llama_perf_context_print:    graphs reused =        216
Llama.generate: 6 prefix-match hit, remaining 1215 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1489.38 ms /  1215 tokens (    1.23 ms per token,   815.78 tokens per second)
llama_perf_context_print:        eval time =    3487.16 ms /    71 runs   (   49.11 ms per token,    20.36 tokens per second)
llama_perf_context_print:       total time =    5010.96 ms /  1286 tokens
llama_perf_context_print:    

 Grounded. The context clearly states that the symptoms of appendicitis include epigastric or periumbilical pain followed by pain shifting to the right lower quadrant, as well as other signs mentioned. It also states that the treatment for appendicitis is surgical removal, and that antibiotics are used before and after the surgery.

 Relevant. The answer directly addresses the user's question about the common symptoms of appendicitis and the treatment options, including the fact that it cannot be cured via medicine alone and the standard treatment is surgical removal.


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [59]:
ground,rel = generate_ground_relevance_response(user_input="What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

Llama.generate: 6 prefix-match hit, remaining 940 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1131.77 ms /   940 tokens (    1.20 ms per token,   830.56 tokens per second)
llama_perf_context_print:        eval time =    8015.55 ms /   170 runs   (   47.15 ms per token,    21.21 tokens per second)
llama_perf_context_print:       total time =    9251.44 ms /  1110 tokens
llama_perf_context_print:    graphs reused =        164
Llama.generate: 6 prefix-match hit, remaining 1124 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1341.88 ms /  1124 tokens (    1.19 ms per token,   837.63 tokens per second)
llama_perf_context_print:        eval time =    4179.77 ms /    90 runs   (   46.44 ms per token,    21.53 tokens per second)
llama_perf_context_print:       total time =    5571.15 ms /  1214 tokens
llama_perf_context_print:    

 Grounded. The context clearly states that alopecia areata is a type of sudden patchy hair loss and provides a list of treatment options for it, which includes the ones mentioned in the answer. The answer also correctly identifies alopecia areata as a nonscarring alopecia and mentions that the cause is not well understood but believed to be an autoimmune disorder, which is consistent with the context.

 Relevant. The answer directly addresses the user question by providing information about the causes and effective treatments for alopecia areata, which is a type of sudden patchy hair loss.


### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [60]:
ground,rel = generate_ground_relevance_response(user_input="What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

Llama.generate: 6 prefix-match hit, remaining 916 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1107.79 ms /   916 tokens (    1.21 ms per token,   826.87 tokens per second)
llama_perf_context_print:        eval time =    6049.98 ms /   132 runs   (   45.83 ms per token,    21.82 tokens per second)
llama_perf_context_print:       total time =    7229.14 ms /  1048 tokens
llama_perf_context_print:    graphs reused =        127
Llama.generate: 6 prefix-match hit, remaining 1062 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1269.40 ms /  1062 tokens (    1.20 ms per token,   836.62 tokens per second)
llama_perf_context_print:        eval time =    2149.26 ms /    47 runs   (   45.73 ms per token,    21.87 tokens per second)
llama_perf_context_print:       total time =    3446.15 ms /  1109 tokens
llama_perf_context_print:    

 Grounded. The answer is fully supported by the context, which mentions supportive care, preventing systemic complications, providing good nutrition, preventing pressure ulcers, and early intervention by rehabilitation specialists for patients with brain damage.

 Relevant. The answer directly addresses the question by explaining the recommended treatments for a person with brain damage, which include supportive care and early intervention by rehabilitation specialists.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [61]:
ground,rel = generate_ground_relevance_response(user_input="What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

Llama.generate: 6 prefix-match hit, remaining 914 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1104.08 ms /   914 tokens (    1.21 ms per token,   827.84 tokens per second)
llama_perf_context_print:        eval time =   16913.13 ms /   369 runs   (   45.84 ms per token,    21.82 tokens per second)
llama_perf_context_print:       total time =   18325.88 ms /  1283 tokens
llama_perf_context_print:    graphs reused =        356
Llama.generate: 6 prefix-match hit, remaining 1298 prompt tokens to eval
llama_perf_context_print:        load time =     649.74 ms
llama_perf_context_print: prompt eval time =    1576.96 ms /  1298 tokens (    1.21 ms per token,   823.10 tokens per second)
llama_perf_context_print:        eval time =    2372.04 ms /    50 runs   (   47.44 ms per token,    21.08 tokens per second)
llama_perf_context_print:       total time =    3975.49 ms /  1348 tokens
llama_perf_context_print:    

 Grounded. The answer is fully supported by the context, which discusses the evaluation, immobilization, treatment, and rehabilitation of leg fractures, as well as the risk of skin breakdown and the importance of proper prosthetic fitting.

 Relevant. The answer directly addresses the user question by providing necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, as well as considerations for their care and recovery.


## Actionable Insights and Business Recommendations


*   Enhanced Diagnostic Efficiency: The RAG system successfully retrieves relevant medical protocols (e.g., for sepsis and fractures) and generates grounded answers. This proves that AI can significantly reduce the time healthcare professionals spend searching through 4,000+ pages of manuals, directly addressing the "information overload" challenge.
*  High Reliability for Decision Support: The "LLM-as-a-judge" evaluation consistently rated the system's responses as both "Grounded" and "Relevant". This demonstrates that the system minimizes hallucinations and provides trustworthy information, which is critical for standardizing care and ensuring patient safety.
* Scalable Knowledge Access: The prototype confirms that complex medical knowledge can be democratized and accessed instantly via natural language queries. This suggests that deploying such a tool across healthcare centers could streamline operations and ensure all staff have immediate access to the latest standard operating procedures.



<font size=6 color='blue'>Power Ahead</font>
___



*   Deploying as a Web Application: The next step is to wrap this backend logic into a user-friendly interface. This would allow doctors and nurses to interact with the system via a simple chat window on their tablets or hospital computers.
*   Integrating Advanced RAG Techniques: To further improve accuracy, we could implement Hybrid Search (combining keyword search with semantic search) and Re-ranking. Adding a re-ranker step would double-check the retrieved documents to ensure the absolute most relevant medical protocols are passed to the LLM.
* Multimodal Capabilities: Future iterations could incorporate multimodal LLMs (like Llama 3.2 Vision or GPT-4o) to allow the system to interpret not just text queries, but also medical images (X-rays, MRI scans) alongside the textual protocols

