## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━[0m [32m1.4/1.8 MB[0m [31m41.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m49.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m221.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m221.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m261.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m199.1 MB/

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [None]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community chromadb sentence-transformers numpy -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m110.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m101.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.8/20.8 MB[0m [31m129.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m96.6 MB/s[0m eta [36m0:00:

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [None]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model





#### Downloading and Loading chat model

In [None]:
# Defining the Hugging Face repository and model version for Mistral-7B fine-tuned for instruction-following
model_name_or_path = 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'

# Specifying the file name for the quantized Mistral-7B model in GGUF format (Q6_K for optimal performance)
model_basename = 'mistral-7b-instruct-v0.2.Q6_K.gguf'


## Observation:
Downloaded the mistral model. Will be using mistral model for further analysis.


In [None]:
# Downloading the specified model file from Hugging Face Hub and store its local path
model_path = hf_hub_download(
    repo_id=model_name_or_path, #The Hugging Face repository containing the model
    filename=model_basename  # The specific model file to download (in GGUF format)
)
#The GGUF format is used because it provides memory-efficient storage and faster inference while maintaining compatibility across different hardware platforms.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [None]:
# Loading the LLaMA model with specified context, GPU layers, and batch size
llm = Llama(
    model_path=model_path, #Path to the GGUF model file
    n_ctx=4096, #Sets the context window to 2300 tokens (how much text the model can "see" at once)
    n_gpu_layers=38, #Loads 38 model layers onto GPU for faster inference (set to 0 for CPU-only)
    n_batch=256, #Number of tokens processed at
    use_mmap = False

)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
llm

<llama_cpp.llama.Llama at 0x798047b08f50>

In [None]:
model_path

'/root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf'

In [None]:
def response1(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
response1('What is the protocol for managing sepsis in a critical care unit?')


'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.\n2. Resusc'

**Observation:**
The response generated by the LLM is detailed and medically appropriate regarding the management of sepsis in a critical care unit.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
response1('Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?')

Llama.generate: prefix-match hit


'\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small pouch that extends from the large intestine. The following are common symptoms of appendicitis:\n\n1. Abdominal pain: The pain typically starts around the navel area and then shifts to the lower right side of the abdomen. It may be constant or come and go, and it can range from mild to severe.\n2. Loss of appetite: People with appendicitis often lose their appetite due to abdominal pain and discomfort.\n3. Nausea and vomiting:'

**Observations:**

The answer, similar to the first query, is medically accurate, demonstrating that the model is performing effectively.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
response1('Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?')

Llama.generate: prefix-match hit


"\n\nSudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. The exact cause of alopecia areata is not fully understood, but it's believed to be related to an abnormal immune response that attacks the hair follicles, leading to hair loss.\n\nThere are several treatments and solutions for addressing sudden patchy hair loss:\n\n1. Corticosteroids: These are anti-inflammatory medications that can help reduce inflammation and suppress the immune system's attack on the"

**Observations:**

The answer is medically appropriate and lists effective treatements for sudden patchy hair loss.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
response1('Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?')

Llama.generate: prefix-match hit


"\n\nAnswer: Treatment for a person with a brain injury depends on the severity and location of the injury. Here are some common treatments that may be recommended:\n\n1. Emergency care: If the brain injury is severe, the first priority is to provide emergency care to stabilize the patient's condition. This may include surgery to remove hematomas or other obstructions, administering medications to prevent further damage, and providing oxygen and fluids to maintain vital signs.\n2. Rehabilitation: Rehabilitation is an essential part of treatment for brain injuries. It can help improve function, reduce symptoms"

**Observation**
The answer provided is medically appropriate and offers a comprehensive list of common treatments for brain injury, which aligns well with the query.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
response1('Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?')

Llama.generate: prefix-match hit


'\n\nA fractured leg is a serious injury that requires prompt medical attention. Here are some necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip:\n\n1. Assess the severity of the injury: Check if there is any open wound, swelling, or deformity in the leg. If the bone is protruding through the skin, do not try to move the person and call for emergency medical help immediately.\n2. Provide first aid: Apply a sterile dressing to any wounds and immobilize the leg using a splint or a makes'

** Observations **

This answer is comprehensive and medically appropriate, outlining the necessary precautions and treatment steps for a person who has fractured their leg while hiking.


## Strengths and Limitation of Mistral 7B model in this scenario

**Strengths**

Comprehensive Responses: The Mistral 7B model provides detailed and medically appropriate answers, covering a broad spectrum of information.

Structured Answer Format: The model outputs answers in a clear, structured format, using numbered lists to break down the response into digestible steps.

Contextual Relevance: The model generates contextually relevant and precise information based on the query, addressing each aspect of the question.

**Limitations**

Incomplete Responses: All of the answers provided by the this model are cut off mid-sentence (e.g., the answer about treatments for traumatic brain injury or fractures), suggesting that the model might be truncating its output due to the token limits. This results in incomplete answers, which could be problematic for users seeking thorough information.

Generic Responses: While the answers are medically appropriate in structure, some responses are quite general (e.g., treatment steps or protocols for common injuries). This could imply that the model might need further fine-tuning or more specific context to generate highly detailed, situation-specific answers.

Need for Contextual Depth: The model tends to provide useful but somewhat high-level responses. This could make the answers seem more like generic guidelines rather than tailored advice for specific scenarios.
**Next Steps**
Therefore, we will leverage prompt engineering to enhance the quality of our responses.

## Question Answering using LLM with Prompt Engineering

In [None]:
## Write the code for promopt engineering function
def generate_llama_response(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.7,
        top_p=0.95,
        repeat_penalty=1.1,
        top_k=50,
        stop=['</s>'],
        echo=False,
        )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
generate_llama_response('Query 1: What is the protocol for managing sepsis in a critical care unit?')

"Response: The management of sepsis in a critical care unit involves a systematic approach known as the Surviving Sepsis Campaign (SSC) guidelines. Here are the key steps:\n\n1. Early recognition and identification: Assess patients with suspected infection and organize for septic workup, including blood cultures and lactate measurement.\n2. Fluid resuscitation: Initiate intravenous fluid replacement with isotonic crystalloids to maintain mean arterial pressure (MAP) ≥65 mm Hg and urine output >0.5 ml/kg/h.\n3. Antimicrobial therapy: Administer broad-spectrum antibiotics based on the suspected source of infection as soon as possible (ideally within 1 hour).\n4. Source control: Address the underlying source of infection, such as surgical debridement, drainage, or endovascular repair.\n5. Vasopressor and/or inotropic support: If necessary to maintain MAP >65 mm Hg, administer vasopressors or inotropes as recommended by the SSC guidelines.\n6. Corticosteroids: Consider using corticosteroid

**Modified parameters - Temperature =0.0

In [None]:
def generate_llama_response1(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.0,
        top_p=0.95,
        repeat_penalty=1.1,
        top_k=50,
        stop=['</s>'],
        echo=False,
        )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()

In [None]:
generate_llama_response1('Query 1: What is the protocol for managing sepsis in a critical care unit?')

Llama.generate: prefix-match hit


'Response: The management of sepsis in a critical care unit involves a systematic approach based on current guidelines from organizations such as the Surviving Sepsis Campaign (SSC). Here are the key steps:\n\n1. Early recognition and diagnosis: Identify sepsis suspects using clinical criteria, such as the Sequential [Sepsis-related] Organ Failure Assessment (SOFA) score or Quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA) score.\n\n2. Early goal-directed therapy: Initiate resuscitation with intravenous fluids and vasopressors to maintain adequate tissue perfusion, as indicated by a mean arterial pressure ≥65 mmHg and a central venous oxygen saturation (ScvO2) >70%.\n\n3. Source control: Identify and address the source of infection, such as removing catheters or draining abscesses.\n\n4. Antimicrobial therapy: Administer broad-spectrum antimicrobials based on suspected pathogens and local resistance patterns.\n\n5. Corticosteroids: Consider using corticosteroids in spe

**Observation**

Answer 1, generated with a temperature of 0.7, is more comprehensive and nuanced, offering a clear step-by-step clinical protocol with specific metrics (e.g., MAP ≥ 65 mmHg, urine output ≥ 0.5 ml/kg/h). This reflects a more dynamic and human-like response, likely enhanced by the higher temperature which allows the model to explore a broader range of relevant tokens, contributing to richer content.

Answer 2, with temperature 0.0, is more conservative and deterministic, providing keysteps, sticking closely to the most probable next token at each step. Its not very human like conversational text which a doctor may give to patients

Overall, Answer 1 is superior in terms of completeness, clinical clarity, and usefulness for decision-making in a critical care setting.

Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

** Basic parameter evaluation **

In [None]:
generate_llama_response('What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?')

Llama.generate: prefix-match hit


'The common symptoms for appendicitis include:\n1. Sudden pain in the lower right abdomen that may begin as mild but quickly worsens\n2. Loss of appetite\n3. Nausea and vomiting\n4. Fever, which may start after a few hours or days of having other symptoms\n5. Abdominal swelling and rigidity\n6. Constipation or diarrhea\n7. Inability to pass gas\n8. Pain upon walking or moving around\n\nAppendicitis cannot be cured via medicine alone as the appendix will eventually rupture if left untreated, leading to severe complications such as peritonitis and sepsis. The standard treatment for appendicitis is a surgical procedure called an appendectomy, which involves removing the infected appendix before it ruptures. This surgery can be performed through open incision or laparoscopically. Early diagnosis and prompt surgical intervention are crucial in preventing complications and ensuring the best possible outcome.'

**Modified Parameters - Top-p = 0.85 **

In [None]:
## Write the code for promopt engineering function
def generate_llama_response2(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.7,
        top_p=0.85,
        repeat_penalty=1.1,
        top_k=50,
        stop=['</s>'],
        echo=False,
        )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()


In [None]:
generate_llama_response2('### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?')

Llama.generate: prefix-match hit


"Appendicitis is a medical condition characterized by inflammation of the appendix. Common symptoms include:\n\n1. Sudden pain in the lower right abdomen, which may begin as mild and then progressively worsen over several hours.\n2. Loss of appetite.\n3. Nausea and vomiting.\n4. Abdominal swelling.\n5. Fever (often a temperature above 37.8°C or 100°F).\n6. Constipation or diarrhea.\n\nAppendicitis cannot be cured through medication alone, as the inflammation and potential infection can only be effectively treated by surgically removing the appendix (an appendectomy). This procedure is typically performed as an emergency surgery due to the risk of rupture, which could lead to peritonitis – a potentially life-threatening condition. If you suspect that you may have appendicitis, it's crucial to seek medical attention immediately."

**Observations on Query 2**

Answer 1, generated with a top_p of 0.95, allows the model to consider a wider range of possible next tokens, leading to slightly more diverse and expansive content. The answer is informative but a bit general in structure.

Answer 2, generated with a top_p = 0.85, that is a narrower top_p, the model restricts its output to more probable and focused responses.

Overall, Answer 2 is better because the lower top_p value helps the model generate more focused and relevant content, minimizing unnecessary variation.



### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

** Basic Parameter **

In [None]:
generate_llama_response('What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it')

Llama.generate: prefix-match hit


'?\n\nSudden patchy hair loss, also known as alopecia areata, is an autoimmune disorder that causes hair loss in small patches on the scalp or other areas of the body. There are several treatments for addressing this condition:\n\n1. Corticosteroids: Topical and injectable corticosteroids are the most commonly used treatments for alopecia areata. They help to suppress the immune system response, reducing inflammation and promoting hair regrowth.\n2. Minoxidil: Minoxidil is a medication that has been shown to promote hair growth in some people with alopecia areata. It is available as a topical solution or foam.\n3. DHT blockers: Finasteride and dutasteride are medications that block the production of dihydrotestosterone (DHT), which is believed to contribute to hair loss. Some people with alopecia areata have found these medications helpful, but they may not be effective for everyone.\n4. Hair transplantation: In severe cases of alopecia areata, hair transplantation may be an option. Th

** Modified parameters - Top_k = 80

In [None]:
## Write the code for promopt engineering function
def generate_llama_response3(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.7,
        top_p=0.95,
        repeat_penalty=1.1,
        top_k=80,
        stop=['</s>'],
        echo=False,
        )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()


In [None]:
generate_llama_response2('Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it')

Llama.generate: prefix-match hit


"?\n\nResponse:\nSudden patchy hair loss can be alarming, but there are several potential causes and treatments. Here are some common explanations and corresponding solutions:\n\n1. Alopecia Areata: This is an autoimmune disorder causing sudden hair loss in small patches on the scalp, face, or other areas of the body. There is no cure for alopecia areata, but treatments such as corticosteroids, immunotherapy (Minoxidil), or light therapy can help stimulate hair growth.\n\n2. Traction Alopecia: This condition results from excessive pulling or tension on the hair, often due to tight hairstyles like ponytails, braids, weaves, or extensions. To prevent further damage, avoid using tight hairstyles and consider gentler alternatives, such as loose braids or twists.\n\n3. Nutritional Deficiencies: Lack of essential nutrients like iron, zinc, vitamin B12, and biotin can cause hair loss. Incorporate a balanced diet rich in these nutrients or consider supplements under the guidance of a healthcar

**Observations on Query 3**

Answer 1, generated with a top_k of 50, provides a broader overview by listing multiple causes of patchy hair loss and briefly mentioning treatment options for each, which makes it informative but slightly general.

Answer 2, with top_k of 80, dives deeper into specific treatments for different types of hair loss, especially Alopecia Areata and Traction Alopecia, offering more clinical detail and therapeutic options.

Overall, in the context of this project, Answer 2 is better because it shows more medical depth and actionable insights.



### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

** Basic Parameters**

In [None]:
generate_llama_response('What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?')

Llama.generate: prefix-match hit


'The treatment for a brain injury can vary depending on the severity and location of the injury. Here are some common interventions:\n\n1. Medical intervention: This includes addressing any life-threatening conditions such as respiratory failure, seizures, or infections.\n2. Rehabilitation: Physical therapy, occupational therapy, speech therapy, and cognitive rehabilitation can help improve physical function and cognitive abilities.\n3. Medications: Depending on the specific symptoms, medications may be prescribed to manage conditions such as seizures, pain, depression, or memory loss.\n4. Surgery: In some cases, surgery may be required to remove blood clots or repair skull fractures.\n5. Assistive devices and technology: Devices such as wheelchairs, communication aids, and assistive technology can help improve function and independence.\n6. Support groups: Joining support groups with others who have experienced similar injuries can provide emotional support and practical advice.\n7. L

** Max_tokens 512

In [None]:
def generate_llama_response4(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=512,
        temperature=0.7,
        top_p=0.95,
        repeat_penalty=1.1,
        top_k=50,
        stop=['</s>'],
        echo=False,

    )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()

In [None]:
generate_llama_response4('### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?')


Llama.generate: prefix-match hit


"The treatment for a brain injury depends on the severity and location of the injury. Here are some common interventions:\n\n1. Initial care: This includes addressing any life-threatening conditions such as airway obstruction, breathing problems, or excessive bleeding. Brain injuries may require emergency surgery to remove hematomas (clots) or contusions (bruises).\n\n2. Medications: Various medications can be used to manage symptoms and prevent complications. These include pain relievers, anti-inflammatory drugs, anticonvulsants for seizure prevention, and drugs that enhance brain function or protect against further damage.\n\n3. Rehabilitation: Rehabilitation plays a crucial role in helping the person regain lost functions and improve their overall quality of life. This may include physical therapy to help with motor skills, speech therapy for communication issues, occupational therapy to learn new skills, and cognitive rehabilitation to address problems with memory, attention, and p

** Observations on Query 4 **

Answer 1, generated with a Max_tokens of 1024, provides a comprehensive overview of brain injury treatment, listing specific interventions such as medications, surgery, rehabilitation, and supportive care, covering a wide range of recovery aspects, including physical, cognitive, and emotional support.

Answer 2, with Max_tokens of 512, is slightly more concise, focusing on key areas like initial care, rehabilitation, medications, and supportive care. It emphasizes rehabilitation methods and includes mention of surgery when necessary for specific issues such as hematomas or clots.

Overall, Answer 1 is better as it offers a more detailed and well-rounded view, covering additional aspects. With more tokens, the model can cover additional aspects of the topic, giving a fuller answer.



##Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

**Basic Parameters**

In [None]:
generate_llama_response('What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?')

Llama.generate: prefix-match hit


"First, it's crucial to ensure that the person's airway is clear and they have no signs of shock. If there are any signs of shock such as rapid heartbeat, low blood pressure, or confusion, they should receive immediate medical attention.\n\nFor a suspected leg fracture, the following steps should be taken:\n1. Immobilize the injured leg by using a splint or sling to prevent further damage and keep the bone in place.\n2. Apply ice to reduce swelling and pain.\n3. Elevate the injured leg above heart level to minimize swelling and promote healing.\n4. Provide pain relief with over-the-counter medication or prescription painkillers as needed.\n5. Seek medical attention as soon as possible, even if it means hiking out or calling for rescue services.\n\nDuring transport, the person should remain still to prevent any movement that could worsen the fracture. Once at a hospital, the fracture will be diagnosed through X-rays and treated with either a cast or surgery depending on the severity of 

**Modified Parameters based on our learning from previous prompt engineering analysis**

Temp = 0.7
Top_p = 0.85
Top_k = 80
Max_tokens = 1024



In [None]:
def generate_llama_response5(query, instruction="Answer the following question clearly and concisely."):

    # Create system message with instructions for the model
    system_message = f"[INST]<<SYS>>\n{instruction}\n<</SYS>>[/INST]"

    # Construct the final prompt using the user's query and system message
    prompt = f"{system_message} {query}"

    # Generate a response using the LLaMA model
    response = llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.7,
        top_p=0.85,
        repeat_penalty=1.1,
        top_k=80,
        stop=['</s>'],
        echo=False
       )

    # Return only the generated answer text
    return response["choices"][0]["text"].strip()


In [None]:
generate_llama_response5('Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?')

Llama.generate: prefix-match hit


"Answer:\n1. Assess the severity of the injury: Determine if it's an open or closed fracture, if there is any swelling, bruising, or deformity.\n2. Provide first aid: Apply a sterile dressing to the wound, if present, and stabilize the leg using a splint, sling, or brace to prevent movement.\n3. Seek medical attention: Fractures require professional evaluation and treatment. Delay in seeking care may result in complications such as infection or improper healing.\n4. Pain management: Use over-the-counter pain relievers or prescription medications as directed by a healthcare professional.\n5. Rest and elevate the leg: Avoid putting weight on the injured limb and keep it raised to reduce swelling and promote proper healing.\n6. Follow a doctor's recommended rehabilitation program: This may include physical therapy, exercises, and assistive devices to help restore strength and flexibility.\n7. Monitor for complications: Keep an eye out for signs of infection such as redness, warmth, or inc

**Observation on Query 5**

Answer 1, generated with default parameters, provides a more detailed approach to the immediate steps for handling a fractured leg during a hiking trip, emphasizing both the severity assessment and infection prevention, while focusing on the necessity of seeking medical attention in case of complications.

Answer 2, with modified parameters, is more structured and offers a comprehensive recovery plan, addressing not only the immediate care but also long-term considerations like nutrition, rest, and physical therapy. It also focuses on the steps that a layperson can take, such as cleaning the wound and elevating the leg for pain management.

Overall, Answer 2 is more holistic and includes follow-up care, making it slightly better for a complete response that covers both the immediate treatment and recovery process.



## Data Preparation for RAG

### Loading the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
merck_pdf_path = '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf'
pdf_loader = PyMuPDFLoader(merck_pdf_path)
merck = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [None]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(merck[i].page_content,end="\n")


Page Number : 1
pkharbanda@gmail.com
Y0B4FE3CJA
eant for personal use by pkharbanda@gm
shing the contents in part or full is liable
Page Number : 2
pkharbanda@gmail.com
Y0B4FE3CJA
This file is meant for personal use by pkharbanda@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ...........................................................................................................................................................................................
53
1

#### Checking the number of pages

In [None]:
len(merck)


4114

### Data Chunking

In [None]:
# Initializing a RecursiveCharacterTextSplitter to split the text into manageable chunks for embedding and retrieval
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 20
)


In [None]:
#loading the PDF document, extracting its text, and splitting it into smaller chunks
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
document_chunks[1]

Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationdate': '2012-06-15T05:44:40+00:00', 'source': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2025-11-07T19:52:57+00:00', 'trapped': '', 'modDate': 'D:20251107195257Z', 'creationDate': 'D:20120615054440Z', 'page': 1}, page_content='pkharbanda@gmail.com\nY0B4FE3CJA\nThis file is meant for personal use by pkharbanda@gmail.com only.\nSharing or publishing the contents in part or full is liable for legal action.')

In [None]:
document_chunks[2]

Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationdate': '2012-06-15T05:44:40+00:00', 'source': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2025-11-07T19:52:57+00:00', 'trapped': '', 'modDate': 'D:20251107195257Z', 'creationDate': 'D:20120615054440Z', 'page': 2}, page_content='Table of Contents\n1\nFront    ................................................................................................................................................................................................................\n1\nCover    .................................................................................................................

In [None]:
len(document_chunks)

8463

### Embedding

In [None]:
#This model is chosen because of its embedding vector size is the same as our token size in chunking (512).
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')


  embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
# Generating embedding for the first document chunk
embedding_1 = embedding_model.embed_query(document_chunks[1].page_content)
# Generating embedding for the second document chunk
embedding_2 = embedding_model.embed_query(document_chunks[2].page_content)

In [None]:
#Checking if both are of the same size
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)


Dimension of the embedding vector  1024


True

### Vector Database

In [None]:
import os


In [None]:
# Creating the output directory 'merck_db' if it doesn't already exist, so we can save the processed data or vector database files there.
out_dir = '/content/drive/MyDrive/medicalassistant/merck_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [None]:
vectorstore = Chroma.from_documents( #creating a Chroma vector store from a set of document chunks.
    document_chunks, #creating a list of text chunks that will be converted into embeddings..
    embedding_model, #model responsible for embedding the document chunks into vector representations
    persist_directory=out_dir #name of the collection in the Chroma database
)


In [None]:
#Loading Chroma vector store with the given embedding model
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


  vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


In [None]:
#Accessing the embedding function used in the Chroma vector store
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [None]:
#Performing a similarity search in the vector store to find the top 3 most similar documents to "Alopecia Areata"
vectorstore.similarity_search("Alopecia Areata ",k=3)


[Document(metadata={'source': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'format': 'PDF 1.7', 'page': 859, 'moddate': '2025-11-07T19:52:57+00:00', 'creationdate': '2012-06-15T05:44:40+00:00', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'modDate': 'D:20251107195257Z', 'total_pages': 4114, 'creationDate': 'D:20120615054440Z', 'trapped': '', 'subject': '', 'keywords': '', 'creator': 'Atop CHM to PDF Converter', 'file_path': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf'}, page_content='infection or emotional stress. It occasionally coexists with autoimmune vitiligo or thyroiditis.\nDiagnosis\n• Examination\nDiagnosis is by inspection. Alopecia areata typically manifests as discrete circular patches of hair loss\ncharacterized by short broken hairs at the margins, which resemble exclamation points. Nails are\nsometimes pitted or display trachy

**Observations**

From the retrieved chunks, we observe that all the chunks are related to the key terms Alopecia Areata

### Retriever

In [None]:
retriever = vectorstore.as_retriever( #Converting the Chroma vector store into a retriever for querying.
    search_type='similarity', #Specifying that retrieval is based on cosine similarity
    search_kwargs={'k': 3} #Retrieving the top 3 most similar documents for a given query.
)

In [None]:
user_input = 'What are the symptoms of migraine?'
rel_docs = retriever.get_relevant_documents(user_input)
rel_docs


  rel_docs = retriever.get_relevant_documents(user_input)


[Document(metadata={'keywords': '', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'creationdate': '2012-06-15T05:44:40+00:00', 'total_pages': 4114, 'page': 1895, 'subject': '', 'creator': 'Atop CHM to PDF Converter', 'moddate': '2025-11-07T19:52:57+00:00', 'format': 'PDF 1.7', 'author': '', 'file_path': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'trapped': '', 'source': '/content/drive/MyDrive/medicalassistant/medical_diagnosis_manual.pdf', 'modDate': 'D:20251107195257Z', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationDate': 'D:20120615054440Z'}, page_content='are less common than visual auras. Some patients have an aura with little or no headache.\nHeadache varies from moderate to severe, and attacks last from 4 hours to several days, typically\nresolving with sleep. The pain is often unilateral but may be bilateral, most often in a frontotemporal\ndistribution, and is typically described as pulsating or throbbing.

### System and User Prompt Template

In [None]:
# System message instructing the LLM to only answer using Merck Manual 19th Edition
qna_system_message = """
You are a helpful assistant trained to answer questions based only on the Merck Manual of Medical Diagnosis and Therapy, Nineteenth Edition.
Use the context provided to find accurate and reliable answers.
If the answer is not found in the context, reply with "I don't know".
Do not mention the context or the Merck Manual in your final answer.
"""

In [None]:
# Template for formatting the user's input with context from the Merck Manual, 19th Edition and the actual medical question.
qna_user_message_template = """
###Context
The following excerpts are from the Merck Manual of Medical Diagnosis and Therapy, Nineteenth Edition:
{context}

###Question
{question}
"""

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input1 = 'What is the protocol for managing sepsis in a critical care unit?'
print(generate_rag_response(user_input1))

Llama.generate: prefix-match hit


Based on the context provided, the Merck Manual does not specifically discuss the protocol for managing sepsis in a critical care unit. However, it does mention that ICU patients typically have routine daily blood tests to help detect problems early, including infection. Patients with arrhythmias should also have Mg, phosphate, and Ca levels measured. Patients receiving TPN need weekly liver enzymes and coagulation profiles. Other tests (eg, blood culture for fever, CBC after a bleeding episode) are done as needed. The context also mentions that every ICU should strictly follow protocols for investig


Fine Tuning - Removing Chunk Overlap
Chunk size = 512, overlap = 0

In [None]:
# Initializing a RecursiveCharacterTextSplitter to split the text into manageable chunks for embedding and retrieval
text_splitter1 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 0
)

In [None]:
#loading the PDF document, extracting its text, and splitting it into smaller chunks
document_chunks1 = pdf_loader.load_and_split(text_splitter1)

In [None]:
#Checking the number of text chunks the pdf has been split into
len(document_chunks1)


8441

In [None]:
# Generating embedding for the first document chunk
embedding_1 = embedding_model.embed_query(document_chunks1[0].page_content)
# Generating embedding for the second document chunk
embedding_2 = embedding_model.embed_query(document_chunks1[1].page_content)

In [None]:
vectorstore1 = Chroma.from_documents( #creating a Chroma vector store from a set of document chunks.
    document_chunks1, #creating a list of text chunks that will be converted into embeddings..
    embedding_model, #model responsible for embedding the document chunks into vector representations
    persist_directory=out_dir #name of the collection in the Chroma database
)

In [69]:
 #Loading Chroma vector store with the given embedding model
vectorstore1 = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

  vectorstore1 = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


In [70]:
#Accessing the embedding function used in the Chroma vector store
vectorstore1.embeddings


HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [71]:
retriever1 = vectorstore1.as_retriever( #Converting the Chroma vector store into a retriever for querying.
    search_type='similarity', #Specifying that retrieval is based on cosine similarity
    search_kwargs={'k': 3} #Retrieving the top 3 most similar documents for a given query.
)

In [72]:
def generate_rag_response1(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks1 = retriever1.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks1]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

In [73]:
user_input1 = 'What is the protocol for managing sepsis in a critical care unit?'
print(generate_rag_response1(user_input1))


Llama.generate: prefix-match hit


Based on the context provided, the Merck Manual does not specifically discuss the protocol for managing sepsis in a critical care unit. However, it does mention that ICU patients typically have routine daily blood tests to help detect problems early, including infection. Patients with arrhythmias should also have Mg, phosphate, and Ca levels measured. Patients receiving TPN need weekly liver enzymes and coagulation profiles. Other tests (eg, blood culture for fever, CBC after a bleeding episode) are done as needed. The context also mentions that every ICU should strictly follow protocols for investig


**Observations on Query 1**

Removing chunk overlap did not change the answers in this case, probably because the relavant information for the question was already contained within a single chunk.

Removing chunk overlap might affect the answer if the question required information that spanned across multiple chunks from different parts of the text.



### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

**Basic Parameter**

In [82]:
user_input2 = 'Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?'
print(generate_rag_response(user_input2))

Llama.generate: prefix-match hit


The common symptoms for appendicitis include abdominal pain, anorexia (loss of appetite), and abdominal tenderness. Appendicitis cannot be cured via medicine alone; the standard treatment is surgical removal of the appendix through open or laparoscopic appendectomy. This procedure should be preceded by IV antibiotics to prevent infection spread. If the appendix is perforated, antibiotics should be continued until the patient's temperature and WBC count have normalized. In cases where a large inflammatory mass is found involving the appendix, terminal ile


**Fine Tuning Retriver Parameters**

In [75]:
retriever2 = vectorstore.as_retriever( #Converting the Chroma vector store into a retriever for querying.
    search_type='similarity', #Specifying that retrieval is based on cosine similarity
    search_kwargs={'k': 2} #Retrieving the top 2 most similar documents for a given query.
)

In [80]:
def generate_rag_response2(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever2.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

In [81]:
user_input2 = 'What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?'
print(generate_rag_response2(user_input2))


Llama.generate: prefix-match hit


The common symptoms for appendicitis include abdominal pain, loss of appetite, nausea or vomiting, fever, and a feeling of being sick. Appendicitis cannot be cured via medicine alone as it requires surgical removal of the appendix due to the risk of perforation and potential complications such as peritonitis, gangrene, and sepsis. The preferred treatment is open or laparoscopic appendectomy. Antibiotics are administered before surgery to prevent infection and reduce the risk of complications. If the appendix is perforated, antibiot


**Observations on Query 2**

Answer 1, with k=3, is slightly more detailed and includes examples of cases (like when surgery is impossible).

Answer 2, with k=2, is slightly shorter and cuts off earlier, suggesting that retrieving more chunks (k=3) provided additional helpful context for a more complete answer.



### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [83]:
user_input3 = 'What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?'
print(generate_rag_response(user_input3))

Llama.generate: prefix-match hit


Based on the context provided, sudden patchy hair loss, also known as alopecia areata, is an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers. The most effective treatments for addressing this condition include corticosteroids, retinoids, or immunosuppressants. These medications help suppress the immune system's attack on hair follicles and promote hair regrowth. In some cases, microscopic hair examination or scalp biopsy may be required for definitive diagnosis. It is essential to note that alopecia areata is


**Fine Tuning - LLM Parameters**

Temperature = 0.7



In [84]:
user_input3 = 'What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?'
print(generate_rag_response(user_input3, temperature=0.7))


Llama.generate: prefix-match hit


Based on the context provided, the condition being described is alopecia areata. The Merck Manual suggests that alopecia areata is a sudden patchy hair loss in people with no obvious skin or systemic disorder. It affects genetically susceptible individuals exposed to unclear environmental triggers. The treatment for alopecia areata includes corticosteroids, retinoids, and immunosuppressants. These medications can help promote regrowth of the affected hair. However, it's important to note that alopecia areata is an autoimmune disorder,


**Observations on Query 3**

Answer 1, with temperature = 0, is more precise and structured.

Answer 2, with temperature = 0.7, introduces a bit more variability, with slight changes in wording, reflecting the increased randomness from the higher temperature.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

** Basic Parameters**


In [89]:
user_input4 = 'What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?'
print(generate_rag_response(user_input4))

Llama.generate: prefix-match hit


Based on the context provided, the following treatments are recommended for a person who has sustained a physical injury to brain tissue:
1. Ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed in patients with more severe injury to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas.
3. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are


**Modified LLM Parameters top_p=0.8**

In [90]:
user_input4 = 'What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?'
print(generate_rag_response(user_input4, top_p=0.8))

Llama.generate: prefix-match hit


Based on the context provided, the following treatments are recommended for a person who has sustained a physical injury to brain tissue:
1. Ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure.
2. Surgery may be needed in patients with more severe injury to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas.
3. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are


**Observations on Query 4**

Answer 1 with Top_p = 0.95 means the model will consider the top 95% probability mass of the next token and select from that pool. This makes the answer generated more focused, coherent responses because it prioritizes more likely words and phrases.

Answer 2 with Top_p = 0.8 means the model restricts the token selection to only the top 80% of the probability distribution, including less likely but more diverse options.

Thus, in Answer 1, the model focuses more on the main facts (like supportive care and rehab) without adding unnecessary or unrelated details. In Answer 2, the model explores a wider range of possibilities, leading to additional information about severe cases, surgery, and treatment phases.




### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [93]:
user_input5='What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?'
print(generate_rag_response(user_input5))

Llama.generate: prefix-match hit


###Answer
The person with a fractured leg should have the injured limb elevated above heart level for the first two days to minimize swelling. After 48 hours, they can apply heat using a heating pad for 15 to 20 minutes to relieve pain and speed healing. Immobilization is necessary to prevent further injury and facilitate healing, which includes immobilizing joints proximal and distal to the injury. A cast may be used for fractures requiring weeks of immobilization. Severe swelling under a cast can contribute to compartment syndrome, so patients should keep the


** Fine turning top_k = 25**

In [95]:
user_input5='What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?'
print(generate_rag_response(user_input5, top_k=25))

Llama.generate: prefix-match hit


###Answer
The person with a fractured leg should have the injured limb elevated above heart level for the first two days to minimize swelling. After 48 hours, they can apply heat using a heating pad for 15 to 20 minutes to relieve pain and speed healing. Immobilization is necessary to prevent further injury and facilitate healing, which includes immobilizing joints proximal and distal to the injury. A cast may be used for fractures requiring weeks of immobilization. Severe swelling under a cast can contribute to compartment syndrome, so patients should keep the


**Observations on Query 5**

In this case, the generated answers are the same, probably because the topic is very structured, and the model chooses almost the same tokens even when it has a broader range of options (top_k=50)

### Fine-tuning

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [101]:
groundedness_rater_system_message  = """
You will be presented a ###Question, ###Context used by the AI system and AI generated ###Answer.

Your task is to judge the extent to which the ###Answer is derived from ###Context.

Rate it 1 - if The ###Answer is not derived from the ###Context at all
Rate it 2 - if The ###Answer is derived from the ###Context only to a limited extent
Rate it 3 - if The ###Answer is derived from ###Context to a good extent
Rate it 4 - if The ###Answer is derived from ###Context mostly
Rate it 5 - if The ###Answer is is derived from ###Context completely

Please note: Make sure you give a single overall rating in the range of 1 to 5 along with an overall explanation.

"""

In [102]:
relevance_rater_system_message = """

You will be presented with a ###Question, the ###Context used by the AI system to generate a response, and the AI-generated ###Answer.

Your task is to judge the extent to which the ###Answer is relevant to the ###Question, considering whether it directly addresses the key aspects of the ###Question based on the provided ###Context.

Rate the relevance as follows:
- Rate 1 – The ###Answer is not relevant to the ###Question at all.
- Rate 2 – The ###Answer is only slightly relevant to the **###Question**, missing key aspects.
- Rate 3 – The ###Answer is moderately relevant, addressing some parts of the **###Question** but leaving out important details.
- Rate 4 – The ###Answer is mostly relevant, covering key aspects but with minor gaps.
- Rate 5 – The ###Answer is fully relevant, directly answering all important aspects of the **###Question** with appropriate details from the **###Context**.

Note: Provide a single overall rating in the range of 1 to 5, along with a brief explanation of why you assigned that score.
"""



In [104]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""


In [105]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [107]:
user_input1 = 'What is the protocol for managing sepsis in a critical care unit?'
ground,rel = generate_ground_relevance_response(user_input1,max_tokens=512)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Rating: 2

Explanation: The context provided does mention the importance of preventing infection in ICU patients, but it does not specifically discuss the protocol for managing sepsis. The answer goes beyond the context by providing information on the importance of early recognition and prompt treatment, appropriate antibiotic therapy, fluid resuscitation, supportive measures, close monitoring, and following hospital protocols for sepsis management in the ICU setting. While some of this information is related to the context, it is not explicitly stated in the context and is derived from the Merck Manual.

 Rating: 3

Explanation: The answer provides some relevant information on managing sepsis in a critical care unit, such as the importance of early recognition and prompt treatment, appropriate antibiotic therapy, fluid resuscitation, and close monitoring of vital signs, urine output, and lactate levels. However, it does not directly address all key aspects of the question, which spec

Observation: Rating of 3 means a satisfactory level of quality was provided

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [108]:
user_input2 = 'What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?'
ground,rel = generate_ground_relevance_response(user_input2,max_tokens=512)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Rating: 5 - The Answer is derived from the Context completely.

Explanation: The context clearly states that the treatment for appendicitis involves surgical removal of the appendix, and that antibiotics are used in conjunction with surgery. It also mentions that appendicitis cannot be cured via medicine alone. The answer accurately reflects this information. Additionally, the answer lists the common symptoms for appendicitis, which are also mentioned in the context.

 Rating: 5

Explanation: The answer directly addresses the key aspects of the question by listing the common symptoms for appendicitis and stating that surgery (appendectomy) is necessary for treatment. It also clarifies that medicine alone cannot cure appendicitis. The context provided in the text supports these statements, making the answer fully relevant to the question.


** Observations **

The rating of 5 for groudness and relavance indicates that the answer is fully grounded in the context and is also highly relevant to the query.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [110]:
user_input3 = 'What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?'
ground,rel = generate_ground_relevance_response(user_input3,max_tokens=512)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Rating: 5 - The Answer is derived from the Context completely.

Explanation: The context provides detailed information about alopecia areata, including its symptoms (sudden patchy hair loss), potential causes (autoimmune disorder, environmental triggers, and genetics), and treatments (corticosteroids, topical immunotherapy, and systemic immunosuppressants). The Answer accurately reflects this information by stating that alopecia areata can be treated with corticosteroids, topical immunotherapy, or systemic immunosuppressants and providing some additional context about the choice of treatment based on the size and number of bald patches. Additionally, the Answer correctly identifies the potential causes of alopecia areata as an autoimmune disorder that affects genetically susceptible individuals exposed to unclear environmental triggers.

 Rating: 5

Explanation: The AI-generated answer directly addresses the question by providing information on effective treatments and possible causes

**Observations**

The rating of 5 for groudness and relavance indicates that the answer is fully grounded in the context and is also highly relevant to the query.

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [111]:
user_input4 = 'What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?'
ground,rel = generate_ground_relevance_response(user_input4,max_tokens=512)

print(ground,end="\n\n")
print(rel)


Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Rating: 5 - The Answer is derived from the Context completely.

Explanation: The Question asks about treatments for a person with a traumatic brain injury (TBI). The Context provides detailed information about TBI, including its definition, diagnosis, initial treatment, and subsequent care. The Answer accurately summarizes the initial treatment for TBI as ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, and mentions the possibility of surgery in more severe cases to treat intracranial pressure or hematomas. These details are all taken directly from the Context.

 Rating: 5

Explanation: The answer directly addresses the key aspects of the question by mentioning the initial treatments for a person with a traumatic brain injury (TBI), which include ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, as well as potential surgery to treat intracranial pressure or hematomas. The answer also mentions th

**Observations**

The rating of 5 for groudness and relavance indicates that the answer is fully grounded in the context and is also highly relevant to the query.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [112]:
user_input5 = 'What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?'
ground,rel = generate_ground_relevance_response(user_input5,max_tokens=512)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Rating: 5 - The AI generated answer is completely derived from the context provided. The context outlines the necessary precautions and treatment steps for a person with a fractured leg, and the AI answer accurately reflects those steps, including limb elevation, use of elastic bandages, application of heat after 48 hours, immobilization through casting, and instructions for cast care to prevent complications.

 Rating: 5

Explanation: The AI-generated answer directly addresses all important aspects of the question by providing necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip. It covers key aspects such as elevating the injured limb, using elastic bandages, applying warmth after 48 hours, immobilization with a cast if required, and instructions for patients with casts to keep it dry, inspect edges regularly, and pad rough edges. The answer is fully relevant to the question based on the provided context.


**Observations**

The rating of 5 for groudness and relavance indicates that the answer is fully grounded in the context and is also highly relevant to the query.

## Actionable Insights and Business Recommendations

**Business Insights**

**Accuracy and Trustworthiness**

RAG-based answers are context-specific, reliable, and trustworthy, enhancing customer satisfaction in medical applications.
Enhanced User Experience

The RAG model is able to provide personalized, relevant responses and more accurate answers.

**Cost Reduction in Medical Consultations**

The model can be used for automating medical Q&A, reducing the need for consultations, saving costs in telemedicine and healthcare support.
Recommendations

**Continuous Fine-tuning**

The model should be updated and fine-tuned to reflect the latest medical guidelines and research.

**Ethical Considerations**

Data privacy should be prioritized to ensure compliance with regulations like HIPAA and GDPR.

**Specialized Models**

Dedicated models can be created, tailored to individual medical specialties

<font size=6 color='blue'>Power Ahead</font>
___