## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
#!CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.8 MB[0m [31m8.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m28.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m329.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m233.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m303.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m275.4 MB/s

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
# For installing the libraries & downloading models from HF Hub
!pip install -q huggingface_hub==0.25.0 pandas==2.2.2 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.3.0 langchain-community==0.3.0 langchain-text-splitters==0.3.6 chromadb==0.5.5 sentence-transformers==3.2.0 numpy==1.26.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.5/58.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m3.7 MB/s[0m eta 

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

Download and load the Mistral-7B-Instruct-v0.2-GGUF from HuggingFace

In [2]:
# Download the Mistral model from Hugging Face
model_name_mistral = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename_mistral = "mistral-7b-instruct-v0.2.Q6_K.gguf"
model_path_mistral = hf_hub_download(
    repo_id=model_name_mistral,
    filename=model_basename_mistral
    )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [3]:
# Load the model as Llama model
llm = Llama(
    model_path=model_path_mistral, #Path to the GGUF model file
    n_ctx=4096, #Sets the context window to 4096 tokens
    n_gpu_layers=36, #Loads 36 model layers onto GPU for faster inference (set to 0 for CPU-only)
    n_batch=512 #Number of tokens processed at once
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [4]:
# Define function to use the loaded Mistral model to generate response to user question
# Use default values for some parameters if values are not provided in the function input
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [5]:
# Invoke the function by providing the question
response('What is the protocol for managing sepsis in a critical care unit?')

'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.\n2. Resusc'

**Observations**
* The defaut value for max output tokens is 128, and it appears to be very less as we can see that the response is being cut off towards the end of the output.
* It appears that the LLM is not able to fetch the complete information to answer the question in 128 tokens.
* Let's increase the max output tokens and observe the output

In [6]:
# Invoke the function by providing the question
response('What is the protocol for managing sepsis in a critical care unit?', max_tokens=500)

Llama.generate: prefix-match hit


'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.\n2. Source control: Identify and address the source of infection as quickly as possible. This may involve surgical intervention, such as drainage of an abscess or debridement of necrotic tissue.\n3. Fluid resuscitation: Administer intravenous fluids to maintain adequate blood pressure and organ perfusion. The goal is to achieve a mean arterial pressure (MAP) of at least 65 mmHg and a central venous oxygen saturation (ScvO2) of greater than 70%.\n4. Vasopressor

**Observations**
* LLM is able to fetch and respond with more information by increasing the tokens to 500
* The answer to the question is detailed and appears medically appropriate
* The answer still appears to being cut off towards the end
* We will use token size as 500 for response to the subsequent questions

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [7]:
# Invoke the function by providing the question
response('What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?', max_tokens=500)

Llama.generate: prefix-match hit


"\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:\n\n1. Abdominal pain: The pain may start as a mild discomfort around the navel or in the lower right abdomen, which then gradually moves to the right lower quadrant and becomes more severe over time. The pain may be constant or intermittent and is often worsened by movement, coughing, or deep breathing.\n2. Loss of appetite: People with appendicitis may lose their appetite due to abdominal pain and discomfort.\n3. Nausea and vomiting: Vomiting is a common symptom of appendicitis, especially in the later stages of the condition.\n4. Fever: A fever of 100.4°F (38°C) or higher may be present in some cases of appendicitis.\n5. Constipation or diarrhea: Both constipation and diarrhea can occur with appendicitis, depending on the location and

**Observations**
* Answer to the question appears to be medically appropriate
* Answer to this question is also getting cut off towards the end

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [8]:
# Invoke the function by providing the question
response('What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?', max_tokens=500)

Llama.generate: prefix-match hit


"\n\nSudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, or eyelashes.\n\nThe exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.\n\nThere are several treatments that have been shown to be effective in addressing sudden patchy hair loss:\n\n1. Corticosteroids: These are anti-inflammatory drugs that can help reduce inflammation and suppress the immune system's attack on the hair follicles. They can be applied topically or taken orally, depending on the severity of the condition.\n2. Minoxidil: This is a medication that has been shown to promote hair growth in some people with alopecia areata. It works by increa

**Observations**
* As similar to the first 2 questions, answer to the question appears to be medically appropriate and provides details on the treatment for hair loss
* Answer to this question is also getting cut off towards the end

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [9]:
# Invoke the function by providing the question
response('What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?', max_tokens=500)

Llama.generate: prefix-match hit


"\n\nA person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), may require various treatments depending on the severity and location of the injury. Here are some common treatments recommended for TBIs:\n\n1. Emergency care: The first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating normally. In severe cases, emergency surgery may be required to remove hematomas or other obstructions.\n2. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as swelling, seizures, pain, or infections. For example, corticosteroids may be used to reduce brain swelling, and anticonvulsants may be given to prevent seizures.\n3. Rehabilitation: Rehabilitation is an essential part of the recovery process for TBI patients. Rehabilitation may include physical therapy to help with mobility and strength, occupational therapy to help with daily living skills, speech therapy 

**Observations**
* Answer appears to be medically appropriate
* It provides a comprehensive list of treatments

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [10]:
# Invoke the function by providing the question
response('What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?', max_tokens=500)

Llama.generate: prefix-match hit


"\n\nFirst and foremost, if you suspect that someone has fractured their leg while hiking, it's essential to ensure their safety and prevent further injury. Here are some necessary precautions:\n\n1. Keep the person calm and still: Encourage them to remain as still as possible to minimize pain and prevent worsening the injury.\n2. Assess the situation: Check for any signs of shock, such as pale skin, rapid heartbeat, or shallow breathing. If you notice these symptoms, seek medical help immediately.\n3. Immobilize the leg: Use a splint, sling, or other available materials to immobilize the leg and prevent movement. Be sure not to apply too much pressure on the injury site.\n4. Provide pain relief: Offer over-the-counter pain medication, such as acetaminophen or ibuprofen, to help manage pain.\n5. Seek medical attention: If the fracture is severe or if you suspect that there may be other injuries, seek medical help as soon as possible.\n\nOnce you've ensured the person's safety and stabi

**Observations**
* Answer appears to be medically appropriate
* It provides a detailed list of precautions and the treatment steps

### Overall Observations

**Strengths**
* Model is providing detailed responses to the questions
* Model is providing contextually relevant responses to the questions
* Model is providing responses in a structured format, using numbered list to break down the response into points or steps

**Limitations**
* Responses are getting cut off towards the end, possibly due to the token limits
* Although the answers are medically appropriate, they seem to be generalized responses
* The responses appear to be at a high level

## Question Answering using LLM with Prompt Engineering

In [11]:
# Define function to use the loaded Mistral model to generate response to user question, while prefixing the user question with a system message
# Use default values for some parameters if values are not provided in the function input
def generate_resp_with_prompt_eng(user_prompt, max_tokens=128, temperature=0.2, top_p=0.95, top_k=50, repeat_penalty=1):
    # Define the system message
    system_message = "[INST]<<SYS>>Answer the following question in a clear and consise manner<</SYS>>[/INST]"

    # Formulate the prompt using system message and the user question
    prompt = f"{system_message}\n{user_prompt}"

    response = llm(prompt = prompt,
                   max_tokens=max_tokens,
                   temperature=temperature,
                   top_p=top_p,
                   top_k=top_k,
                   repeat_penalty=repeat_penalty,
                   stop=['INST'],
                   echo=False)

    return response['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

Let's check the response with default parameters

In [12]:
# Invoke the function by providing the question
generate_resp_with_prompt_eng("What is the protocol for managing sepsis in a critical care unit?")

Llama.generate: prefix-match hit


"\n\nSepsis is a life-threatening condition caused by the body's response to infection. In a critical care unit, the following steps are typically taken to manage sepsis:\n\n1. Early recognition and diagnosis: Identify sepsis early by recognizing signs and symptoms such as fever, tachycardia, tachypnea, altered mental status, and lactic acidosis.\n2. Fluid resuscitation: Administer intravenous fluids to maintain adequate blood pressure and organ perfusion.\n3. Antibiotics: Administer broad-spectrum"

Let's check the response by increasing max_tokens to 1024

In [13]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What is the protocol for managing sepsis in a critical care unit?", max_tokens=1024)

Llama.generate: prefix-match hit


'\n\n1. Early recognition: Identify sepsis suspects using quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA) or Sequential Organ Failure Assessment (SOFA) scores.\n2. Immediate fluid resuscitation: Administer 30 mL/kg of crystalloid within the first 3 hours.\n3. Antibiotics: Start broad-spectrum antibiotics within 1 hour of recognition.\n4. Source control: Address the source of infection as soon as possible.\n5. Vasopressors: Use vasopressors if needed to maintain mean arterial pressure (MAP) ≥65 mmHg.\n6. Corticosteroids: Consider using corticosteroids in patients with septic shock who do not respond to fluid resuscitation.\n7. Inotropes: Use inotropes to maintain cardiac output.\n8. Mechanical ventilation: Provide mechanical ventilation if needed to maintain oxygenation and ventilation.\n9. Renal replacement therapy: Consider using renal replacement therapy for patients with severe sepsis or septic shock and acute kidney injury.\n10. Monitoring: Continuously monitor v

**Observations**
* First answer is very short, and doesn't appear to give adequate information
* Second answer is more detailed and provides a comprehensive list of protocols

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

Let's check the response with max_tokens as 1024, and other parameters with default value

In [14]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", max_tokens=1024)

Llama.generate: prefix-match hit


'\n\nCommon symptoms for appendicitis include:\n1. Sudden pain in the lower right abdomen, often starting around the navel and moving to the right.\n2. Loss of appetite.\n3. Nausea and vomiting.\n4. Fever.\n5. Abdominal swelling.\n6. Inability to pass gas or have a bowel movement.\n\nAppendicitis cannot be cured via medicine alone. If left untreated, the appendix may burst, leading to peritonitis, a serious inflammation of the abdominal cavity. The standard treatment for appendicitis is an appendectomy, a surgical procedure to remove the appendix. This can be performed as an open appendectomy or laparoscopically, depending on the severity of the case.'

Let's check the response by changing temperature to 0

In [15]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", max_tokens=1024, temperature=0)

Llama.generate: prefix-match hit


'\n\nAppendicitis is an inflammatory condition of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. Common symptoms include:\n\n1. Sudden pain in the lower right abdomen, often starting around the navel and moving to the right.\n2. Loss of appetite.\n3. Nausea and vomiting.\n4. Fever.\n5. Abdominal swelling.\n6. Constipation or diarrhea.\n7. Inability to pass gas.\n\nAppendicitis cannot be cured with medicine alone. If left untreated, the appendix may burst, leading to peritonitis, a serious inflammation of the abdominal cavity. The standard treatment for appendicitis is surgical removal of the appendix, known as an appendectomy. This procedure can be performed as an open appendectomy or laparoscopically, depending on the severity of the condition. The goal is to remove the inflamed appendix before it ruptures to prevent complications.'

**Observations**:
* First answer directly starts by listing the symptoms followed by the treatment
* Second answer starts by explaining what is appendicitis, and with what part of the human body it is associated to, and then lists the same symptoms and treatment as in answer 1
* Overall, second answer is more appropriate in terms of completeness, but the first answer appropriately responds to the overall context of the question

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

Let's check the response with max_tokens as 1024, and other parameters with default value

In [16]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", max_tokens=1024)

Llama.generate: prefix-match hit


"\n\nAnswer:\n\nSudden patchy hair loss, also known as alopecia areata, is an autoimmune condition that causes hair loss in small, round patches on the scalp. The exact cause of alopecia areata is unknown, but it's believed to be related to a problem with the immune system.\n\nEffective treatments for alopecia areata include:\n\n1. Corticosteroids: These are anti-inflammatory drugs that can be applied directly to the scalp or taken orally to reduce inflammation and suppress the immune system.\n2. Minoxidil: This is a medication that can be applied topically to stimulate hair growth.\n3. DHT blockers: Finasteride and dutasteride are medications that can block the production of dihydrotestosterone (DHT), which is believed to contribute to hair loss.\n4. Immunotherapy: Injections of certain substances, such as diphenylcyclopropenone (DPCP) or squaric acid dibutyl ester (SADBE), can help stimulate hair regrowth by altering the immune response.\n5. Herbal remedies: Certain herbs, such as sa

Let's check the response by changing top_p to 0.7

In [17]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", max_tokens=1024, top_p=0.7)

Llama.generate: prefix-match hit


'\n\nAnswer:\n\nSudden patchy hair loss, also known as alopecia areata, is an autoimmune disorder that causes hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is unknown, but it is believed to be related to a problem with the immune system.\n\nEffective treatments for alopecia areata include:\n\n1. Corticosteroids: These are anti-inflammatory drugs that can help reduce inflammation and suppress the immune system, allowing hair to grow back. They can be applied topically or taken orally.\n2. Immunotherapy: This involves using medications that stimulate the immune system to attack the cells causing the hair loss. One such medication is diphenylcyclopropenone (DPCP).\n3. Minoxidil: This is a medication that can help stimulate hair growth. It is applied topically and is available over the counter.\n4. Hair transplantation: This is a surgical procedure in which healthy hair is taken from one area of the scalp and transplan

**Observations**
* First answer provides information on the treatments and possibles causes for patchy hair loss, but appears slightly general
* Second answer also provides similar information as the first answer, but is slightly more informative than the first answer
* Overall, second answer is better, probably due to a lower top_p value that helps the model generate more focused and relevant content, as the model restricts its output to more probable responses

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

Let's check the response with max_tokens as 1024, and other parameters with default value

In [18]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", max_tokens=1024)

Llama.generate: prefix-match hit


"\n\nThe treatment for a brain injury depends on the severity and location of the injury. Here are some common treatments:\n\n1. Emergency care: For severe brain injuries, the first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating. This may involve CPR, intubation, or other emergency measures.\n2. Surgery: In some cases, surgery may be necessary to remove blood clots, repair skull fractures, or relieve pressure on the brain.\n3. Medications: Medications may be used to manage symptoms such as pain, swelling, seizures, or infections.\n4. Rehabilitation: Rehabilitation is an essential part of treatment for brain injuries. It may include physical therapy, occupational therapy, speech therapy, and cognitive rehabilitation to help the person regain lost skills and functions.\n5. Assistive devices: Depending on the extent of the injury, the person may need assistive devices such as wheelchairs, braces, or communication aids to help them functi

Let's check the response by increasing top_k to 80

In [19]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", max_tokens=1024, top_k=80)

Llama.generate: prefix-match hit


"\n\nThe treatment for a brain injury depends on the severity and location of the injury. Here are some common treatments:\n\n1. Emergency care: For severe brain injuries, the first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating. This may involve CPR, intubation, or other emergency measures.\n2. Surgery: In some cases, surgery may be necessary to remove blood clots, repair skull fractures, or relieve pressure on the brain.\n3. Medications: Medications may be used to manage symptoms such as pain, swelling, seizures, or infections.\n4. Rehabilitation: Rehabilitation is an essential part of treatment for brain injuries. It may include physical therapy, occupational therapy, speech therapy, and cognitive rehabilitation.\n5. Assistive devices: Depending on the extent of the injury, assistive devices such as wheelchairs, braces, or communication aids may be necessary.\n6. Lifestyle modifications: Making lifestyle modifications, such as gett

**Observations**
* First answer provides a comprehensive list of recommended treatments
* Second answer also provides a comprehensive list of recommended treatments
* Both answers provide mostly same recommendations, and only 1 recommendation is different between both the answers (Supportive care in first answer and Support groups in second answer)
* Overall, first answer appears to be more appropriate as it has slight more details compared to the second answer

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

Let's check the response with max_tokens as 1024, and other parameters with default value

In [20]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", max_tokens=1024)

Llama.generate: prefix-match hit


"\n\nFirst and foremost, if a person has fractured their leg while hiking, they should not move the leg unnecessarily to avoid causing further damage. They should try to keep the leg as still as possible and apply a makeshift splint or use any available materials to immobilize the leg. If possible, they should also try to elevate the leg above heart level to reduce swelling and pain.\n\nNext, they should try to signal for help and call for emergency medical assistance if they are unable to hike out on their own. If they are with a group, they should ask for assistance in getting to safety.\n\nOnce they have reached medical help, the fracture will be diagnosed and treated appropriately. Depending on the severity of the fracture, this may involve setting the bone in place with pins or a cast, or undergoing surgery to repair the damage. The person may also be prescribed pain medication to manage their discomfort.\n\nDuring the recovery process, the person should follow their doctor's inst

Let's check the response by increasing max_tokens to 1024

In [21]:
# Invoke the function by providing the question and model parameters
generate_resp_with_prompt_eng("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", max_tokens=1024, top_k=50, top_p=0.7, temperature=0)

Llama.generate: prefix-match hit


'\n\n1. Assess the severity of the fracture: If the fracture is open or compound, meaning the bone has pierced the skin, it is a medical emergency and the person should be transported to the nearest hospital immediately. For a closed fracture, where the bone is not visible through the skin, assess for signs of severe pain, swelling, or inability to move the leg.\n2. Provide first aid: Apply a sterile dressing to the injury to prevent infection. Splint the leg to prevent movement and provide support. Elevate the leg above heart level to reduce swelling and pain.\n3. Transport the person to medical care: If the fracture is not severe enough to require immediate hospitalization, transport the person to a medical facility as soon as possible.\n4. Pain management: Provide pain relief through over-the-counter pain medications or prescription painkillers as directed by a healthcare professional.\n5. Immobilization: The leg will likely need to be immobilized with a cast or brace to allow the b

## Data Preparation for RAG

### Loading the Data

In [22]:
# Load the PDF file for the diagnosis manual
manual_pdf = 'medical_diagnosis_manual.pdf'
pdfLoader = PyMuPDFLoader(manual_pdf)
manual = pdfLoader.load()

### Data Overview

#### Checking the first 5 pages

In [23]:
# Display the data from the first 5 pages
for i in range(5):
    print('Page #: ', (i+1))
    print(manual[i].page_content, '\n')

Page #:  1
sanket.07@yahoo.com
S15FYNH08J
eant for personal use by sanket.07@yah
shing the contents in part or full is liable 
 

Page #:  2
sanket.07@yahoo.com
S15FYNH08J
This file is meant for personal use by sanket.07@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
 

Page #:  3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ...........................................................................................................................................................................................
53
1 - Nutritio

#### Checking the number of pages

In [24]:
# Get the number of pages from the loaded PDF
len(manual)

4114

PDF file has 4114 pages

### Data Chunking

In [25]:
# Split the data from the manual into chunks of text with length no more than 512 chunks
# Let's ensure overlapping of 20 chunks between consecutive chunks to that we preserve the context
text_splitter_overlap = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap=20
)

In [26]:
# Load and split the data in PDF into chunks
chunks = pdfLoader.load_and_split(text_splitter_overlap)

Let's check some consecutive chunks to see if there is an overlap of 20 chunks

In [27]:
chunks[400]

Document(metadata={'source': 'medical_diagnosis_manual.pdf', 'file_path': 'medical_diagnosis_manual.pdf', 'page': 217, 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Atop CHM to PDF Converter', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251028030449Z', 'trapped': ''}, page_content='more prominent, acidosis is more likely. Both vomiting and diarrhea can cause hypokalemia.\nHyponatremia may develop, particularly if hypotonic fluids are used in replacement therapy.\nIn viral infections, watery diarrhea is the most common symptom; stools rarely contain mucus or blood.\nRotavirus gastroenteritis in infants and young children may last 5 to 7 days. Vomiting occurs in 90% of\npatients, and fever > 39° C (> 102.2° F) occurs in about 30%. Norovirus typically causes acute onset of\nvomiting, abdominal cramps, an

In [28]:
chunks[401]

Document(metadata={'source': 'medical_diagnosis_manual.pdf', 'file_path': 'medical_diagnosis_manual.pdf', 'page': 218, 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Atop CHM to PDF Converter', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251028030449Z', 'trapped': ''}, page_content='• Stool testing in select cases\nOther GI disorders that cause similar symptoms (eg, appendicitis, cholecystitis, ulcerative colitis) must be\nexcluded. Findings suggestive of gastroenteritis include copious, watery diarrhea; ingestion of potentially\ncontaminated food (particularly during a known outbreak), untreated surface water, or a known GI irritant;\nrecent travel; or contact with similarly ill people. E. coli O157:H7-induced diarrhea is notorious for\nappearing to be a hemorrhagic rather than an infectious process, 

In [29]:
chunks[402]

Document(metadata={'source': 'medical_diagnosis_manual.pdf', 'file_path': 'medical_diagnosis_manual.pdf', 'page': 218, 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Atop CHM to PDF Converter', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251028030449Z', 'trapped': ''}, page_content='use should have a stool assay for C. difficile toxin.\nGeneral tests: Serum electrolytes, BUN, and creatinine should be obtained to evaluate hydration and\nacid-base status in patients who appear seriously ill. CBC is nonspecific, although eosinophilia may\nindicate parasitic infection.\nTreatment\n• Oral or IV rehydration\n• Consideration of antidiarrheal agents if there is no suspicion of C. difficile or E. coli O157:H7 infection\n• Antibiotics only in select cases\nSupportive treatment is all that is needed for most pati

We can observe the overlap in the chunks in the beginning and end of the block of chunks between 2 consecutive blocks of chunks

### Embedding

Let's encode the blocks of chunks into vector representations. We will use the "gte-large" embedding model to generate the embeddings. The embedding vector size for this model is 512, which is same as the token size of our chunking strategy

In [30]:
# Download and load the "gte-large" embedding model from HuggingFace
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

  embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

### Vector Database

Let's generate the embeddings and store them in an output directory

In [31]:
# Create the output directory if it doesn't exist
out_dir = 'medical_diag_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [32]:
# Generate the embeddings and store them in the output directory
vectorstore = Chroma.from_documents(
    chunks,
    embedding_model,
    persist_directory=out_dir
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Even though an error is observed in the console above, there is no issue in creating the vector database, as we will observe in the next section

### Retriever

In [33]:
# Define the retriever to fetch documents based on similar embeddings, using cosine similarity, and return top 5 matches
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 5}
)

In [34]:
# Test the retriever on a sample query
relevant_chunks = retriever.invoke("What is the protocol for managing sepsis in a critical care unit?")
relevant_chunks

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(metadata={'author': '', 'creationDate': 'D:20120615054440Z', 'creator': 'Atop CHM to PDF Converter', 'file_path': 'medical_diagnosis_manual.pdf', 'format': 'PDF 1.7', 'keywords': '', 'modDate': 'D:20251028030449Z', 'page': 2400, 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'source': 'medical_diagnosis_manual.pdf', 'subject': '', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'total_pages': 4114, 'trapped': ''}, page_content="16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\nof physiolog

We are able to retreive the relevant chunks for the given input query

### System and User Prompt Template

Let's define the below 2 parts to create a template for the prompt for the RAG based system:
* System Message: Instructions to be given to the LLM
* User message template: Message template that contains the context derived from the document chunks, and the user query

In [35]:
# Define the system message template
qna_system_message = """
You are a medical assistant whose work is to give answers to questions with respect to a context.
User input will have the context for you to answer user questions.

The context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Use the context provided to find accurate and reliable answers.
If the answer is not found in the context, reply with "I don't know".
Do not mention the about the provided context in your final answer.

"""


In [36]:
# Define the user message template
qna_user_message_template = """
Conider the following ###Context and ###Question
###Context
{context}

###Question
{question}
"""


### Response Function

In [37]:
# Define a funtion that takes the user question as an input, along with some model parameters to optimize the response
# Function will use the system message and user message templates and use the LLM to evaluate the answer to the user question
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.invoke(input=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    #prompt = qna_system_message + '\n' + user_message
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""
    # Generate the response
    #print(prompt)

    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k,
                  stop=['INST'],
                  echo=False
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [38]:
# Invoke the function by providing the question
generate_rag_response(user_input="What is the protocol for managing sepsis in a critical care unit?" )

Llama.generate: prefix-match hit


'The protocol for managing sepsis in a critical care unit includes giving parenteral antibiotics after taking specimens for Gram stain and culture, starting very prompt empiric therapy as soon as sepsis is suspected, changing the antibiotic regimen based on culture and sensitivity results, continuing antibiotics for at least 5 days after shock resolves, draining abscesses and excising necrotic tissues, normalizing blood glucose levels, using corticosteroid therapy with replacement doses, and considering other emerging therapies like activated protein C (drotrecogin'

**Observations**
* Output generated with default value to max tokens as 128 is generating a very short response that is getting cut off towards the end
* Output is probaly not able to get all the information from the context due to the limit of max tokens

Let's increase the value of max tokens to 1024 for subsequent questions

In [39]:
# Invoke the function by providing the question and model parameters
generate_rag_response(user_input="What is the protocol for managing sepsis in a critical care unit?", max_tokens=1024 )

Llama.generate: prefix-match hit


'The protocol for managing sepsis in a critical care unit includes giving parenteral antibiotics after taking specimens for Gram stain and culture, starting very prompt empiric therapy as soon as sepsis is suspected, changing the antibiotic regimen based on culture and sensitivity results, continuing antibiotics for at least 5 days after shock resolves, draining abscesses and excising necrotic tissues, normalizing blood glucose levels, using corticosteroid therapy with replacement doses, and considering other emerging therapies like activated protein C (drotrecogin alfa) for severe sepsis and septic shock.'

**Observations**:
* Answer to the question is still short, possibly due to less number of chunks

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [40]:
# Invoke the function by providing the question and model parameters
generate_rag_response(user_input="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", max_tokens=1024)

Llama.generate: prefix-match hit


'The common symptoms for appendicitis include abdominal pain, anorexia, and abdominal tenderness. Appendicitis cannot be cured via medicine alone, and the standard treatment is surgical removal of the appendix through open or laparoscopic appendectomy.'

**Observations**:
* Answer to the question is very short, even with allowed 1024 output tokens, possibly due to less number of contextual chunks going into the input to the LLM

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [41]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", max_tokens=1024)

Llama.generate: prefix-match hit


'The effective treatments for addressing sudden patchy hair loss, also known as alopecia areata, include topical corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). In severe cases, systemic corticosteroids may be used. The possible causes behind sudden patchy hair loss are thought to be an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers.'

**Observations**:
* Answer to the question is very short, even with allowed 1024 output tokens, possibly due to less number of contextual chunks going into the input to the LLM

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [42]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", max_tokens=1024)

Llama.generate: prefix-match hit


"The recommended treatments for a person with a traumatic brain injury (TBI) include ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are important. Subsequently, many patients require rehabilitation. There is no specific treatment for TBI, but supportive care should include preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers. For post-traumatic seizures, drugs may be given to prevent seizures if there is significant structural injury or a Glasgow Coma Scale score of less than 10. These drugs reduce the risk of seizures during the first week after injury but do no

**Observations**
* Answer is different than what is given by the model without using RAG, and appears to be precise and as per the context provided in the input

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [43]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", max_tokens=1024)

Llama.generate: prefix-match hit


"The necessary precautions for a person with a fractured leg include keeping the injured limb elevated above heart level for the first two days to minimize swelling, applying warmth for pain relief after 48 hours, immobilizing the joints proximal and distal to the injury, and avoiding putting objects inside the cast or getting it wet. They should also inspect the cast's edges and skin around it daily, apply lotion to any red or sore areas, pad rough edges with soft material, seek medical care if an odor emanates from within the cast or a fever develops, and maintain good hygiene.\n\nFor treatment, they may use a splint for immobilization if the injury is stable and does not require weeks of immobilization. A cast can be used for fractures that do require extended immobilization but should be checked for signs of compartment syndrome. Prolonged immobilization can lead to complications such as stiffness, contractures, and muscle atrophy.\n\nIn the field, if the person has a frostbitten e

**Observations**
* Answer is different than what is given by the model without using RAG, and appears to be precise and as per the context provided in the input

### Fine-tuning

Let's generate a response for the same questions while changing the value of some parameters of the LLM

### Query 1: What is the protocol for managing sepsis in a critical care unit?

Let's increase the number of documents into the input context

In [44]:
# Invoke the function by providing the question and model parameters
generate_rag_response(user_input="What is the protocol for managing sepsis in a critical care unit?", max_tokens=1024, k=6 )

Llama.generate: prefix-match hit


'The protocol for managing sepsis in a critical care unit includes aggressive fluid resuscitation, antibiotics, surgical excision of infected or necrotic tissues and drainage of pus, supportive care, normalization of blood glucose, corticosteroid therapy (in some cases), and activated protein C (drotrecogin alfa) for severe sepsis and septic shock if begun early. Antibiotic selection requires an educated guess based on the suspected source, clinical setting, knowledge or suspicion of causative organisms and sensitivity patterns common to that specific inpatient unit, and previous culture results. Septic foci must be eliminated to prevent further deterioration. Other emerging therapies for severe sepsis include cooling for hyperthermia and early treatment of renal failure with continuous venovenous hemofiltration. Monoclonal antibodies to the lipid A fraction of endotoxin, antileukotrienes, and antibodies to tumor necrosis factor have been unsuccessful in clinical trials. Diagnosis is c

**Observations**
* We are able to get a more detailed response with k=6, suggesting that retrieving more chunks provided additional context to formulate a more complete answer

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

Let's increase temperature to 0.2 to evaluate the answer to this question

In [45]:
# Invoke the function by providing the question and model parameters
generate_rag_response(user_input="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", max_tokens=1024, temperature=0.2)

Llama.generate: prefix-match hit


'The common symptoms for appendicitis include abdominal pain, anorexia, and abdominal tenderness. Appendicitis cannot be cured via medicine alone, and the standard surgical procedure to treat it is an appendectomy.'

**Observations**
* Answer is still short with increased temperature value, with a little difference from the answer with default parameters
* Answer is short possibly due to less chunks in the context

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

Let's change top_p to 0.8 to evaluate the answer to this question

In [46]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", max_tokens=1024, top_p=0.8)

Llama.generate: prefix-match hit


'The effective treatments for addressing sudden patchy hair loss, also known as alopecia areata, include topical corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). In severe cases, systemic corticosteroids may be used. The possible causes behind sudden patchy hair loss are thought to be an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers.'

**Observations**
* With value of top_p as 0.8, model restrics the token selection to only top 80% of the probability distribution
* Effective treatments given are very similar to what is given in the answer with default parameters
* Possible causes are also very similar, with the differnce being that this answer also warns that some treatments may have adverse effects

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

Let's decrease top_k to 25 to evaluate the answer to this question

In [47]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", max_tokens=1024, top_k=25)

Llama.generate: prefix-match hit


"The recommended treatments for a person with a traumatic brain injury (TBI) include ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are important. Subsequently, many patients require rehabilitation. There is no specific treatment for TBI, but supportive care should include preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers. For post-traumatic seizures, drugs may be given to prevent seizures if there is significant structural injury or a Glasgow Coma Scale score of less than 10. These drugs reduce the risk of seizures during the first week after injury but do no

**Observations**
* The generated answer is almost similar to the answer generated with default parameters
* This indicates that model is choosing the same tokens even when it has a wider range of options when top_k was 50 during the first attempt to answer this question

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

Let's increase k to 6 and decrease top_p to 0.8 and top_k to 25 to evaluate the answer to this question

In [48]:
# Invoke the function by providing the question and model parameters
generate_rag_response("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", max_tokens=512, k=6, top_p=0.8, top_k=25)

Llama.generate: prefix-match hit


"The necessary precautions for a person with a fractured leg include immobilizing the injured limb to prevent further injury and facilitate healing. This can be achieved through the use of a cast or splint. It is important to keep the cast dry, never put any objects inside it, inspect the edges and skin around the cast daily, and seek medical care if an odor emanates from within the cast or a fever develops. The person should also be advised to avoid putting weight on the injured leg and to elevate it above heart level for the first 48 hours.\n\nFor acute care, the person's core temperature should be stabilized, and the limb should be rapidly rewarmed in large containers of circulating water kept at about 40.5°C. Pain relief through parenteral analgesics may be necessary during rewarming. Once in the hospital, the affected part should be moved to promote healing and prevent complications such as deep venous thrombosis or muscle atrophy.\n\nProlonged immobilization (more than 3-4 weeks)

**Observations**
* Answer to this question using these parameter values is more compact with slightly lesser details compared to answer with default values

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

Let's define the below 3 parts to create a templates for output evaluation
* Groundedness rater System Message: Instructions to be given to the LLM to evaluate the groundedness
* Relevance rater System Message: Instructions to be given to the LLM to evaluate the relevance
* User message template: Message template that contains the context derived from the document chunks, and the user query and the answer to the user query based


In [49]:
# Define the system message for groundedness rating
groundedness_rater_system_message  = """
You will be provided a question and context used by an AI system and the answer generated by the AI system

Your task is to evaluate the extent to which the answer has been derived from the context.
Use below criteria to evaluate:
Rate it 1 - if the answer is not derived from the context at all
Rate it 2 - if the answer is derived from the context only to a limited extent
Rate it 3 - if the answer is derived from the context to a good extent
Rate it 4 - if the answer is derived from the context for most of the part
Rate it 5 - if the answer is derived from the context completely

Only give a single rating along with a brief explanation for the rating

"""

In [50]:
# Define the system message for relevance rating
relevance_rater_system_message = """
You will be provided a question and context used by an AI system and the answer generated by the AI system

Your task is to evaluate the extent to which the answer is relevant to the question while considering if it directly answers the key aspect of the question based on the context provided.
Use below criteria to evaluate:
Rate it 1 - if the answer is not relevant to the question at all
Rate it 2 - if the answer is only slightly related to the question and missing key aspects
Rate it 3 - if the answer is moderately related to the question and addresses some parts of the question but leaves out important details
Rate it 4 - if the answer is mostly relevant and addresses most parts of the question
Rate it 5 - if the answer is fully relevant and addresses all aspects of the question with details from the context

Only give a single rating along with a brief explanation for the rating

"""

In [51]:
# Define the user message template
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [52]:
# Define a funtion that takes the user question as an input, along with some model parameters to optimize the response
# Function will use the system messages for the ratings and user message templates and use the LLM to evaluate the ratings for groundedness and relevance
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.invoke(input=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Evaluate groundedness of the answer
    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    # Evaluate relevance of the answer
    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [53]:
# Let's evaluate the groundedness and relevance of the answer by using k=5 and max_tokens=512
groundedness_report, relevance_report = generate_ground_relevance_response("What is the protocol for managing sepsis in a critical care unit?", k=5, max_tokens=512)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [54]:
# Print the groundedness report
groundedness_report

' Rating: 5\n\nExplanation: The answer is derived completely from the context as it includes all the key steps mentioned in the context for managing sepsis in a critical care unit, such as giving parenteral antibiotics after taking specimens for Gram stain and culture, starting very prompt empiric therapy, changing the antibiotic regimen based on culture and sensitivity results, continuing antibiotics for at least 5 days after shock resolves, draining abscesses and excising necrotic tissues, normalizing blood glucose levels, using corticosteroid therapy with replacement doses, and considering other emerging therapies like activated protein C (drotrecogin alfa) for severe sepsis and septic shock.'

In [55]:
# Print the relevance report
relevance_report

' Rating: 5\n\nExplanation: The answer directly addresses all aspects of the question based on the context provided. It mentions taking specimens for Gram stain and culture before giving parenteral antibiotics, starting prompt empiric therapy as soon as sepsis is suspected, changing the antibiotic regimen based on culture and sensitivity results, continuing antibiotics for at least 5 days after shock resolves, draining abscesses and excising necrotic tissues, normalizing blood glucose levels, using corticosteroid therapy with replacement doses, and considering other emerging therapies like activated protein C (drotrecogin alfa) for severe sepsis and septic shock. These are all key aspects of managing sepsis in a critical care unit as outlined in the context.'

**Observations**
* Groundedness and Relevance ratings of 5 indicate that the RAG system was able to give a fully grounded answer in the context, and answer is also highly relevant to the question

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [56]:
# Let's evaluate the groundedness and relevance of the answer by using k=5 and max_tokens=512
groundedness_report, relevance_report = generate_ground_relevance_response(user_input="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?", max_tokens=512, k=5)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [57]:
# Print the groundedness report
groundedness_report

' Rating: 5\n\nExplanation: The answer is derived completely from the context. The context clearly states that the treatment for appendicitis is surgical removal (appendectomy), and it cannot be cured via medicine alone. Additionally, the symptoms mentioned in the question are also listed in the context as common symptoms of appendicitis.'

In [58]:
# Print the relevance report
relevance_report

' Rating: 5\n\nExplanation: The answer directly addresses the question by listing the common symptoms for appendicitis and stating that it cannot be cured via medicine alone, requiring surgical removal. The recommended surgical procedure, an appendectomy, is also mentioned. The context provided in the text fully supports this answer as it discusses the treatment options for appendicitis, including surgery and antibiotics.'

**Observations**
* Groundedness and Relevance ratings of 5 indicate that the RAG system was able to give a fully grounded answer in the context, and answer is also highly relevant to the question

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [59]:
# Let's evaluate the groundedness and relevance of the answer by using k=5 and max_tokens=512
groundedness_report, relevance_report = generate_ground_relevance_response("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?", max_tokens=512, temperature=0.3)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [60]:
# Print the groundedness report
groundedness_report

' Rating: 5 - The answer is derived from the context completely. The context mentions various treatments for different types of hair loss, including topical corticosteroids, minoxidil, anthralin, immunotherapy, oral antimalarials, hormonal modulators, and surgical options. The answer also correctly identifies alopecia areata as a cause of sudden patchy hair loss and mentions that it is an autoimmune disorder. Therefore, the answer accurately reflects the information provided in the context.'

In [61]:
# Print the relevance report
relevance_report

' Rating: 5\n\nExplanation: The answer directly addresses the question by providing information on effective treatments for sudden patchy hair loss, which is a key aspect of the question. It also mentions possible causes, which are outlined in the context provided. The answer covers various treatment options for different types of alopecia and provides some details about the underlying causes. Therefore, it fully answers both the question and the context.'

**Observations**
* Groundedness and Relevance ratings of 5 indicate that the RAG system was able to give a fully grounded answer in the context, and answer is also highly relevant to the question

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [62]:
# Let's evaluate the groundedness and relevance of the answer by using k=5 and max_tokens=512
groundedness_report, relevance_report = generate_ground_relevance_response("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?", max_tokens=512, top_p=0.8)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [63]:
# Print the groundedness report
groundedness_report

' Rating: 5\n\nExplanation: The answer is derived completely from the context as it covers all the treatments mentioned in the context for a person with a traumatic brain injury, including initial treatment, surgical interventions, supportive care, and long-term treatment for post-traumatic seizures.'

In [64]:
# Print the relevance report
relevance_report

' Rating: 5\n\nExplanation: The answer directly addresses all aspects of the question based on the context provided. It mentions the initial treatment for TBI, which includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. It also mentions surgery as a possible treatment option for more severe injuries. In the first few days after the injury, it emphasizes the importance of maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium. Subsequently, many patients require rehabilitation. The answer also discusses supportive care, including preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers. Lastly, it mentions the treatment for post-traumatic seizures, which includes drugs given to prevent seizures if there is significant structural injury or a Glasgow Coma Scale score of less than 10. The answer provides details from the context regar

**Observations**
* Groundedness and Relevance ratings of 5 indicate that the RAG system was able to give a fully grounded answer in the context, and answer is also highly relevant to the question

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [65]:
# Let's evaluate the groundedness and relevance of the answer by using k=5 and max_tokens=512
groundedness_report, relevance_report = generate_ground_relevance_response("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?", max_tokens=512, top_k=25)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [66]:
# Print the groundedness report
groundedness_report

' Rating: 5\n\nExplanation: The answer is derived completely from the context as it covers all the necessary precautions and treatment steps mentioned in the text for a person with a fractured leg, including immobilization using a cast or splint, elevating the injured limb, applying warmth after 48 hours, avoiding putting objects inside the cast, inspecting the edges of the cast and skin around it daily, seeking medical care if necessary, and following instructions on keeping the cast dry. The answer also mentions the potential complications of prolonged immobilization and the benefits of early mobilization, which are discussed in the context.'

In [67]:
# Print the relevance report
relevance_report

' Rating: 5\n\nExplanation: The answer directly addresses all aspects of the question by providing detailed information on the necessary precautions and treatment steps for a person with a fractured leg, as well as considerations for their care and recovery based on the context provided. It covers immobilization methods (cast or splint), elevating the injured limb, applying warmth, keeping the area clean and dry, inspecting the cast and skin, seeking medical care for potential complications, and the importance of early mobilization for some injuries to prevent complications. The answer also mentions the potential risks of prolonged immobilization and the need for hospitalization and specialized medical care for severe fractures or those requiring surgery.'

**Observations**
* Groundedness and Relevance ratings of 5 indicate that the RAG system was able to give a fully grounded answer in the context, and answer is also highly relevant to the question

## Actionable Insights and Business Recommendations

### Insights

When using Mistral  model on its own (without RAG system),
* Model is providing detailed, but generalized responses to the medical questions presented to it
* Model is providing contextually relevant responses to the questions
* Model is providing responses in a structured format, using numbered list to break down the response into points or steps
* The response is getting cut off towards the end, even with a larger token size
* Prompt engineering improved the answer slightly, but they still appear as generalized responses

When using Mistral using with a RAG system,
* Model is providing contextually relevant and accurate answers to the questions, without including too much of generalized responses
* Retriever appears to be fetching the relevant text for the user query, thus enabling the model to provide grounded and relevant answers
* For the model to provide complete and detailed answers, the model needs good amount of contextual data (k>=5)
* Varying other parameters (top_p, top_k, temperature) slightly did not have too much of an impact, but certain changes to these parameters did provide better answers

### Recommendations

* It is recommended that the healthcare professionals use the RAG based system instead of only the LLM to have their questions answered, so that the answers are more precise and accurate
* Model should be updated on a timely manner to reflect the latest medical guidelines
* Model can be further fine tuned for medical specialities like neurology, gastroenterology, hematology, etc

<font size=6 color='blue'>Power Ahead</font>
___