## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m59.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m153.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m134.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m266.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m242.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency

In [1]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.7 MB/s[0m eta [36m0

In [1]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama


## Question Answering using LLM

#### Downloading and Loading the model

In [2]:
#Mistral instruct model seems to be trending on the huggingface website.
#The mistral-7b-instruct-v0.2.Q6_K.gguf provides a model that is 'very large with extremely low quality loss'.
model_name_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [3]:
#Downloading the model
model_path = hf_hub_download(
    repo_id=model_name_path,
    filename=model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [4]:
#Loading the model with the context set to 10000, batch size set to 512 and gpu layers set to 35
#The maximum sequence length specified on huggingface for this model is 32768, therefore selecting a value less tham this value considering the length of the prompt and response.
#Increased n_ctx to 10000 from 5000 since responses were being truncated.
lcpp_llm = Llama(
    model_path=model_path,
    n_ctx=10000,
    n_batch=512, #Using default batch size, tried others but 512 was giving good results.
    n_gpu_layers=35 #setting to 35 similar to the documentation example for this model on huggingface
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [5]:
# Defining a method to generate response using the above model.
def response(query,max_tokens=500,temperature=0,top_p=0.95,top_k=50):
    model_output = lcpp_llm(
      prompt="[INST]" + query + "[/INST]",
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.\n2. Resuscitation: Provide adequate fluid resuscitation to maintain adequate tissue perfusion. The goal is to achieve a mean arterial pressure (MAP) of at least 65 mmHg and a central venous oxygen saturation (ScvO2) of greater than 70%.\n3. Antibiotics: Administer broad-spectrum antibiotics as soon as possible based on the suspected source of infection and local microbiology data.\n4. Source control: Identify and address the source of infection, if possible. T

**Observations:** The model answers the question pretty well. It knows the question and what is being asked. The response is truncated but can fine tune to fix this.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response(user_input)

Llama.generate: prefix-match hit


"\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:\n\n1. Abdominal pain: The pain may start as a mild discomfort around the navel or in the lower right abdomen, which then gradually moves to the right lower quadrant and becomes more severe over time. The pain may be constant or intermittent and is often worsened by movement, coughing, or deep breathing.\n2. Loss of appetite: People with appendicitis may lose their appetite due to abdominal pain and discomfort.\n3. Nausea and vomiting: Vomiting is a common symptom of appendicitis, especially in the later stages of the condition.\n4. Fever: A fever of 100.4°F (38°C) or higher may be present in some cases of appendicitis.\n5. Constipation or diarrhea: Both constipation and diarrhea can occur with appendicitis, depending on the location and

**Observations:** Again, the model seems to understand the question well and has provided a good response, if we ignore the fact that the response is truncated which can be easily rectified.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response(user_input)

Llama.generate: prefix-match hit


"\n\nSudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, or eyelashes.\n\nThe exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.\n\nThere are several treatments that have been shown to be effective in addressing sudden patchy hair loss:\n\n1. Corticosteroids: These are anti-inflammatory drugs that can help reduce inflammation and suppress the immune system's attack on the hair follicles. They can be applied topically or taken orally, depending on the severity of the condition.\n2. Minoxidil: This is a medication that has been shown to promote hair growth in some people with alopecia areata. It works by increa

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(user_input)

Llama.generate: prefix-match hit


"\n\nA person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), may require various treatments depending on the severity and location of the injury. Here are some common treatments recommended for TBIs:\n\n1. Emergency care: The first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating normally. In severe cases, emergency surgery may be required to remove hematomas or other obstructions.\n2. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as swelling, seizures, pain, or infections. For example, corticosteroids may be used to reduce brain swelling, and anticonvulsants may be given to prevent seizures.\n3. Rehabilitation: Rehabilitation is an essential part of the recovery process for TBI patients. Rehabilitation may include physical therapy to help with mobility and strength, occupational therapy to help with daily living skills, speech therapy 

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(user_input)

Llama.generate: prefix-match hit


"\n\nFirst and foremost, if you suspect that someone has fractured their leg while hiking, it's essential to ensure their safety and prevent further injury. Here are some necessary precautions:\n\n1. Keep the person calm and still: Encourage them to remain as still as possible to minimize pain and prevent worsening the injury.\n2. Assess the situation: Check for any signs of shock, such as pale skin, rapid heartbeat, or shallow breathing. If you notice these symptoms, seek medical help immediately.\n3. Immobilize the leg: Use a splint, sling, or other available materials to immobilize the leg and prevent movement. Be sure not to apply too much pressure on the injury site.\n4. Provide pain relief: Offer over-the-counter pain medication, such as acetaminophen or ibuprofen, to help manage pain.\n5. Seek medical attention: If the fracture is severe or if you suspect that there may be other injuries, seek medical help as soon as possible.\n\nOnce you've ensured the person's safety and stabi

**Observations:** Query 3, 4 and 5 have generated relevant responses as well. Query 4 and 5 seem to be complete.

## Question Answering using LLM with Prompt Engineering

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
#Increasing the max_tokens value to 1000
user_input = """
What is the protocol for managing sepsis in a critical care unit?"
"""
response(user_input, max_tokens=1000)

Llama.generate: prefix-match hit


"Sepsis is a life-threatening condition that requires prompt recognition and aggressive management. The following are general steps for managing sepsis in a critical care unit:\n1. Early recognition: Recognize sepsis early by identifying signs and symptoms such as fever, chills, tachycardia, respiratory distress, altered mental status, and lactic acidosis. Use validated scoring systems like the Sequential Organ Failure Assessment (SOFA) score or Quick Sequential Organ Failure Assessment (qSOFA) to help identify patients at risk for sepsis.\n2. Resuscitation: Initiate resuscitation measures as soon as possible, including administering oxygen, intravenous fluids, and vasopressors if necessary to maintain adequate tissue perfusion.\n3. Source control: Identify and address the source of infection as quickly as possible. This may involve surgical intervention or other interventions such as drainage or debridement.\n4. Antimicrobial therapy: Administer appropriate antimicrobial therapy based

**Observations:** Increasing the max_tokens value renders the complete response.

In [None]:
user_input = """
You are an experienced medical assistant. Respond to the user question in a bulletted list.
What is the protocol for managing sepsis in a critical care unit?
"""
response(user_input)

Llama.generate: prefix-match hit


'\n- Immediate recognition and assessment of sepsis using the Sequential [Sepsis-related] Organ Failure Assessment (SOFA) score or Quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA) score.\n- Initiate resuscitation with intravenous fluids to maintain adequate blood pressure and organ perfusion.\n- Administer broad-spectrum antibiotics as soon as possible based on culture results, suspected source of infection, and local guidelines.\n- Monitor vital signs closely, including heart rate, respiratory rate, temperature, oxygen saturation, and mean arterial pressure.\n- Provide supplemental oxygen if needed to maintain adequate oxygenation.\n- Administer vasopressors if necessary to maintain adequate blood pressure.\n- Initiate mechanical ventilation if the patient is unable to breathe adequately or if there is evidence of respiratory failure.\n- Monitor and manage electrolyte imbalances, acid-base disturbances, and coagulation abnormalities as needed.\n- Provide appropriate 

**Observation:**  The model provides the answer as a bulletted list as requested in the prompt.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
system_message="""
You are an experienced medical professional providing written advice. Respond to the user question in a professional tone.
"""

In [None]:
user_input = """
What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?
"""
response(system_message+"\n"+user_input, max_tokens=1000, temperature=0.2)

Llama.generate: prefix-match hit


"\nAppendicitis is a condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms for appendicitis may include:\n\n1. Sudden pain in the lower right abdomen that may begin as mild and gradually worsens over several hours or days.\n2. Loss of appetite and feeling sick to your stomach (nausea).\n3. Vomiting, especially in advanced cases.\n4. Fever, which can be low-grade at first but may rise as high as 101°F (38.3°C) or more.\n5. Constipation or diarrhea.\n6. Abdominal swelling and tenderness.\n7. Pain that worsens when you walk, cough, sneeze, or make other jarring movements.\n8. Inability to pass gas or have a bowel movement.\n\nAppendicitis cannot be cured with medicine alone as the inflammation can lead to rupture of the appendix, which can result in peritonitis – a serious infection of the abdominal cavity. If left untreated, it can lead to severe complications and even death. Therefore, if you suspe

**Observations:** The model has generated a very good response. It answers all questions asked, is organized and structured well with paragraphs for answers to each question. The symptoms are aptly presented as a numbered list.

Increasing the max_tokens to 1000 has generated the complete response.

Setting the temperature to 0.2 does not seem to have a big effect. The wordings are slightly different.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = """What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp?
"""
response(system_message+"\n"+user_input, top_p=0.85, max_tokens=1000 )

Llama.generate: prefix-match hit


"\nSudden patchy hair loss, also known as alopecia areata, is an autoimmune condition that causes hair loss in small, round patches on the scalp. While there is no cure for this condition, several treatments can help promote hair regrowth and potentially prevent further hair loss. Here are some effective treatment options:\n\n1. Corticosteroids: Topical or injected corticosteroids are often used to treat alopecia areata. They work by reducing inflammation in the affected area, which helps to stop the immune system from attacking the hair follicles. Topical corticosteroids can be applied directly to the bald spots, while injected corticosteroids are administered using a very fine needle.\n\n2. Minoxidil: Minoxidil is a medication that has been shown to promote hair growth in people with alopecia areata. It works by increasing blood flow to the affected area and stimulating hair follicles to produce new hairs. Minoxidil is available as a liquid or foam, which can be applied directly to t

**Observations:** Response is relevant and complete.
Changing top_p=0.85 did not add much difference.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = """What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?
Provide your response in a bulletted list with detailed explanation of the benefits and risks for each treatment.
Briefly introduce your response with the factors that determine whether the brain damage is temporary or permanant.
"""
response(system_message+"\n"+user_input, max_tokens=1000)

Llama.generate: prefix-match hit


' I. Assessing the Extent and Type of Brain Injury:\n- A thorough medical evaluation is necessary to determine the severity, location, and type of brain injury.\n- Imaging studies such as CT scans or MRIs can help identify areas of damage and assess the extent of swelling or bleeding.\n\nII. Acute Care:\n- In cases of severe brain injury, immediate care may include:\n  * Controlling intracranial pressure to prevent further damage\n  * Managing seizures with anticonvulsant medications\n  * Providing oxygen and maintaining adequate blood flow through the use of ventilators or other supportive measures\n\nIII. Rehabilitation:\n- Rehabilitation programs can help individuals regain lost skills and improve overall function.\n- Depending on the specific injury, rehabilitation may include:\n  * Physical therapy to address motor impairments\n  * Occupational therapy to help with daily living activities\n  * Speech therapy for communication difficulties\n  * Cognitive rehabilitation to address m

**Observation:** Rewording the question and asking specific information seems to have generated additional specific information and requested formatting compared to the original response.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = """
What are the necessary treatment steps for a person who has fractured their leg during a hiking trip? what should be considered for their care and recovery?
Also, what are some precautionary measures the person must take going forward?
"""
response(system_message+"\n"+user_input)

Llama.generate: prefix-match hit


" I. Immediate Care for a Fractured Leg During Hiking:\n1. Assess the severity of the injury: Check for signs of open wounds, swelling, deformity, or inability to move the leg.\n2. Provide first aid: Apply a sterile dressing if there are any wounds, immobilize the leg using a splint or sling, and apply ice packs to reduce swelling.\n3. Seek medical attention: If the fracture is suspected, it's crucial to seek professional medical help as soon as possible.\n\nII. Treatment for a Fractured Leg:\n1. Diagnosis: X-rays will be taken to determine the type and location of the fracture.\n2. Immobilization: A cast or brace will be used to keep the bone in place while it heals.\n3. Pain management: Prescription pain medication may be necessary to manage discomfort during the healing process.\n4. Rehabilitation: Physical therapy exercises will help restore strength, flexibility, and mobility to the affected leg.\n5. Follow-up appointments: Regular check-ups with your healthcare provider are essen

**Observation:** The content of the response is relevant to the prompt, organized and structured well. We notice that this is a more professional written material as requested.

## Data Preparation for RAG

### Loading the Data

In [6]:
#Loading the Merck PDF Manual
manual_pdf_path = "medical_diagnosis_manual.pdf"
pdf_loader = PyMuPDFLoader(manual_pdf_path)
manual = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [None]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(manual[i].page_content)

Page Number : 1
smithavn@gmail.com
3B1ENFRQJG
meant for personal use by smithavn@gm
shing the contents in part or full is liable 

Page Number : 2
smithavn@gmail.com
3B1ENFRQJG
This file is meant for personal use by smithavn@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.

Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ...........................................................................................................................................................................................
53
1 - N

**Observation:** The pages have been loaded correctly.

#### Checking the number of pages

In [None]:
print(f"The manual contains {len(manual)} pages.")

The manual contains 4114 pages.


**Observation:** The Merck Manual contains 4114 pages and all pages have been loaded.

### Data Chunking

In [7]:
#Using RecursiveCharacterTextSplitter as the text splitter to break up documents into chunks
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512, #creating chunks of size 512
    chunk_overlap= 20 #using a size of 20 as overlap
)

In [8]:
#Load documents and split into chunks. Document chunks are returned as a list of documents
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
print(f"The manual contains {len(document_chunks)} chunks.")

The manual contains 8436 chunks.


In [None]:
#display content of first chunk
document_chunks[0].page_content

'smithavn@gmail.com\n3B1ENFRQJG\nmeant for personal use by smithavn@gm\nshing the contents in part or full is liable'

In [None]:
document_chunks[1].page_content

'smithavn@gmail.com\n3B1ENFRQJG\nThis file is meant for personal use by smithavn@gmail.com only.\nSharing or publishing the contents in part or full is liable for legal action.'

In [None]:
document_chunks[2].page_content

'Table of Contents\n1\nFront    ................................................................................................................................................................................................................\n1\nCover    .......................................................................................................................................................................................................\n2\nFront Matter    ...........................................................................................................................................................................................\n53\n1 - Nutritional Disorders    ...............................................................................................................................................................\n53\nChapter 1. Nutrition: General Considerations    ...........................................................................................

In [None]:
document_chunks[3].page_content

"Chapter 24. Testing for Hepatic & Biliary Disorders    ......................................................................................................\n305\nChapter 25. Drugs & the Liver    ................................................................................................................................................\n308\nChapter 26. Alcoholic Liver Disease    ....................................................................................................................................\n314\nChapter 27. Fibrosis & Cirrhosis    ............................................................................................................................................\n322\nChapter 28. Hepatitis    ..................................................................................................................................................................\n333\nChapter 29. Vascular Disorders of the Liver    .................................................

**Observation:** Document chunks have been created. Chapter 24 appears at the end of document_chunks[2] and at the beginning of document_chunks[3] which shows that there is overlap. This is correct since we specified an overlap of 20.

### Embedding

In [9]:
#GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios.
#Therefore, using the gte-large embedding model here
embedding_model = SentenceTransformerEmbeddings(model_name="thenlper/gte-large")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
#Checking size of embedding for chunk0
embedding_0 = embedding_model.embed_query(document_chunks[0].page_content)
print(f"Size of embedding for chunk0 is {len(embedding_0)}.")

Size of embedding for chunk0 is 1024.


In [None]:
#Checking size of embedding for chunk1
embedding_1 = embedding_model.embed_query(document_chunks[1].page_content)
print(f"Size of emedding for chunk1 is {len(embedding_1)}.")

Size of emedding for chunk1 is 1024.


In [None]:
#Checking size of embedding for chunk2
embedding_2 = embedding_model.embed_query(document_chunks[2].page_content)
print(f"Size of embedding for chunk2 is {len(embedding_2)}.")

Size of embedding for chunk2 is 1024.


**Observation:** The embeddings for all chunks are the same size, as expected.

### Vector Database

In [10]:
#Creating an output directory for vectorstore
out_dir = 'merck_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [11]:
#Creating the Chromadb vectorestore from the document chunks using the embedding model defined above
vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    persist_directory=out_dir
)

vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

In [None]:
#Checking 8 documents most similar to the user input,
#Using similarity_search_with_score to retrieve the cosine distance for each document as well
vectorstore.similarity_search_with_score("What is the protocol for managing sepsis in a critical care unit?",k=8)

[(Document(page_content="16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\nof physiologic parameters.\nSupportive care for the ICU patient includes provision of adequate nutrition (see p. 21) and prevention of\ninfection, stress ulcers and gastritis (see p. 131), and pulmonary embolism (see p. 1920). Because 15 to\n25% of patients admitted to ICUs die there, physicians should know how to minimize suffering and help\ndying patients maintain dignity (see p. 3480).\nPatient Monitoring and Testing\nSome monitoring is manual (ie, by d

**Observation:** Documents most similar to the prompt have been retrieved. First 3 documents seem to be very relevant. 4th - 8th documents have a lower relevance to the context in the question.
The cosine distance is in the range 0.26 - 0.32 for these retrieved documents. First 3 documents have cosine distance 0.26 - 0.29. Lower cosine distance values show good similarity.

### Retriever

In [12]:
#Setting the retriever to retrieve 8 most similar documents from the vectorstore
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 8} #choosing 8 documents to retrieve after trying other values
)

In [None]:
#Retrieving context from the vectorstore for the specific query
rel_docs = retriever.get_relevant_documents("What is the protocol for managing sepsis in a critical care unit?")
rel_docs

[Document(page_content="16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\nof physiologic parameters.\nSupportive care for the ICU patient includes provision of adequate nutrition (see p. 21) and prevention of\ninfection, stress ulcers and gastritis (see p. 131), and pulmonary embolism (see p. 1920). Because 15 to\n25% of patients admitted to ICUs die there, physicians should know how to minimize suffering and help\ndying patients maintain dignity (see p. 3480).\nPatient Monitoring and Testing\nSome monitoring is manual (ie, by di

In [None]:
model_output = response("What is the protocol for managing sepsis in a critical care unit?",
                        max_tokens=1000,
                        temperature=0,
                        top_p=0.95,
                        top_k=50)

In [None]:
model_output

' Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early Recognition: Identify sepsis early by recognizing the signs and symptoms, which may include fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis. Use validated scoring systems such as the Sequential Organ Failure Assessment (SOFA) score or Quick Sequential Organ Failure Assessment (qSOFA) to help identify patients at risk for sepsis.\n2. Immediate Fluid Resuscitation: Administer intravenous fluids to maintain adequate tissue perfusion and prevent hypotension. The goal is to achieve a mean arterial pressure (MAP) of 65 mmHg or higher, and a central venous pressure (CVP) between 8-12 mmHg in mechanically ventilated patients.\n3. Antibiotics: Administer broad-spectrum antibiotics as soon as po

### System and User Prompt Template

In [13]:
qna_system_message =  """
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".
"""

In [14]:
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

### Response Function

In [15]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = "[INST]" + qna_system_message + '\n' + user_message + "[/INST]"

    # Generate the response
    try:
        response = lcpp_llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [16]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,max_tokens=1000)

'The protocol for managing sepsis in a critical care unit involves several steps:\n\n1. Early recognition and diagnosis: Evaluate patients with suspected infection for signs of sepsis, which may include fever, tachycardia, respiratory rate > 20 breaths per minute or PaCO2 < 32 mm Hg, WBC count > 12,000 cells/μL or < 4,000 cells/μL or > 10% immature forms. However, these criteria are not diagnostic and clinical judgment is important.\n\n2. Fluid resuscitation: Aggressively administer fluids to maintain adequate tissue perfusion and organ function. Target CVP at 8 mm Hg (10 cm H2O) or PAOP at 12 to 15 mm Hg. If a patient remains hypotensive after reaching these targets, consider adding vasopressors such as dopamine or norepinephrine.\n\n3. Antibiotics: Administer broad-spectrum antibiotics based on suspected organisms and clinical setting. Culture results should guide subsequent antibiotic selection.\n\n4. Source control: Drain abscesses, excise necrotic tissue, and address other sources

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
generate_rag_response(user_input, max_tokens=1000)

Llama.generate: prefix-match hit


"The common symptoms for appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia. After a few hours, the pain shifts to the right lower quadrant. Pain increases with cough and motion. Classic signs are right lower quadant direct and rebound tenderness located at McBurney's point. Additional signs may include pain felt in the right lower quadrant with palpation of the left lower quadrant (Rovsing sign), an increase in pain from passive extension of the right hip joint that stretches the iliopsoas muscle (psoas sign), or pain caused by passive internal rotation of the flexed thigh (obturator sign). A low-grade fever is also common.\n\nWithout surgery or antibiotics, mortality from appendicitis is over 50%. Early surgical removal is the preferred treatment for appendicitis to reduce the risk of complications such as rupture and development of an abscess or peritonitis. The surgical procedure typically involves open or laparoscopic appendectom

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
generate_rag_response(user_input, max_tokens=1000)

Llama.generate: prefix-match hit


"Based on the context provided, the condition being described is likely Alopecia Areata. The treatment options for Alopecia Areata include:\n\n1. Corticosteroids: Triamcinolone acetonide suspension can be injected intradermally into small lesions, while potent topical corticosteroids like betamethasone 0.05% can be used. However, oral corticosteroids are effective but have adverse effects and hair loss recurs after cessation of therapy.\n2. Topical anthralin: It may be used in combination with minoxidil for treating Alopecia Areata.\n3. Minoxidil: This medication can be applied topically to stimulate hair growth.\n4. Induction of allergic contact dermatitis using diphencyprone or squaric acid dibutylester: This treatment is best reserved for patients with diffuse involvement who have not responded to other therapies.\n5. Scalp biopsy: It may be necessary when diagnosis is in doubt and can differentiate scarring from nonscarring forms of alopecia.\n\nThe possible causes behind sudden pa

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
generate_rag_response(user_input, max_tokens=1000)

Llama.generate: prefix-match hit


'Based on the context provided, the following treatments are recommended for a person with traumatic brain injury (TBI):\n\n1. Initial treatment: Ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed in patients with more severe injuries to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas.\n2. Supportive care: Preventing systemic complications due to immobilization (such as pneumonia, UTI, thromboembolic disease), providing good nutrition, preventing pressure ulcers, and physical therapy to prevent limb contractures.\n3. Seizure prevention: In patients with significant structural injury or a Glasgow Coma Scale score < 10, a prophylactic anticonvulsant should be considered to prevent seizures during the first week after injury.\n4. Long-term treatment: No single drug controls all types of seizures, and different patients m

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
generate_rag_response(user_input, max_tokens=1000)

Llama.generate: prefix-match hit


"Based on the context provided, here's the answer:\n\nA fractured leg requires immediate medical attention. The first step is to immobilize the injured limb using a splint or sling to prevent further injury and decrease pain. Pain can be managed with opioids if necessary. Once the patient is in the hospital, core temperature should be stabilized, and the leg should be rapidly rewarmed in large containers of circulating water kept at about 40.5°C for around 15 to 30 minutes. The use of parenteral analgesics may also be necessary"

All queries have been answered. max_tokens was set to 1000 rather than the default value to allow the complete response to be generated. Further fine tuning is done below.

### Fine-tuning

# Q1: What is the protocol for managing sepsis in a critical care unit?

In [17]:
#Setting temperature parameter to see if the response generated is more creative
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,temperature= 0.9)

Llama.generate: prefix-match hit


'The protocol for managing sepsis in a critical care unit involves several steps:\n\n1. Early recognition and diagnosis of sepsis based on clinical signs and symptoms such as fever, hypotension, oliguria, and confusion.\n2. Aggressive fluid resuscitation to maintain adequate tissue perfusion and prevent or reverse organ dysfunction.\n3. Antibiotic therapy based on suspected source and sensitivity patterns of causative organisms.\n4. Surgical excision of infected or necrotic tissues and drainage of pus when necessary.\n5. Supportive care'

**Observation:** The response is slightly different from the original response. The responses are still relevant and refers to the context provided.
Providing a temperature of 0.9 did not alter the answer a whole lot.
Also noticed that the generated response is truncated with default max_tokens value.


In [18]:
#Increasing max_tokens parameter value to allow the response to be complete
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input, temperature=0.9, max_tokens=1000)

Llama.generate: prefix-match hit


'The protocol for managing sepsis in a critical care unit includes the following steps:\n\n1. Early recognition and evaluation of sepsis, which may include signs of infection, systemic inflammation, and organ dysfunction.\n2. Aggressive fluid resuscitation to maintain adequate tissue perfusion until CVP or PAOP reaches target levels.\n3. Use of vasopressors (such as dopamine or norepinephrine) if the patient remains hypotensive after fluid resuscitation.\n4. Administration of oxygen and possible mechanical ventilation for respiratory failure.\n5. Broad-spectrum antibiotics based on suspected source, clinical setting, and previous culture results. Antibiotic selection may require an educated guess and adjustment based on culture and sensitivity results.\n6. Surgical excision of infected or necrotic tissues and drainage of pus to eliminate septic foci.\n7. Supportive care, including monitoring vital signs, electrolyte levels, renal function, and blood glucose levels.\n8. Normalization of

**Observation:** Setting max_tokens to 1000 generates the complete response. Wordings seem to be different from previous run. The content still refers to septis management.

# Q2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [31]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
generate_rag_response(user_input, top_p=0.5, max_tokens=5000)

Llama.generate: prefix-match hit


"The common symptoms for appendicitis include epigastric or periumbilical pain that shifts to the right lower quadrant, brief nausea, vomiting, anorexia, and pain that increases with coughing and movement. Right lower quadrant direct and rebound tenderness at McBurney's point is also a classic sign. However, these symptoms may not be present in all cases, particularly in children and elderly patients or pregnant women.\n\nAppendicitis cannot be cured via medicine alone as it requires surgical intervention to remove the inflamed appendix. The standard procedure for treating appendicitis is an appendectomy, which can be performed through open surgery or laparoscopically. Antibiotics are usually administered before and after the surgery to prevent infection and reduce complications. If the appendix has ruptured and formed an abscess, drainage of the abscess may be necessary prior to the appendectomy."

# Q3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [30]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
generate_rag_response(user_input,temperature= 0.5, max_tokens=1000)

Llama.generate: prefix-match hit


"The document provided discusses Alopecia Areata, a condition characterized by sudden patchy hair loss. The treatment options for Alopecia Areata include:\n\n1. Corticosteroids: Triamcinolone acetonide suspension or potent topical corticosteroids can be used to treat small lesions. Oral corticosteroids are effective but have adverse effects and hair loss recurs after cessation of therapy.\n2. Topical anthralin: It may be used in combination with minoxidil for treating Alopecia Areata.\n3. Minoxidil: It can be applied topically to stimulate hair growth.\n4. Induction of allergic contact dermatitis using diphencyprone or squaric acid dibutylester: This treatment is best reserved for patients with diffuse involvement who have not responded to other therapies.\n5. Oral antimalarials may be used to treat Lichen Planopilaris and chronic cutaneous lupus lesions, which can also cause patchy hair loss.\n\nThe possible causes behind sudden patchy hair loss include autoimmune disorders, genetic s

# Q4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [29]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
generate_rag_response(user_input,top_p=0.85, max_tokens=1000)

Llama.generate: prefix-match hit


"Based on the context provided, the initial treatment for a person with a traumatic brain injury (TBI) includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for patients with more severe injuries to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are important. Subsequently, many patients require rehabilitation.\n\nSupportive care is essential for TBI patients, including preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers. For patients with significant structural injury or a Glasgow Coma Scale (GCS) score < 10, a prophylactic anticonvulsant should be considered to prevent seizures. Seizures can worsen brain da

# Q5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [28]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
generate_rag_response(user_input,top_k=100, temperature=0.2, max_tokens=1000)

Llama.generate: prefix-match hit


'The context provides information on the treatment for fractures, which includes the following necessary precautions and treatment steps:\n\n1. Immobilization: The injured limb should be immobilized to prevent further injury and facilitate healing. This can be done using a cast or a splint, depending on the severity of the injury and the length of immobilization required.\n2. Elevation: The injured leg should be elevated above heart level for the first 48 hours to help minimize swelling and drain edema fluid.\n3. Application of heat: After 48 hours, periodic application of warmth (such as a heating pad) for 15 to 20 minutes may relieve pain and speed healing.\n4. Keeping the cast dry: Patients with casts should be instructed to keep them dry to prevent moisture from entering and causing skin irritation or infection.\n5. Avoiding putting objects inside the cast: Patients should not put any objects inside their cast as it may cause pressure points or other complications.\n6. Inspecting t

# Q6: Random question not related to the medical field.

In [None]:
#Trying a prompt that does not have a context in the manual
user_input = "Who won the Super Bowl in 2025?"
generate_rag_response(user_input,top_k=20, temperature=0)

Llama.generate: prefix-match hit


"I don't know. The context provided does not contain any information about the Super Bowl winner in 2025."

**Observations:**
All questions have been answered. The responses with different parameters are above. Will check groundness and relevance next to evaluate output.
The model also correctly states that it doesn't know the Super Bowl winner of 2025.

## Output Evaluation

In [19]:
groundedness_rater_system_message  = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""

In [20]:
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""

In [21]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}

"""

In [22]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    print("prompt:", prompt)

    response = lcpp_llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = lcpp_llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = lcpp_llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

In [23]:
#Checking relevance for Q1 with temperature=0.9 and max_tokens=1000
user_input = "What is the protocol for managing sepsis in a critical care unit?"
ground,rel = generate_ground_relevance_response(user_input,temperature=0.9, max_tokens=1000)


print(ground,end="\n\n")
print(rel)

prompt: [INST]
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".


                user: 
###Context
Here are some documents that are relevant to the question mentioned below.
16 - Critical Care Medicine
Chapter 222. Approach to the Critically Ill Patient
Introduction
Critical care medicine specializes in caring for the most seriously ill patients. These patients are best
treated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special
populations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high
nurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring
of physiologic parameters.
Supportive care for the ICU 

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 The steps mentioned in the answer are largely derived from the context provided. The context discusses various aspects of managing critically ill patients, including the importance of fluid resuscitation, use of vasopressors, mechanical ventilation, antibiotics, drainage of abscesses and necrotic tissue, normalization of blood glucose levels, corticosteroid therapy, and monitoring in the context of sepsis and septic shock. The answer explicitly lists out these steps in managing sepsis in a critical care unit.

The answer follows the metric as it is derived solely from the information presented in the context and does not include any additional or external information. Therefore, the extent to which the metric is followed is 5 - The metric is followed completely.

Based on the evaluation criteria, I would rate the answer with a score of 5 because it fully adheres to the given metric by being derived solely from the context and not including any additional or external information.

 To 

**Observation:** With temperature=0.9 and max_tokens=1000 the AI generated response is rated a 5 because it fully adheres to the given metric by being derived solely from the context and not including any additional or external information.

The context is rated only a 3 since the response mentions mentions ICU patient care, supportive care, and monitoring which are related to managing critically ill patients and are somewhat relevant to the question but do not directly address sepsis treatment.

In [24]:
#Checking relevance for Q2 with top_p=0.5, temperature=0(default) and max_tokens=5000
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
ground,rel = generate_ground_relevance_response(user_input, top_p=0.5, max_tokens=5000)


print(ground,end="\n\n")
print(rel)

prompt: [INST]
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".


                user: 
###Context
Here are some documents that are relevant to the question mentioned below.
• Surgical removal
• IV fluids and antibiotics
Treatment of acute appendicitis is open or laparoscopic appendectomy; because treatment delay
increases mortality, a negative appendectomy rate of 15% is considered acceptable. The surgeon can
usually remove the appendix even if perforated. Occasionally, the appendix is difficult to locate: In these
cases, it usually lies behind the cecum or the ileum and mesentery of the right colon. A contraindication to
appendectomy is inflammatory bowel disease involving the cecum. However, in cases of terminal ileitis
and 

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the question and the specific information being asked for in the question.
2. Read through the context provided to understand the background information related to appendicitis, its symptoms, diagnosis, and treatment.
3. Compare the information given in the context with the answer to determine if the answer is derived solely from the context.

Evaluation:
The answer adheres to the metric to a good extent as it includes the common symptoms for appendicitis mentioned in the context, such as epigastric or periumbilical pain that shifts to the right lower quadrant, brief nausea, vomiting, and anorexia. The answer also mentions that pain increases with cough and motion and that classic signs include right lower quadrant direct and rebound tenderness located at McBurney's point. Additionally, the answer states that appendicitis cannot be cured via medicine alone and requires surgical removal of the appendix for treatment.

Rating:
Based on the evalu

**Observation:** With a top_p=0.5 and max_tokens=5000, the AI generated answer is rated a 3. The answer includes most of the relevant information from the context but does not cover all aspects of the question, such as the specific symptoms that may be present in infants, children, elderly patients, and pregnant women or the use of imaging studies for diagnosis. However, the answer accurately conveys the main points regarding the symptoms and treatment of appendicitis as presented in the context.

The context is completely relevant and rated a 5.

In [25]:
#Checking relevance for Q3 with temperature=0.5 and max_tokens=1000
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
ground,rel = generate_ground_relevance_response(user_input, temperature= 0.5, max_tokens=1000)


print(ground,end="\n\n")
print(rel)

prompt: [INST]
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".


                user: 
###Context
Here are some documents that are relevant to the question mentioned below.
corticosteroids, retinoids, or immunosuppressants.
Hair loss due to chemotherapy is temporary and is best treated with a wig; when hair regrows, it may be
different in color and texture from the original hair. Hair loss due to telogen effluvium or anagen effluvium
is usually temporary as well and abates after the precipitating agent is eliminated.
Key Points
• Androgenetic alopecia (male-pattern and female-pattern hair loss) is the most common type of hair loss.
• Concomitant virilization in women or scarring hair loss should prompt a thorough evaluation fo

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 To evaluate the answer, we need to follow these steps:

1. Identify the information in the context that relates to the question about effective treatments and possible causes of sudden patchy hair loss (alopecia areata).
2. Determine if the AI-generated answer is derived solely from the provided context without introducing any new or irrelevant information.
3. Evaluate the extent to which the metric is followed by the answer based on the above steps.
4. Assign a score according to the evaluation criteria.

Step 1: The context provides detailed information about alopecia areata, its causes, and various treatment options.

Step 2: The AI-generated answer mentions corticosteroids, topical anthralin, minoxidil, immunotherapy (diphencyprone or squaric acid dibutylester), and the possible autoimmune nature of alopecia areata as causes. All these details are directly taken from the context.

Step 3: The metric is followed mostly by the answer since it mentions all the treatment options provi

**Observation:** With a temperature of 0.5, the generated answer is rated a 4. The metric is followed mostly by the answer since it mentions all the treatment options provided in the context without introducing any new or irrelevant information.

Based on the evaluation criteria and steps above, I would rate the AI-generated answer as a 4 (The metric is followed mostly).

The context is highly relevant to the question as it provides detailed information about alopecia areata, its causes, and various treatment options. The context covers all important aspects of the question and is rated a 5.

In [26]:
#Checking relevance for Q4 with top_p=0.85 and max_tokens=1000
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
ground,rel = generate_ground_relevance_response(user_input, top_p=0.85, max_tokens=1000)


print(ground,end="\n\n")
print(rel)

prompt: [INST]
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".


                user: 
###Context
Here are some documents that are relevant to the question mentioned below.
Chapter 324. Traumatic Brain Injury
Introduction
Traumatic brain injury (TBI) is physical injury to brain tissue that temporarily or permanently
impairs brain function. Diagnosis is suspected clinically and confirmed by imaging (primarily
CT). Initial treatment consists of ensuring a reliable airway and maintaining adequate
ventilation, oxygenation, and blood pressure. Surgery is often needed in patients with more
severe injury to place monitors to track and treat intracranial pressure, decompress the brain if
intracranial pressure is increased, or remove i

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the key information related to treatments for traumatic brain injury from the context.
2. Compare the information identified in step 1 with the AI generated answer to determine if all the recommended treatments are included and derived only from the context.

Explanation:
The AI generated answer includes all the recommended treatments for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function as per the context. The answer is derived entirely from the information provided in the context. Therefore, the metric is followed completely.

Rating:
Based on the evaluation criteria and the complete adherence to the metric, I would rate the AI generated answer a 5.

 To evaluate the context as per the metric of relevance, the following steps are needed:

1. Identify the main aspects of the question: The question asks about treatments for a person with brain injury resulting in impa

**Observation:** With top_p=0.85 and max_tokens=1000, the AI generated answer is rated a 5. The AI generated answer includes all the recommended treatments for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function as per the context.

The context is also rated a 5.

In [27]:
#Checking relevance for Q5 with top_k=100, temperature=0.2 and max_tokens=1000
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
ground,rel = generate_ground_relevance_response(user_input, top_k=100, temperature=0.2, max_tokens=1000)


print(ground,end="\n\n")
print(rel)

prompt: [INST]
You are a medical assistant and you need to answer the user question based on your medical knowledge.
The context for this query is provided to you. It begins with the token: ###Context.

User questions will begin with the token: ###Question.

If the answer is not found in the context, respond "I don't know".


                user: 
###Context
Here are some documents that are relevant to the question mentioned below.
batting, and layers 2 and 4 are elastic bandages. The injured limb is elevated above the heart for the first
2 days in a position that allows gravity to help drain edema fluid and thus minimize swelling. After 48 h,
periodic application of warmth (eg, a heating pad) for 15 to 20 min may relieve pain and speed healing.
Immobilization: Immobilization decreases pain and facilitates healing by preventing further injury and is
helpful except for very rapidly healing injuries. Joints proximal and distal to the injury should be
immobilized.
A cast is usually used 

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the necessary precautions and treatment steps mentioned in the context for a person with a fractured leg.
2. Check if the AI generated answer includes all the identified precautions and treatment steps.
3. Evaluate the extent to which the answer is derived only from the information presented in the context.

The AI generated answer adheres to the metric as it includes all the necessary precautions and treatment steps mentioned in the context for a person with a fractured leg. Therefore, the answer is:
5 - The metric is followed completely.

Based on the evaluation criteria, I would rate the answer as:
5 - The metric is followed completely.

Therefore, the score would be 5.

 To evaluate the context as per the metric of relevance, the following steps are needed:

1. Identify the main aspects of the question: The question asks about the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, as we

**Observation:** With top_k=100, temperature=0.2 and max_tokens=1000, the answer is rated a 5. It includes all the necessary precautions and treatment steps mentioned in the context for a person with a fractured leg.

The context is rated a 5. The context covers all the important aspects related to the precautions and treatment steps for a person with a fractured leg, as well as what should be considered for their care and recovery.

## Actionable Insights and Business Recommendations

* Vector database took a while to create. With increase in the size of the PDF document(s) that need to be loaded, the storage space + the time needed to create the vectorstore will increase.

* Retrieval parameter k is critical as the answer can be spread across multiple contexts. Setting k=8 seemed sufficient for context to be completely relevant for all questions.

* chunk_overlap set to 20 ensure continuity across chunks.

* max_tokens depends on the complexity of the response. Detailed responses need higher values. Simple queries result in concise outputs inspite of large token limits. In this project max_tokens=1000 seemed to satisfy the need of all questions asked.

* Prompt design can influence how the response is formatted and what specific aspects to focus on. Specific questions will generate specific answers.

* The temperature value affects the variability of the response. Setting temperature to 0.5 showed the response quality to be rated a 4. temperature setting of 0.2 or lower yielded responses that were given an evaluation rating of 5.

* top_p=0.5 influences the response quality. Higher values generate responses that are of greater quality. In this scenario top_p of    0.85 or greater generated responses that were given an evaluation rating of 5.

* Default value top_k=50 generated quality responses. There is no need to increase the top_k value in this case.

* The responses when using RAG always started with "Based on the context..." which seems to be something the Mistral model is trained to generate and is hard to get rid off.

In my case, the Mistral model that I used seemed to generate good responses. It seemed to understand the context without RAG. For proprietary information, RAG would be very useful. The temperature, top_p and top_k values can be adjusted based on size and number of documents. The k value for the similarity_search can be adjusted depending on how dispersed the information is in the documents. When information is spread across, k values may need to be larger to generate good responses. chunk_size and overlap can be adjusted to ensure coherence between chunks.
___

