<a href="https://colab.research.google.com/github/r-autowired/AIMLWorks/blob/main/Notes/NLP/Pi5_Proj_NLP_RAG_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# 1. Check current CUDA version on the A100 VM
print("Checking CUDA version...")
!nvcc --version

Checking CUDA version...
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [None]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q
#!CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --verbose

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m72.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m174.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m247.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m122.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m252.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency

In [None]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community chromadb sentence-transformers numpy -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.8/16.8 MB[0m [31m47.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.2.6 which is incompatible.
tensorflow 2.18.0 requires numpy<2.1.0,>=1.26.0, but you have numpy 2.2.6 which is incompatible.
numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.2.6 which is incompatible.[0m[31m
[0m

In [None]:
!pip uninstall numpy scipy tsfresh -y
!pip install numpy==1.26.4 scipy==1.13.1 gensim==4.3.3


Found existing installation: numpy 2.2.6
Uninstalling numpy-2.2.6:
  Successfully uninstalled numpy-2.2.6
Found existing installation: scipy 1.13.1
Uninstalling scipy-1.13.1:
  Successfully uninstalled scipy-1.13.1
[0mCollecting numpy==1.26.4
  Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting scipy==1.13.1
  Using cached scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
Using cached scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)
Installing collected packages: numpy, scipy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.[0m[31m
[0mSucce

In [None]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma



In [None]:
#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [None]:
## Model configuration
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename = "llama-2-13b-chat.Q5_K_M.gguf"
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
    )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Observation
- Downloaind the Llama model from Hugging Face

In [None]:
import torch

if torch.cuda.is_available():
    gpu_count = torch.cuda.device_count()
    print(f"Number of GPUs available: {gpu_count}")
    for i in range(gpu_count):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        total_memory = torch.cuda.get_device_properties(i).total_memory
        print(f"Total memory: {total_memory / (1024**3):.2f} GB") # Convert bytes to GB
else:
    print("No CUDA-enabled GPU found.")

No CUDA-enabled GPU found.


Observation
- I am getting out of memory error when using GPU and trying to see if GPU enabled
- This is weired and in the middle of programing executions it **crashes and restarts session** and this is pain in neck for me to execute everything from scratch
- I ran on CPU most of my commands and had to spend tons of time waiting for it to complete

In [None]:
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2,  # CPU cores
    n_batch=512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43,  # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096,  # Context window
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


Observation
- n_therads is set to 2 as running this code on my laptop where number of CPU Cores are minimal (my laptop has only 4 cores)
- n_batch is 512 and this refers to the number of input prompts processed together in a single step. Higher the numbers - speed up training
- n_gpu_layers  will transfer the number of process to GPU and remaning will be on CPU. Though i had memory issues i still had 43 for my gpu_layer
- n_ctx refers to context of length which determines the conversation history the model can **remmeber** and that is why n_batch is dependent on this variable. Larger the value more coherence and relevance to the input.

#### Response

In [None]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    # System message
    system_message = """
    [INST]<> Respond to the user question based on the user prompt<>[/INST]
    """

    # Combine user_prompt and system_message to create the prompt
    prompt = f"{user_prompt}\n{system_message}"


    model_output = lcpp_llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

#####Observation
- max_tokens - this variable determines how much input/output tokens can be used for prompting
- temperature - this variable controls the randomness of the generated response. Lower the value result in more predictable response and 0 is the lowest
- top_p controls the diversity of the generated response by establishing a cumulative probablity cutoff for token selection. Higher the value will result in more diverse response
- top_k is the parameter controls the max number of most likely next tokens to consider

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
resp = response("What is the protocol for managing sepsis in a critical care unit?")
print(resp)

 Sepsis is a life-threatening condition that can arise from an infection, and it is a medical emergency. The protocol for managing sepsis in a critical care unit typically includes the following steps:

1. Early recognition and diagnosis: Healthcare providers should be vigilant in identifying patients who are at risk of developing sepsis, such as those with severe infections, and monitor them closely for signs of sepsis.

2. Rapid administration of antibiotics: Once sepsis is suspected, antibiotics should be admin


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
resp = response("What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?")
print(resp)

Llama.generate: prefix-match hit


 Appendicitis is a medical emergency that requires immediate attention. The symptoms of appendicitis can vary from person to person, but common signs include:

1. Severe pain in the abdomen, usually starting near the belly button and then moving to the lower right side of the abdomen.
2. Nausea and vomiting.
3. Loss of appetite.
4. Fever.
5. Abdominal tenderness and guarding (muscle tension).
6. Abdominal swelling.
7. Diarrhea or const


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
resp = response("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?")
print(resp)

Llama.generate: prefix-match hit


 Sudden patchy hair loss, also known as alopecia areata, is a common condition that can cause localized bald spots on the scalp. There are several effective treatments and solutions for addressing this condition, and the possible causes vary depending on the individual case. Here are some of the most common causes and treatment options:

1. Autoimmune disorders: Alopecia areata is often associated with autoimmune disorders, where the body's immune system mistakenly attacks healthy hair follicles, leading to hair loss. Treatment options for autoimmune-related


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
resp = response("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?")
print(resp)

Llama.generate: prefix-match hit


 Treatments for a person who has sustained a physical injury to brain tissue and experienced temporary or permanent impairment of brain function depend on the severity and location of the injury, as well as the individual's overall health and medical history. Here are some possible treatment options:

1. Medications: To manage symptoms such as pain, inflammation, and anxiety, medications like analgesics, anti-inflammatory drugs, and antidepressants may be prescribed.
2. Rehabilitation therapy: Physical, occupational, and speech therapy


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
resp = response("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?")
print(resp)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help! If someone has fractured their leg while hiking, it is essential to take immediate action to provide proper care and prevent further complications. Here are some necessary precautions and treatment steps that should be considered:

1. Stop activity: The first step is to stop any physical activity and rest the affected leg immediately. This will help prevent further damage or injury to the fractured bone.

2. Assess the severity of the injury: Evaluate the extent of the fracture by checking for signs such as swelling, deform


Observation
- Most of the response generated has max_token of 128

Clarification
- It was asking only for response from llm and i am not expermenting with various parameter and experimenting with various parameters in the next session.

## Question Answering using LLM with Prompt Engineering
- Prompt for individual questions is done and not common template used for Prompting

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_prompt = ''' As a clinician seeking information for a patient, please provide a concise summary of the diagnosis and management protocol for sepsis in a critical care unit ?'''
resp = response(user_prompt, max_tokens=1024)
print(resp)

Llama.generate: prefix-match hit




As a clinician seeking information for a patient, I would like to know the following:

1. Definition and criteria for sepsis: What are the specific signs and symptoms that indicate sepsis has developed?
2. Diagnostic tests: Which laboratory and imaging studies are used to confirm the diagnosis of sepsis?
3. Management protocol: What are the general principles of management for sepsis in a critical care unit, including fluid resuscitation, antibiotics, and vasopressors?
4. Monitoring and follow-up: How is the patient's response to treatment monitored, and what are the criteria for escalation of care or transfer to a higher level of care?
5. Complications and sequelae: What are the potential complications and long-term sequelae of sepsis, and how can they be prevented or managed?
6. Prognosis and mortality: What is the typical prognosis for patients with sepsis, and what factors influence mortality?
7. Prevention: What strategies can be implemented to prevent sepsis from developing in 

Observation
- max_tokens is 1024 and we could see the response has more text lines than the previous

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_prompt = ''' As a clinician seeking information for a patient, please provide a concise summary of common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it ?
Please provide clear and easy-to-understand answers, and include links to resources for additional information."
'''
resp = response(user_prompt, 1024)
print(resp)

Llama.generate: prefix-match hit



As a clinician seeking information for a patient, here is a concise summary of common symptoms for appendicitis:

Common Symptoms of Appendicitis:

1. Severe pain in the abdomen, usually starting near the belly button and then moving to the lower right side of the abdomen.
2. Nausea and vomiting.
3. Loss of appetite.
4. Fever.
5. Abdominal tenderness and guarding (muscle tension) in the affected area.
6. Abdominal swelling or bloating.
7. Diarrhea or constipation.
8. Inability to pass gas.

It is important to note that not everyone with appendicitis will experience all of these symptoms, and some people may have different symptoms altogether. If you suspect that your patient has appendicitis, it is essential to seek medical attention immediately.

Can Appendicitis be Cured via Medicine?

No, appendicitis cannot be cured via medicine. Appendicitis is a surgical condition that requires prompt surgical intervention to remove the inflamed appendix. Delaying surgery can lead to complicatio

Observation
- Prompted the llm to generate additional resources
- max_toekns is 1024

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_prompt = ''' As a Dermatologist looking for information about effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp for a patient,
please provide a concise summary of what could be the possible causes behind it?
'''
resp = response(user_prompt, 2048)
print(resp)

Llama.generate: prefix-match hit



As a dermatologist, I would consider the following possible causes of sudden patchy hair loss, also known as alopecia areata:

1. Autoimmune disorder: Alopecia areata is an autoimmune condition where the immune system mistakenly attacks healthy hair follicles, leading to hair loss.
2. Hormonal imbalance: Hormonal changes, such as those that occur during pregnancy or menopause, can cause hair loss in patches.
3. Skin conditions: Certain skin conditions like eczema, psoriasis, and folliculitis can cause inflammation and lead to hair loss.
4. Infections: Fungal or bacterial infections can cause hair loss by damaging the hair follicles.
5. Traction alopecia: Prolonged use of tight hairstyles like braids, ponytails, or cornrows can cause hair loss due to traction on the scalp.
6. Telogen effluvium: This is a condition where there is an increase in the number of hair follicles that stop growing and enter the resting phase, leading to excessive hair shedding.
7. Medications: Certain medicati

Observation
- max_token is 2048

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_prompt = ''' As a caregiver looking for information about What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?
Provide response that is coming from a senior medical expert and also make it clear to understand for the patient.
'''
resp = response(user_prompt, 1024, temperature=0.01)
print(resp)

Llama.generate: prefix-match hit



Answer:

As a senior medical expert, I understand the concern and complexity of treating a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function. The treatment approach will depend on the severity and location of the injury, as well as the individual's overall health and medical history.

Here are some common treatments that may be recommended for a person with a brain injury:

1. Medications: To manage symptoms such as pain, anxiety, or depression, and to prevent seizures or further damage to the brain.
2. Rehabilitation therapy: To help regain lost function and improve cognitive, physical, and emotional abilities. This may include physical therapy, occupational therapy, speech therapy, and cognitive therapy.
3. Surgery: In some cases, surgery may be necessary to relieve pressure on the brain, repair damaged blood vessels, or remove a tumor.
4. Lifestyle changes: To help manage symptoms and improve overall health,

Observation
- max_token is set to 1024
- temperature is 0.01

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_prompt = ''' As a nurse looking for information about What are the necessary precautions and treatment steps for a person who has fractured their leg ?
Include details on potential treatment modalities, rehabilitation strategies, and important aspects of recovery for someone who has experienced this type of injury."
'''
resp = response(user_prompt, 1024, top_k=25)
print(resp)

Llama.generate: prefix-match hit



A fracture of the leg can be a serious and painful injury that requires prompt medical attention. As a nurse, it is essential to understand the necessary precautions and treatment steps to ensure proper healing and minimize potential complications. This article will provide an overview of the common types of leg fractures, potential treatment modalities, rehabilitation strategies, and important aspects of recovery for someone who has experienced this type of injury.

Types of Leg Fractures
-------------------------

There are several types of leg fractures that can occur, depending on the location and severity of the injury. Some common types of leg fractures include:

* Femur fracture: A break in the thigh bone (femur) is a serious injury that requires prompt medical attention.
* Tibia fracture: A break in the shin bone (tibia) is another common type of leg fracture that can be caused by trauma or sports injuries.
* Fibula fracture: A break in the smaller bone of the lower leg (fibul

Obseravation
- top_k is 25


Observation
- From the execution of Quesiton Answering using LLM with and without Prompt Engineering clearly proves that Prompt makes the LLM engine to perform better with various configurations on **temperature, max_tokens, top_p, top_k**
- I have input the Prompt question within each section of execution and we could see the response in much clearer form and with fine-tuning

## Data Preparation for RAG

### Loading the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from langchain_community.document_loaders import PyMuPDFLoader
pdf_loader = PyMuPDFLoader('/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf')

In [None]:
manual = pdf_loader.load()

In [None]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"


In [None]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
llm = Llama(
    model_path=model_path,
    n_ctx=2500,
    n_gpu_layers=23,
    n_batch=512
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


Observation
- Downloading model --> Mistral 7B
- n_ctx is 2024 for input prompting variables

### Data Overview

#### Checking the first 5 pages

In [None]:
for page in manual[:5]:
    print(page.page_content)


rajesh.g86@yahoo.com
3LS890G5KA
eant for personal use by rajesh.g86@yah
shing the contents in part or full is liable
rajesh.g86@yahoo.com
3LS890G5KA
This file is meant for personal use by rajesh.g86@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ...........................................................................................................................................................................................
53
1 - Nutritional Disorders    ...................

#### Checking the number of pages

In [None]:
len(manual)

4114

### Data Chunking

In [None]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 50
)

In [None]:
document_chunks = pdf_loader.load_and_split(text_splitter)
document_chunks[4]

Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationdate': '2012-06-15T05:44:40+00:00', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2025-06-21T13:48:40+00:00', 'trapped': '', 'modDate': 'D:20250621134840Z', 'creationDate': 'D:20120615054440Z', 'page': 3}, page_content='491\nChapter 44. Foot & Ankle Disorders    .....................................................................................................................................\n502\nChapter 45. Tumors of Bones & Joints    .......................................................................................................

In [None]:
document_chunks[5]

Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationdate': '2012-06-15T05:44:40+00:00', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2025-06-21T13:48:40+00:00', 'trapped': '', 'modDate': 'D:20250621134840Z', 'creationDate': 'D:20120615054440Z', 'page': 3}, page_content='731\nChapter 69. Optic Nerve Disorders    ......................................................................................................................................\n737\nChapter 70. Orbital Diseases    ...............................................................................................................

Observation
- from the above document_chunks [4] and [5] there is a overlap of chunks from the document

In [None]:
len(document_chunks)

8659

In [None]:
import pickle

# Save to Google Drive or local directory
with open("/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/document_chunks1.pkl", "wb") as f:
    pickle.dump(document_chunks, f)


In [None]:
with open("/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/document_chunks1.pkl", "rb") as f:
    document_chunks = pickle.load(f)

print(f"Loaded {len(document_chunks)} chunks.")


Loaded 8659 chunks.


Observation
- The above import for pickle is to save the document_chunks to local_drive
- getting out of memory error and had use this technique to retrive document_chunks at time for relevant process
- Also this has the exact number of chunks loaded 8659

Observation
- chunk_size 512 - that creates chunks of 512 to store data into the vectors
- chunk_overlap is the overlapping length of the sentence so that the model can connect between two chunks in sequence and it is set at 50

### Embedding

In [None]:
#! pip install sentence-transformers==2.7.0
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

  embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')


In [None]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [None]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  1024


True

Observation
- Embedding model provides fixed length vector for any number of chunks

### Vector Database

In [None]:
zip_file_path1 = "/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/manual_db1.zip"
persist_directory1 = "/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1"
os.makedirs(persist_directory1, exist_ok=True)

In [None]:

print(f"Unzipping {zip_file_path1} to {persist_directory1}...")
!unzip "{zip_file_path1}" -d "{persist_directory1}"
print("Unzipping complete!")

Unzipping /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/manual_db1.zip to /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1...
Archive:  /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/manual_db1.zip
   creating: /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1/
   creating: /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1/698233d7-e8e0-40a6-bdc9-d42bff959296/
  inflating: /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1/698233d7-e8e0-40a6-bdc9-d42bff959296/header.bin  
  inflating: /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1/698233d7-e8e0-40a6-bdc9-d42bff959296/link_lists.bin  
  inflating: /content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1/698233d7-e8e0-40a6-bdc9-d42bff95929

In [None]:
import os
import shutil

db_path = "/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/chroma_db_extracted1/content/manual_db1"

# Detect the actual UUID folder
for name in os.listdir(db_path):
    full_path = os.path.join(db_path, name)
    if os.path.isdir(full_path) and name != "index":
        print(f"Renaming index folder: {name} → index")
        shutil.move(full_path, os.path.join(db_path, "index"))
        break


Renaming index folder: 698233d7-e8e0-40a6-bdc9-d42bff959296 → index


In [None]:
vectorstore1 = Chroma(
    persist_directory=db_path,  # or the unzipped path
    embedding_function=embedding_model
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
print(vectorstore1._collection.count())  # Hidden method but works

16892


In [None]:
vectorstore1.similarity_search("The main symptom of cataract",k=2)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(metadata={'creationDate': 'D:20120615054440Z', 'modDate': 'D:20250621134840Z', 'format': 'PDF 1.7', 'author': '', 'subject': '', 'creator': 'Atop CHM to PDF Converter', 'trapped': '', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'creationdate': '2012-06-15T05:44:40+00:00', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'page': 4113, 'total_pages': 4114, 'moddate': '2025-06-21T13:48:40+00:00', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'keywords': ''}, page_content="Z\nZafirlukast 1879\nZalcitabine 1451\nin children 2854\nZaleplon 1709\nZanamivir 1407\nin influenza 1407, 1929\nZAP-70 (zeta-associated protein 70) deficiency 1092, 1108\nZavanelli maneuver 2680\nZellweger syndrome 2383, 3023\nZenker's diverticulum 125\nZidovudine 1451, 1453\nin children 2854\nZileuton 1881\nin asthma 1

Observation
- The above steps are downloading the vector database (Chroma) to google drive and retrieving them for future use
- As mentioned this project is of tremendous challenges in terms of memory errors and so the above steps are followed and used at time
- Though above steps are followed the project still relies on the actual vector store stored in the temporary directory
- From the above we would see vectorestore1 is able to fetch the user input but only brings in only 1 similarity search and that is the reason why we would still use the actual vector store from the temporary stored Chroma DB

####Actual implementation of Vector Store

In [None]:
import json,os

out_dir = 'manual_db3'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [None]:
vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    persist_directory=out_dir
)
vectorstore.persist()

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
  vectorstore.persist()


In [None]:
print(vectorstore._collection.count())  # Hidden method but works


17318


Observation
- The vectorstore.persist() is used to persist the data into the ChormaDB - if not the when the session restarts the vectorstore deletes the temporary stored vectors
- vectore Store collection count
  - vectorestore --> 17318
  - vectorestore1 --> 16892
  - Almost same size for vector collections

In [None]:
## Downloaded the Vector Store for future use
## This project is consuming more RAM
!zip -r manual_db3.zip /content/manual_db3

updating: content/manual_db3/ (stored 0%)
updating: content/manual_db3/chroma.sqlite3 (deflated 52%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/ (stored 0%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/header.bin (deflated 55%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/link_lists.bin (deflated 75%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/length.bin (deflated 98%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/data_level0.bin (deflated 8%)
updating: content/manual_db3/846ad7b3-18cd-4500-b48d-8947e8f19a67/index_metadata.pickle (deflated 45%)


Obseration
- Once vectorstore.persists() is executed it persists data but then the above line of code is executed to zip it and store it in the google drive

In [None]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

  vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [None]:
vectorstore.similarity_search("The main symptom of cataract",k=2)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


[Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'page': 719, 'trapped': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'creationDate': 'D:20120615054440Z', 'keywords': '', 'subject': '', 'creator': 'Atop CHM to PDF Converter', 'modDate': 'D:20250621134840Z', 'moddate': '2025-06-21T13:48:40+00:00', 'author': '', 'format': 'PDF 1.7', 'total_pages': 4114, 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition'}, page_content='Chapter 66. Cataract\n(For developmental or congenital cataracts, see p. 2920.)\nA cataract is a congenital or degenerative opacity of the lens. The main symptom is gradual,\npainless vision blurring. Diagnosis is by ophthalmoscopy and slit-lamp examination. Treatment\nis surgical removal and placement of an intraocular l

In [None]:
vectorstore1.similarity_search("The main symptom of cataract",k=2)

[Document(metadata={'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'keywords': '', 'total_pages': 4114, 'moddate': '2025-06-21T13:48:40+00:00', 'format': 'PDF 1.7', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'page': 4113, 'author': '', 'creationDate': 'D:20120615054440Z', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'trapped': '', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'creationdate': '2012-06-15T05:44:40+00:00', 'modDate': 'D:20250621134840Z', 'creator': 'Atop CHM to PDF Converter', 'subject': ''}, page_content="Z\nZafirlukast 1879\nZalcitabine 1451\nin children 2854\nZaleplon 1709\nZanamivir 1407\nin influenza 1407, 1929\nZAP-70 (zeta-associated protein 70) deficiency 1092, 1108\nZavanelli maneuver 2680\nZellweger syndrome 2383, 3023\nZenker's diverticulum 125\nZidovudine 1451, 1453\nin children 2854\nZileuton 1881\nin asthma 1

Observation
- Vector store is important to store the chunk embeddings to DB
- its a high dimension database and it uses numerical representation for its dimension


### Retriever

In [None]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 2}
)

In [None]:
retriever1 = vectorstore1.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 2}
)

In [None]:
rel_docs = retriever.get_relevant_documents("What is the diagnosis and treatment for Cataract? ")
rel_docs

  rel_docs = retriever.get_relevant_documents("What is the diagnosis and treatment for Cataract? ")


[Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'trapped': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'creationDate': 'D:20120615054440Z', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'moddate': '2025-06-21T13:48:40+00:00', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'author': '', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'creator': 'Atop CHM to PDF Converter', 'total_pages': 4114, 'subject': '', 'keywords': '', 'modDate': 'D:20250621134840Z', 'format': 'PDF 1.7', 'page': 719}, page_content='Chapter 66. Cataract\n(For developmental or congenital cataracts, see p. 2920.)\nA cataract is a congenital or degenerative opacity of the lens. The main symptom is gradual,\npainless vision blurring. Diagnosis is by ophthalmoscopy and slit-lamp examination. Treatment\nis surgical removal and placement of an intraocular l

In [None]:
rel_docs1 = retriever1.get_relevant_documents("What is the diagnosis and treatment for Cataract?")
rel_docs1

[Document(metadata={'creator': 'Atop CHM to PDF Converter', 'creationDate': 'D:20120615054440Z', 'subject': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'source': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'page': 4113, 'format': 'PDF 1.7', 'trapped': '', 'keywords': '', 'modDate': 'D:20250621134840Z', 'total_pages': 4114, 'author': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'moddate': '2025-06-21T13:48:40+00:00', 'file_path': '/content/drive/MyDrive/AIML_UT/Colab_Files/Pi5_NLP/Pi5_Proj/medical_diagnosis_manual.pdf', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition'}, page_content="Z\nZafirlukast 1879\nZalcitabine 1451\nin children 2854\nZaleplon 1709\nZanamivir 1407\nin influenza 1407, 1929\nZAP-70 (zeta-associated protein 70) deficiency 1092, 1108\nZavanelli maneuver 2680\nZellweger syndrome 2383, 3023\nZenker's diverticulum 125\nZidovudine 1451, 1453\nin children 2854\nZileuton 1881\nin asthma 1

Observation
- This is the final step in the RAG Data setup
- As the name suggests that it retrieves data from the vector store
- as k=2 we could see that the retriever brings in two relevant document output from the chunks
- As i have used two vector store for retrieving data at time when memoery error throws and we could see the retriver1 does not output the same as retriver
- There is a clear mismatch on how the vector store and retriever works on execution of actual data from the temporary memory than from executing it from google drive or local drive

### System and User Prompt Template

In [None]:
qna_system_message = """
You are a medical assistant whose work is to review the report and provide the appropriate answers from the context.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""

In [None]:

qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""


### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=256,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
print(generate_rag_response(user_input))

Based on the context, the protocol for managing sepsis in a critical care unit includes:
1. Prompt empiric antibiotic therapy: Start immediately after suspecting sepsis.
2. Antibiotic selection: Based on suspected source, clinical setting, knowledge or suspicion of causative organisms and sensitivity patterns common to that specific inpatient unit, and previous culture results.
3. Regimen for septic shock of unknown cause: Gentamicin or tobramycin 5.1 mg/kg IV once/day plus a 3rd-generation cephalosporin


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Based on the context provided, the answer would be: Appendicitis is characterized by abdominal pain, anorexia, and abdominal tenderness. It cannot be cured via medicine alone; instead, surgical removal of the appendix (appendectomy) is required for treatment.


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
print(generate_rag_response(user_input, max_tokens=512))

Llama.generate: prefix-match hit


Based on the context provided, the condition being described is alopecia areata. The effective treatments for this condition include topical, intralesional, or systemic corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). The possible causes behind this condition are thought to be an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Based on the context provided, the following treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function:

1. Ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure.
2. Surgery to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas.
3. Maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium in the first few


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Based on the context provided, here's the answer:

The person with a fractured leg should follow these steps:
1. Elevate the injured limb above heart level for the first 2 days to minimize swelling.
2. Apply ice after 48 hours to relieve pain and speed healing.
3. Immobilize the joints proximal and distal to the injury using elastic bandages, a cast, or a splint.
   - If a cast is used, keep it dry, never put anything inside it, inspect the edges and skin around it daily


###Observation
- Above execution of RAG response is without fine tuning and its just basically retrieving info from the Merck Manual

### Fine-tuning

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
print(generate_rag_response(user_input, temperature=0.001, max_tokens=1024))

  relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)


Based on the context, the protocol for managing sepsis in a critical care unit includes:
1. Monitoring systemic pressure, CVP or PAOP, pulse oximetry, ABGs, blood glucose, lactate and electrolyte levels, renal function, and possibly sublingual PCO2.
2. Measuring urine output using an indwelling catheter.
3. Administering fluid resuscitation with 0.9% saline until CVP reaches 8 mm Hg (10 cm H2O) or PAOP reaches 12 to 15 mm Hg.
4. Monitoring for hypotension and oliguria, which is not a contraindication to vigorous fluid resuscitation.
5. Using PAOP or echocardiography to identify limitations in left ventricular function and incipient pulmonary edema due to fluid overload.
6. If the patient remains hypotensive after CVP or PAOP has been raised to target levels, dopamine may be given to increase mean BP to at least 60 mm Hg. If dopamine dose exceeds 20 μg/kg/min, another vasopressor like norepinephrine may be added.
7. Giving oxygen by mask or nasal prongs and considering tracheal intubati

Observation
- Generated RAG response with temperature=0.001 and max_toekns=1024

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
print(generate_rag_response(user_input, max_tokens=1024))

Llama.generate: prefix-match hit


Based on the context, the common symptoms for appendicitis include difficulty in locating the appendix, which is usually found behind the cecum or the ileum and mesentery of the right colon. Treatment for appendicitis is open or laparoscopic appendectomy because of the risk of mortality if treatment is delayed. Antibiotics are used to treat perforated appendicitis until the patient's temperature and WBC count have normalized, or continued for a fixed course according to the surgeon's preference. If surgery is impossible, antibiotics can improve the survival rate but are not curative. Therefore, appendicitis cannot be cured via medicine alone, and surgical removal of the appendix is necessary.


Observation
- Though max_token is 1024 but the response is smalled and temperature is 0 and could have found the right retriver chunk

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
print(generate_rag_response(user_input, temperature=0.1, max_tokens=1024))

Llama.generate: prefix-match hit


Based on the context provided, here's the answer:

The person with a fractured leg should follow these steps:
1. Elevate the injured limb above heart level for the first 2 days to minimize swelling.
2. Apply ice after 48 hours to relieve pain and speed healing.
3. Immobilize the joints proximal and distal to the injury using elastic bandages, a cast, or a splint.
4. Keep the cast dry and never put an object inside it.
5. Inspect the cast's edges and skin around it every day for redness or soreness and apply lotion if needed.
6. Pad any rough edges with soft material to prevent discomfort from the cast's edges.
7. Seek medical care immediately if there is an odor coming from within the cast, a fever develops, or signs of infection appear.
8. If possible, resume active motion within the first few days or weeks for rapidly healing injuries to minimize contractures and muscle atrophy.
9. In case of severe swelling under a cast, it may be cut open from end to end medially and laterally (biv

Observation
- RAG response with temperature=0.1 and max_tokens=1024

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
print(generate_rag_response(user_input, temperature=0.01, max_tokens=512, top_k=20, top_p=0.96))

Llama.generate: prefix-match hit


Based on the context provided, the following treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function:

1. Ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure.
2. Surgery to place monitors to track and treat intracranial pressure, decompress the brain if intracranial pressure is increased, or remove intracranial hematomas.
3. Maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium in the first few days after the injury.
4. Rehabilitation for many patients subsequently.
5. Preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers.
6. Supportive care with drugs given to prevent seizures if there is significant structural injury or a Glasgow Coma Scale score of less than 10. These drugs reduce the risk of seizures during the first week after in

Observation
- RAG response with temperature of 0.01 and max_tokens of 512 and top_k=20 and top_p=0.96

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
print(generate_rag_response(user_input, max_tokens=100))

Llama.generate: prefix-match hit


Based on the context provided, the condition being described is alopecia areata. The effective treatments for this condition include topical, intralesional, or systemic corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). The possible causes behind this condition are thought to


Observation
- Even though the max_tokens is 100 and the response is less than 100 tokens and it could be the reason that temperature is 0 and retriever has the exact chunk

Observation on the total fine tuning:
- Most of the prompts used above for fine tunning has more tokens than the question the RAG section
- Used different parameter for the llm - top_k, top_p, temperature etc...
- Above proves that fine-runing provides better response based on the configurations for the RAG response method parameters

## Output Evaluation

In [None]:
groundedness_rater_system_message  = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""

In [None]:
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""


In [None]:

user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=512,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)
    try:
      # Combine user_prompt and system_message to create the prompt
      prompt = f"""[INST]{qna_system_message}\n
                  {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                  [/INST]"""

      response = llm(
              prompt=prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )

      answer =  response["choices"][0]["text"]

      # Combine user_prompt and system_message to create the prompt
      groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                  {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                  [/INST]"""

      # Combine user_prompt and system_message to create the prompt
      relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                  {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                  [/INST]"""

      response_1 = llm(
              prompt=groundedness_prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )

      response_2 = llm(
              prompt=relevance_prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )
    except Exception as e:
      print(f"An error occurred during LangChain LLM invocation: {e}")
      return f'Sorry, I encountered the following error: \n {e}'

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
ground,rel = generate_ground_relevance_response(user_input, max_tokens=1024)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 To evaluate the answer as per the metric that the answer should be derived only from the information presented in the context, here are the steps:

1. Read the question and understand its meaning: The question is asking about the protocol for managing sepsis in a critical care unit.
2. Read and comprehend the context: The context provides details about how to manage septic shock patients in an ICU, including monitoring parameters, fluid resuscitation, medication use, and oxygen administration.
3. Identify the key information from the context: The context mentions that patients with septic shock should be treated in an ICU, and several monitoring parameters (systemic pressure, CVP or PAOP, pulse oximetry, ABGs, blood glucose, lactate and electrolyte levels, renal function, sublingual PCO2, and urine output) should be monitored frequently. Fluid resuscitation with 0.9% saline is recommended until CVP reaches 8 mm Hg or PAOP reaches 12 to 15 mm Hg. If a patient remains hypotensive after 

Observation
- Retrived RAG response based on the given prompt along with Metric

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=1024, temperature=0.1)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to appendicitis and its symptoms.
2. Determine if the AI generated answer includes only the information from the context.
3. Evaluate the extent to which the metric is followed.

The AI generated answer "The common symptoms for appendicitis include abdominal pain, anorexia, and abdominal tenderness. Appendicitis cannot be cured via medicine alone; the standard treatment is surgical removal of the appendix through open or laparoscopic appendectomy." adheres to the metric as it only includes information derived from the context.

Therefore, the extent to which the metric is followed is 5 - The metric is followed completely.

Based on the evaluation criteria and the extent to which the metric is followed, I would rate the answer a 5.

 Steps to evaluate the context as per the relevance metric:
1. Identify the main aspects of the question: common symptoms for appendicitis and whether it can be cured via medic

Observation
- Got a perfect score on groundedness but 4 on a relevance score
- Both retrievel and augmentation are good

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=512, temperature=0.01, top_k=20)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to effective treatments or solutions for sudden patchy hair loss.
2. Check if the AI generated answer includes only the information from the context and not any additional or incorrect information.
3. Verify that the possible causes mentioned in the answer are also present in the context.

Explanation:
The AI generated answer adheres to the metric as it is derived solely from the information presented in the context. The answer includes all the effective treatments for sudden patchy hair loss mentioned in the context, and also mentions the possible causes of this condition which are also stated in the context.

Evaluation:
The metric is followed completely.

Rating:
Based on the evaluation criteria, I would rate the answer as a 5. The metric was followed completely as all the information in the answer was derived directly from the context and no additional or incorrect information was included.

 Steps to

Observation
- Got a perfect score of 5
- RAG response with Prompt engineering is proving once again on the ask
  - Steps
  - Evaluvation
  - Rating
  - Groundedness
  - Relevance
- But the relevance response infers the reason for the point and no rating and it could be due to ambiguity due to contexts

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350, top_k=70)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to treatments for traumatic brain injury (TBI).
2. Compare the AI generated answer with the identified information from the context.
3. Determine if the answer is derived only from the information presented in the context.

Explanation:
The AI generated answer mentions several treatments for TBI, including ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure, surgery for patients with more severe injuries, rehabilitation, preventing systemic complications due to immobilization, providing good nutrition, preventing pressure ulcers, and drugs to prevent seizures for those with significant structural injury or a Glasgow Coma Scale score of less than 10.

The context provides information on the initial treatment for TBI, which includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. It also mentions surgery for patie

Observation
- Relevance score at 4 but some details are missing from groundedness response and it is unclear and could be due to max-tokens size that determines the response
- Need to be careful in fine tuning the configurations for the llm RAG response

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=512)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 To evaluate the answer as per the metric, the following steps are needed:
1. Identify the information related to necessary precautions and treatment steps for a person who has fractured their leg from both the question and context.
2. Determine if the answer is derived only from the information presented in the context.

The answer adheres to the metric as it mentions the necessary precautions and treatment steps (sterile wound dressings, tetanus prophylaxis, and broad-spectrum antibiotics) directly from the context. Therefore, the answer is derived only from the information presented in the context.

Based on the evaluation criteria:
1 - The metric is not followed at all: 1
2 - The metric is followed only to a limited extent: 2
3 - The metric is followed to a good extent: 3
4 - The metric is followed mostly: 4
5 - The metric is followed completely: 5

Since the answer follows the metric completely, it would receive a score of 5.

 To evaluate the context as per the relevance metric, 

Observation
- Again the Groundedness reveives a score of 5
- Relevance is at r as the response predicts the relevance is directly addresses the main aspect of the question

###Additional LLM Engineering
- This is mostly done for comparision of differnt prompt engineering models and its rating system
- Used a different prompting solution
- This below additional prompt engineering also answers the groundedness and relevance to the same set of 5 questions asked in the previous section

In [None]:
mod_groundedness_relevance_rater_system_message  = """
Review the provided AI generated answer to the user's question based on the provided context from the medical manual.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Groundedness evaluaton:
5 - Perfectly Grounded: The information is explicitly stated and verifiable within the provided source text.
4 - Well Grounded: The information is strongly supported by the provided source text, although not explicitly stated word-for-word.
3 - Partially Grounded: Some claims within the information are grounded in the source, but others are not verifiable or lack sufficient support.
2 - Weakly Grounded: The information has weak support from the source, and the connection is difficult to verify.
1 - Ungrounded: The information is not supported or verifiable by the provided source text.

Groundedness:
The answer should be derived only from the information presented in the context

Instructions:
1. Answer/Claim: The specific piece of information being evaluated.
2. Groundedness Rating: The assigned groundedness score (1-5).
3. Page(s) Found: The page number(s) within the medical manual where the information is located or supported.
"""

In [None]:
mod_relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Relevance evaluaton:
5 - Highly Relevant: The information is directly related to the main topic and essential for understanding the context.
4 - Relevant: The information is related to the topic and adds valuable detail or insight.
3 - Moderately Relevant: The information is loosely related to the topic, but not essential for understanding.
2 - Minimally Relevant: The information is barely related and could potentially distract from the main topic.
1 - Irrelevant: The information is not related to the main topic.

Relevance:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. Answer/Claim: The specific piece of information being evaluated for Relevance.
2. Relevance Rating: The assigned relevance score (1-5).
3. Page(s) Found: The page number(s) within the medical manual where the information is located or supported.
"""


In [None]:

user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [None]:
import re

def generate_prompt_rating_response(user_input,k=3,max_tokens=512,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template, mod_groundedness_relevance_rater_system_message, user_message_template, mod_relevance_rater_system_message
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    answer = ""
    evaluation_output = ""

    try:
      # First, generate the RAG response
      prompt = f"""[INST]{qna_system_message}\n
                  {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                  [/INST]"""

      response = llm(
              prompt=prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )

      answer =  response["choices"][0]["text"].strip()

      # Combine user_prompt and system_message to create the prompt
      groundedness_prompt = f"""[INST]{mod_groundedness_relevance_rater_system_message}\n
                  {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                  [/INST]"""

      # Combine user_prompt and system_message to create the prompt
      relevance_prompt = f"""[INST]{mod_relevance_rater_system_message}\n
                  {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                  [/INST]"""

      response_1 = llm(
              prompt=groundedness_prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )

      response_2 = llm(
              prompt=relevance_prompt,
              max_tokens=max_tokens,
              temperature=temperature,
              top_p=top_p,
              top_k=top_k,
              stop=['INST'],
              echo=False
              )

    except Exception as e:
      print(f"An error occurred during LLM invocation or parsing: {e}")
      return f'Sorry, I encountered the following error: \n {e}', '', -1, -1 # Return error message and default ratings

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
grounded_rate, relevance_rate = generate_prompt_rating_response(user_input, max_tokens=1024)

print(grounded_rate,end="\n\n")
print(relevance_rate)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Answer/Claim: The protocol for managing sepsis in a critical care unit.

Groundedness Rating: 5 - Perfectly Grounded: The information is explicitly stated and verifiable within the provided source text.

Page(s) Found: The information can be found on pages 2446-2447 of "The Merck Manual of Diagnosis & Therapy, 19th Edition."

 Relevance Rating: 5 - The AI generated answer directly addresses the main aspects of managing sepsis in a critical care unit as outlined in the context, including monitoring parameters, fluid resuscitation, use of vasopressors, and oxygen therapy. All important aspects are contained in the answer.
Answer/Claim: Patients with septic shock should be treated in a critical care unit. Monitoring frequently includes systemic pressure, CVP or PAOP, pulse oximetry, ABGs, blood glucose, lactate and electrolyte levels, renal function, sublingual PCO2, and urine output. Fluid resuscitation with 0.9% saline is given until CVP reaches 8 mm Hg or PAOP reaches 12 to 15 mm Hg. 

Observation
- Grounded and Relevance are at 5
- Though this prompt didn't add much Answer or step by step evaluvation to the questions
- This includes the pages where to find the answers to the questions (cheatsheet for students)

In [None]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
grounded_rate, relevance_rate = generate_prompt_rating_response(user_input,max_tokens=1024, temperature=0.1)

print(grounded_rate,end="\n\n")
print(relevance_rate)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Answer/Claim: The symptoms of appendicitis include abdominal pain, loss of appetite, nausea, and vomiting, which are typically localized in the lower right quadrant of the abdomen. Appendicitis cannot be cured via medicine alone and requires surgical intervention for treatment.

Groundedness Rating: 5 - Perfectly Grounded: The information is explicitly stated within the provided source text.

Page(s) Found: Multiple pages throughout the context, including mentions of symptoms (e.g., "The pain is typically localized in the lower right quadrant of the abdomen") and treatment methods (e.g., "Treatment of acute appendicitis is open or laparoscopic appendectomy").

 ###Answer
The common symptoms for appendicitis include abdominal pain, loss of appetite, nausea, and vomiting, with the pain typically localized in the lower right quadrant of the abdomen. Appendicitis cannot be cured via medicine alone and requires surgical intervention. The standard treatment is an appendectomy, which can be 

Observation
- Grounded and Relevance are at 5
- Though this prompt didn't add much Answer/claim or step by step evaluvation to the questions
- This includes the pages where to find the answers to the questions (cheatsheet for students)

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
grounded_rate, relevance_rate  = generate_prompt_rating_response(user_input,max_tokens=512, temperature=0.01, top_k=20)
print(grounded_rate,end="\n\n")
print(relevance_rate)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Answer/Claim: The effective treatments or solutions for addressing sudden patchy hair loss, also known as alopecia areata, and the possible causes behind it.

Groundedness Rating: 5 - Perfectly Grounded: The information is explicitly stated in the provided source text that topical corticosteroids, oral antimalarials, corticosteroids, retinoids, or immunosuppressants can be used to treat alopecia areata. The cause of alopecia areata is also mentioned as an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers.

Page(s) Found: The information provided in the context is directly from the medical manual, specifically mentioning the treatments and causes of alopecia areata on pages 849-850.

 Relevance Rating: 5 - Highly Relevant

Answer/Claim: The effective treatments or solutions for addressing sudden patchy hair loss, also known as alopecia areata, and possible causes.

The answer directly addresses the main aspects of the question by pro

Observation
- Again Grounded and Relevance are at 5
- Though this prompt didn't add much Answer or step by step evaluvation to the questions
- This includes the pages where to find the answers to the questions (cheatsheet for students)
- This provies one point clear is that when the LLM response is asked for a straight forward answer, it is easy for to get high evaluvation metrics on Groundedness and Relevance scores

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
grounded_rate, relevance_rate = generate_prompt_rating_response(user_input,max_tokens=350, top_k=70)
print(grounded_rate,end="\n\n")
print(relevance_rate)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Answer/Claim: The recommended treatments for a person with a physical injury to the brain resulting in temporary or permanent impairment of brain function.

Groundedness Rating: 5 - Perfectly Grounded: The information is explicitly stated and verifiable within the provided source text.

Page(s) Found: Throughout the context, specifically sections discussing initial treatment and subsequent care for traumatic brain injury.

 Relevance Rating: 5

Answer/Claim: The recommended treatments for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, include initial treatment such as ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, as well as potential surgery, and subsequent rehabilitation.

Explanation: The answer directly addresses the main aspects of the question by summarizing the recommended treatments for traumatic brain injury (TBI) from the context. It includes the i

Observation
- Again Grounded and Relevance are at 5
- Though this prompt didn't add much Answer or step by step evaluvation to the questions
- This provies one point clear is that when the LLM response is asked for a straight forward answer, it is easy for to get high evaluvation metrics on Groundedness and Relevance scores

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
grounded_rate, relevance_rate = generate_prompt_rating_response(user_input,max_tokens=512)

print(grounded_rate,end="\n\n")
print(relevance_rate)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Answer/Claim: The necessary precautions and treatment steps for a person with a suspected open leg fracture.

Groundedness Rating: 5 - Perfectly Grounded: The information is explicitly stated in the provided context.

Page(s) Found: The context itself does not provide specific page numbers, but the information can be found in Chapter 323. Fractures, Dislocations & Sprains of The Merck Manual of Diagnosis & Therapy, 19th Edition.

 Answer/Claim: The necessary precautions and treatment steps for a suspected open leg fracture include sterile wound dressings, tetanus prophylaxis, and broad-spectrum antibiotics (eg, a 2nd-generation cephalosporin plus an aminoglycoside).

Relevance Rating: 5 - Highly Relevant: The answer directly addresses the main aspects of the question by providing specific treatment steps for a suspected open leg fracture.

Page(s) Found: The information is from the context provided, specifically from the text mentioning "sterile wound dressings, tetanus prophylaxis, a

Observation
- Again Grounded and Relevance are at 5
- Though this prompt didn't add much Answer or step by step evaluvation to the questions
- This provies one point clear is that when the LLM response is asked for a straight forward answer, it is easy for to get high evaluvation metrics on Groundedness and Relevance scores

####Using Gemini LLM
- This is a quick experiment with Gemini LLM to use it as an API to generate RAG responses

In [None]:
!pip install -q langchain-google-genai
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━[0m [32m1.0/1.4 MB[0m [31m67.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━[0m [32m1.0/1.4 MB[0m [31m67.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you

In [None]:
import google.generativeai as genai
GOOGLE_API_KEY = "XXXX"
genai.configure(api_key=GOOGLE_API_KEY)
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
embeddings_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
llm_chat_gemini = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.2)

In [None]:
model_0 = genai.GenerativeModel('gemini-1.5-flash')

Observation
- installing required imports for Gemini AI
- Setting of GOOGLE API KEY
 - Though it is hardcoded here but in real-time scenario we should use Colab +Add new secret in the settings

In [None]:
def generate_GEMINI_1_5_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)


    prompt = qna_system_message + '\n' + user_message



    # Generate the response
    try:
        response_chat = model_0.generate_content(
            prompt,
            generation_config={
                "temperature": temperature,
                "top_p": top_p,
                "top_k": top_k,
                # 'max_output_tokens' is the parameter for the direct SDK
                "max_output_tokens": 1024  # Example: You can set this explicitly
            }
        )

        #print(response_chat)

        # The response is an AIMessage object, access its content attribute
        generated_text = response_chat.text
        return generated_text

    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit? For the answer provided please provide a groundedness and relevance rating!"
response = generate_GEMINI_1_5_rag_response(user_input, max_tokens=1024)
print(response)

I don't know



In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
response = generate_GEMINI_1_5_rag_response(user_input, max_tokens=1024)
print(response)

Patients with septic shock should be treated in an ICU and have the following monitored frequently: systemic pressure; CVP, PAOP, or both; pulse oximetry; ABGs; blood glucose, lactate, and electrolyte levels; renal function; and possibly sublingual PCO2.  Urine output should be measured, usually with an indwelling catheter. Fluid resuscitation with 0.9% saline should be given until CVP reaches 8 mm Hg (10 cm H2O) or PAOP reaches 12 to 15 mm Hg.  If a patient remains hypotensive after CVP or PAOP has been raised to target levels, dopamine may be given to increase mean BP to at least 60 mm Hg. If the dopamine dose exceeds 20 μg/kg/min, norepinephrine may be added. O2 is given by mask or nasal prongs, and tracheal intubation and mechanical ventilation may be needed for respiratory failure.



Observation
- Above two console has the same question except the first one asking for extra relevance
- Answer to the 1 item is simple **I don't know**
- Second question is exact the same question asked in the RAG Question without any modification
 -  And we could see a direct response is generated with short, clean and concise answer
 - Need further development and fine tuning, but this is just experimenting and did not concentrate much to develope this a full fledged app

In [None]:
import google.generativeai as genai
# Ensure your API key is configured here as per the above steps
# genai.configure(api_key="...")

try:
    model = genai.GenerativeModel('gemini-1.5-flash')
    response = model.generate_content("Hello, Gemini!")
    print(response.text)
    print("API key is working!")
except Exception as e:
    print(f"API key test failed: {e}")

Hello there!  How can I help you today?

API key is working!


Observation
- This a sample on contecting to Gemini API to generate content

## Actionable Insights, Business Recommendations and Techinical Challenges

Insights and Business Recommendations
- Vector Store creation time increases with increase in number of pages in the document
- Configuration Parameters are so important for fine-tuning and prompt engineering
- max_tokens depends on query complexity and needs higher value for details response
- n_ctx is a critical parameter for llm to define exactly on number of tokens that llm can consider as context at any given time. This includes input token and any text generated and fine tuning this parameter to suit the RAG responses
-  n_gpu_layers=23 this parameter again shifts most of the computing tasks to GPU and we will have fine tune this parameter to suit your GPU core and memory. Multiple fine tuning has been made from -1 to 43, -1 for CPU compute. And finally achieved GPU layer of 23 which did not make my session go crashing.
- chunk_overlap another parameter which will overlap and provide context from one document to antoher and chunk_size is set 512 and this is a standard recommendation for Chroma/Pinecode. Rage is from 256-1000
  - smaller chunks used for specific information
  - larger the chunk value is for summerizing concepts and understanding relation between the information (512-1000)
- Continuously adjust RAG parameters based on specific use cases for optimal performance.
- Prioritize groundedness and relevance in evaluations to ensure reliable and contextually accurate outputs.


Technical challenges
-  This section is important for this application as the challenges faced during developing RAG responses took out of the box thinking to resolve
- Got a $9.99 extra compute units for training and evaluvating the RAG Responses
- Session was crashing and had to restart from the start everytime
- Out of memory was continously thrown during vector_store implementation and also while running a RAG question and answer
 - This is made to buy additional compute units
 - This made me learn how to save the Chroma DB Vectors to Google Drive and retrieve it for later to use it
 - Lots of other learning when switching to Gemini API, but the only concern in using it is - cannot be fine tunned with LLM parameters
 - n_ctx and n_batch should be rightly choosen while development as both have their own stype in execution
  - n_ctx takens the total tokens (input+output) and our RAG prompt question itself might take more tokens and the output expected may not match the entire criteria of the response format in the RAG response
  - n_batch - have a higher value can potentially speed up the prompt processing phase and utilizing parallel processing capabilties for the very large batch sizes
   - But with that comes loads of memory consumption
  - These two parameters made the LLM to be fine tuned and made this app to run again without any memory errors or issues


<font size=6 color='blue'>Power Ahead</font>
___