# RAG (Retrieval Augmented Generation) with PHI-3.5 

This notebook is a simple implementation of the RAG system with PHI-3.5. The RAG system is a combination of a retriever and a generator. The retriever is used to find relevant documents from a large corpus, and the generator is used to generate the answer based on the retrieved documents.

**Models used:**
- Embedding model: `all-mpnet-base-v2` (768 dimensional embeddings)
- Generator model: `Phi-3.5-mini-instruct` (3.8B parameter model)

In [1]:
!pip install faiss-cpu pymupdf sentence_transformers gdown -q

!mkdir -p vec_db # stores vector embeddings
!mkdir -p knowledge # stores pdf files

In [2]:
import sys
sys.path.append('/kaggle/input/rag-utils/')

import os
import json
import torch
import faiss
import pymupdf
import numpy as np
from tqdm import tqdm

from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load embedding model
embedder = SentenceTransformer('all-mpnet-base-v2')

# Load the llm and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-mini-instruct",trust_remote_code=True)

  from tqdm.autonotebook import tqdm, trange
2024-09-18 18:36:02.199749: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-18 18:36:02.199858: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-18 18:36:02.322206: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.98k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/3.45k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

In [3]:
# Intialize empty Flat index
index = faiss.IndexFlatL2(embedder.get_sentence_embedding_dimension())

# Intialize metadata
metadata = []

print('Number of chunks: ', index.ntotal) # 0

Number of chunks:  0


## Helper class that holds methods we'll use later

In [4]:
class Utils:
    def __init__(self, embedding_model: SentenceTransformer=None, llm_model: AutoModelForCausalLM=None, llm_tokenizer: AutoTokenizer=None, index=None, metadata=None, chunk_size=512):
        self.embedding_model = embedding_model
        self.llm_model = llm_model
        self.llm_tokenizer = llm_tokenizer
        self.index = index
        self.metadata = metadata
        self.chunk_size = chunk_size

    def extract_text_from_pdf(self, pdf_path):
        """
        Extract text from PDF file. Returns a list of tuples (page_number, text).
        """
        text = []
        pdf_document = pymupdf.open(pdf_path)
        for page_num in range(len(pdf_document)):
            page = pdf_document.load_page(page_num)
            text.append((page_num, str(page.get_text()).replace("\n", " ")))
        return text

    def chunk_text(self, text: list[tuple[int, str]]):
        chunks = []
        for page_num, page_text in text:
            page_chunks = [
                (page_num, page_text[i:i+self.chunk_size])
                for i in range(0, len(page_text), self.chunk_size)
            ]
            chunks.extend(page_chunks)
        return chunks

    def add_chunks_to_faiss(self, chunks, filename, db_loc="/kaggle/working/vec_db/"):
        for chunk_num, (page_number, chunk) in enumerate(tqdm(chunks, desc="Adding chunks to FAISS")):
            embeddings = self.embedding_model.encode(chunk, show_progress_bar=False)
            self.index.add(np.array([embeddings]))
            self.metadata.append({
                "filename": filename,
                "page_number": page_number,
                "chunk_num": chunk_num,
                "chunk": chunk
            })
        faiss.write_index(self.index, db_loc + "vector_database.index")
        with open(db_loc + "metadata.json", "w") as file:
            json.dump(self.metadata, file)

    def process_file(self, file_path):
        """
        Process the file and add chunks to FAISS index
        """
        if file_path.endswith('.pdf'):
            text = self.extract_text_from_pdf(file_path)
        else:
            print(f"Unsupported file format, with extension: {os.path.splitext(file_path)[1]}")
            return 0

        chunks = self.chunk_text(text)
        self.add_chunks_to_faiss(chunks, filename=os.path.basename(file_path))
        return len(chunks)
    
    def answer_question(self, prompt_template="", query="", max_tokens=512, temp=None, k=5):
        # embed question
        question_embedding = self.embedding_model.encode(query, show_progress_bar=False)

        # search for similar chunks
        D, I = self.index.search(np.array([question_embedding]), k)

        # get the chunks text
        chunks = [self.metadata[i] for i in I[0]]
        
        # Augment the input text with the chunks
        context=""
        for i, chunk in enumerate(chunks):
            context += f"{i+1}. {chunk['chunk']}\n"
            
        prompt = prompt_template.format(context=context, query=query)
        
        messages = [
            {"role": "system", "content": "Be helpful, straight to the point\nDon't put irrelevant information\nAnswer using the context\nMake sure your answers are as explanatory as possible\nDon't hallucinate."},
            {"role": "user", "content": prompt},
        ]
        
        input_ids = self.llm_tokenizer.apply_chat_template(conversation=messages,
                                      tokenize=True,
                                      add_generation_prompt=True,
                                      return_tensors="pt")
        
        # Generate answer
        output_ids = self.llm_model.generate(input_ids, max_new_tokens=max_tokens, temperature=temp, do_sample=True)[0]
        answer = self.llm_tokenizer.decode(output_ids)

        return answer.split("<|assistant|> ")[1].replace("<|end|>", ''), chunks

## Download some data about human nutrition
you can check the books by copying a link from below and pasting it in the browser.

In [5]:
!gdown --fuzzy 'https://drive.google.com/file/d/1sI4LVzS_cxhyUqkVfJstxT_QyymdWzmW/view?usp=sharing' -O knowledge/human-nutrition-1.pdf
!gdown --fuzzy 'https://drive.google.com/file/d/1WyF1j5l1y-35SS3zpflXXi_y02ffdXbT/view?usp=sharing' -O knowledge/human-nutrition-2.pdf
!wget --no-check-certificate 'http://solr.bccampus.ca:8001/bcc/file/17b5912c-ff77-466a-acb7-c1af0cda5bf8/1/Human-Nutrition-1611795434%281%29.pdf' -O knowledge/human-nutrition-3.pdf

  pid, fd = os.forkpty()


Downloading...
From: https://drive.google.com/uc?id=1sI4LVzS_cxhyUqkVfJstxT_QyymdWzmW
To: /kaggle/working/knowledge/human-nutrition-1.pdf
100%|██████████████████████████████████████| 26.9M/26.9M [00:00<00:00, 31.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1WyF1j5l1y-35SS3zpflXXi_y02ffdXbT
To: /kaggle/working/knowledge/human-nutrition-2.pdf
100%|██████████████████████████████████████| 5.01M/5.01M [00:00<00:00, 16.9MB/s]
--2024-09-18 18:37:28--  http://solr.bccampus.ca:8001/bcc/file/17b5912c-ff77-466a-acb7-c1af0cda5bf8/1/Human-Nutrition-1611795434(1).pdf
Resolving solr.bccampus.ca (solr.bccampus.ca)... 142.58.234.47
Connecting to solr.bccampus.ca (solr.bccampus.ca)|142.58.234.47|:8001... connected.
HTTP request sent, awaiting response... 200 
Length: 26355405 (25M) [application/pdf]
Saving to: 'knowledge/human-nutrition-3.pdf'


2024-09-18 18:38:53 (306 KB/s) - 'knowledge/human-nutrition-3.pdf' saved [26355405/26355405]



In [6]:
utils = Utils(
    embedder,
    model,
    tokenizer,
    index,
    metadata,
    chunk_size=512
)

In [7]:
# Chunk the pdfs and save embeddings into index
knowledge_dir = "/kaggle/working/knowledge/"
for file in os.listdir(knowledge_dir):
    utils.process_file(knowledge_dir + file)
    
print('Number of chunks: ', index.ntotal)

Adding chunks to FAISS: 100%|██████████| 3088/3088 [00:46<00:00, 66.32it/s]
Adding chunks to FAISS: 100%|██████████| 3311/3311 [00:50<00:00, 65.19it/s]
Adding chunks to FAISS: 100%|██████████| 3059/3059 [00:48<00:00, 63.43it/s]


Number of chunks:  9458


## Test using few-shot prompting

In [8]:
# List of 20 nutrition-related questions
questions = [
    "What are the health benefits of consuming leafy green vegetables?",
    "How much protein should I consume per day? my weight is 80KGs and 21 Years old.",
    "What is the recommended daily intake of vitamin C for adults?",
    "What are the primary sources of omega-3 fatty acids in a plant-based diet?",
    "Which foods are high in iron and suitable for people with anemia?",
    "What are the side effects of excessive sugar consumption?",
    "Which nutrients are essential for muscle recovery after exercise?",
    "How can a vegan diet provide all essential amino acids?",
    "What foods should be avoided to reduce the risk of cardiovascular disease?",
    "Which diet is best for managing Type 2 diabetes?",
    "What are the health benefits of following a Mediterranean diet?",
    "What are the symptoms of vitamin D deficiency?",
    "How does magnesium support muscle function?",
    "What are the signs of a vitamin B12 deficiency, and how can it be treated?",
    "How much potassium should be consumed daily to maintain normal blood pressure?",
    "What are the benefits of selenium, and which foods are rich in it?",
    "How does excessive sodium intake affect the body?",
    "What is the best diet for people with gluten intolerance?",
    "What is the best way to manage high cholesterol through diet?",
    "Which foods can help alleviate symptoms of acid reflux?"
]

# Randomly select a query from the list
random_query = np.random.choice(questions)

In [9]:
prompt_template ="""Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Use the following examples as reference for the ideal answer style.
Example 1:
Query: What are the fat-soluble vitamins?
Answer: **Fat-soluble vitamins:** Include Vitamin A, Vitamin D, Vitamin E, and Vitamin K. These vitamins are absorbed along with fats in the diet and can be stored in the body's fatty tissue and liver for later use. Vitamin A is important for vision, immune function, and skin health. Vitamin D plays a critical role in calcium absorption and bone health. Vitamin E acts as an antioxidant, protecting cells from damage. Vitamin K is essential for blood clotting and bone metabolism.
Example 2:
Query: What are the causes of type 2 diabetes?
Answer: **Type 2 diabetes** is often associated with overnutrition, particularly the overconsumption of calories leading to obesity. **Factors include:** Diet high in refined sugars and saturated fats, which can lead to insulin resistance, a condition where the body's cells do not respond effectively to insulin. Over time, the pancreas cannot produce enough insulin to manage blood sugar levels, resulting in type 2 diabetes. Additionally, excessive caloric intake without sufficient physical activity exacerbates the risk by promoting weight gain and fat accumulation, particularly around the abdomen, further contributing to insulin resistance.
Example 3:
Query: What is the importance of hydration for physical performance?
Answer: **Hydration** is crucial for physical performance because water plays key roles in maintaining blood volume, regulating body temperature, and ensuring the transport of nutrients and oxygen to cells. Adequate hydration is essential for optimal muscle function, endurance, and recovery. Dehydration can lead to decreased performance, fatigue, and increased risk of heat-related illnesses, such as heat stroke. Drinking sufficient water before, during, and after exercise helps ensure peak physical performance and recovery.
Now use the following context items to answer this one user query only:
{context}
Relevant passages: <extract relevant passages from the context here>
Main User Query: {query}
Answer:\n"""

response, chunks = utils.answer_question(
        prompt_template=prompt_template,
        query=random_query,
        max_tokens=512,
        temp=0.1
)

print("Query: ", random_query)
print("\nResponse: ", response)
print("\nContext: ", chunks)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


Query:  What are the signs of a vitamin B12 deficiency, and how can it be treated?

Response:  **Signs of Vitamin B12 Deficiency:**
- Megaloblastic bone marrow changes
- Incipient anemia
- Myelin damage
- Early bone marrow changes
- Abnormalities of the deoxyuridine monophosphate (dUMP) suppression test

**Treatment for Vitamin B12 Deficiency:**
- Large oral doses of vitamin B12
- Sublingual administration (placing the vitamin under the tongue)
- Injection of vitamin B12 for patients who do not respond to oral or sublingual treatment

Additionally, older individuals with impaired absorption due to conditions like atrophic gastritis might benefit from food enrichment with vitamin B12.

The context indicates that vitamin B12 deficiency can lead to irreversible nerve damage, especially in older people. Treatment options include oral or sublingual administration of vitamin B12, and injections for those who do not respond to these methods. Food enrichment with vitamin B12 is also suggested 