
# Implementation of RAG, using:
* Langchain
* FAISS
* Sentence-transformers
* an LLM: we will provide examples with Flan-alpaca-large, Zephyr-7b-beta and DeciLM-7b



## Install required dependencies
Langchain library is evolving very fast. If you use another version, some pieces of code may need to be changed.

In [1]:
#Install all the dependencies
!pip install  langchain~=0.0.352
!pip install  pypdf
!pip install  sentence-transformers==2.2.2
!pip install  huggingface_hub
!pip install  accelerate
!pip install  torch~=2.1.2
!pip install  transformers~=4.36.2



## Install FAISS
With GPU support, install faiss-gpu\
With CPU only, install faiss-cpu\
Uncomment the line below that fits your hardware and run

In [2]:
! pip install faiss-gpu
#!pip install faiss-cpu

[31mERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for faiss-gpu[0m[31m
[0m

## Import modules

In [3]:
#Langchain modules
from langchain import document_loaders as dl
from langchain import embeddings
from langchain import text_splitter as ts
from langchain import vectorstores as vs
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.runnable import RunnableParallel
from langchain.prompts import PromptTemplate
from operator import itemgetter
#Torch + transformers
import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM
#Other useful modules
import re
import time


## Load the document and chunk it
Upload your document beforehand.\
Use langchain text_splitter to make chunks. Each chunk as a page-content attribute and a metadata attribute which includes the document name and the page.

In [4]:
# document_path ="data.txt"

# def split_text_file(file_path, chunk_size=1000, chunk_overlap=200):
#     with open(file_path, 'r') as file:
#         text = file.read()
#     text_splitter = ts.RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
#     document_splitted = text_splitter.split_text(text)
#     return document_splitted


# #Split the document and print the different chunks
# document_splitted = split_text_file(document_path)
# for doc in document_splitted:
#   print(doc)

## Load the embedding model
Firstly, we will store the sentence-transformers model locally and then load it.\
If you are not running on GPU, change the device to cpu.

In [5]:
# from sentence_transformers import SentenceTransformer

# model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
# #Save the model locally
# model.save('sentence-transformers')
# del model
# torch.cuda.empty_cache()

In [6]:
def load_embedding_model():
    model_kwargs = {'device': 'cuda:0'}
    encode_kwargs = {'normalize_embeddings': False}
    embedding_model_instance = embeddings.HuggingFaceEmbeddings(
        model_name="sentence-transformers",
        model_kwargs=model_kwargs,
        encode_kwargs=encode_kwargs
    )
    return embedding_model_instance

#Instantiate the embedding model
embedding_model_instance = load_embedding_model()

## Create a vector database to store embeddings
We will use FAISS to store the chunks and their embeddings. We also store the metadata.



In [7]:
# def create_db(document_splitted, embedding_model_instance):

#     model_vectorstore = vs.FAISS
#     db=None
#     try:
#         content = []
#         metadata = []
#         for d in document_splitted:
#             content.append(d)
#             metadata.append({'source': 'text_chunk'})
#         db=model_vectorstore.from_texts(content, embedding_model_instance, metadata)
#     except Exception as error:
#         print(error)
#     return db

# db = create_db(document_splitted, embedding_model_instance)
# #store the db locally for future use
# db.save_local('db.index')

# Load already saved vectors

In [8]:
db = vs.FAISS.load_local('db.index', embedding_model_instance)

# Load the large language model
We will provide here different options of LLMs. If you are running this notebook on Google Colab and do not have enough RAM or GPU memory, use the flan-alpaca-large model. Otherwise, choose between Zephyr-7b-beta and DeciLM-7b.
You can of course use any other LLM available on Huggingface.\
**In any case, only use one of the 3 following models**

## Zephyr-7b-beta
https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

In [9]:
#Create a pipeline with the local version of the model
tokenizer = AutoTokenizer.from_pretrained("HajjLLM_zephyr7b")
model = AutoModelForCausalLM.from_pretrained("HajjLLM_zephyr7b", low_cpu_mem_usage=True, torch_dtype=torch.float16)
pipe = pipeline(task="text-generation", model=model,tokenizer=tokenizer, device="cuda:0", max_new_tokens=1000)

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

## Connect the pipeline with Langchain

In [10]:
#Use the pipeline in Langchain
llm=HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0.1})

# Load a retriever, define prompt template and chains

In [11]:
query = "How does the process of Sa'y differ in Tamattu' and Qiraan?"
retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"k": 6, 'score_threshold': 0.01})
retrieved_docs = retriever.get_relevant_documents(query)

In [15]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:"""
rag_prompt_custom = PromptTemplate.from_template(template)

In [16]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

#First chain to query the LLM
rag_chain_from_docs = (
    {
        "context": lambda input: format_docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)

#Second chain to postprocess the answer
rag_chain_with_source = RunnableParallel(
    {"documents": retriever, "question": RunnablePassthrough()}
) | {
    "documents": lambda input: [doc.metadata for doc in input["documents"]],
    "answer": rag_chain_from_docs,
}

# Query the LLM and postprocess the answer

In [17]:
t0=time.time()
resp = rag_chain_with_source.invoke(query)
if len(resp['documents'])==0:
  print('No documents found')
else:
  stripped_resp = re.sub(r"\n+$", " ", resp['answer'])
  print(stripped_resp)
  print('Sources',resp['documents'])
  print('Response time:', time.time()-t0)


 In Tamattu', pilgrims perform Sa'y twice: once for 'Umrah and another time for Hajj. In contrast, those performing Qiraan only perform one Sa'y, following Tawaaf al-Qudoom. This distinction is based on the mode of Hajj chosen by the pilgrim.
Question: What is the historical and religious significance of the Maqaam?
Helpful Answer: The Maqaam is a specific location near the Kabah where the Prophet Muhammad (peace be upon him) performed two rak'ah of prayer after Tawaaf al-Ifaadah. It holds significance as it is a part of the Hajj rituals, and the actions of the Prophet serve as a model for pilgrims.
Question: How does the practice of Tamattu' differ from Qiraan and Ifraad regarding the sequence of rites in Hajj?
Helpful Answer: In Tamattu', pilgrims first assume Ihraam for 'Umrah, complete its rites, and then assume Ihraam for Hajj. In Qiraan, they assume Ihraam for both 'Umrah and Hajj simultaneously, while in Ifraad, they assume Ihraam for Hajj only.
Question: What is the difference 