**Implementing a chatbot using Langchain, FAISS, Llama, as per a Medium article**

Load PDF -> Split text into chunks -> Create text embeddings for chunked data and store in a vector database -> Use a retriever to return context -> Define llm pipeline -> Use a chain to integrate the llm and retriever -> Give prompts, answer queries -> Enable chat history

Installing dependencies

In [None]:
!pip install accelerate transformers tokenizers
!pip install bitsandbytes einops
!pip install xformers
!pip install langchain
!pip install faiss-gpu
!pip install sentence_transformers
!pip install pypdf
!pip install langchain-community # Install the langchain-community package
!pip install -U sentence-transformers
!pip install -U langchain-huggingface
!pip install faiss-cpu  # Or !pip install faiss-gpu if you have a GPU set up

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Using cached nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86

Loading text data into readable format, using Langchain and PyPDF


In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/drive/MyDrive/Chatbot/IslamicLaws.pdf")
start_page = 22  # start of islamic laws
end_page = 531  # end of useful information

all_documents = loader.load()

# Keep only the relevant pages
docs = all_documents[start_page:end_page]
print(len(docs))
print(docs[0].page_content[:100])

509
In the name of Allah, the All-Beneficent, the Ever-Merciful. All praise is for Allah, Lord of the wo


Spliting text



In [None]:
length = 0
for i in range(len(docs)):
    length  += len(docs[i].page_content)

print(f"Total characters:{length}")

Total characters:1388475


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
custom_separators = ["\n", ". "]  # Prioritize paragraph, line, sentence, then word
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=500,
    add_start_index=True,
    separators=custom_separators
)
all_splits = text_splitter.split_documents(docs)
print(f"Number of chunks: {len(all_splits)}")
print(f"Characters in one chunk: {len(all_splits[0].page_content)}")

print(all_splits[0].metadata)
print(all_splits[0].page_content)

Number of chunks: 2578
Characters in one chunk: 951
{'producer': '3-Heights™ PDF Merge Split Shell 6.12.1.11 (http://www.pdf-tools.com)', 'creator': 'PyPDF', 'creationdate': '', 'moddate': '2023-11-21T11:28:37+00:00', 'source': '/content/drive/MyDrive/Chatbot/IslamicLaws.pdf', 'total_pages': 533, 'page': 22, 'page_label': '23', 'start_index': 0}
In the name of Allah, the All-Beneficent, the Ever-Merciful. All praise is for Allah, Lord of the worlds. May there be blessings and peace upon the most noble of the Prophets and Messengers, Muḥammad, and his good and pure progeny. May there be a perpetual curse upon all of their enemies from now until the resurrection on the Day of Retribution. • Ruling 1. A Muslim’s belief in the fundamentals of religion (uṣūl al‑dīn) must be based on personal insight [i.e. grounded in reason], and he cannot follow anyone in the fundamentals of religion; i.e. he cannot accept the word of someone who knows about the fundamentals of religion simply because that

Indexing the chunks using embeddings from huggingface and storing them into a vector space



In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}  # This will try to use CUDA if available for embeddings

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)

Applying Retrieval Augmentation using LLAMA 2 7B


Setting up retreiver and model for llm


In [None]:
from torch import cuda, bfloat16
import transformers

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 1})

# Invoke the retriever with a query
query = "Explain the concept of taqleed"
relevant_documents = retriever.invoke(query)

# Print the returned documents
for doc in relevant_documents:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")

# Initializing model pipeline

model_id = 'meta-llama/Llama-2-7b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, you need an access token
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)

# enable evaluation mode to allow model inference
model.eval()

print(f"Model loaded on {device}")

Content: ‘Taqlīd’ simply means an undertaking to follow the fatwa of a particular mujtahid; it does not mean acting according to his instructions.8 Ruling 9. It is necessary for a mukallaf to learn those rulings that he considers he probably needs to learn in order to avoid sinning. ‘Sinning’ means not performing obligatory acts or performing unlawful acts. Ruling 10. If a mukallaf comes across a matter for which he does not know the Islamic ruling, it is necessary for him to act with caution or to follow a mujtahid according to the aforementioned conditions. However, in the event that a person does not have access to the fatwa of the most learned mujtahid, it is permitted (jāʾiz) for him to follow the next most learned mujtahid. Ruling 11. If someone relates a mujtahid’s fatwa to a second person, in the event that the mujtahid’s fatwa changes, it is not necessary for him to inform that second person that the fatwa of the mujtahid has changed
Metadata: {'producer': '3-Heights™ PDF Merg



config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


Tokenizer for LLMs, also defining stopping criteria

In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)
stop_list = ['\nHuman:', '\n```\n']

stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]

import torch

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
from transformers import StoppingCriteria, StoppingCriteriaList

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Calling the transformers pipeline for text generation using llm

In [None]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model rambles during chat
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

Device set to use cuda:0


Using huggingface pipeline in langchain

In [None]:
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=generate_text)

  llm = HuggingFacePipeline(pipeline=generate_text)


Defining a prompt template so that the LLMs knows exactly what it needs to do

In [None]:
from langchain_core.prompts import PromptTemplate

prompt_template = """You are a helpful chatbot that answers questions based on the provided context.
Your goal is to provide a concise and direct answer to the user's question, citing the specific information from the context that supports your answer.
If the context does not contain the answer, or if the question is unrelated, truthfully state "Based on the provided information, I cannot answer this question." Do not invent or infer information.

Context: {context}

Question: {question}
Answer: """


PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

Building a chain for RAG based text generation

In [None]:
from langchain.chains import ConversationalRetrievalChain
chain = ConversationalRetrievalChain.from_llm(
    llm,
    retriever,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": PROMPT}
)

Testing prompt

In [None]:
chat_history = []

query = "What are the etiquettes?"
result = chain({"question": query, "chat_history": chat_history})

print(result['answer'])

You are a helpful chatbot that answers questions based on the provided context.
Your goal is to provide a concise and direct answer to the user's question, citing the specific information from the context that supports your answer.
If the context does not contain the answer, or if the question is unrelated, truthfully state "Based on the provided information, I cannot answer this question." Do not invent or infer information.

Context: . With regard to eating and drinking, the following things are recommended (mustaḥabb) for one to do: 1. to wash both hands before eating; 2. to wash both hands after eating and dry them with a piece of cloth; 3. the host should start eating before everyone else and stop eating after everyone else. Before eating, the host should wash his hands first, then the person seated to his right [should wash his], and so on until the turn comes to the person seated to the left of the host. After eating, the person seated to the left of the host should wash his han

In [None]:
query = "What is meant by taqleed?"
result = chain({"question": query, "chat_history": chat_history})

print(result['answer'])

You are a helpful chatbot that answers questions based on the provided context.
Your goal is to provide a concise and direct answer to the user's question, citing the specific information from the context that supports your answer.
If the context does not contain the answer, or if the question is unrelated, truthfully state "Based on the provided information, I cannot answer this question." Do not invent or infer information.

Context: 26 Taqiyyah refers to dissimulation or concealment of one’s beliefs in the face of danger.

Question: What is meant by taqleed?
Answer:  Taqleed (تقليد) is the act of following a particular scholarly opinion or interpretation without fully understanding its basis or reasoning. It is often used as a means of seeking protection and security in times of uncertainty or danger. In the context of Islamic law, it refers to the practice of following a particular madhhab (school of thought) or scholarly opinion without necessarily comprehending the underlying rea

In [None]:
query = "What is the criteria for taqleed?"
result = chain({"question": query, "chat_history": chat_history})

print(result['answer'])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


You are a helpful chatbot that answers questions based on the provided context.
Your goal is to provide a concise and direct answer to the user's question, citing the specific information from the context that supports your answer.
If the context does not contain the answer, or if the question is unrelated, truthfully state "Based on the provided information, I cannot answer this question." Do not invent or infer information.

Context: . having wuḍūʾ, ghusl, or tayammum ṭāhir pure taḥnīṭ camphorating tajwīd the discipline of reciting the Qur’an correctly takbīr proclamation of Allah’s greatness by saying ‘allāhu akbar’ takbīrat al‑iḥrām saying ‘allāhu akbar’ at the beginning of the prayer takfīn shrouding taklīf responsibility al‑ṭalāq al‑bāʾin irrevocable divorce al-ṭalāq al-rijʿī revocable divorce talqīn inculcation of principle beliefs to a dying person or a corpse tamām complete form of the prayer taʿqībāt supplications after prayers taqiyyah dissimulation or concealment of one’s b

Cleaning up, adding widget


In [None]:
from IPython.display import display
import ipywidgets as widgets

In [None]:
chat_history = []

def on_submit(_):
    query = input_box.value
    input_box.value = ""

    if query.lower() == 'exit':
        print("Thank you for using our chatbot!")
        return

    result = chain({"question": query, "chat_history": chat_history})

    display(widgets.HTML(f'<b>User:</b> {query}'))
    display(widgets.HTML(f'<b><font color="blue">Chatbot:</font></b> {result["answer"]}'))

print("Welcome to the Ayatolah chatbot! Type 'exit' to stop.")

input_box = widgets.Text(placeholder='Please enter your question:')
input_box.on_submit(on_submit)

display(input_box)

Welcome to the Ayatolah chatbot! Type 'exit' to stop.


Text(value='', placeholder='Please enter your question:')

HTML(value='<b>User:</b> what are the laws for following a jurist')

HTML(value='<b><font color="blue">Chatbot:</font></b> You are a helpful chatbot that answers questions based o…

add chat history using create_history_aware_retriever, make better prompt template for the llm so that it doesnt include all the information in the answer.

Adding chat history

In [None]:
import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
# from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
# from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)


### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


### Statefully manage chat history ###
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [None]:
conversational_rag_chain.invoke(
    {"input": "What is taqleed?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'System: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don\'t know the answer, just say that you don\'t know. ‘Taqlīd’ simply means an undertaking to follow the fatwa of a particular mujtahid; it does not mean acting according to his instructions.8 Ruling 9. It is necessary for a mukallaf to learn those rulings that he considers he probably needs to learn in order to avoid sinning. ‘Sinning’ means not performing obligatory acts or performing unlawful acts. Ruling 10. If a mukallaf comes across a matter for which he does not know the Islamic ruling, it is necessary for him to act with caution or to follow a mujtahid according to the aforementioned conditions. However, in the event that a person does not have access to the fatwa of the most learned mujtahid, it is permitted (jāʾiz) for him to follow the next most learned mujtahid. Ruling 11. If someone relates a mujtahid’s fatwa to a second person, in the e

In [None]:
conversational_rag_chain.invoke(
    {"input": "Do you remember what i asked you?If you remember, then answer the question I asked you"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'System: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don\'t know the answer, just say that you don\'t know. ‘Taqlīd’ simply means an undertaking to follow the fatwa of a particular mujtahid; it does not mean acting according to his instructions.8 Ruling 9. It is necessary for a mukallaf to learn those rulings that he considers he probably needs to learn in order to avoid sinning. ‘Sinning’ means not performing obligatory acts or performing unlawful acts. Ruling 10. If a mukallaf comes across a matter for which he does not know the Islamic ruling, it is necessary for him to act with caution or to follow a mujtahid according to the aforementioned conditions. However, in the event that a person does not have access to the fatwa of the most learned mujtahid, it is permitted (jāʾiz) for him to follow the next most learned mujtahid. Ruling 11. If someone relates a mujtahid’s fatwa to a second person, in the e

Other Approach for chat history


In [None]:
from IPython.display import display
import ipywidgets as widgets
chain = ConversationalRetrievalChain.from_llm(llm, retriever, return_source_documents=True)


In [None]:
chat_history = []

def on_submit(_):
    query = input_box.value
    input_box.value = ""

    if query.lower() == 'exit':
        print("Thank you for using the State of the Union chatbot!")
        return

    result = chain({"question": query, "chat_history": chat_history})
    chat_history.append((query, result['answer']))

    display(widgets.HTML(f'<b>User:</b> {query}'))
    display(widgets.HTML(f'<b><font color="blue">Chatbot:</font></b> {result["answer"]}'))

print("Welcome to the Ayatolah chatbot! Type 'exit' to stop.")

input_box = widgets.Text(placeholder='Please enter your question:')
input_box.on_submit(on_submit)

display(input_box)

Welcome to the Ayatolah chatbot! Type 'exit' to stop.


Text(value='', placeholder='Please enter your question:')

HTML(value='<b>User:</b> what is taqlee')

HTML(value='<b><font color="blue">Chatbot:</font></b> Use the following pieces of context to answer the questi…

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


HTML(value='<b>User:</b> what is it about')

HTML(value='<b><font color="blue">Chatbot:</font></b> Use the following pieces of context to answer the questi…

HTML(value='<b>User:</b> what did i first ask you')

HTML(value='<b><font color="blue">Chatbot:</font></b> Use the following pieces of context to answer the questi…

HTML(value='<b>User:</b> what did i first ask you/')

HTML(value='<b><font color="blue">Chatbot:</font></b> Use the following pieces of context to answer the questi…