<img src="https://drive.google.com/uc?id=1uZppqKTPFt0zfrTsA-76AGRfPcp-PY-a">

Different steps in build a RAG based Conversational assistant
<img src="https://drive.google.com/uc?id=1Bh_GFFABV45OyupAI6hsr_DHWRcYJSNF">

In [1]:
!pip install pypdf accelerate bitsandbytes gradio langchain chromadb sentence_transformers

Collecting chromadb
  Downloading chromadb-0.4.18-py3-none-any.whl (502 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m502.4/502.4 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m65.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.1.0-py2.py3-none-any.whl (37 kB)
Collecting pulsar-client>=3.1.0 (from chromadb)
  Downloading pulsar_client-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m93.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB)
[2

In [2]:
from torch import cuda, bfloat16
import transformers

model_id = "mistralai/Mistral-7B-Instruct-v0.2"

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map={'':0},
)
model.eval()
print(f"Model loaded on {device}")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Model loaded on cuda:0


In [3]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

In [4]:

stop_list = ['\nHuman:', '\n```\n']

stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids

[[1, 28705, 13, 28769, 6366, 28747], [1, 28705, 13, 13940, 28832, 13]]

In [5]:

import torch

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
stop_token_ids

[tensor([    1, 28705,    13, 28769,  6366, 28747], device='cuda:0'),
 tensor([    1, 28705,    13, 13940, 28832,    13], device='cuda:0')]

In [6]:
from transformers import StoppingCriteria, StoppingCriteriaList

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])


In [7]:
llm = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model rambles during chat
    temperature=0.01,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    top_p=0.95,
    top_k=50,
    max_new_tokens=512,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

In [8]:
res = llm("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Explain to me the difference between nuclear fission and fusion.

Nuclear Fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing energy in the form of gamma radiation and neutrons. This process typically occurs when a large nucleus absorbs a neutron, becoming unstable and splitting apart. The resulting smaller nuclei have less mass than the original nucleus, and this mass difference is released as energy according to Einstein's famous equation E=mc².

Nuclear Fusion, on the other hand, is a process in which two or more atomic nuclei combine to form a single larger nucleus, releasing energy in the form of light and heat. This process typically occurs at extremely high temperatures and pressures, where the atomic nuclei are able to overcome their natural repulsion and come close enough together to fuse. The most common type of nuclear fusion is the fusion of hydrogen isotopes (deuterium and tritium) to form helium.

In summary, Nuclear Fission i

In [9]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=llm)

In [10]:
llm(prompt="Explain to me the difference between nuclear fission and fusion.")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


"\n\nNuclear Fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing energy in the form of gamma radiation and neutrons. This process typically occurs when a large nucleus absorbs a neutron, becoming unstable and splitting apart. The resulting smaller nuclei have less mass than the original nucleus, and this mass difference is released as energy according to Einstein's famous equation E=mc².\n\nNuclear Fusion, on the other hand, is a process in which two or more atomic nuclei combine to form a single larger nucleus, releasing energy in the form of light and heat. This process typically occurs at extremely high temperatures and pressures, where the atomic nuclei are able to overcome their natural repulsion and come close enough together to fuse. The most common type of nuclear fusion is the fusion of hydrogen isotopes (deuterium and tritium) to form helium.\n\nIn summary, Nuclear Fission involves the splitting of a large nucleus into smaller one

In [11]:
with open('sample_data/article_370.txt', 'r', encoding='utf-8', errors='ignore') as f:
  lines = f.readlines()

with open('sample_data/article_370.txt', 'w', encoding='utf-8') as f:
  f.writelines(lines)

In [12]:
# loading
from langchain.document_loaders import DirectoryLoader, TextLoader


path = "sample_data"
loader = DirectoryLoader(
    path,
    glob="*.txt",
    loader_cls=TextLoader
)

documents = loader.load()

In [13]:
documents[0]



In [14]:
# chunking

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size=448,
    chunk_overlap=128,
    length_function=len,
    separator="\n"
)
chunks = text_splitter.split_documents(documents)




In [15]:
print(f"{len(documents)=} {len(chunks)=}")

len(documents)=1 len(chunks)=1803


In [16]:
chunks[0]

Document(page_content='2023 INSC 1058\tReportable  \nIN THE SUPREME COURT OF INDIA \nORIGINAL WRIT / APPELLATE JURISDICTION \n \nWrit Petition (Civil) No. 1099 of 2019 \n \n \n \n \nIN RE: ARTICLE 370 OF THE CONSTITUTION \n \n \n \n \n \n \n \n \nWith  \n \nWrit Petition (C) No. 871 of 2015 \n \nWith  \n \nWrit Petition (C) No. 722 of 2014 \n \nWith  \n \nSLP (C) No. 19618 of 2017 \n \nWith  \n \nWrit Petition (C) No. 1013 of 2019 \n \nWith  \n \nWrit Petition (C) No. 1082 of 2019 \n \nWith', metadata={'source': 'sample_data/article_370.txt'})

In [19]:
# indexing
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores.chroma import Chroma

BI_ENCODER_MODEL = "intfloat/e5-large-v2"
PERSIST_DIR = "db"

hf_embedding_model = HuggingFaceEmbeddings(model_name=BI_ENCODER_MODEL)

vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=hf_embedding_model,
    persist_directory=PERSIST_DIR
)
vector_db.persist()

.gitattributes:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.8k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

handler.py:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

In [20]:
print(vector_db._collection.count())

1803


In [45]:
k=10
fetch_k=20
query = "What are the factors that would allow for the abrogation of Article 370?"
vector_db.max_marginal_relevance_search(query, k=k, fetch_k=fetch_k)

[Document(page_content='370(1)(d); \nd. Whether the abrogation of Article 370 by the President in exercise of the power under Article 370(3) is constitutionally invalid in the absence of a recommendation of the Constituent Assembly of the State of Jammu and \nKashmir as mandated by the proviso to clause (3);', metadata={'source': 'sample_data/article_370.txt'}),
 Document(page_content='52. The nature of Article 370 itself - whether temporary or permanent - is the key to assessing the validity of the impugned actions. We propose to conduct this enquiry in three ways. First, by examining the historical background that led to the introduction of the provision in the Constitution. Second, by looking at the structure of the provision itself, and third, by reflecting on how the provision has worked out in the context of State-Union relations.', metadata={'source': 'sample_data/article_370.txt'}),
 Document(page_content='44. The abrogation of Article 370 brings the residents of Jammu and Kash

In [46]:
# Retrieval QA Chain
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context given in to answer the question at the end. \
If you don't know the answer, just say that you don't know, don't try to make up an answer. \
Keep the answer short and succinct.

Context:<{context}>
Question:<{question}>
Helpful Answer:"""

conversation = [{"role": "user", "content": template}]

template = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_db.as_retriever(search_type="mmr"),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain({"query": query})
print(result["result"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 According to the context provided, the abrogation of Article 370 can be allowed through the President's powers under Article 370(1)(d), which permits alterations to Article 370 itself. This is necessary because if this route is not kept open, Article 370 would become permanent, which was not the intention of the Constitution makers. The context also states that the abrogation brings residents of Jammu and Kashmir at par with other Indian citizens and confers upon them all rights flowing from the Constitution, making it an non-arbitrary act. However, it is important to note that this interpretation may be subject to further legal analysis and debate.


In [47]:
# Memory

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=False
)

In [48]:
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain

retriever = vector_db.as_retriever(search_type="mmr", search_kwargs={"k": k})
qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    get_chat_history=lambda h: h,
    memory=memory,
    verbose=True)

result = qa_chain({"query": query})
print(result["result"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 According to the context provided, the abrogation of Article 370 can be allowed through the President's powers under Article 370(1)(d), which permits alterations to Article 370 itself. This is necessary because if this route is not kept open, Article 370 would become permanent, which was not the intention of the Constitution makers. The context also states that the abrogation brings residents of Jammu and Kashmir at par with other Indian citizens and confers upon them all rights flowing from the Constitution, making it an non-arbitrary act. However, it is important to note that this interpretation may be subject to further legal analysis and debate.


In [49]:
import gradio as gr
import random
import time

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])

    def respond(message, chat_history):
        bot_message = qa_chain({"query": message, "chat_history":chat_history})["result"]
        chat_history.append((message, bot_message.replace("<", "").replace(">","")))
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://1efcfc5d7719e0a253.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




### References

1. [LangChain for LLM Application Developmen](https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/)
2. [LangChain: Chat with Your Data](https://www.deeplearning.ai/short-courses/langchain-chat-with-your-data/)
3. [How to create GPT-powered conversational bot for any website](https://youtu.be/T1hdz3eU3bg)
4. [LLaMa 70B Chatbot in Hugging Face and LangChain](https://github.com/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2-70b-chat-agent.ipynb)
5. [Using HuggingFace, OpenAI, and Cohere models with Langchain](https://medium.com/the-techlife/using-huggingface-openai-and-cohere-models-with-langchain-db57af14ac5b)

### Further Readings
1. [Build a ChatGPT for PDFs with Langchain](https://www.analyticsvidhya.com/blog/2023/05/build-a-chatgpt-for-pdfs-with-langchain/)
2. [Build A ChatGPT For YouTube Videos with Langchain](https://www.analyticsvidhya.com/blog/2023/06/build-a-chatgpt-for-youtube-videos-with-langchain/)
3. [QA over Documents](https://python.langchain.com/docs/use_cases/question_answering/)